-
Notifications
You must be signed in to change notification settings - Fork 59
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
error when training with multiple GPUs : AttributeError: Can't pickle local object 'ExtractiveSummarizer.prepare_data.<locals>.longformer_modifier' #25
Comments
It's good to hear that you've started training! I've seen this type of error before during abstractive summarization and I should be able to fix it relatively quickly. I'll have a fix in the next few days. |
@HHousen - it's training! |
@HHousen - sorry one more issue -- I think my performance is slow because of num_workers -- I tried setting it through dataloader_num_workers based on the documentation but got error |
@moyid The The docstring gives two examples of how to split an I was looking at how the huggingface/transformers seq2seq example deals with this problem. They use the |
Also @moyid, to determine if the number of workers is actually the problem you can train with the |
sounds good, I'll try that. |
Hi @HHousen -- we have talked in a previous issue -- the good news is that I actually got the longformer training working! But now I'm trying to speed up training by using multiple GPUs. However, I get the following error with muliple GPUs while it is working fine with just 1 GPU:
Traceback (most recent call last):
File "src/main.py", line 393, in
main(main_args)
File "src/main.py", line 97, in main
trainer.fit(model)
File "/home/jupyter/TransformerSum/env/lib/python3.7/site-packages/pytorch_lightning/trainer/trainer.py", line 439, in fit
results = self.accelerator_backend.train()
File "/home/jupyter/TransformerSum/env/lib/python3.7/site-packages/pytorch_lightning/accelerators/ddp_spawn_accelerator.py", line 65, in train
mp.spawn(self.ddp_train, nprocs=self.nprocs, args=(self.mp_queue, model,))
File "/home/jupyter/TransformerSum/env/lib/python3.7/site-packages/torch/multiprocessing/spawn.py", line 200, in spawn
return start_processes(fn, args, nprocs, join, daemon, start_method='spawn')
File "/home/jupyter/TransformerSum/env/lib/python3.7/site-packages/torch/multiprocessing/spawn.py", line 149, in start_processes
process.start()
File "/opt/conda/lib/python3.7/multiprocessing/process.py", line 112, in start
self._popen = self._Popen(self)
File "/opt/conda/lib/python3.7/multiprocessing/context.py", line 284, in _Popen
return Popen(process_obj)
File "/opt/conda/lib/python3.7/multiprocessing/popen_spawn_posix.py", line 32, in init
super().init(process_obj)
File "/opt/conda/lib/python3.7/multiprocessing/popen_fork.py", line 20, in init
self._launch(process_obj)
File "/opt/conda/lib/python3.7/multiprocessing/popen_spawn_posix.py", line 47, in _launch
reduction.dump(process_obj, fp)
File "/opt/conda/lib/python3.7/multiprocessing/reduction.py", line 60, in dump
ForkingPickler(file, protocol).dump(obj)
AttributeError: Can't pickle local object 'ExtractiveSummarizer.prepare_data..longformer_modifier'
The text was updated successfully, but these errors were encountered: