Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

_pickle.UnpicklingError: pickle data was truncated when using more than one GPU #9

Closed
zhong-yy opened this issue Apr 3, 2023 · 1 comment

Comments

@zhong-yy
Copy link
Contributor

zhong-yy commented Apr 3, 2023

I got the following error when using 2 GPUs. It runs normal when only one GPU is used.

Epoch 0:   0%|                                           | 0/25 [00:00<?, ?it/s]Traceback (most recent call last):
  File "<string>", line 1, in <module>
  File "/home/zhongyiyuan/miniconda3/envs/seisbench_v0.3/lib/python3.10/multiprocessing/spawn.py", line 116, in spawn_main
    exitcode = _main(fd, parent_sentinel)
  File "/home/zhongyiyuan/miniconda3/envs/seisbench_v0.3/lib/python3.10/multiprocessing/spawn.py", line 126, in _main
    self = reduction.pickle.load(from_parent)
_pickle.UnpicklingError: pickle data was truncated
Traceback (most recent call last):
  File "/home/zhongyiyuan/pick-benchmark/benchmark/train.py", line 257, in <module>
    train(config, experiment_name, test_run=args.test_run)
  File "/home/zhongyiyuan/pick-benchmark/benchmark/train.py", line 87, in train
    trainer.fit(model, train_loader, dev_loader)
  File "/home/zhongyiyuan/miniconda3/envs/seisbench_v0.3/lib/python3.10/site-packages/pytorch_lightning/trainer/trainer.py", line 608, in fit
    call._call_and_handle_interrupt(
  File "/home/zhongyiyuan/miniconda3/envs/seisbench_v0.3/lib/python3.10/site-packages/pytorch_lightning/trainer/call.py", line 36, in _call_and_handle_interrupt
    return trainer.strategy.launcher.launch(trainer_fn, *args, trainer=trainer, **kwargs)
  File "/home/zhongyiyuan/miniconda3/envs/seisbench_v0.3/lib/python3.10/site-packages/pytorch_lightning/strategies/launchers/multiprocessing.py", line 113, in launch
    mp.start_processes(
  File "/home/zhongyiyuan/miniconda3/envs/seisbench_v0.3/lib/python3.10/site-packages/torch/multiprocessing/spawn.py", line 198, in start_processes
    while not context.join():
  File "/home/zhongyiyuan/miniconda3/envs/seisbench_v0.3/lib/python3.10/site-packages/torch/multiprocessing/spawn.py", line 140, in join
    raise ProcessExitedException(
torch.multiprocessing.spawn.ProcessExitedException: process 1 terminated with signal SIGKILL

@yetinam
Copy link
Member

yetinam commented Apr 3, 2023

Hi @zhong-yy ,
thanks for reporting this. We have not tested multi-GPU training and are not planning to implement it into this benchmark. Personally, I wouldn't expect any speed-up from multi-GPU training for now, as the training was not GPU-bound. Nonetheless, if you feel like adding multi-GPU functionality, I'd be happy to review a PR.

@yetinam yetinam closed this as completed Apr 3, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants