`_pickle.UnpicklingError: pickle data was truncated` when using more than one `GPU` #9

zhong-yy · 2023-04-03T07:57:10Z

I got the following error when using 2 GPUs. It runs normal when only one GPU is used.

Epoch 0:   0%|                                           | 0/25 [00:00<?, ?it/s]Traceback (most recent call last):
  File "<string>", line 1, in <module>
  File "/home/zhongyiyuan/miniconda3/envs/seisbench_v0.3/lib/python3.10/multiprocessing/spawn.py", line 116, in spawn_main
    exitcode = _main(fd, parent_sentinel)
  File "/home/zhongyiyuan/miniconda3/envs/seisbench_v0.3/lib/python3.10/multiprocessing/spawn.py", line 126, in _main
    self = reduction.pickle.load(from_parent)
_pickle.UnpicklingError: pickle data was truncated
Traceback (most recent call last):
  File "/home/zhongyiyuan/pick-benchmark/benchmark/train.py", line 257, in <module>
    train(config, experiment_name, test_run=args.test_run)
  File "/home/zhongyiyuan/pick-benchmark/benchmark/train.py", line 87, in train
    trainer.fit(model, train_loader, dev_loader)
  File "/home/zhongyiyuan/miniconda3/envs/seisbench_v0.3/lib/python3.10/site-packages/pytorch_lightning/trainer/trainer.py", line 608, in fit
    call._call_and_handle_interrupt(
  File "/home/zhongyiyuan/miniconda3/envs/seisbench_v0.3/lib/python3.10/site-packages/pytorch_lightning/trainer/call.py", line 36, in _call_and_handle_interrupt
    return trainer.strategy.launcher.launch(trainer_fn, *args, trainer=trainer, **kwargs)
  File "/home/zhongyiyuan/miniconda3/envs/seisbench_v0.3/lib/python3.10/site-packages/pytorch_lightning/strategies/launchers/multiprocessing.py", line 113, in launch
    mp.start_processes(
  File "/home/zhongyiyuan/miniconda3/envs/seisbench_v0.3/lib/python3.10/site-packages/torch/multiprocessing/spawn.py", line 198, in start_processes
    while not context.join():
  File "/home/zhongyiyuan/miniconda3/envs/seisbench_v0.3/lib/python3.10/site-packages/torch/multiprocessing/spawn.py", line 140, in join
    raise ProcessExitedException(
torch.multiprocessing.spawn.ProcessExitedException: process 1 terminated with signal SIGKILL

The text was updated successfully, but these errors were encountered:

yetinam · 2023-04-03T08:08:27Z

Hi @zhong-yy ,
thanks for reporting this. We have not tested multi-GPU training and are not planning to implement it into this benchmark. Personally, I wouldn't expect any speed-up from multi-GPU training for now, as the training was not GPU-bound. Nonetheless, if you feel like adding multi-GPU functionality, I'd be happy to review a PR.

yetinam closed this as completed Apr 3, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

`_pickle.UnpicklingError: pickle data was truncated` when using more than one `GPU` #9

`_pickle.UnpicklingError: pickle data was truncated` when using more than one `GPU` #9

zhong-yy commented Apr 3, 2023

yetinam commented Apr 3, 2023

_pickle.UnpicklingError: pickle data was truncated when using more than one GPU #9

_pickle.UnpicklingError: pickle data was truncated when using more than one GPU #9

Comments

zhong-yy commented Apr 3, 2023

yetinam commented Apr 3, 2023

`_pickle.UnpicklingError: pickle data was truncated` when using more than one `GPU` #9

`_pickle.UnpicklingError: pickle data was truncated` when using more than one `GPU` #9