You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I got the following error when using 2 GPUs. It runs normal when only one GPU is used.
Epoch 0: 0%| | 0/25 [00:00<?, ?it/s]Traceback (most recent call last):
File "<string>", line 1, in <module>
File "/home/zhongyiyuan/miniconda3/envs/seisbench_v0.3/lib/python3.10/multiprocessing/spawn.py", line 116, in spawn_main
exitcode = _main(fd, parent_sentinel)
File "/home/zhongyiyuan/miniconda3/envs/seisbench_v0.3/lib/python3.10/multiprocessing/spawn.py", line 126, in _main
self = reduction.pickle.load(from_parent)
_pickle.UnpicklingError: pickle data was truncated
Traceback (most recent call last):
File "/home/zhongyiyuan/pick-benchmark/benchmark/train.py", line 257, in <module>
train(config, experiment_name, test_run=args.test_run)
File "/home/zhongyiyuan/pick-benchmark/benchmark/train.py", line 87, in train
trainer.fit(model, train_loader, dev_loader)
File "/home/zhongyiyuan/miniconda3/envs/seisbench_v0.3/lib/python3.10/site-packages/pytorch_lightning/trainer/trainer.py", line 608, in fit
call._call_and_handle_interrupt(
File "/home/zhongyiyuan/miniconda3/envs/seisbench_v0.3/lib/python3.10/site-packages/pytorch_lightning/trainer/call.py", line 36, in _call_and_handle_interrupt
return trainer.strategy.launcher.launch(trainer_fn, *args, trainer=trainer, **kwargs)
File "/home/zhongyiyuan/miniconda3/envs/seisbench_v0.3/lib/python3.10/site-packages/pytorch_lightning/strategies/launchers/multiprocessing.py", line 113, in launch
mp.start_processes(
File "/home/zhongyiyuan/miniconda3/envs/seisbench_v0.3/lib/python3.10/site-packages/torch/multiprocessing/spawn.py", line 198, in start_processes
while not context.join():
File "/home/zhongyiyuan/miniconda3/envs/seisbench_v0.3/lib/python3.10/site-packages/torch/multiprocessing/spawn.py", line 140, in join
raise ProcessExitedException(
torch.multiprocessing.spawn.ProcessExitedException: process 1 terminated with signal SIGKILL
The text was updated successfully, but these errors were encountered:
Hi @zhong-yy ,
thanks for reporting this. We have not tested multi-GPU training and are not planning to implement it into this benchmark. Personally, I wouldn't expect any speed-up from multi-GPU training for now, as the training was not GPU-bound. Nonetheless, if you feel like adding multi-GPU functionality, I'd be happy to review a PR.
I got the following error when using 2 GPUs. It runs normal when only one GPU is used.
The text was updated successfully, but these errors were encountered: