Multi GPU Training for a unet_learner #450

tikurahul · 2020-08-03T15:55:01Z

I am trying to figure out how to use multiple GPUs to speed up training for my segmentation model.

I looked at PyTorch's documentation (nn.DataParallel) and this link. However, I have not had success so far.

My first attempt was something like this:

if torch.cuda.device_count() > 1:
    wrapped_model = nn.DataParallel(learner.model)
    learner.model = wrapped_model.module

This does not have the intended effect. I only see 1 GPU being used.

I also saw the documentation here but from what I can tell unet_learner does not have the parallel_ctx context manager.

The other thing I tried doing was:

callbacks = [
    ParallelTrainer(device_ids=[0, 1]),
    EarlyStoppingCallback(min_delta=0.001, patience=5)
]
learner.fine_tune(20, freeze_epochs=2, wd=0.01, base_lr=0.0006, cbs=callbacks)

This is more promising, but I end up with the following error message:

RuntimeError: Expected tensor for argument #1 'input' to have the same device as tensor for argument #2 'weight'; but device 1 does not equal 0 (while checking arguments for cudnn_batch_norm)

The text was updated successfully, but these errors were encountered:

jph00 · 2020-08-03T16:12:21Z

Please use the forums for help.

jph00 closed this as completed Aug 3, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Multi GPU Training for a unet_learner #450

Multi GPU Training for a unet_learner #450

tikurahul commented Aug 3, 2020 •

edited

Loading

jph00 commented Aug 3, 2020

Multi GPU Training for a unet_learner #450

Multi GPU Training for a unet_learner #450

Comments

tikurahul commented Aug 3, 2020 • edited Loading

jph00 commented Aug 3, 2020

tikurahul commented Aug 3, 2020 •

edited

Loading