Training on multiple GPUs #41

mohummedalee · 2024-06-11T22:12:14Z

I'm re-using the Trainer implemented in examples.classification.src.trainer. It largely looks like a port of the original Trainer source code but I noticed that has an additional check that stops training when multiple GPUs are available. Specifically:

if self.args.local_rank != -1 or self.args.n_gpu > 1:
    raise ValueError("Multi-gpu and distributed training is currently not supported.")

What could go wrong if I comment this out and let the distributed training proceed with torch.nn.DataParallel(model)? Appreciate the well-written code—thanks for the help in advance.

The text was updated successfully, but these errors were encountered:

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Training on multiple GPUs #41

Training on multiple GPUs #41

mohummedalee commented Jun 11, 2024

Training on multiple GPUs #41

Training on multiple GPUs #41

Comments

mohummedalee commented Jun 11, 2024