You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I'm re-using the Trainer implemented in examples.classification.src.trainer. It largely looks like a port of the original Trainer source code but I noticed that has an additional check that stops training when multiple GPUs are available. Specifically:
if self.args.local_rank != -1 or self.args.n_gpu > 1:
raise ValueError("Multi-gpu and distributed training is currently not supported.")
What could go wrong if I comment this out and let the distributed training proceed with torch.nn.DataParallel(model)? Appreciate the well-written code—thanks for the help in advance.
The text was updated successfully, but these errors were encountered:
I'm re-using the Trainer implemented in
examples.classification.src.trainer
. It largely looks like a port of the original Trainer source code but I noticed that has an additional check that stops training when multiple GPUs are available. Specifically:What could go wrong if I comment this out and let the distributed training proceed with
torch.nn.DataParallel(model)
? Appreciate the well-written code—thanks for the help in advance.The text was updated successfully, but these errors were encountered: