Skip to content
Merged
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
13 changes: 9 additions & 4 deletions src/transformers/trainer.py
Original file line number Diff line number Diff line change
Expand Up @@ -264,11 +264,14 @@ def __init__(
self.eval_dataset = eval_dataset
self.tokenizer = tokenizer

# Model parallel
if not self.is_model_parallel:
# postpone switching model to cuda when:
# 1. MP - since we are trying to fit a much bigger than 1 gpu model
# 2. fp16-enabled DeepSpeed loads the model in half the size and it doesn't need .to() anyway
if not (self.is_model_parallel or args.deepspeed):
model = model.to(args.device)
else:
# Force n_gpu to 1 to avoid DataParallel.

# Force n_gpu to 1 to avoid DataParallel as MP will manage the GPUs
if self.is_model_parallel:
self.args._n_gpu = 1

# later use `self.model is self.model_wrapped` to check if it's wrapped or not
Expand Down Expand Up @@ -790,6 +793,8 @@ def train(
model = ShardedDDP(model, self.optimizer)
elif is_sagemaker_distributed_available():
model = DDP(model, device_ids=[dist.get_local_rank()], broadcast_buffers=False)
if self.deepspeed:
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

FYI this breaks most integrations, it should be an elif so that we don't fall into the branches after if TPU or sagemaker is here.
Will fix in a commit on master.

Copy link
Contributor Author

@stas00 stas00 Feb 4, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

oh boy, my apologies, my branching skills went haywire yesterday.

just the fact that one puts an if foo really close to an existing set of conditionals doesn't make it part of it. need a different programming language that will be more do-what-i-mean-when-i-am-tired

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thank you for the fix, @sgugger

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No worries, just wanted to alert you :-) Thankfully we found this just before cutting the release candidate!

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh my!

As I said above this literally happened to me several times yesterday, something went haywire and I started adding new branches with just if's adjacent to an existing if/elif/else pile - my brain decided that if they are together it's must be part of the other if/else. So odd. Some new programming language must be percolating through my neurons or a rogue AI took over and is using my brain for its experiments.

pass # already initialized its own DDP earlier
elif self.args.local_rank != -1:
if self.args.ddp_find_unused_parameters is not None:
find_unused_parameters = self.args.ddp_find_unused_parameters
Expand Down