Smp grad accum #10488

sgugger · 2021-03-02T21:25:17Z

What does this PR do?

This PR adds support for gradient accumulation in SageMakerTrainer. It has been tested on the glue script with success (with and without gradient accumulation passed along).

sgugger · 2021-03-02T21:25:36Z

src/transformers/sagemaker/trainer_sm.py

@@ -108,7 +107,7 @@ def _wrap_model(self, model, training=True):
            # Wrapping the base model twice in a DistributedModel will raise an error.
            if isinstance(self.model_wrapped, smp.model.DistributedModel):
                return self.model_wrapped
-            return smp.DistributedModel(model)
+            return smp.DistributedModel(model, backward_passes_per_step=self.args.gradient_accumulation_steps)


This does the equivalent of no_sync in regular DDP.

sgugger · 2021-03-02T21:26:06Z

src/transformers/training_args.py

+    def _no_sync_in_gradient_accumulation(self):
+        """
+        Whether or not to use no_sync for the gradients when doing gradient accumulation.
+        """
+        return not self.deepspeed


This is introduced to make it easy to skip the no_sync part in subclasses.

anirudh2290

Thanks a lot for the fix! LGTM.

LysandreJik

Very nice! LGTM!

sgugger added 2 commits March 2, 2021 15:37

Fix gradient accumulation for SM Model Parallelism

701d510

Style and divide loss by grad accum steps

8d97c4b

sgugger requested a review from LysandreJik March 2, 2021 21:25

sgugger commented Mar 2, 2021

View reviewed changes

anirudh2290 approved these changes Mar 2, 2021

View reviewed changes

LysandreJik approved these changes Mar 3, 2021

View reviewed changes

sgugger merged commit b70f441 into master Mar 3, 2021

sgugger deleted the smp_grad_accum branch March 3, 2021 17:13

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Smp grad accum #10488

Smp grad accum #10488

sgugger commented Mar 2, 2021

sgugger Mar 2, 2021

sgugger Mar 2, 2021

anirudh2290 left a comment

LysandreJik left a comment

Smp grad accum #10488

Smp grad accum #10488

Conversation

sgugger commented Mar 2, 2021

What does this PR do?

sgugger Mar 2, 2021

Choose a reason for hiding this comment

sgugger Mar 2, 2021

Choose a reason for hiding this comment

anirudh2290 left a comment

Choose a reason for hiding this comment

LysandreJik left a comment

Choose a reason for hiding this comment