Add note about GradientState being in-sync with the dataloader by def…

…ault (#2134) * NOte about sync * PR review comments
huggingface · Nov 14, 2023 · 8dedb14 · 8dedb14
1 parent b55855a
commit 8dedb14
Showing 1 changed file with 17 additions and 1 deletion.
diff --git a/docs/source/usage_guides/gradient_accumulation.md b/docs/source/usage_guides/gradient_accumulation.md
@@ -118,8 +118,24 @@ You can remove all the special checks for the step number and the loss adjustmen
 As you can see the [`Accelerator`] is able to keep track of the batch number you are on and it will automatically know whether to step through the prepared optimizer and how to adjust the loss. 
 
 <Tip>
+
 Typically with gradient accumulation, you would need to adjust the number of steps to reflect the change in total batches you are 
-training on. 🤗 Accelerate automagically does this for you by default. Behind the scenes we instantiate a GradientAccumulationPlugin configured to do this.
+training on. 🤗 Accelerate automagically does this for you by default. Behind the scenes we instantiate a [`GradientAccumulationPlugin`] configured to do this.
+
+</Tip>
+
+<Tip warning={true}>
+
+The [`state.GradientState`] is sync'd with the active dataloader being iterated upon. As such it assumes naively that when we have reached the end of the dataloader everything will sync and a step will be performed. To disable this, set `sync_with_dataloader` to be `False` in the [`GradientAccumulationPlugin`]:
+
+```{python}
+from accelerate import Accelerator
+from accelerate.utils import GradientAccumulationPlugin
+
+plugin = GradientAccumulationPlugin(sync_with_dataloader=False)
+accelerator = Accelerator(..., gradient_accumulation_plugin=plugin)
+```
+
 </Tip>
 
 ## The finished code