From 8dedb140ef8995b4ff6f4b0e2452369a0ab1a969 Mon Sep 17 00:00:00 2001 From: Zach Mueller Date: Tue, 14 Nov 2023 11:53:57 -0500 Subject: [PATCH] Add note about GradientState being in-sync with the dataloader by default (#2134) * NOte about sync * PR review comments --- .../usage_guides/gradient_accumulation.md | 18 +++++++++++++++++- 1 file changed, 17 insertions(+), 1 deletion(-) diff --git a/docs/source/usage_guides/gradient_accumulation.md b/docs/source/usage_guides/gradient_accumulation.md index 54863015d8b..7960e6b0e4c 100644 --- a/docs/source/usage_guides/gradient_accumulation.md +++ b/docs/source/usage_guides/gradient_accumulation.md @@ -118,8 +118,24 @@ You can remove all the special checks for the step number and the loss adjustmen As you can see the [`Accelerator`] is able to keep track of the batch number you are on and it will automatically know whether to step through the prepared optimizer and how to adjust the loss. + Typically with gradient accumulation, you would need to adjust the number of steps to reflect the change in total batches you are -training on. 🤗 Accelerate automagically does this for you by default. Behind the scenes we instantiate a GradientAccumulationPlugin configured to do this. +training on. 🤗 Accelerate automagically does this for you by default. Behind the scenes we instantiate a [`GradientAccumulationPlugin`] configured to do this. + + + + + +The [`state.GradientState`] is sync'd with the active dataloader being iterated upon. As such it assumes naively that when we have reached the end of the dataloader everything will sync and a step will be performed. To disable this, set `sync_with_dataloader` to be `False` in the [`GradientAccumulationPlugin`]: + +```{python} +from accelerate import Accelerator +from accelerate.utils import GradientAccumulationPlugin + +plugin = GradientAccumulationPlugin(sync_with_dataloader=False) +accelerator = Accelerator(..., gradient_accumulation_plugin=plugin) +``` + ## The finished code