MisconfigurationException: Do not set gradient_accumulation_steps
in the DeepSpeed config
#19891
Labels
gradient_accumulation_steps
in the DeepSpeed config
#19891
Bug description
I want to use gradient accumulation in my training process which is using a manually configured
DeepSpeedStrategy
(via a config file) to accomplish distributed training. My first go at this is to define the variablegradient_accumulation_steps
indeepspeed_config.json
whilst simultaneously passing in the same value in theTrainer
as an argument. In this case,lightning
raises the following exception:That's fine, but when I follow this advice and unset
gradient_accumulation_steps
in theDeepSpeed
config, thedeepspeed
library throws an exception:As a consequence I'm unable to use gradient accumulation with
DeepSpeedStrategy
. Am I doing something wrong here or is this actually a conflict of interest between thedeepspeed
andlightning
. In my eyes, I would imagine it would be sufficient iflightning
threw a warning here, or followed thedeepspeed
configuration file entirely.I've tested this using
deepspeed==0.12.6
anddeepspeed==0.14.2
(latest).What version are you seeing the problem on?
v2.2
How to reproduce the bug
Error messages and logs
Here's the full traceback of the first exception thrown by
lightning
.And here's the full traceback of the second exception thrown by
deepspeed
.Environment
Current environment
More info
No response
The text was updated successfully, but these errors were encountered: