Diffucult to understand the behavior of lr_scheduler
when using gradient_accumulation
#1160
Closed
2 of 4 tasks
System Info
Information
Tasks
no_trainer
script in theexamples
folder of thetransformers
repo (such asrun_no_trainer_glue.py
)Reproduction
Add the following codes after this line: link
I run the
gradient_accumulation.py
using the following command provided by document.Expected behavior
Expected: The learning rate is 0 at the end of training. However, the result is not 0. Epoch 3 step 222 lr 1.8779661016949152e-05 step 223 lr 1.8779661016949152e-05 step 224 lr 1.8745762711864407e-05 step 225 lr 1.8745762711864407e-05 step 226 lr 1.8745762711864407e-05 step 227 lr 1.8745762711864407e-05 step 228 lr 1.8745762711864407e-05 step 229 lr 1.8711864406779663e-05 # the last line of log
The text was updated successfully, but these errors were encountered: