Diffucult to understand the behavior of `lr_scheduler` when using `gradient_accumulation` #1160

kingnobro · 2023-03-07T14:22:56Z

System Info

- `Accelerate` version: 0.16.0
- Platform: Linux-5.4.0-91-generic-x86_64-with-glibc2.31
- Python version: 3.9.15
- Numpy version: 1.23.5
- PyTorch version (GPU?): 1.12.1 (True)
- `Accelerate` default config:
        Not found

Information

The official example scripts
My own modified scripts

Tasks

One of the scripts in the examples/ folder of Accelerate or an officially supported no_trainer script in the examples folder of the transformers repo (such as run_no_trainer_glue.py)
My own task or dataset (give details below)

Reproduction

Add the following codes after this line: link

if accelerator.is_local_main_process:
    accelerator.print('step', step, 'lr', lr_scheduler.get_last_lr()[0])

I run the gradient_accumulation.py using the following command provided by document.

accelerate launch ./gradient_accumulation.py --gradient_accumulation_steps 5

Expected behavior

Expected: The learning rate is 0 at the end of training.

However, the result is not 0.

Epoch 3
step 222 lr 1.8779661016949152e-05
step 223 lr 1.8779661016949152e-05
step 224 lr 1.8745762711864407e-05
step 225 lr 1.8745762711864407e-05
step 226 lr 1.8745762711864407e-05
step 227 lr 1.8745762711864407e-05
step 228 lr 1.8745762711864407e-05
step 229 lr 1.8711864406779663e-05  # the last line of log

The text was updated successfully, but these errors were encountered:

muellerzr self-assigned this Mar 7, 2023

muellerzr mentioned this issue Mar 13, 2023

Make the Scheduler adjust the steps taken relative to the gradient accumulation steps #1187

Merged

muellerzr closed this as completed in #1187 Mar 15, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Diffucult to understand the behavior of `lr_scheduler` when using `gradient_accumulation` #1160

Diffucult to understand the behavior of `lr_scheduler` when using `gradient_accumulation` #1160

kingnobro commented Mar 7, 2023 •

edited

Loading

Diffucult to understand the behavior of lr_scheduler when using gradient_accumulation #1160

Diffucult to understand the behavior of lr_scheduler when using gradient_accumulation #1160

Comments

kingnobro commented Mar 7, 2023 • edited Loading

System Info

Information

Tasks

Reproduction

Expected behavior

Diffucult to understand the behavior of `lr_scheduler` when using `gradient_accumulation` #1160

Diffucult to understand the behavior of `lr_scheduler` when using `gradient_accumulation` #1160

kingnobro commented Mar 7, 2023 •

edited

Loading