Make the Scheduler adjust the steps taken relative to the gradient accumulation steps #1187

muellerzr · 2023-03-13T16:33:11Z

Let the `AcceleratedScheduler` handle gradient accumulation steppage

What does this add?

This PR adjusts the logic in the AcceleratedScheduler to take into account gradient accumulation steps.

Who is it for?

Closes #1170
Closes #1160

Why is it needed?

Currently a behavior does not exist that will automatically "cut" the LR scheduler of a user if they pass in gradient_accumulation_steps, so unless they are careful and adjust their LR scheduler beforehand, they're not actually stepping properly with the lr scheduler.

What parts of the API does this impact?

User-facing:

A new GradientAccumulationPlugin is being added which will handle gradient_accumulation_steps and optionally disabling the extra steppage involved with the scheduler when performing gradient accumulation

Internal structure:

AcceleratedScheduler's step function now will run n*num_processes where n==gradient_accumulation_steps to account for the difference.

To test performance, I ran the equivalent training of gradient_accumulation_steps==2 on a bs of 16, and a regular batch size of 32, with negligible performance differences (accuracy += 0.5%, F1 of 0.3). Likely due to batch norm layers

Usage Example:

When building the Accelerator, pass in adjust_scheduler_to_accumulation (default False) to enable this behavior:

    accelerator = Accelerator(
        cpu=args.cpu, 
        mixed_precision=args.mixed_precision, 
        gradient_accumulation_steps=2,
    )

    accelerator = Accelerator(
        cpu=args.cpu, 
        mixed_precision=args.mixed_precision, 
        gradient_accumulation_plugin=GradientAccumulationPlugin(adjust_scheduler=False, num_steps=2)
    )

HuggingFaceDocBuilderDev · 2023-03-13T16:40:43Z

The documentation is not available anymore as the PR was closed or merged.

sgugger

Thanks for working on this. Just not sure about the API (adding yet another argument to the Accelerator) for this.

docs/source/usage_guides/gradient_accumulation.mdx

src/accelerate/accelerator.py

src/accelerate/scheduler.py

Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

muellerzr · 2023-03-14T15:56:05Z

@sgugger good for review, a few notes:

You'll notice that in the test for grad_sync changes, for it to pass (and for everywhere to pass in general), having accelerator use the new stepped scheduler to True is used, and on every script we have this behavior of what should exist is what we've advertised using. As such, I think it would be good to change the default to True not False here, as users are expecting one thing rather than the other
Should we keep gradient_accumulation_steps as an arg in Accelerator? I'm leaning towards yes if we change the default to True for reasons above, though we can also just keep it there for usability. (In my eyes, users would likely not need to adjust the other param, hence we only need to expose gradient_accumulation_steps there)

sgugger

Thanks for iterating! I agree that the default should be True for adjust_scheduler. The gradient accumulation API is quite new anyway, so it's okay if it's adjusted a bit like this.

sgugger · 2023-03-14T16:06:48Z

src/accelerate/utils/dataclasses.py

+
+    num_steps: int = field(default=None, metadata={"help": "The number of steps to accumulate gradients for."})
+    adjust_scheduler: bool = field(
+        default=False,


This should be True by default I think.

muellerzr added 5 commits March 13, 2023 11:26

Make scheduler actually adjust the length

94181fd

Print

df904b1

Use last_epoch

099bfe0

Try now

8006f8c

Make scheduler step based on gradient accumulation

01f9a0a

muellerzr added the enhancement New feature or request label Mar 13, 2023

muellerzr requested a review from sgugger March 13, 2023 16:33

Put decorator in the right place

8744f40

clean

79ff7c5

sgugger reviewed Mar 13, 2023

View reviewed changes

docs/source/usage_guides/gradient_accumulation.mdx Outdated Show resolved Hide resolved

src/accelerate/accelerator.py Outdated Show resolved Hide resolved

src/accelerate/scheduler.py Outdated Show resolved Hide resolved

src/accelerate/scheduler.py Outdated Show resolved Hide resolved

muellerzr added 3 commits March 14, 2023 10:25

Rework it all, better version and working now with tests

60f3009

Fix tests

250fe90

Bring back grad_accum_steps

71619d5

muellerzr marked this pull request as draft March 14, 2023 15:14

muellerzr and others added 3 commits March 14, 2023 11:46

Use plugin

f19a4b0

Fix docstrings

744686e

Formatting nit

b39fa98

Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

muellerzr marked this pull request as ready for review March 14, 2023 15:56

sgugger approved these changes Mar 14, 2023

View reviewed changes

Proper kwarg

3acda86

muellerzr requested a review from sgugger March 14, 2023 16:23

muellerzr added 2 commits March 14, 2023 12:26

Raise err

97d1af2

Move around check for TPU

0d922d4

sgugger approved these changes Mar 14, 2023

View reviewed changes

muellerzr added 4 commits March 14, 2023 14:25

fix para name

4a47333

Fix steps w/ deepspeed

0449504

Good import

bb5a374

Try with this

f3bce84

muellerzr added 2 commits March 15, 2023 11:45

Try now?

7926808

Should be working now, just need to test on CI

248070f

muellerzr merged commit e462098 into main Mar 15, 2023

muellerzr deleted the scheduler-docs branch March 15, 2023 16:16

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Make the Scheduler adjust the steps taken relative to the gradient accumulation steps #1187

Make the Scheduler adjust the steps taken relative to the gradient accumulation steps #1187

muellerzr commented Mar 13, 2023 •

edited

HuggingFaceDocBuilderDev commented Mar 13, 2023 •

edited

sgugger left a comment

muellerzr commented Mar 14, 2023 •

edited

sgugger left a comment

sgugger Mar 14, 2023

Make the Scheduler adjust the steps taken relative to the gradient accumulation steps #1187

Make the Scheduler adjust the steps taken relative to the gradient accumulation steps #1187

Conversation

muellerzr commented Mar 13, 2023 • edited

Let the AcceleratedScheduler handle gradient accumulation steppage

What does this add?

Who is it for?

Why is it needed?

What parts of the API does this impact?

User-facing:

Internal structure:

Usage Example:

HuggingFaceDocBuilderDev commented Mar 13, 2023 • edited

sgugger left a comment

Choose a reason for hiding this comment

muellerzr commented Mar 14, 2023 • edited

sgugger left a comment

Choose a reason for hiding this comment

sgugger Mar 14, 2023

Choose a reason for hiding this comment

muellerzr commented Mar 13, 2023 •

edited

Let the `AcceleratedScheduler` handle gradient accumulation steppage

HuggingFaceDocBuilderDev commented Mar 13, 2023 •

edited

muellerzr commented Mar 14, 2023 •

edited