use Fused AdamW as default #881

lessw2020 · 2025-02-24T04:21:10Z

Currently Titan does not use Fused AdamW as default.
This PR makes Fused the new default.

After investigating current parallelisms using Llama 8B, I found an average speedup of 2.64% as follows:

Fused AdamW	FSDP, eager	2.24%
8B	FSDP, compile	1.63%
	TP	3.62%
	AsyncTP	3.26%
	CP	2.97%
(debug model)	PP	2.10%

Gains	Average	2.64%
	Min	1.63%
	Max	3.62%

Updated to add --optimizer.implementation with ["for-loop", "foreach" and "fused"] support.

Testing:
beyond verifying no issues with all parallelisms above, verified that fused/foreach/ for-loop is being set with the new default config:

[rank0]:Using foreach implementation for optimizer
[rank0]:foreach=True, fused=False

[rank0]:Using for-loop implementation for optimizer
[rank0]:foreach=False, fused=False

[rank0]:Using fused implementation for optimizer
[rank0]:foreach=False, fused=True

tianyu-l

Thanks for verifying the performance. I have a suggestion inline.

Also, regarding https://github.com/pytorch/torchtitan/blob/main/torchtitan/components/optimizer.py#L212-L213
can fused and foreach coexist?

torchtitan/config_manager.py

fegin · 2025-02-24T05:05:28Z

fused and foreach cannot coexist.

lessw2020 · 2025-02-24T16:30:51Z

updated based on PR feedback to ensure command line disable is supported (toml support was already there).

Added --optimizer.disable_fused support and verified both cases:
a - using disable_fused in cmd line:

rank0]:Using AdamW optimizer
[rank0]:optimizer_kwargs: {'lr': 0.0008, 'betas': (0.9, 0.95), 'weight_decay': 0.1, 'fused': False, 'foreach': True}

b - not using disable fused (uses default setting for fused):

rank0]:Using AdamW optimizer
[rank0]:optimizer_kwargs: {'lr': 0.0008, 'betas': (0.9, 0.95), 'weight_decay': 0.1, 'fused': True, 'foreach': False}

torchtitan/config_manager.py

…d"] support

tianyu-l

LGTM.
Could you fix CI before merging?
In particular, please update the fused into a foreach test: https://github.com/pytorch/torchtitan/blob/main/tests/integration_tests.py#L267-L274
Also, there seem to be two optimizers used, which we should trim to one.

tests/integration_tests.py

…g test.

Currently Titan does not use Fused AdamW as default. This PR makes Fused the new default. After investigating current parallelisms using Llama 8B, I found an average speedup of 2.64% as follows: <google-sheets-html-origin> Fused AdamW | FSDP, eager | 2.24% -- | -- | -- 8B | FSDP, compile | 1.63% | TP | 3.62% | AsyncTP | 3.26% | CP | 2.97% (debug model) | PP | 2.10% | | Gains | Average | 2.64% | Min | 1.63% | Max | 3.62% Updated to add --optimizer.implementation with ["for-loop", "foreach" and "fused"] support. Testing: beyond verifying no issues with all parallelisms above, verified that fused/foreach/ for-loop is being set with the new default config: ~~~ [rank0]:Using foreach implementation for optimizer [rank0]:foreach=True, fused=False ~~~ ~~~ [rank0]:Using for-loop implementation for optimizer [rank0]:foreach=False, fused=False ~~~ ~~~ [rank0]:Using fused implementation for optimizer [rank0]:foreach=False, fused=True ~~~

use Fused AdamW as default

a07421b

facebook-github-bot added the CLA Signed This label is managed by the Meta Open Source bot. label Feb 24, 2025

lessw2020 requested review from tianyu-l and wz337 February 24, 2025 04:22

lessw2020 mentioned this pull request Feb 24, 2025

Make Fused AdamW default? Need to verify across all parallelisms ... #842

Closed

tianyu-l reviewed Feb 24, 2025

View reviewed changes

torchtitan/config_manager.py Outdated Show resolved Hide resolved

tianyu-l linked an issue Feb 24, 2025 that may be closed by this pull request

Make Fused AdamW default? Need to verify across all parallelisms ... #842

Closed

add support for cmd line disable_fused to override new Fused default

e411a6c

lessw2020 requested review from fegin and tianyu-l February 24, 2025 16:33

tianyu-l reviewed Feb 24, 2025

View reviewed changes

torchtitan/config_manager.py Outdated Show resolved Hide resolved

lessw2020 added 2 commits February 24, 2025 15:55

change to optimizer.implementation with ["for-loop", "foreach", "fuse…

96cde7f

…d"] support

change to optimizer.implementation with ["for-loop", "foreach", "fuse…

2c3c4b1

…d"] support

lessw2020 requested a review from tianyu-l February 24, 2025 23:59

tianyu-l approved these changes Feb 25, 2025

View reviewed changes

tianyu-l mentioned this pull request Feb 25, 2025

Generalize Optimizers container type, by passing base internal optimizer class. #884

Merged

lessw2020 added 2 commits February 24, 2025 18:35

update CI to have fused and foreach optimizer testing

bd6afc4

update CI to have fused and foreach optimizer testing

00b868b

tianyu-l reviewed Feb 25, 2025

View reviewed changes

tests/integration_tests.py Outdated Show resolved Hide resolved

update CI to single ForEach test. Fused already covered in main confi…

08620bf

…g test.

tianyu-l merged commit 6e49885 into main Feb 25, 2025
6 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

use Fused AdamW as default #881

use Fused AdamW as default #881

Uh oh!

lessw2020 commented Feb 24, 2025 •

edited

Loading

Uh oh!

tianyu-l left a comment

Uh oh!

Uh oh!

fegin commented Feb 24, 2025

Uh oh!

lessw2020 commented Feb 24, 2025

Uh oh!

Uh oh!

tianyu-l left a comment

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

use Fused AdamW as default #881

use Fused AdamW as default #881

Uh oh!

Conversation

lessw2020 commented Feb 24, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

tianyu-l left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

fegin commented Feb 24, 2025

Uh oh!

lessw2020 commented Feb 24, 2025

Uh oh!

Uh oh!

tianyu-l left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

lessw2020 commented Feb 24, 2025 •

edited

Loading