Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

set the default to use set_to_none for clearing gradients in BF16 optimizer. #5434

Merged
merged 8 commits into from
Apr 23, 2024

Conversation

inkcherry
Copy link
Contributor

as discussed in #5175, set the default to use set_to_none for clearing gradients in BF16 optimizer.
Additionally, for the case of zero clearing, use foreach_zero.
Verified correctness with mega-ds llama 7B training.

FYI @loadams

@@ -441,11 +441,20 @@ def clear_hp_grads(self):
self.fp32_groups_has_gradients[i] = [False] * len(group)

def clear_lp_grads(self):

# using zero_() fixed memory address for graph replay
set_to_none = set_to_none = False if self.graph_harvesting else True
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
set_to_none = set_to_none = False if self.graph_harvesting else True
set_to_none = False if self.graph_harvesting else True

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fixed and reverified

@loadams loadams enabled auto-merge April 22, 2024 22:29
@loadams loadams added this pull request to the merge queue Apr 22, 2024
Merged via the queue into microsoft:master with commit c66bc42 Apr 23, 2024
14 checks passed
rraminen pushed a commit to ROCm/DeepSpeed that referenced this pull request May 9, 2024
…imizer. (microsoft#5434)

as discussed in microsoft#5175, set the default to use set_to_none for clearing
gradients in BF16 optimizer.
Additionally, for the case of zero clearing, use foreach_zero.
Verified correctness with mega-ds llama 7B training.

FYI @loadams

---------

Co-authored-by: Logan Adams <114770087+loadams@users.noreply.github.com>
umchand pushed a commit to umchand/DeepSpeed that referenced this pull request May 20, 2024
…imizer. (microsoft#5434)

as discussed in microsoft#5175, set the default to use set_to_none for clearing
gradients in BF16 optimizer.
Additionally, for the case of zero clearing, use foreach_zero.
Verified correctness with mega-ds llama 7B training.

FYI @loadams

---------

Co-authored-by: Logan Adams <114770087+loadams@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants