Skip to content

Conversation

@jeffra
Copy link
Collaborator

@jeffra jeffra commented Jul 29, 2021

No description provided.

@jeffra jeffra changed the title pass GAS boundary state from PP -> ZeRO Correctness fix PP+ZeRO for gradient accumulation Jul 29, 2021

def allreduce_gradients(self, bucket_size=MEMORY_OPT_ALLREDUCE_SIZE):
# Pass (PP) gas boundary flag to optimizer (required for zero)
self.optimizer.is_gradient_accumulation_boundary = self.is_gradient_accumulation_boundary(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could self.optimizer be an fp32 optimizer, our fp16 optimizer, or client optimizer? Is it safe to assume self.optimizer.is_gradient_accumulation_boundary?

@MichaelEk
Copy link

This PR fixes this issue: #1217

@jeffra jeffra merged commit b712bab into master Sep 10, 2021
@jeffra jeffra deleted the jeffra/pp-zero-gas-fix branch September 10, 2021 14:47
@stas00
Copy link
Collaborator

stas00 commented Sep 18, 2021

@jeffra, could you please replay to @big-science? Thank you!

@jeffra
Copy link
Collaborator Author

jeffra commented Sep 18, 2021

I believe this is already replayed on big-science branch: f93e22b

I think this commit was mixed with a few other items as well though.

@stas00
Copy link
Collaborator

stas00 commented Sep 18, 2021

Indeed. Thank you for finding it, Jeff! I was just comparing the titles of the commits.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants