Skip to content

Pipeline warnings and checkpoint portability#588

Merged
ShadenSmith merged 3 commits intodeepspeedai:masterfrom
ShadenSmith:pp-portable-ckpt
Dec 8, 2020
Merged

Pipeline warnings and checkpoint portability#588
ShadenSmith merged 3 commits intodeepspeedai:masterfrom
ShadenSmith:pp-portable-ckpt

Conversation

@ShadenSmith
Copy link
Copy Markdown
Contributor

This PR does two things:

  1. Drops deprecated allreduce_gradients argument to backward()
  2. Fixes a portability bug in pipeline checkpointing. Previously, checkpoints could not be renamed due to this.

Copy link
Copy Markdown
Collaborator

@jeffra jeffra left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good, but can we add a test case that covers the checkpoint bug?

@ShadenSmith
Copy link
Copy Markdown
Contributor Author

Good point. Our pipeline checkpoint suite already covers this in a way. This change that returns None instead of a file path would trigger the same bug; we'd just fail to open None instead of a non-existent (or incorrect) file. Does that seem like a reasonable test?

@ShadenSmith ShadenSmith merged commit 2f62697 into deepspeedai:master Dec 8, 2020
@ShadenSmith ShadenSmith deleted the pp-portable-ckpt branch December 8, 2020 17:42
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

bug Something isn't working

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants