Skip to content

[BUG]: Checkpointing Test Failed with PyTorch 1.9 #720

@FrankLeeeee

Description

@FrankLeeeee

🐛 Describe the bug

When running unit testing for model checkpointing, the following exception occurs.

Screenshot 2022-04-11 at 5 10 48 PM

This is because that the _rank_not_in_group API is not exposed at the torch.distributed level with PyTorch 1.9.

Environment

CUDA: 11.1
PyTorch 1.9.1

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugSomething isn't working

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions