Update DDP checkpoint documentation #84589
Labels
good first issue
module: ddp
Issues/PRs related distributed data parallel training
oncall: distributed
Add this issue/PR to distributed oncall triage queue
triaged
This issue has been looked at a team member, and triaged and prioritized into an appropriate module
馃摎 The doc issue
DDP checkpoint documentation here:
pytorch/torch/nn/parallel/distributed.py
Line 333 in 88b1cc8
In particular, checkpoint with non_reentrant=True supports the use cases that are mentioned as unsupported.
Suggest a potential alternative/fix
No response
cc @pietern @mrshenli @pritamdamania87 @zhaojuanmao @satgera @gqchen @aazzolini @osalpekar @jiayisuse @SciPioneer @H-Huang @kwen2501
The text was updated successfully, but these errors were encountered: