[Activation Checkpointing] Investigate pin_memory for CPU offload #86097
Labels
module: checkpoint
Related to torch.utils.checkpoint
oncall: distributed
Add this issue/PR to distributed oncall triage queue
triaged
This issue has been looked at a team member, and triaged and prioritized into an appropriate module
🚀 The feature, motivation and pitch
@awgu had a good point here: #85459 (comment) that we shouldn't assume we have unlimited space in the pinned memory region, right now
save_on_cpu
does pin_memory=True in a hardcoded way, we should investigate performance implications of this and improve our intuition.Alternatives
No response
Additional context
No response
cc @pietern @mrshenli @pritamdamania87 @zhaojuanmao @satgera @gqchen @aazzolini @osalpekar @jiayisuse @SciPioneer @H-Huang @kwen2501
The text was updated successfully, but these errors were encountered: