-
Notifications
You must be signed in to change notification settings - Fork 26k
Eliminate use of deprecated 'device' argument in Tensor.pin_memory() #163102
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
🔗 Helpful Links🧪 See artifacts and rendered test results at hud.pytorch.org/pr/163102
Note: Links to docs will display an error until the docs builds have been completed. ✅ No FailuresAs of commit be83b24 with merge base 86db4de ( This comment was automatically generated by Dr. CI and updates every 15 minutes. |
|
Would it be possible to add a test for this? |
6 tests added |
|
Does it make sense to just add the tests to |
I can make that change if you prefer, although I noticed that test_dataloader.py is already quite large (≈approximately 3680 lines). |
|
@eqy : Could you, please, review this PR? |
divyanshk
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
|
@eqy : I moved the 6 |
ngimel
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For the future, AI assistance on PRs has to be disclosed
Bumps the recipes container to 25.10. We have to xfail / disable a couple of features to get the nvidia-internal torch fork to work: * megatron-fsdp's `gather_uneven_dtensor_to_full_tensor` seems to break with newer versions of torch, fix WIP. * torchdata's StatefulDataLoader uses internal pytorch APIs that changed with pytorch/pytorch#163102 (merged in the nvidia NGC images), so to use StatefulDataLoader we need to set `pin_memory=False`, which impacts performance. We now have the option to fall back to the standard pytorch dataloader with `use_stateful_dataloader=False` --------- Signed-off-by: Peter St. John <pstjohn@nvidia.com>
- Added use_stateful_dataloader: false to all hydra configs (matches ESM2) - Updated train_ddp.py and train_fsdp2.py to conditionally pass dataloader to checkpoint functions - Updated test_distributed_checkpointing.py to enable stateful dataloader in all tests - Works around pin_memory issue (pytorch/pytorch#163102) by defaulting to regular DataLoader - Tests can still validate full checkpoint/resume with use_stateful_dataloader=true Signed-off-by: savitha-eng <savithas@nvidia.com>
- Added use_stateful_dataloader: false to all hydra configs (matches ESM2) - Updated train_ddp.py and train_fsdp2.py to conditionally pass dataloader to checkpoint functions - Updated test_distributed_checkpointing.py to enable stateful dataloader in all tests - Works around pin_memory issue (pytorch/pytorch#163102) by defaulting to regular DataLoader - Tests can still validate full checkpoint/resume with use_stateful_dataloader=true Signed-off-by: savitha-eng <savithas@nvidia.com>
A crash was observed due to the usage of the deprecated
deviceparameter.This PR removes its use to ensure compatibility with current PyTorch versions