-
Notifications
You must be signed in to change notification settings - Fork 21.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
add XPU backend to support torch.save and torch.load #89679
Conversation
🔗 Helpful Links🧪 See artifacts and rendered test results at hud.pytorch.org/pr/89679
Note: Links to docs will display an error until the docs builds have been completed. ✅ No FailuresAs of commit 97e63b4: This comment was automatically generated by Dr. CI and updates every 15 minutes. |
@ezyang could you help review this change. Thanks a lot. |
I'm OK with the direction (replacing manual memcpy with a dispatched copy). I ask you to go further: there is no need to gate on HIP/XPU/CUDA; we should work for all non-CPU devices this way. |
@ezyang Following your comments, I have updated the related code and current PR's comments. Is it good? |
torch/csrc/serialization.cpp
Outdated
(void*)cpu_data.get(), | ||
{size_bytes}, | ||
at::device(at::kCPU).dtype(c10::kByte)); | ||
cpu_tensor.copy_(device_tensor); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This seems a bit more circuitous than is necessary. Since you're doing a dispatched copy, the operator can take care of allocating a CPU tensor for you. So you can just directly convert device_tensor
to CPU, and then pull out the data pointer directly, no need to hand allocate cpu_data
buffer anymore.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done. It looks more clear.
@ezyang Thanks, any more comments?
|
@pytorchbot merge |
Merge startedYour change will be merged once all checks pass (ETA 0-4 Hours). Learn more about merging in the wiki. Questions? Feedback? Please reach out to the PyTorch DevX Team |
Merge failedReason: 1 additional jobs have failed, first few of them are: .github/workflows/trunk.yml Details for Dev Infra teamRaised by workflow job |
@pytorchbot merge |
Merge startedYour change will be merged once all checks pass (ETA 0-4 Hours). Learn more about merging in the wiki. Questions? Feedback? Please reach out to the PyTorch DevX Team |
Merge failedReason: 1 additional jobs have failed, first few of them are: trunk Details for Dev Infra teamRaised by workflow job |
@pytorchbot merge -f "idk why this failed everything looks good" |
Merge startedYour change will be merged immediately since you used the force (-f) flag, bypassing any CI checks (ETA: 1-5 minutes). Learn more about merging in the wiki. Questions? Feedback? Please reach out to the PyTorch DevX Team |
# Motivate We need to add XPU backend to support torch.save and torch.load when parameter _use_new_zipfile_serialization=False. # Solution We give a design via wrap data as a tensor: >1. and use an in-place copy for H2D >2. directly call a tensor.to() for D2H. This can help us: >1. unify the generic code for all backends. >2. support all the non-CPU device backends. # Additional Context No need more UT. test/test_serialization.py will cover this code change. Pull Request resolved: pytorch#89679 Approved by: https://github.com/ezyang
We need to add XPU backend to support torch.save and torch.load when parameter _use_new_zipfile_serialization=False. We give a design via wrap data as a tensor: >1. and use an in-place copy for H2D >2. directly call a tensor.to() for D2H. This can help us: >1. unify the generic code for all backends. >2. support all the non-CPU device backends. No need more UT. test/test_serialization.py will cover this code change. Pull Request resolved: pytorch#89679 Approved by: https://github.com/ezyang
Motivate
We need to add XPU backend to support torch.save and torch.load when parameter _use_new_zipfile_serialization=False.
Solution
We give a design via wrap data as a tensor:
This can help us:
Additional Context
No need more UT.
test/test_serialization.py will cover this code change.