Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

add XPU backend to support torch.save and torch.load #89679

Closed
wants to merge 1 commit into from
Closed

add XPU backend to support torch.save and torch.load #89679

wants to merge 1 commit into from

Conversation

guangyey
Copy link
Collaborator

@guangyey guangyey commented Nov 25, 2022

Motivate

We need to add XPU backend to support torch.save and torch.load when parameter _use_new_zipfile_serialization=False.

Solution

We give a design via wrap data as a tensor:

  1. and use an in-place copy for H2D
  2. directly call a tensor.to() for D2H.

This can help us:

  1. unify the generic code for all backends.
  2. support all the non-CPU device backends.

Additional Context

No need more UT.
test/test_serialization.py will cover this code change.

@pytorch-bot
Copy link

pytorch-bot bot commented Nov 25, 2022

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/89679

Note: Links to docs will display an error until the docs builds have been completed.

✅ No Failures

As of commit 97e63b4:
💚 Looks good so far! There are no failures yet. 💚

This comment was automatically generated by Dr. CI and updates every 15 minutes.

@gujinghui
Copy link
Collaborator

@ezyang could you help review this change. Thanks a lot.

@ezyang
Copy link
Contributor

ezyang commented Nov 29, 2022

I'm OK with the direction (replacing manual memcpy with a dispatched copy). I ask you to go further: there is no need to gate on HIP/XPU/CUDA; we should work for all non-CPU devices this way.

@guangyey
Copy link
Collaborator Author

@ezyang Following your comments, I have updated the related code and current PR's comments. Is it good?

@albanD albanD requested a review from ezyang November 29, 2022 14:22
@albanD albanD added the triaged This issue has been looked at a team member, and triaged and prioritized into an appropriate module label Nov 29, 2022
(void*)cpu_data.get(),
{size_bytes},
at::device(at::kCPU).dtype(c10::kByte));
cpu_tensor.copy_(device_tensor);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This seems a bit more circuitous than is necessary. Since you're doing a dispatched copy, the operator can take care of allocating a CPU tensor for you. So you can just directly convert device_tensor to CPU, and then pull out the data pointer directly, no need to hand allocate cpu_data buffer anymore.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done. It looks more clear.
@ezyang Thanks, any more comments?

@linux-foundation-easycla
Copy link

linux-foundation-easycla bot commented Nov 30, 2022

CLA Signed

The committers listed above are authorized under a signed CLA.

  • ✅ login: guangyey / name: Yu, Guangye (97e63b4)

@ezyang
Copy link
Contributor

ezyang commented Nov 30, 2022

@pytorchbot merge

@pytorch-bot pytorch-bot bot added the ciflow/trunk Trigger trunk jobs on your pull request label Nov 30, 2022
@pytorchmergebot
Copy link
Collaborator

Merge started

Your change will be merged once all checks pass (ETA 0-4 Hours).

Learn more about merging in the wiki.

Questions? Feedback? Please reach out to the PyTorch DevX Team

Advanced Debugging
Check the merge workflow status
here

@pytorchmergebot
Copy link
Collaborator

Merge failed

Reason: 1 additional jobs have failed, first few of them are: .github/workflows/trunk.yml

Details for Dev Infra team Raised by workflow job

@ezyang
Copy link
Contributor

ezyang commented Nov 30, 2022

@pytorchbot merge

@pytorchmergebot
Copy link
Collaborator

Merge started

Your change will be merged once all checks pass (ETA 0-4 Hours).

Learn more about merging in the wiki.

Questions? Feedback? Please reach out to the PyTorch DevX Team

Advanced Debugging
Check the merge workflow status
here

@pytorchmergebot
Copy link
Collaborator

Merge failed

Reason: 1 additional jobs have failed, first few of them are: trunk

Details for Dev Infra team Raised by workflow job

@ezyang
Copy link
Contributor

ezyang commented Nov 30, 2022

@pytorchbot merge -f "idk why this failed everything looks good"

@pytorchmergebot
Copy link
Collaborator

Merge started

Your change will be merged immediately since you used the force (-f) flag, bypassing any CI checks (ETA: 1-5 minutes).

Learn more about merging in the wiki.

Questions? Feedback? Please reach out to the PyTorch DevX Team

Advanced Debugging
Check the merge workflow status
here

@guangyey guangyey deleted the guangyey/save_load branch December 1, 2022 10:01
kulinseth pushed a commit to kulinseth/pytorch that referenced this pull request Dec 10, 2022
# Motivate
We need to add XPU backend to support torch.save and torch.load when parameter _use_new_zipfile_serialization=False.

# Solution
We give a design via wrap data as a tensor:
>1. and use an in-place copy for H2D
>2. directly call a tensor.to() for D2H.

This can help us:
>1. unify the generic code for all backends.
>2. support all the non-CPU device backends.

# Additional Context
No need more UT.
test/test_serialization.py will cover this code change.

Pull Request resolved: pytorch#89679
Approved by: https://github.com/ezyang
neggles pushed a commit to neggles/pytorch that referenced this pull request Mar 9, 2023
We need to add XPU backend to support torch.save and torch.load when parameter _use_new_zipfile_serialization=False.

We give a design via wrap data as a tensor:
>1. and use an in-place copy for H2D
>2. directly call a tensor.to() for D2H.

This can help us:
>1. unify the generic code for all backends.
>2. support all the non-CPU device backends.

No need more UT.
test/test_serialization.py will cover this code change.

Pull Request resolved: pytorch#89679
Approved by: https://github.com/ezyang
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
ciflow/trunk Trigger trunk jobs on your pull request Merged open source triaged This issue has been looked at a team member, and triaged and prioritized into an appropriate module
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

6 participants