fix hpu storage serialization #101680

ppiskorski · 2023-05-17T07:34:51Z

Change-Id: Ia534400a0e8972590374eceba5b62a2525b796e5

Fixes #ISSUE_NUMBER

cc @mruberry @mikaylagawarecki

pytorch-bot · 2023-05-17T07:34:54Z

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/101680

📄 Preview Python docs built from this PR
📄 Preview C++ docs built from this PR
❓ Need help or want to give feedback on the CI? Visit the bot commands wiki or our office hours

Note: Links to docs will display an error until the docs builds have been completed.

✅ No Failures

As of commit 4fc1665:
💚 Looks good so far! There are no failures yet. 💚

This comment was automatically generated by Dr. CI and updates every 15 minutes.

github-actions · 2023-05-18T16:08:23Z

This PR needs a label

If your changes are user facing and intended to be a part of release notes, please use a label starting with release notes:.

If not, please add the topic: not user facing label.

To add a label, you can comment to pytorchbot, for example
@pytorchbot label "topic: not user facing"

For more information, see
https://github.com/pytorch/pytorch/wiki/PyTorch-AutoLabel-Bot#why-categorize-for-release-notes-and-how-does-it-work.

mikaylagawarecki

This looks good to me, just two questions

mikaylagawarecki · 2023-06-09T17:31:14Z

torch/_utils.py

+    """
+    non_blocking = _get_async_or_non_blocking("hpu", non_blocking, kwargs)
+    hpu = getattr(torch, "hpu", None)
+    assert hpu is not None, "HPU device module is not loaded"


is it guaranteed that hpu will be available if hasattr(torch, "hpu")? do we have an equivalent of torch.{device}.is_available() for hpu?

Actually, availability of torch.hpu only means that the module has been loaded and subsequent check for actual device can be done with torch.hpu.is_available, just like you suspect. Only that at this level (_utils.py) there are no such checks.

hm why is it not available in _utils.py? I see that torch.xpu.is_available() is used in this file

this is aligned with _cuda. The user requested hpu storage on a particular device. That's a non-negotiable request so we'd error anyway if we can't make it. Explicit check of hpu.is_available does not buy us that much: we can use it to error that there are no hpu devices at all. Without it we also fail, but this time because the particular requested device is not there.

Got it, thanks for clarifying

mikaylagawarecki · 2023-06-09T17:45:03Z

torch/serialization.py

+def validate_hpu_device(location):
+    hpu = getattr(torch, "hpu", None)
+    assert hpu is not None, "HPU device module is not loaded"
+    device = hpu._utils._get_device_index(location)


For my understanding is there a reason why we don't pass optional=True here? (just comparing with validate_cuda_device and curious to know if this is intentional)

That's a good point. The API that hpu module did not implement the 'optional' argument, but there is no good reason for this and I'll fix that.

check if hpu is loaded before accessing torch.hpu make validate_hpu_device public Change-Id: I8e356cc03f83e96810a6e504031c7f6f305680be

mikaylagawarecki

Thanks, this lgtm! Just one question

mikaylagawarecki · 2023-06-21T17:03:22Z

@pytorchbot merge

pytorchmergebot · 2023-06-21T17:05:16Z

Merge started

Your change will be merged once all checks pass (ETA 0-4 Hours).

Learn more about merging in the wiki.

Questions? Feedback? Please reach out to the PyTorch DevX Team

Advanced Debugging

Check the merge workflow status
here

pytorchmergebot · 2023-06-21T17:20:28Z

Merge failed

Reason: 1 jobs have failed, first few of them are: trunk / win-vs2019-cuda11.8-py3 / build

Details for Dev Infra team

Raised by workflow job

mikaylagawarecki · 2023-06-21T21:17:54Z

@pytorchbot merge

pytorchmergebot · 2023-06-21T21:19:45Z

Merge started

Your change will be merged once all checks pass (ETA 0-4 Hours).

Learn more about merging in the wiki.

Questions? Feedback? Please reach out to the PyTorch DevX Team

Advanced Debugging

Check the merge workflow status
here

pytorchbot added the open source label May 17, 2023

drisspg added module: serialization Issues related to serialization (e.g., via pickle, or otherwise) of PyTorch objects triaged This issue has been looked at a team member, and triaged and prioritized into an appropriate module labels May 17, 2023

ppiskorski force-pushed the hpu_storage_serialization branch 2 times, most recently from f3f4f9b to 6cf7306 Compare May 22, 2023 07:19

ppiskorski force-pushed the hpu_storage_serialization branch 2 times, most recently from 2831579 to 990b4a6 Compare June 5, 2023 16:15

mikaylagawarecki self-requested a review June 9, 2023 16:56

mikaylagawarecki reviewed Jun 9, 2023

View reviewed changes

[SW-141573] fix hpu storage serialization

4fc1665

check if hpu is loaded before accessing torch.hpu make validate_hpu_device public Change-Id: I8e356cc03f83e96810a6e504031c7f6f305680be

ppiskorski force-pushed the hpu_storage_serialization branch from ca82ec1 to 4fc1665 Compare June 20, 2023 08:47

mikaylagawarecki approved these changes Jun 20, 2023

View reviewed changes

mikaylagawarecki added release notes: python_frontend release notes category topic: improvements topic category labels Jun 20, 2023

pytorch-bot bot added the ciflow/trunk Trigger trunk jobs on your pull request label Jun 21, 2023

pytorchmergebot added the merging label Jun 21, 2023

pytorchmergebot removed the merging label Jun 21, 2023

pytorchmergebot added the merging label Jun 21, 2023

pytorchmergebot added Merged and removed merging labels Jun 21, 2023

pytorchmergebot closed this in 7fb2a92 Jun 21, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix hpu storage serialization #101680

fix hpu storage serialization #101680

ppiskorski commented May 17, 2023 •

edited by pytorch-bot bot

pytorch-bot bot commented May 17, 2023 •

edited

github-actions bot commented May 18, 2023

mikaylagawarecki left a comment

mikaylagawarecki Jun 9, 2023

ppiskorski Jun 19, 2023

mikaylagawarecki Jun 20, 2023

ppiskorski Jun 21, 2023

mikaylagawarecki Jun 21, 2023

mikaylagawarecki Jun 9, 2023

ppiskorski Jun 19, 2023

mikaylagawarecki left a comment

mikaylagawarecki commented Jun 21, 2023

pytorchmergebot commented Jun 21, 2023

pytorchmergebot commented Jun 21, 2023

mikaylagawarecki commented Jun 21, 2023

pytorchmergebot commented Jun 21, 2023

fix hpu storage serialization #101680

fix hpu storage serialization #101680

Conversation

ppiskorski commented May 17, 2023 • edited by pytorch-bot bot

pytorch-bot bot commented May 17, 2023 • edited

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/101680

✅ No Failures

github-actions bot commented May 18, 2023

This PR needs a label

mikaylagawarecki left a comment

Choose a reason for hiding this comment

mikaylagawarecki Jun 9, 2023

Choose a reason for hiding this comment

ppiskorski Jun 19, 2023

Choose a reason for hiding this comment

mikaylagawarecki Jun 20, 2023

Choose a reason for hiding this comment

ppiskorski Jun 21, 2023

Choose a reason for hiding this comment

mikaylagawarecki Jun 21, 2023

Choose a reason for hiding this comment

mikaylagawarecki Jun 9, 2023

Choose a reason for hiding this comment

ppiskorski Jun 19, 2023

Choose a reason for hiding this comment

mikaylagawarecki left a comment

Choose a reason for hiding this comment

mikaylagawarecki commented Jun 21, 2023

pytorchmergebot commented Jun 21, 2023

Merge started

pytorchmergebot commented Jun 21, 2023

Merge failed

mikaylagawarecki commented Jun 21, 2023

pytorchmergebot commented Jun 21, 2023

Merge started

ppiskorski commented May 17, 2023 •

edited by pytorch-bot bot

pytorch-bot bot commented May 17, 2023 •

edited