Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix DLPack CUDA stream convention #67618

Closed
wants to merge 3 commits into from

Conversation

emcastillo
Copy link
Collaborator

@emcastillo emcastillo commented Nov 1, 2021

Apparently for the array API, cuda default stream and per thread stream should be 1 and 2 instead of 0 and 1:

https://data-apis.org/array-api/latest/API_specification/array_object.html?dlpack-self-stream-none#dlpack-self-stream-none.

This caused a problem in the interop with CuPy cupy/cupy#5970 (comment).

cc @rgommers @leofang @mruberry

@pytorch-probot
Copy link

pytorch-probot bot commented Nov 1, 2021

CI Flow Status

⚛️ CI Flow

Ruleset - Version: v1
Ruleset - File: https://github.com/emcastillo/pytorch/blob/c4494d0d2d2c180b047f461f5ec6c833b2db8220/.github/generated-ciflow-ruleset.json
PR ciflow labels: ciflow/default

Workflows Labels (bold enabled) Status
Triggered Workflows
linux-bionic-py3.6-clang9 ciflow/all, ciflow/cpu, ciflow/default, ciflow/linux, ciflow/noarch, ciflow/xla ✅ triggered
linux-vulkan-bionic-py3.6-clang9 ciflow/all, ciflow/cpu, ciflow/default, ciflow/linux, ciflow/vulkan ✅ triggered
linux-xenial-cuda11.3-py3.6-gcc7 ciflow/all, ciflow/cuda, ciflow/default, ciflow/linux ✅ triggered
linux-xenial-py3-clang5-mobile-build ciflow/all, ciflow/default, ciflow/linux, ciflow/mobile ✅ triggered
linux-xenial-py3-clang5-mobile-custom-build-static ciflow/all, ciflow/default, ciflow/linux, ciflow/mobile ✅ triggered
linux-xenial-py3.6-clang7-asan ciflow/all, ciflow/cpu, ciflow/default, ciflow/linux, ciflow/sanitizers ✅ triggered
linux-xenial-py3.6-clang7-onnx ciflow/all, ciflow/cpu, ciflow/default, ciflow/linux, ciflow/onnx ✅ triggered
linux-xenial-py3.6-gcc5.4 ciflow/all, ciflow/cpu, ciflow/default, ciflow/linux ✅ triggered
linux-xenial-py3.6-gcc7 ciflow/all, ciflow/cpu, ciflow/default, ciflow/linux ✅ triggered
linux-xenial-py3.6-gcc7-bazel-test ciflow/all, ciflow/bazel, ciflow/cpu, ciflow/default, ciflow/linux ✅ triggered
pytorch-linux-xenial-py3-clang5-android-ndk-r19c-gradle-custom-build-single ciflow/all, ciflow/android, ciflow/cpu, ciflow/default, ciflow/linux ✅ triggered
pytorch-linux-xenial-py3-clang5-android-ndk-r19c-gradle-custom-build-single-full-jit ciflow/all, ciflow/android, ciflow/cpu, ciflow/default, ciflow/linux ✅ triggered
win-vs2019-cpu-py3 ciflow/all, ciflow/cpu, ciflow/default, ciflow/win ✅ triggered
win-vs2019-cuda11.3-py3 ciflow/all, ciflow/cuda, ciflow/default, ciflow/win ✅ triggered
Skipped Workflows
caffe2-linux-xenial-py3.6-gcc5.4 ciflow/all, ciflow/cpu, ciflow/linux 🚫 skipped
docker-builds ciflow/all 🚫 skipped
ios-12-5-1-arm64 ciflow/all, ciflow/ios, ciflow/macos 🚫 skipped
ios-12-5-1-arm64-coreml ciflow/all, ciflow/ios, ciflow/macos 🚫 skipped
ios-12-5-1-arm64-custom-ops ciflow/all, ciflow/ios, ciflow/macos 🚫 skipped
ios-12-5-1-arm64-full-jit ciflow/all, ciflow/ios, ciflow/macos 🚫 skipped
ios-12-5-1-arm64-metal ciflow/all, ciflow/ios, ciflow/macos 🚫 skipped
ios-12-5-1-x86-64 ciflow/all, ciflow/ios, ciflow/macos 🚫 skipped
ios-12-5-1-x86-64-coreml ciflow/all, ciflow/ios, ciflow/macos 🚫 skipped
ios-12-5-1-x86-64-full-jit ciflow/all, ciflow/ios, ciflow/macos 🚫 skipped
libtorch-linux-xenial-cuda10.2-py3.6-gcc7 ciflow/all, ciflow/cuda, ciflow/libtorch, ciflow/linux 🚫 skipped
libtorch-linux-xenial-cuda11.3-py3.6-gcc7 ciflow/all, ciflow/cuda, ciflow/libtorch, ciflow/linux 🚫 skipped
linux-bionic-cuda10.2-py3.9-gcc7 ciflow/all, ciflow/cuda, ciflow/linux, ciflow/slow 🚫 skipped
macos-10-15-py3-arm64 ciflow/all, ciflow/macos 🚫 skipped
macos-10-15-py3-lite-interpreter-x86-64 ciflow/all, ciflow/macos 🚫 skipped
macos-10-15-py3-x86-64 ciflow/all, ciflow/macos 🚫 skipped
parallelnative-linux-xenial-py3.6-gcc5.4 ciflow/all, ciflow/cpu, ciflow/linux 🚫 skipped
periodic-libtorch-linux-xenial-cuda11.1-py3.6-gcc7 ciflow/all, ciflow/cuda, ciflow/libtorch, ciflow/linux, ciflow/scheduled 🚫 skipped
periodic-linux-xenial-cuda10.2-py3-gcc7-slow-gradcheck ciflow/all, ciflow/cuda, ciflow/linux, ciflow/scheduled, ciflow/slow, ciflow/slow-gradcheck 🚫 skipped
periodic-linux-xenial-cuda11.1-py3.6-gcc7-debug ciflow/all, ciflow/cuda, ciflow/linux, ciflow/scheduled 🚫 skipped
periodic-win-vs2019-cuda11.1-py3 ciflow/all, ciflow/cuda, ciflow/scheduled, ciflow/win 🚫 skipped

You can add a comment to the PR and tag @pytorchbot with the following commands:
# ciflow rerun, "ciflow/default" will always be added automatically
@pytorchbot ciflow rerun

# ciflow rerun with additional labels "-l <ciflow/label_name>", which is equivalent to adding these labels manually and trigger the rerun
@pytorchbot ciflow rerun -l ciflow/scheduled -l ciflow/slow

For more information, please take a look at the CI Flow Wiki.

@facebook-github-bot
Copy link
Contributor

facebook-github-bot commented Nov 1, 2021

🔗 Helpful links

💊 CI failures summary and remediations

As of commit c4494d0 (more details on the Dr. CI page):


💚 💚 Looks good so far! There are no failures yet. 💚 💚


This comment was automatically generated by Dr. CI (expand for details).

Please report bugs/suggestions to the (internal) Dr. CI Users group.

Click here to manually regenerate this comment.

Copy link
Contributor

@leofang leofang left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, Emilio! Look like we missed stream_ptr here?

torch/utils/dlpack.py Outdated Show resolved Hide resolved
@leofang
Copy link
Contributor

leofang commented Nov 1, 2021

Question for @rgommers @mruberry: Would there be PyTorch 1.10.1? If so we should backport this patch.

@emcastillo emcastillo force-pushed the fix-dlpack branch 2 times, most recently from 06a8985 to 8114b16 Compare November 2, 2021 04:57
@mruberry mruberry added the triaged This issue has been looked at a team member, and triaged and prioritized into an appropriate module label Nov 2, 2021
Copy link
Collaborator

@rgommers rgommers left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This LGTM, thanks @emcastillo!

@mruberry
Copy link
Collaborator

Hey @emcastillo!

Thanks for this fix and your patience (I was moving from the West Coast to the East Coast)!

It looks like the ROCm failure is real, however:

08:18:53 ======================================================================
08:18:53 FAIL [0.075s]: test_dlpack_default_stream_cuda (__main__.TestTorchDeviceTypeCUDA)
08:18:53 ----------------------------------------------------------------------
08:18:53 Traceback (most recent call last):
08:18:53   File "/opt/conda/lib/python3.6/site-packages/torch/testing/_internal/common_utils.py", line 1422, in wrapper
08:18:53     method(*args, **kwargs)
08:18:53   File "/opt/conda/lib/python3.6/site-packages/torch/testing/_internal/common_utils.py", line 1422, in wrapper
08:18:53     method(*args, **kwargs)
08:18:53   File "/opt/conda/lib/python3.6/site-packages/torch/testing/_internal/common_device_type.py", line 371, in instantiated_test
08:18:53     result = test(self, **param_kwargs)
08:18:53   File "/opt/conda/lib/python3.6/site-packages/torch/testing/_internal/common_device_type.py", line 772, in dep_fn
08:18:53     return fn(slf, *args, **kwargs)
08:18:53   File "/opt/conda/lib/python3.6/site-packages/torch/testing/_internal/common_device_type.py", line 891, in only_fn
08:18:53     return fn(slf, *args, **kwargs)
08:18:53   File "test_torch.py", line 7336, in test_dlpack_default_stream
08:18:53     from_dlpack(x)
08:18:53   File "/opt/conda/lib/python3.6/site-packages/torch/utils/dlpack.py", line 68, in from_dlpack
08:18:53     dlpack = ext_tensor.__dlpack__(stream=stream_ptr)
08:18:53   File "test_torch.py", line 7328, in __dlpack__
08:18:53     assert stream == 1
08:18:53 AssertionError

Any idea what's going on there? One option would be to file a follow-up issue and skip the test on ROCm for now (with a link to the issue).

@rgommers
Copy link
Collaborator

It looks like the ROCm failure is real,

Stream numbering for CUDA and ROCm is different, it's 1 for CUDA default stream and 0 for ROCm: see https://data-apis.org/array-api/latest/API_specification/array_object.html#dlpack-self-stream-none.

I have a vague memory about us talking about this before, and there being an issue with detecting whether we're running on ROCm, but I can't find it back.

@emcastillo
Copy link
Collaborator Author

Thanks @mruberry, ler me take a closer look and fix it!

@emcastillo
Copy link
Collaborator Author

@mruberry all tests passed :)

@facebook-github-bot
Copy link
Contributor

@mruberry has imported this pull request. If you are a Facebook employee, you can view this diff on Phabricator.

@facebook-github-bot
Copy link
Contributor

@mruberry merged this pull request in 533e72e.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
cla signed Merged open source triaged This issue has been looked at a team member, and triaged and prioritized into an appropriate module
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants