Reland: Fix CUDA device guard usage when first arg of kernel is scalar #39956

kurtamohler · 2020-06-12T19:56:24Z

Reland PR #39870

Closes #38889

dr-ci · 2020-06-12T20:20:05Z

💊 CI failures summary and remediations

As of commit 928ac64 (more details on the Dr. CI page):

1/3 failures possibly* introduced in this PR
- 1/1 non-CircleCI failure(s)
2/3 broken upstream at merge base a9aa636 on Jun 12 from 12:36pm to 1:40pm PDT (2 commits; a9aa636 - 8bc821f)

🚧 2 fixed upstream failures:

These were probably caused by upstream breakages that were already fixed.

Please rebase on the viable/strict branch (expand for instructions)

Since your merge base is older than viable/strict, run these commands:

git fetch https://github.com/pytorch/pytorch viable/strict
git rebase FETCH_HEAD

Check out the recency history of this "viable master" tracking branch.

pytorch_windows_vs2019_py36_cpu_build on Jun 12 from 12:36pm to 1:40pm PDT (2 commits; a9aa636 - 8bc821f)
- 🔁 rerun
pytorch_windows_vs2019_py36_cuda10.1_build on Jun 12 from 12:36pm to 1:40pm PDT (2 commits; a9aa636 - 8bc821f)
- 🔁 rerun

ci.pytorch.org: 1 failed

Failed: pr/caffe2-pytorch-linux-xenial-rocm3.3-py3.6-test

This comment was automatically generated by Dr. CI (expand for details).

Follow this link to opt-out of these comments for your Pull Requests.

Please report bugs/suggestions on the GitHub issue tracker or post in the (internal) Dr. CI Users group.

See how this bot performed.

This comment has been revised 12 times.

kurtamohler · 2020-06-12T20:43:49Z

Turns out that I'm getting test failures on TestTorchDeviceTypeCUDA.test_serialization_cuda only if I run all of test_torch.py. If I run just that failing test, it doesn't fail.

ngimel · 2020-06-12T21:07:30Z

Yeah, that happens because you are changing device in your test and don't set it back. You can either explicitly set the device back to original, or, better, use with torch.cuda.device context manager.
so there are 2 problems here

the failing test is poorly written, it should not rely on global state
your test should not change global state.
It's ok to fix only 2).

kurtamohler · 2020-06-12T21:29:52Z

Oh right, that makes sense.

test/test_torch.py

facebook-github-bot

@ngimel has imported this pull request. If you are a Facebook employee, you can view this diff on Phabricator.

facebook-github-bot · 2020-06-13T06:23:19Z

@ngimel merged this pull request in db2b273.

pytorch#39956) Summary: Reland PR pytorch#39870 Closes pytorch#38889 Pull Request resolved: pytorch#39956 Differential Revision: D22027956 Pulled By: ngimel fbshipit-source-id: e6029f450e2da3782b2d05bcc2012c19b82291da

Reland PR pytorch#39870

41e3537

kurtamohler requested a review from ngimel June 12, 2020 19:56

Use context manager in unit test to fix failures

906e01f

ngimel reviewed Jun 12, 2020

View reviewed changes

test/test_torch.py Outdated Show resolved Hide resolved

Fix call to change default cuda device

928ac64

ngimel approved these changes Jun 12, 2020

View reviewed changes

facebook-github-bot reviewed Jun 12, 2020

View reviewed changes

facebook-github-bot closed this in db2b273 Jun 13, 2020

facebook-github-bot added the merged label Jun 13, 2020

ngimel mentioned this pull request Jul 28, 2020

Weird result multiplying a cpu tensor by a cuda:1 tensor #42161

Closed

mruberry added the Merged label Oct 28, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Reland: Fix CUDA device guard usage when first arg of kernel is scalar #39956

Reland: Fix CUDA device guard usage when first arg of kernel is scalar #39956

kurtamohler commented Jun 12, 2020

dr-ci bot commented Jun 12, 2020 •

edited

kurtamohler commented Jun 12, 2020

ngimel commented Jun 12, 2020 •

edited

kurtamohler commented Jun 12, 2020

facebook-github-bot left a comment

facebook-github-bot commented Jun 13, 2020

Reland: Fix CUDA device guard usage when first arg of kernel is scalar #39956

Reland: Fix CUDA device guard usage when first arg of kernel is scalar #39956

Conversation

kurtamohler commented Jun 12, 2020

dr-ci bot commented Jun 12, 2020 • edited

💊 CI failures summary and remediations

🚧 2 fixed upstream failures:

ci.pytorch.org: 1 failed

kurtamohler commented Jun 12, 2020

ngimel commented Jun 12, 2020 • edited

kurtamohler commented Jun 12, 2020

facebook-github-bot left a comment

Choose a reason for hiding this comment

facebook-github-bot commented Jun 13, 2020

dr-ci bot commented Jun 12, 2020 •

edited

ngimel commented Jun 12, 2020 •

edited