randperm: add torch check to ensure generator device = tensor device #47022

janeyx99 · 2020-10-28T22:06:46Z

BC-breaking Note:

This PR disallows passing in a generator of a different device than the tensor being created during randperm execution. For example, the following code which used to work no longer works.

> torch.randperm(3, device='cuda', generator=torch.Generator(device='cpu'))
tensor([0, 1, 2], device='cuda:0')

It now errors:

> torch.randperm(3, device='cuda', generator=torch.Generator(device='cpu'))
RuntimeError: Expected a 'cuda:0' generator device but found 'cpu'

PR Summary:

Fixes #44714

Also added + ran tests to ensure this functionality.

Disclaimer: More work needs to be done with regards to small cuda tensors when a generator is specified, look at the issue thread for more details.

aten/src/ATen/core/Generator.h

samestep

lgtm 👍 side note, it looks like test_randperm is fairly long... can/should it be split up into multiple smaller tests, or is it a series of steps that must be done sequentially?

janeyx99 · 2020-10-28T22:32:16Z

lgtm 👍 side note, it looks like test_randperm is fairly long... can/should it be split up into multiple smaller tests, or is it a series of steps that must be done sequentially?

Yea, it is marked as a slow test--I'm not sure what the best thing to do is, since that whole test is wholistic in that it only deals with randperm, but it could def be split up.

samestep · 2020-10-28T22:34:34Z

Yea, it is marked as a slow test--I'm not sure what the best thing to do is, since that whole test is wholistic in that it only deals with randperm, but it could def be split up.

gotcha; yeah I'm less concerned about the speed since as you mentioned, it's already marked as slow, but if it can be split up, I think it should, since that would make it clear which parts of it are independent from each other

ssnl · 2020-10-28T22:35:04Z

Just want to note that this is BC breaking. But I don't think it matters too much. I'll let others decide.

samestep

I rescind my approval 😛 these are the results from running the tests on my end, after commenting out the @slowTest on test_randperm:

$ python test/test_tensor_creation_ops.py -k test_randperm
/home/sestep/github/pytorch/pytorch/torch/random.py:95: UserWarning: CUDA reports that you have 8 available devices, and you have used fork_rng without explicitly specifying which devices are being used. For safety, we initialize *every* CUDA device by default, which can be quite slow if you have a lot of GPUs.  If you know that you are only making use of a few CUDA devices, set the environment variable CUDA_VISIBLE_DEVICES or the 'devices' keyword argument of fork_rng with the set of devices you are actually using.  For example, if you are using CPU only, set CUDA_VISIBLE_DEVICES= or devices=[]; if you are using GPU 0 only, set CUDA_VISIBLE_DEVICES=0 or devices=[0].  To initialize all devices and suppress this warning, set the 'devices' keyword argument to `range(torch.cuda.device_count())`.
  warnings.warn(
FF
======================================================================
FAIL: test_randperm_cpu (__main__.TestRandomTensorCreationCPU)
----------------------------------------------------------------------
RuntimeError: Expected a 'cuda' generator device but found 'cpu'

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/home/sestep/github/pytorch/pytorch/torch/testing/_internal/common_device_type.py", line 274, in instantiated_test
    result = test_fn(self, *args)
  File "test/test_tensor_creation_ops.py", line 1263, in test_randperm
    self.assertRaisesRegex(RuntimeError, regex, lambda: torch.randperm(n, device='cuda', generator=cpu_gen))
AssertionError: "Expected a * generator device but found *" does not match "Expected a 'cuda' generator device but found 'cpu'"

======================================================================
FAIL: test_randperm_cuda (__main__.TestRandomTensorCreationCUDA)
----------------------------------------------------------------------
RuntimeError: Expected a 'cuda' generator device but found 'cpu'

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/home/sestep/github/pytorch/pytorch/torch/testing/_internal/common_utils.py", line 833, in wrapper
    method(*args, **kwargs)
  File "/home/sestep/github/pytorch/pytorch/torch/testing/_internal/common_utils.py", line 833, in wrapper
    method(*args, **kwargs)
  File "/home/sestep/github/pytorch/pytorch/torch/testing/_internal/common_device_type.py", line 274, in instantiated_test
    result = test_fn(self, *args)
  File "test/test_tensor_creation_ops.py", line 1263, in test_randperm
    self.assertRaisesRegex(RuntimeError, regex, lambda: torch.randperm(n, device='cuda', generator=cpu_gen))
AssertionError: "Expected a * generator device but found *" does not match "Expected a 'cuda' generator device but found 'cpu'"

----------------------------------------------------------------------
Ran 2 tests in 31.307s

FAILED (failures=2)

janeyx99 · 2020-10-28T23:19:11Z

I rescind my approval 😛 these are the results from running the tests on my end, after commenting out the @slowTest on test_randperm:

$ python test/test_tensor_creation_ops.py -k test_randperm
/home/sestep/github/pytorch/pytorch/torch/random.py:95: UserWarning: CUDA reports that you have 8 available devices, and you have used fork_rng without explicitly specifying which devices are being used. For safety, we initialize *every* CUDA device by default, which can be quite slow if you have a lot of GPUs.  If you know that you are only making use of a few CUDA devices, set the environment variable CUDA_VISIBLE_DEVICES or the 'devices' keyword argument of fork_rng with the set of devices you are actually using.  For example, if you are using CPU only, set CUDA_VISIBLE_DEVICES= or devices=[]; if you are using GPU 0 only, set CUDA_VISIBLE_DEVICES=0 or devices=[0].  To initialize all devices and suppress this warning, set the 'devices' keyword argument to `range(torch.cuda.device_count())`.
  warnings.warn(
FF
======================================================================
FAIL: test_randperm_cpu (__main__.TestRandomTensorCreationCPU)
----------------------------------------------------------------------
RuntimeError: Expected a 'cuda' generator device but found 'cpu'

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/home/sestep/github/pytorch/pytorch/torch/testing/_internal/common_device_type.py", line 274, in instantiated_test
    result = test_fn(self, *args)
  File "test/test_tensor_creation_ops.py", line 1263, in test_randperm
    self.assertRaisesRegex(RuntimeError, regex, lambda: torch.randperm(n, device='cuda', generator=cpu_gen))
AssertionError: "Expected a * generator device but found *" does not match "Expected a 'cuda' generator device but found 'cpu'"

======================================================================
FAIL: test_randperm_cuda (__main__.TestRandomTensorCreationCUDA)
----------------------------------------------------------------------
RuntimeError: Expected a 'cuda' generator device but found 'cpu'

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/home/sestep/github/pytorch/pytorch/torch/testing/_internal/common_utils.py", line 833, in wrapper
    method(*args, **kwargs)
  File "/home/sestep/github/pytorch/pytorch/torch/testing/_internal/common_utils.py", line 833, in wrapper
    method(*args, **kwargs)
  File "/home/sestep/github/pytorch/pytorch/torch/testing/_internal/common_device_type.py", line 274, in instantiated_test
    result = test_fn(self, *args)
  File "test/test_tensor_creation_ops.py", line 1263, in test_randperm
    self.assertRaisesRegex(RuntimeError, regex, lambda: torch.randperm(n, device='cuda', generator=cpu_gen))
AssertionError: "Expected a * generator device but found *" does not match "Expected a 'cuda' generator device but found 'cpu'"

----------------------------------------------------------------------
Ran 2 tests in 31.307s

FAILED (failures=2)

LOL ya I just spotted my regex mistake/didn't realize the tests were skipped last time 😅

facebook-github-bot

@janeyx99 has imported this pull request. If you are a Facebook employee, you can view this diff on Phabricator.

test/test_tensor_creation_ops.py

samestep

approving again since the tests pass now :) it would be good to address Rong's comments though

dr-ci · 2020-10-29T18:26:04Z

💊 CI failures summary and remediations

As of commit 3339eac (more details on the Dr. CI page):

💚 💚 Looks good so far! There are no failures yet. 💚 💚

1 failure confirmed as flaky and can be ignored:

pytorch_python_doc_build

This comment was automatically generated by Dr. CI (expand for details).

Follow this link to opt-out of these comments for your Pull Requests.

Please report bugs/suggestions on the GitHub issue tracker or post in the (internal) Dr. CI Users group.

See how this bot performed.

This comment has been revised 27 times.

test/test_tensor_creation_ops.py

aten/src/ATen/native/TensorFactories.cpp

janeyx99 · 2020-10-30T17:26:47Z

Adding @mruberry and @gchanan for their input on the BC-breaking part

mruberry · 2020-11-02T13:22:10Z

Add a "BC-breaking note" at the top of the PR summary describing what's changing and why it's BC-breaking. This section is used when the release notes are compiled. The current PR summary can appear in a separate section below it:

BC-breaking Note:

explanation for release notes

PR Summary:

current PR Summary

The PR's name should be updated to clarify it's changing the behavior of the randperm operation. Follow-up question, what's with the documentation of torch.randperm? I don't even see a generator argument. torch.randperm also appears twice in search results when queries for. Does it have have 2 rst entries?

janeyx99 · 2020-11-02T15:32:52Z

@mruberry Just updated the summary of the PR with why I think it's considered BC-breaking, but I'm not 100% sure that's the reason. Can @ssnl confirm?

Also, the randperm docs seem to be hardcoded here:
https://github.com/pytorch/pytorch/blob/master/torch/_torch_docs.py#L6462-L6464

ssnl · 2020-11-02T15:37:20Z

Yeah, to me this is BC breaking mainly because CPU genenerators can't be used for CUDA generation when size is small anymore, as @janeyx99 has indicated in main description now.

ssnl · 2020-11-02T15:38:37Z

A separate thing: it might be good to consider moving these checks to randperm_out.

janeyx99 · 2020-11-02T15:41:14Z

A separate thing: it might be good to consider moving these checks to randperm_out.

Oh what would be the difference?

ssnl · 2020-11-02T15:42:54Z

torch.randperm(, out=tensor) directly calls to randperm_out, instead of going through randperm so you would be error-check both if you check in both CPU and GPU versions of randperm_out

mruberry · 2020-11-02T16:42:53Z

Also, the randperm docs seem to be hardcoded here:
https://github.com/pytorch/pytorch/blob/master/torch/_torch_docs.py#L6462-L6464

Right. Seems like we should update the docs (and remove redundant entries while we're at it).

facebook-github-bot

@janeyx99 has imported this pull request. If you are a Facebook employee, you can view this diff on Phabricator.

facebook-github-bot · 2020-11-04T17:11:07Z

@janeyx99 merged this pull request in e4bc785.

Summary: The `randperm` documentation is outdated and did not use to include the optional `generator` parameter. This PR just adds that along with the `pin_memory` parameter. This PR was brought up in [PR 47022](#47022), but is now rebased onto master. New docs look like: ![image](https://user-images.githubusercontent.com/31798555/97923963-e6084400-1d2c-11eb-9d46-573ba3189ad6.png) Pull Request resolved: #47231 Reviewed By: mruberry Differential Revision: D24711960 Pulled By: janeyx99 fbshipit-source-id: 3ff8be62ec33e34ef87d017ea97bb950621a3064

janeyx99 requested review from ezyang, ssnl and a team October 28, 2020 22:07

janeyx99 commented Oct 28, 2020

View reviewed changes

aten/src/ATen/core/Generator.h Show resolved Hide resolved

janeyx99 requested a review from malfet October 28, 2020 22:19

samestep approved these changes Oct 28, 2020

View reviewed changes

ssnl added the module: bc-breaking Related to a BC-breaking change label Oct 28, 2020

samestep suggested changes Oct 28, 2020

View reviewed changes

facebook-github-bot reviewed Oct 28, 2020

View reviewed changes

walterddr reviewed Oct 29, 2020

View reviewed changes

test/test_tensor_creation_ops.py Outdated Show resolved Hide resolved

test/test_tensor_creation_ops.py Show resolved Hide resolved

samestep approved these changes Oct 29, 2020

View reviewed changes

janeyx99 force-pushed the fix-randperm branch from ec5e54b to c2d1141 Compare October 29, 2020 20:08

walterddr reviewed Oct 29, 2020

View reviewed changes

test/test_tensor_creation_ops.py Outdated Show resolved Hide resolved

ezyang reviewed Oct 30, 2020

View reviewed changes

aten/src/ATen/native/TensorFactories.cpp Outdated Show resolved Hide resolved

janeyx99 requested review from mruberry and gchanan October 30, 2020 17:24

facebook-github-bot added the cla signed label Oct 30, 2020

janeyx99 changed the title ~~Add torch check to ensure generator device type = tensor device type~~ randperm: add torch check to ensure generator device = tensor device Nov 2, 2020

janeyx99 mentioned this pull request Nov 2, 2020

Including generator param in randperm documentation #47231

Closed

Add torch check to ensure generator device = tensor device in randperm

3339eac

janeyx99 force-pushed the fix-randperm branch from c52d1bd to 3339eac Compare November 2, 2020 22:52

facebook-github-bot reviewed Nov 3, 2020

View reviewed changes

facebook-github-bot closed this in e4bc785 Nov 4, 2020

facebook-github-bot added the Merged label Nov 4, 2020

BruceGeLi mentioned this pull request May 19, 2021

torch.randperm given GPU random generator raise error #58545

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

randperm: add torch check to ensure generator device = tensor device #47022

randperm: add torch check to ensure generator device = tensor device #47022

janeyx99 commented Oct 28, 2020 •

edited

samestep left a comment

janeyx99 commented Oct 28, 2020

samestep commented Oct 28, 2020

ssnl commented Oct 28, 2020

samestep left a comment

janeyx99 commented Oct 28, 2020 •

edited

facebook-github-bot left a comment

samestep left a comment •

edited

dr-ci bot commented Oct 29, 2020 •

edited

janeyx99 commented Oct 30, 2020

mruberry commented Nov 2, 2020

janeyx99 commented Nov 2, 2020

ssnl commented Nov 2, 2020

ssnl commented Nov 2, 2020

janeyx99 commented Nov 2, 2020

ssnl commented Nov 2, 2020

mruberry commented Nov 2, 2020

facebook-github-bot left a comment

facebook-github-bot commented Nov 4, 2020

randperm: add torch check to ensure generator device = tensor device #47022

randperm: add torch check to ensure generator device = tensor device #47022

Conversation

janeyx99 commented Oct 28, 2020 • edited

samestep left a comment

Choose a reason for hiding this comment

janeyx99 commented Oct 28, 2020

samestep commented Oct 28, 2020

ssnl commented Oct 28, 2020

samestep left a comment

Choose a reason for hiding this comment

janeyx99 commented Oct 28, 2020 • edited

facebook-github-bot left a comment

Choose a reason for hiding this comment

samestep left a comment • edited

Choose a reason for hiding this comment

dr-ci bot commented Oct 29, 2020 • edited

💊 CI failures summary and remediations

janeyx99 commented Oct 30, 2020

mruberry commented Nov 2, 2020

janeyx99 commented Nov 2, 2020

ssnl commented Nov 2, 2020

ssnl commented Nov 2, 2020

janeyx99 commented Nov 2, 2020

ssnl commented Nov 2, 2020

mruberry commented Nov 2, 2020

facebook-github-bot left a comment

Choose a reason for hiding this comment

facebook-github-bot commented Nov 4, 2020

janeyx99 commented Oct 28, 2020 •

edited

janeyx99 commented Oct 28, 2020 •

edited

samestep left a comment •

edited

dr-ci bot commented Oct 29, 2020 •

edited