Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Clarify, make consistent, and test the behavior of logspace when dtype is integral #47647

Closed
wants to merge 4 commits into from

Conversation

xuhdev
Copy link
Collaborator

@xuhdev xuhdev commented Nov 10, 2020

torch.logspace doesn't seem to have explained how integers are handled.
Add some clarification and some test when dtype is integral.

The CUDA implementation is also updated to be consistent with CPU implementation.

@facebook-github-bot
Copy link
Contributor

Hi @xuhdev!

Thank you for your pull request. We require contributors to sign our Contributor License Agreement, and yours needs attention.

You currently have a record in our system, but we do not have a signature on file.

In order for us to review and merge your code, please sign at https://code.facebook.com/cla. If you are contributing on behalf of someone else (eg your employer), the individual CLA may not be sufficient and your employer may need to sign the corporate CLA.

If you have received this in error or have any questions, please contact us at cla@fb.com. Thanks!

@dr-ci
Copy link

dr-ci bot commented Nov 10, 2020

💊 CI failures summary and remediations

As of commit 878a465 (more details on the Dr. CI page):


  • 3/3 failures possibly* introduced in this PR
    • 2/3 non-CircleCI failure(s)

🕵️ 1 new failure recognized by patterns

The following CI failures do not appear to be due to upstream breakages:

See CircleCI build pytorch_linux_xenial_py3_clang5_asan_test1 (1/1)

Step: "Run tests" (full log | diagnosis details | 🔁 rerun)

Jan 08 04:21:11 RuntimeError: test_tensor_creation_ops failed!
Jan 08 04:21:11     #123 0x55d03d94f43d in main /tmp/build/80754af9/python_1599604603603/work/Programs/python.c:69
Jan 08 04:21:11     #124 0x7f00bf0bb83f in __libc_start_main /build/glibc-e6zv40/glibc-2.23/csu/../csu/libc-start.c:291
Jan 08 04:21:11     #125 0x55d03da2ed0a in _start /home/rdonnelly/mc/conda-bld/compilers_linux-64_1534865402226/work/.build/src/glibc-2.12.2/csu/../sysdeps/x86_64/elf/start.S:103
Jan 08 04:21:11 
Jan 08 04:21:11 SUMMARY: UndefinedBehaviorSanitizer: undefined-behavior /var/lib/jenkins/workspace/aten/src/ATen/native/RangeFactories.cpp:86:5 in 
Jan 08 04:21:11 Traceback (most recent call last):
Jan 08 04:21:11   File "test/run_test.py", line 910, in <module>
Jan 08 04:21:11     main()
Jan 08 04:21:11   File "test/run_test.py", line 889, in main
Jan 08 04:21:11     raise RuntimeError(err_message)
Jan 08 04:21:11 RuntimeError: test_tensor_creation_ops failed!
Jan 08 04:21:12 + cleanup
Jan 08 04:21:12 + retcode=1
Jan 08 04:21:12 + set +x
Jan 08 04:21:12 =================== sccache compilation log ===================
Jan 08 04:21:12 =========== If your build fails, please take a look at the log above for possible reasons ===========
Jan 08 04:21:12 Compile requests                      0
Jan 08 04:21:12 Compile requests executed             0
Jan 08 04:21:12 Cache hits                            0
Jan 08 04:21:12 Cache misses                          0
Jan 08 04:21:12 Cache timeouts                        0

ci.pytorch.org: 1 failed


This comment was automatically generated by Dr. CI (expand for details).Follow this link to opt-out of these comments for your Pull Requests.

Please report bugs/suggestions to the (internal) Dr. CI Users group.

This comment has been revised 75 times.

@xuhdev xuhdev marked this pull request as draft November 10, 2020 08:26
@xuhdev xuhdev force-pushed the logspace-int branch 4 times, most recently from 8343c4d to cf01217 Compare January 5, 2021 01:39
@xuhdev xuhdev closed this Jan 5, 2021
@xuhdev xuhdev reopened this Jan 5, 2021
@xuhdev xuhdev force-pushed the logspace-int branch 3 times, most recently from 326b298 to 375a47b Compare January 5, 2021 07:49
@xuhdev xuhdev marked this pull request as ready for review January 5, 2021 10:33
@xuhdev xuhdev requested a review from walterddr January 5, 2021 10:33
@ezyang ezyang added the triaged This issue has been looked at a team member, and triaged and prioritized into an appropriate module label Jan 5, 2021
@@ -133,10 +133,10 @@ Tensor& logspace_cuda_out(Tensor& result, Scalar start, Scalar end, c10::optiona
r.fill_(std::pow(base, start.to<double>()));
} else if (isIntegralType(r.scalar_type(), 0)) {
AT_DISPATCH_INTEGRAL_TYPES(r.scalar_type(), "logspace_cuda", [&]() {
float scalar_base = static_cast<float>(base); // Use float to avoid promotion to double
double scalar_base = static_cast<double>(base);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

could you clarify your change here please?
based on the test_tensor_creation_ops.py change, you are comparing within the device type instead of crossing cpu/gpu. maybe adding a test doing the cpu/gpu cross comparison can help us review this change.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have added the test. This is to make the implementation consistent with CPU, because the CPU implementation uses double instead of float here.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

does your added CPU/GPU comparison test break when you use the original float here?

i guess my point was, cuda impl used float for a reason, and according to the comment on the CPU side, it was "autopromoted". e.g. we might not need that extra precision

Copy link
Collaborator Author

@xuhdev xuhdev Jan 5, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@walterddr Yes, it broke when I kept them as float. Actually it broke the tests on CUDA alone. Likely I guess this is about log/exp, and low precision can easily cause trouble.

i guess my point was, cuda impl used float for a reason, and according to the comment on the CPU side, it was "autopromoted". e.g. we might not need that extra precision

My speculation is that the CUDA implementation used float for perhaps performance reasons? cc @colesbury

Also, I'm having a hard time understanding the current asan test failure, which doesn't show what is broken (it still failed for the same reason when I did not add anything to the CUDA implementation).

@xuhdev xuhdev changed the title Clarify and test the behavior of logspace when dtype is integral Clarify, make consistent, and test the behavior of logspace when dtype is integral Jan 5, 2021
@xuhdev xuhdev requested a review from walterddr January 5, 2021 18:18
Copy link
Contributor

@walterddr walterddr left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

looks good. I got one more comment and it should be good to go

test/test_tensor_creation_ops.py Outdated Show resolved Hide resolved
@@ -2782,6 +2782,20 @@ def test_logspace(self, device, dtype):
y = torch.logspace(0, 3, 4, base=2, device=device, dtype=dtype, out=x.narrow(1, 1, 2))
self.assertEqual(x, torch.tensor(((0, 1, 2), (0, 4, 8)), device=device, dtype=dtype), atol=0, rtol=0)

# Integer test
if torch.testing.is_integral(dtype):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

you can just create a separate test

@skipCPUIf(True, "compares with CPU")
@dtypesIfCUDA(torch.uint8, torch.int8, torch.int16, torch.int32, torch.int64)
def test_logspace_integral(self, device, dtype):
  ...

see how other special typed logspace test were doing before this one.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I added this. I chose to not skip all CPU tests because there are some parts not concerning CPUs.

torch.logspace doesn't seem to have explained how integers are handled.
Add some clarification and some test when dtype is integral.
Copy link
Contributor

@facebook-github-bot facebook-github-bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@walterddr has imported this pull request. If you are a Facebook employee, you can view this diff on Phabricator.

Copy link
Contributor

@walterddr walterddr left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thanks for the change. lgtm now. I'll add @colesbury to see if there's any concern on the float->double type change from the cuda side.

@facebook-github-bot
Copy link
Contributor

@walterddr merged this pull request in 0ae0fac.

@xuhdev xuhdev deleted the logspace-int branch January 15, 2021 20:39
@facebook-github-bot
Copy link
Contributor

This pull request has been reverted by 3df5f9c.

@xuhdev
Copy link
Collaborator Author

xuhdev commented Jan 16, 2021

@walterddr Is there a reason for the reversion?

@walterddr
Copy link
Contributor

the change actually broke ASAN instead of being flaky.

@walterddr
Copy link
Contributor

also I should've checked this more carefully before, but seems like torch.logspace doesn't produce the same expected results as numpy.logspace. --> its a matter of whether we should convert the start/stop into integer before or after the fact.

I will look into it and iterate on top of this PR

xuhdev added a commit to xuhdev/pytorch that referenced this pull request Jan 16, 2021
…e is integral

torch.logspace doesn't seem to have explained how integers are handled.
Add some clarification and some test when dtype is integral.

The CUDA implementation is also updated to be consistent with CPU
implementation.

Following up  pytorch#47647
@xuhdev
Copy link
Collaborator Author

xuhdev commented Jan 16, 2021

@walterddr Thanks, please keep me posted.

@xuhdev
Copy link
Collaborator Author

xuhdev commented Apr 12, 2021

@walterddr Do we have any plan to further resolve this issue?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
cla signed Merged open source Reverted triaged This issue has been looked at a team member, and triaged and prioritized into an appropriate module
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants