-
Notifications
You must be signed in to change notification settings - Fork 21.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Use scalar implementation to keep the precision in linspace of integral types #89048
Conversation
🔗 Helpful Links🧪 See artifacts and rendered test results at hud.pytorch.org/pr/89048
Note: Links to docs will display an error until the docs builds have been completed. ✅ No FailuresAs of commit 159dfd1: This comment was automatically generated by Dr. CI and updates every 15 minutes. |
@@ -74,7 +74,7 @@ class Vectorized<int64_t> : public Vectorizedi { | |||
return _mm256_blendv_epi8(a.values, b.values, mask.values); | |||
} | |||
template <typename step_t> | |||
static Vectorized<int64_t> arange(int64_t base = 0, step_t step = static_cast<step_t>(1)) { | |||
static Vectorized<int64_t> arange(double base = 0, step_t step = static_cast<step_t>(1)) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Doesn't double have smaller range than full int64? (53 bits instead of 64 bits, right?) (especially for very large values of int64?)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes. You're correct. We will not modify the base type here, since vectorized implementation has been removed.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
make sure update the test case to reflect the reported failure.
@@ -74,7 +74,7 @@ class Vectorized<int64_t> : public Vectorizedi { | |||
return _mm256_blendv_epi8(a.values, b.values, mask.values); | |||
} | |||
template <typename step_t> | |||
static Vectorized<int64_t> arange(int64_t base = 0, step_t step = static_cast<step_t>(1)) { | |||
static Vectorized<int64_t> arange(double base = 0, step_t step = static_cast<step_t>(1)) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
consider change int64_t
of base to step_t
.
changing it to double
is not appropriate here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does it also apply to other cases than int64_t?
6ab5c0c
to
c56f478
Compare
@pytorchbot label intel |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM. Worth double checking if the scalar implementation also aligns with numpy.
1f616ca
to
506738e
Compare
7099f4c
to
4cc4f21
Compare
Hi @mingfeima , about the expected failure UTs in https://github.com/pytorch/pytorch/blob/master/torch/testing/_internal/common_methods_invocations.py#L17040-L17043, their reference is https://github.com/pytorch/pytorch/blob/master/torch/_refs/__init__.py#L4323-L4330. But the reference is not that correct. For example, start = 0, end = -3, steps = 50, the last value in our updated kernel is -3, while in reference it is -2. Because the last value that comes from arange is 0.999999999999999888977, extend to -3, it becomes -2.999999, and then convert to integral types, it will be cast to -2. Currently, I don't remove these expected failure UTs, since they are also related to #81996, UTs like |
4cc4f21
to
dc62afc
Compare
dc62afc
to
8b63339
Compare
Hi @mingfeima , could you please help review this PR? |
8b63339
to
4550aa9
Compare
@pytorchbot merge |
Merge failedReason: Approval needed from one of the following (Rule 'superuser'): Details for Dev Infra teamRaised by workflow job |
4550aa9
to
35da604
Compare
Fix CI failures Fix CI failure, the range of TensorIterator is missing
35da604
to
159dfd1
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
SGTM
@pytorchbot merge |
Merge startedYour change will be merged once all checks pass (ETA 0-4 Hours). Learn more about merging in the wiki. Questions? Feedback? Please reach out to the PyTorch DevX Team |
Fixes #88652
In the CPU implementation of linspace of integral types,
base
type in vectorized implementation isint64_t
, which will drop the precision whenbase
comes from a floating number. Meanwhile, its vectorized implementation tends to suffer from the catastrophic cancellation of floating point arithemtic since both thebase (start + step * idx)
and thestep
are not exact. Its scalar implementation is fine since start is always an integer and the result would be truncated to integer as well.Therefore, in this PR , we will skip the vectorized implementation since the vec doesn't contribute to performance anyway. And now the behaviors between CPU and GPU are the same. In some cases, the results are the same as numpy's. In some other cases, the results are different from numpy's, but it is not related to the devices (CPU and GPU). #81996 (comment)
cc @VitalyFedyunin @jgong5 @mingfeima @XiaobingSuper @sanchitintel @ashokei @jingxu10