Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Use scalar implementation to keep the precision in linspace of integral types #89048

Closed
wants to merge 1 commit into from

Conversation

yanbing-j
Copy link
Collaborator

@yanbing-j yanbing-j commented Nov 15, 2022

Fixes #88652

In the CPU implementation of linspace of integral types, base type in vectorized implementation is int64_t, which will drop the precision when base comes from a floating number. Meanwhile, its vectorized implementation tends to suffer from the catastrophic cancellation of floating point arithemtic since both the base (start + step * idx) and the step are not exact. Its scalar implementation is fine since start is always an integer and the result would be truncated to integer as well.

Therefore, in this PR , we will skip the vectorized implementation since the vec doesn't contribute to performance anyway. And now the behaviors between CPU and GPU are the same. In some cases, the results are the same as numpy's. In some other cases, the results are different from numpy's, but it is not related to the devices (CPU and GPU). #81996 (comment)

cc @VitalyFedyunin @jgong5 @mingfeima @XiaobingSuper @sanchitintel @ashokei @jingxu10

@pytorch-bot
Copy link

pytorch-bot bot commented Nov 15, 2022

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/89048

Note: Links to docs will display an error until the docs builds have been completed.

✅ No Failures

As of commit 159dfd1:
💚 Looks good so far! There are no failures yet. 💚

This comment was automatically generated by Dr. CI and updates every 15 minutes.

@github-actions github-actions bot added the module: cpu CPU specific problem (e.g., perf, algorithm) label Nov 15, 2022
@@ -74,7 +74,7 @@ class Vectorized<int64_t> : public Vectorizedi {
return _mm256_blendv_epi8(a.values, b.values, mask.values);
}
template <typename step_t>
static Vectorized<int64_t> arange(int64_t base = 0, step_t step = static_cast<step_t>(1)) {
static Vectorized<int64_t> arange(double base = 0, step_t step = static_cast<step_t>(1)) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Doesn't double have smaller range than full int64? (53 bits instead of 64 bits, right?) (especially for very large values of int64?)

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes. You're correct. We will not modify the base type here, since vectorized implementation has been removed.

Copy link
Collaborator

@mingfeima mingfeima left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

make sure update the test case to reflect the reported failure.

@@ -74,7 +74,7 @@ class Vectorized<int64_t> : public Vectorizedi {
return _mm256_blendv_epi8(a.values, b.values, mask.values);
}
template <typename step_t>
static Vectorized<int64_t> arange(int64_t base = 0, step_t step = static_cast<step_t>(1)) {
static Vectorized<int64_t> arange(double base = 0, step_t step = static_cast<step_t>(1)) {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

consider change int64_t of base to step_t.
changing it to double is not appropriate here.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does it also apply to other cases than int64_t?

@yanbing-j yanbing-j changed the title Use double as base type to keep the precision Use step_t as base type to keep the precision Nov 16, 2022
@yanbing-j
Copy link
Collaborator Author

@pytorchbot label intel

@pytorch-bot pytorch-bot bot added the intel This tag is for PR from Intel label Nov 17, 2022
Copy link
Collaborator

@jgong5 jgong5 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. Worth double checking if the scalar implementation also aligns with numpy.

@yanbing-j yanbing-j force-pushed the yanbing/fix_88652 branch 2 times, most recently from 1f616ca to 506738e Compare November 29, 2022 07:17
@yanbing-j yanbing-j changed the title Use step_t as base type to keep the precision Use scalar implementation to keep the precision in linspace of integral types Nov 29, 2022
@yanbing-j yanbing-j force-pushed the yanbing/fix_88652 branch 2 times, most recently from 7099f4c to 4cc4f21 Compare December 1, 2022 05:56
@yanbing-j yanbing-j marked this pull request as ready for review December 2, 2022 06:25
@janeyx99 janeyx99 added the triaged This issue has been looked at a team member, and triaged and prioritized into an appropriate module label Dec 2, 2022
@yanbing-j
Copy link
Collaborator Author

Hi @mingfeima , about the expected failure UTs in https://github.com/pytorch/pytorch/blob/master/torch/testing/_internal/common_methods_invocations.py#L17040-L17043, their reference is https://github.com/pytorch/pytorch/blob/master/torch/_refs/__init__.py#L4323-L4330. But the reference is not that correct.

For example, start = 0, end = -3, steps = 50, the last value in our updated kernel is -3, while in reference it is -2. Because the last value that comes from arange is 0.999999999999999888977, extend to -3, it becomes -2.999999, and then convert to integral types, it will be cast to -2.

Currently, I don't remove these expected failure UTs, since they are also related to #81996, UTs like torch.linspace(4.3, 0, 50) is not same as numpy's result.

@yanbing-j yanbing-j requested review from mingfeima and jgong5 and removed request for mingfeima and jgong5 December 5, 2022 08:14
@chunyuan-w chunyuan-w added the ciflow/trunk Trigger trunk jobs on your pull request label Dec 6, 2022
@yanbing-j
Copy link
Collaborator Author

Hi @mingfeima , could you please help review this PR?

@yanbing-j
Copy link
Collaborator Author

@pytorchbot merge

@pytorchmergebot
Copy link
Collaborator

Merge failed

Reason: Approval needed from one of the following (Rule 'superuser'):
jianingfu, muchulee8, bashnick, briancoutinho, djthorne, ...

Details for Dev Infra team Raised by workflow job

Fix CI failures

Fix CI failure, the range of TensorIterator is missing
Copy link
Collaborator

@albanD albanD left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

SGTM

@albanD
Copy link
Collaborator

albanD commented Dec 19, 2022

@pytorchbot merge

@pytorchmergebot
Copy link
Collaborator

Merge started

Your change will be merged once all checks pass (ETA 0-4 Hours).

Learn more about merging in the wiki.

Questions? Feedback? Please reach out to the PyTorch DevX Team

Advanced Debugging
Check the merge workflow status
here

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
ciflow/trunk Trigger trunk jobs on your pull request intel This tag is for PR from Intel Merged module: cpu CPU specific problem (e.g., perf, algorithm) open source triaged This issue has been looked at a team member, and triaged and prioritized into an appropriate module
Projects
Status: Done
Development

Successfully merging this pull request may close these issues.

linspace behave differently than numpy linspace
9 participants