Use scalar implementation to keep the precision in linspace of integral types #89048

yanbing-j · 2022-11-15T08:29:02Z

In the CPU implementation of linspace of integral types, base type in vectorized implementation is int64_t, which will drop the precision when base comes from a floating number. Meanwhile, its vectorized implementation tends to suffer from the catastrophic cancellation of floating point arithemtic since both the base (start + step * idx) and the step are not exact. Its scalar implementation is fine since start is always an integer and the result would be truncated to integer as well.

Therefore, in this PR , we will skip the vectorized implementation since the vec doesn't contribute to performance anyway. And now the behaviors between CPU and GPU are the same. In some cases, the results are the same as numpy's. In some other cases, the results are different from numpy's, but it is not related to the devices (CPU and GPU). #81996 (comment)

cc @VitalyFedyunin @jgong5 @mingfeima @XiaobingSuper @sanchitintel @ashokei @jingxu10

pytorch-bot · 2022-11-15T08:29:06Z

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/89048

📄 Preview Python docs built from this PR
📄 Preview C++ docs built from this PR
❓ Need help or want to give feedback on the CI? Visit the bot commands wiki or our office hours

Note: Links to docs will display an error until the docs builds have been completed.

✅ No Failures

As of commit 159dfd1:
💚 Looks good so far! There are no failures yet. 💚

This comment was automatically generated by Dr. CI and updates every 15 minutes.

vadimkantorov · 2022-11-15T16:36:51Z

aten/src/ATen/cpu/vec/vec256/vec256_int.h

@@ -74,7 +74,7 @@ class Vectorized<int64_t> : public Vectorizedi {
    return _mm256_blendv_epi8(a.values, b.values, mask.values);
  }
  template <typename step_t>
-  static Vectorized<int64_t> arange(int64_t base = 0, step_t step = static_cast<step_t>(1)) {
+  static Vectorized<int64_t> arange(double base = 0, step_t step = static_cast<step_t>(1)) {


Doesn't double have smaller range than full int64? (53 bits instead of 64 bits, right?) (especially for very large values of int64?)

Yes. You're correct. We will not modify the base type here, since vectorized implementation has been removed.

mingfeima

make sure update the test case to reflect the reported failure.

mingfeima · 2022-11-16T02:02:26Z

aten/src/ATen/cpu/vec/vec256/vec256_int.h

@@ -74,7 +74,7 @@ class Vectorized<int64_t> : public Vectorizedi {
    return _mm256_blendv_epi8(a.values, b.values, mask.values);
  }
  template <typename step_t>
-  static Vectorized<int64_t> arange(int64_t base = 0, step_t step = static_cast<step_t>(1)) {
+  static Vectorized<int64_t> arange(double base = 0, step_t step = static_cast<step_t>(1)) {


consider change int64_t of base to step_t.
changing it to double is not appropriate here.

Does it also apply to other cases than int64_t?

yanbing-j · 2022-11-17T05:51:24Z

@pytorchbot label intel

jgong5

LGTM. Worth double checking if the scalar implementation also aligns with numpy.

yanbing-j · 2022-12-05T02:54:18Z

Hi @mingfeima , about the expected failure UTs in https://github.com/pytorch/pytorch/blob/master/torch/testing/_internal/common_methods_invocations.py#L17040-L17043, their reference is https://github.com/pytorch/pytorch/blob/master/torch/_refs/__init__.py#L4323-L4330. But the reference is not that correct.

For example, start = 0, end = -3, steps = 50, the last value in our updated kernel is -3, while in reference it is -2. Because the last value that comes from arange is 0.999999999999999888977, extend to -3, it becomes -2.999999, and then convert to integral types, it will be cast to -2.

Currently, I don't remove these expected failure UTs, since they are also related to #81996, UTs like torch.linspace(4.3, 0, 50) is not same as numpy's result.

yanbing-j · 2022-12-08T01:25:30Z

Hi @mingfeima , could you please help review this PR?

yanbing-j · 2022-12-15T06:13:38Z

@pytorchbot merge

pytorchmergebot · 2022-12-15T06:15:12Z

Merge failed

Reason: Approval needed from one of the following (Rule 'superuser'):
jianingfu, muchulee8, bashnick, briancoutinho, djthorne, ...

Details for Dev Infra team

Raised by workflow job

Fix CI failures Fix CI failure, the range of TensorIterator is missing

albanD

SGTM

albanD · 2022-12-19T13:04:11Z

@pytorchbot merge

pytorchmergebot · 2022-12-19T13:05:51Z

Merge started

Your change will be merged once all checks pass (ETA 0-4 Hours).

Learn more about merging in the wiki.

Questions? Feedback? Please reach out to the PyTorch DevX Team

Advanced Debugging

Check the merge workflow status
here

github-actions bot added the module: cpu CPU specific problem (e.g., perf, algorithm) label Nov 15, 2022

pytorchbot added the open source label Nov 15, 2022

vadimkantorov reviewed Nov 15, 2022

View reviewed changes

mingfeima requested changes Nov 16, 2022

View reviewed changes

yanbing-j changed the title ~~Use double as base type to keep the precision~~ Use step_t as base type to keep the precision Nov 16, 2022

yanbing-j force-pushed the yanbing/fix_88652 branch from 6ab5c0c to c56f478 Compare November 16, 2022 06:16

pytorch-bot bot added the intel This tag is for PR from Intel label Nov 17, 2022

jgong5 approved these changes Nov 17, 2022

View reviewed changes

yanbing-j force-pushed the yanbing/fix_88652 branch 2 times, most recently from 1f616ca to 506738e Compare November 29, 2022 07:17

yanbing-j changed the title ~~Use step_t as base type to keep the precision~~ Use scalar implementation to keep the precision in linspace of integral types Nov 29, 2022

yanbing-j force-pushed the yanbing/fix_88652 branch 2 times, most recently from 7099f4c to 4cc4f21 Compare December 1, 2022 05:56

yanbing-j marked this pull request as ready for review December 2, 2022 06:25

janeyx99 added the triaged This issue has been looked at a team member, and triaged and prioritized into an appropriate module label Dec 2, 2022

yanbing-j requested review from mingfeima and jgong5 and removed request for mingfeima and jgong5 December 5, 2022 08:14

yanbing-j force-pushed the yanbing/fix_88652 branch from 4cc4f21 to dc62afc Compare December 6, 2022 05:03

chunyuan-w added the ciflow/trunk Trigger trunk jobs on your pull request label Dec 6, 2022

yanbing-j force-pushed the yanbing/fix_88652 branch from dc62afc to 8b63339 Compare December 7, 2022 07:33

yanbing-j force-pushed the yanbing/fix_88652 branch from 8b63339 to 4550aa9 Compare December 14, 2022 05:21

mingfeima approved these changes Dec 15, 2022

View reviewed changes

yanbing-j requested review from albanD and malfet December 15, 2022 06:18

yanbing-j force-pushed the yanbing/fix_88652 branch from 4550aa9 to 35da604 Compare December 16, 2022 02:41

Change int64_t of base to step_t

159dfd1

Fix CI failures Fix CI failure, the range of TensorIterator is missing

yanbing-j force-pushed the yanbing/fix_88652 branch from 35da604 to 159dfd1 Compare December 19, 2022 02:34

albanD approved these changes Dec 19, 2022

View reviewed changes

pytorchmergebot added the Merged label Dec 19, 2022

pytorchmergebot closed this in 731f417 Dec 19, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Use scalar implementation to keep the precision in linspace of integral types #89048

Use scalar implementation to keep the precision in linspace of integral types #89048

yanbing-j commented Nov 15, 2022 •

edited

pytorch-bot bot commented Nov 15, 2022 •

edited

vadimkantorov Nov 15, 2022

yanbing-j Nov 29, 2022

mingfeima left a comment

mingfeima Nov 16, 2022

vadimkantorov Nov 16, 2022

yanbing-j commented Nov 17, 2022

jgong5 left a comment

yanbing-j commented Dec 5, 2022

yanbing-j commented Dec 8, 2022

yanbing-j commented Dec 15, 2022

pytorchmergebot commented Dec 15, 2022

albanD left a comment

albanD commented Dec 19, 2022

pytorchmergebot commented Dec 19, 2022

Use scalar implementation to keep the precision in linspace of integral types #89048

Use scalar implementation to keep the precision in linspace of integral types #89048

Conversation

yanbing-j commented Nov 15, 2022 • edited

pytorch-bot bot commented Nov 15, 2022 • edited

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/89048

✅ No Failures

vadimkantorov Nov 15, 2022

Choose a reason for hiding this comment

yanbing-j Nov 29, 2022

Choose a reason for hiding this comment

mingfeima left a comment

Choose a reason for hiding this comment

mingfeima Nov 16, 2022

Choose a reason for hiding this comment

vadimkantorov Nov 16, 2022

Choose a reason for hiding this comment

yanbing-j commented Nov 17, 2022

jgong5 left a comment

Choose a reason for hiding this comment

yanbing-j commented Dec 5, 2022

yanbing-j commented Dec 8, 2022

yanbing-j commented Dec 15, 2022

pytorchmergebot commented Dec 15, 2022

Merge failed

albanD left a comment

Choose a reason for hiding this comment

albanD commented Dec 19, 2022

pytorchmergebot commented Dec 19, 2022

Merge started

yanbing-j commented Nov 15, 2022 •

edited

pytorch-bot bot commented Nov 15, 2022 •

edited