Improve complex lerp performance #84844

peterbell10 · 2022-09-11T22:09:41Z

Stack from ghstack (oldest at bottom):

-> Improve complex lerp performance #84844

The complex lerp kernel uses std::abs(z) < 0.5 which involves
computing a sqrt. Instead compare the square against 0.25 has much
lower latency and so performs much better overall.

In a simple timeit benchmark I see more than 10x speedup on CPU for a 4096
element complex lerp, from 84 us to 6.7 us.

The complex lerp kernel uses `std::abs(z) < 0.5` which involves computing a sqrt. Instead compare the square against 0.25 has much lower latency and so performs much better overall. In a simple timeit benchmark I see more than 10x speedup on CPU for a 4096 element complex lerp, from 84 us to 6.7 us. [ghstack-poisoned]

pytorch-bot · 2022-09-11T22:09:43Z

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/84844

📄 Preview Python docs built from this PR
📄 Preview C++ docs built from this PR
❓ Need help or want to give feedback on the CI? Visit our office hours

Note: Links to docs will display an error until the docs builds have been completed.

✅ No Failures

As of commit 92b3719:
💚 Looks good so far! There are no failures yet. 💚

This comment was automatically generated by Dr. CI and updates every 15 minutes.

The complex lerp kernel uses `std::abs(z) < 0.5` which involves computing a sqrt. Instead compare the square against 0.25 has much lower latency and so performs much better overall. In a simple timeit benchmark I see more than 10x speedup on CPU for a 4096 element complex lerp, from 84 us to 6.7 us. ghstack-source-id: d402414a6a71a8e522a25f2819f83248e19b0b05 Pull Request resolved: pytorch#84844

The complex lerp kernel uses `std::abs(z) < 0.5` which involves computing a sqrt. Instead compare the square against 0.25 has much lower latency and so performs much better overall. In a simple timeit benchmark I see more than 10x speedup on CPU for a 4096 element complex lerp, from 84 us to 6.7 us. [ghstack-poisoned]

The complex lerp kernel uses `std::abs(z) < 0.5` which involves computing a sqrt. Instead compare the square against 0.25 has much lower latency and so performs much better overall. In a simple timeit benchmark I see more than 10x speedup on CPU for a 4096 element complex lerp, from 84 us to 6.7 us. ghstack-source-id: 3fd059b6e41f541a6a48b26d2c87e67c01fe236d Pull Request resolved: pytorch#84844

facebook-github-bot · 2022-10-04T00:24:08Z

/easycla

As part of the transition to the PyTorch Foundation, this project now requires contributions be covered under the new CLA. See #85559 for additional details.

This comment will trigger a new check of this PR. If you are already covered, you will simply see a new "EasyCLA" check that passes. If you are not covered, a bot will leave a new comment with a link to sign.

linux-foundation-easycla · 2022-10-04T00:24:12Z

The committers listed above are authorized under a signed CLA.

✅ login: peterbell10 (e300161, 8ee5992, 25fe4f7, 60160f0, 8995a1c)

The complex lerp kernel uses `std::abs(z) < 0.5` which involves computing a sqrt. Instead compare the square against 0.25 has much lower latency and so performs much better overall. In a simple timeit benchmark I see more than 10x speedup on CPU for a 4096 element complex lerp, from 84 us to 6.7 us. [ghstack-poisoned]

peterbell10 · 2022-10-13T14:39:12Z

@ngimel ping

ngimel · 2022-10-13T17:40:13Z

@pytorchbot rebase

pytorchmergebot · 2022-10-13T17:42:02Z

@pytorchbot successfully started a rebase job. Check the current status here

The complex lerp kernel uses `std::abs(z) < 0.5` which involves computing a sqrt. Instead compare the square against 0.25 has much lower latency and so performs much better overall. In a simple timeit benchmark I see more than 10x speedup on CPU for a 4096 element complex lerp, from 84 us to 6.7 us. [ghstack-poisoned]

pytorchmergebot · 2022-10-13T17:42:19Z

Successfully rebased gh/peterbell10/420/orig onto refs/remotes/origin/viable/strict, please pull locally before adding more changes (for example, via ghstack checkout https://github.com/pytorch/pytorch/pull/84844)

The complex lerp kernel uses `std::abs(z) < 0.5` which involves computing a sqrt. Instead compare the square against 0.25 has much lower latency and so performs much better overall. In a simple timeit benchmark I see more than 10x speedup on CPU for a 4096 element complex lerp, from 84 us to 6.7 us. ghstack-source-id: 3baffe91f2c44d0e29df0d39459a2e4ac457c7cc Pull Request resolved: #84844

ngimel · 2022-10-13T20:39:06Z

@pytorchbot merge -g

pytorchmergebot · 2022-10-13T20:40:50Z

Merge started

Your change will be merged once all checks on your PR pass since you used the green (-g) flag (ETA: 0-4 Hours).

Learn more about merging in the wiki.

Questions? Feedback? Please reach out to the PyTorch DevX Team

Advanced Debugging

Check the merge workflow status
here

github-actions · 2022-10-13T21:57:19Z

Hey @peterbell10.
You've committed this PR, but it does not have both a 'release notes: ...' and 'topics: ...' label. Please add one of each to the PR. The 'release notes: ...' label should represent the part of PyTorch that this PR changes (fx, autograd, distributed, etc) and the 'topics: ...' label should represent the kind of PR it is (not user facing, new feature, bug fix, perf improvement, etc). The list of valid labels can be found here for the 'release notes: ...' and here for the 'topics: ...'.
For changes that are 'topic: not user facing' there is no need for a release notes label.

facebook-github-bot added the cla signed label Sep 11, 2022

This was referenced Sep 11, 2022

Vectorize tensor lerp kernel #84845

Closed

Add specialized lerp kernel to accelerate make_tensor #84846

Closed

pytorchbot added the open source label Sep 11, 2022

peterbell10 added 3 commits September 20, 2022 18:53

peterbell10 mentioned this pull request Sep 22, 2022

Improve make_tensor performance for float and complex types #85473

Closed

peterbell10 marked this pull request as ready for review September 23, 2022 16:01

peterbell10 requested a review from ngimel September 23, 2022 16:01

peterbell10 added 2 commits October 6, 2022 15:44

ngimel approved these changes Oct 13, 2022

View reviewed changes

pytorch-bot bot added the ciflow/trunk Trigger trunk jobs on your pull request label Oct 13, 2022

pytorchmergebot added the Merged label Oct 13, 2022

pytorchmergebot closed this in 66979fb Oct 13, 2022

peterbell10 added release notes: complex release notes category topic: performance topic category labels Oct 14, 2022

ZailiWang mentioned this pull request Mar 15, 2023

torch.lerp: discrepancy between CUDA and CPU (with extremal inputs) #78484

Open

facebook-github-bot deleted the gh/peterbell10/420/head branch June 8, 2023 18:20

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Improve complex lerp performance #84844

Improve complex lerp performance #84844

peterbell10 commented Sep 11, 2022 •

edited by pytorchmergebot

pytorch-bot bot commented Sep 11, 2022 •

edited

facebook-github-bot commented Oct 4, 2022

linux-foundation-easycla bot commented Oct 4, 2022 •

edited

peterbell10 commented Oct 13, 2022

ngimel commented Oct 13, 2022

pytorchmergebot commented Oct 13, 2022

pytorchmergebot commented Oct 13, 2022

ngimel commented Oct 13, 2022

pytorchmergebot commented Oct 13, 2022

github-actions bot commented Oct 13, 2022

Improve complex lerp performance #84844

Improve complex lerp performance #84844

Conversation

peterbell10 commented Sep 11, 2022 • edited by pytorchmergebot

pytorch-bot bot commented Sep 11, 2022 • edited

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/84844

✅ No Failures

facebook-github-bot commented Oct 4, 2022

linux-foundation-easycla bot commented Oct 4, 2022 • edited

peterbell10 commented Oct 13, 2022

ngimel commented Oct 13, 2022

pytorchmergebot commented Oct 13, 2022

pytorchmergebot commented Oct 13, 2022

ngimel commented Oct 13, 2022

pytorchmergebot commented Oct 13, 2022

Merge started

github-actions bot commented Oct 13, 2022

peterbell10 commented Sep 11, 2022 •

edited by pytorchmergebot

pytorch-bot bot commented Sep 11, 2022 •

edited

linux-foundation-easycla bot commented Oct 4, 2022 •

edited