logaddexp2: Use log1p and exp2 #92116

peterbell10 · 2023-01-13T00:06:43Z

Stack from ghstack (oldest at bottom):

This replaces log2(1 + x) with log1p(x) * (1 / log(2)) which improves
precision when x is small by avoiding the truncation from calculating
(1 + x) - 1. Noting that x is always <= 1 in this formula.

This also replaces pow(2, x) with exp2(x) which improves performance,
particularly on CPU where the constant value cannot be inlined into Sleef.
With numel=1e7 for example, I see a 1.35x speedup on CPU.

cc @jgong5 @mingfeima @XiaobingSuper @sanchitintel @ashokei @jingxu10

This replaces `log2(1 + x)` with `log1p(x) * (1 / log(2))` which improves precision when `x` is small by avoiding the truncation from calculating `(1 + x) - 1`. Noting that `x` is always `<= 1` in this formula. This also replaces `pow(2, x)` with `exp2(x)` which improves performance, particularly on CPU where the constant value cannot be inlined into Sleef. With numel=1e7 for example, I see a 1.35x speedup on CPU. [ghstack-poisoned]

pytorch-bot · 2023-01-13T00:06:45Z

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/92116

📄 Preview Python docs built from this PR
📄 Preview C++ docs built from this PR
❓ Need help or want to give feedback on the CI? Visit the bot commands wiki or our office hours

Note: Links to docs will display an error until the docs builds have been completed.

✅ No Failures

As of commit 96f6185:
💚 Looks good so far! There are no failures yet. 💚

This comment was automatically generated by Dr. CI and updates every 15 minutes.

This replaces `log2(1 + x)` with `log1p(x) * (1 / log(2))` which improves precision when `x` is small by avoiding the truncation from calculating `(1 + x) - 1`. Noting that `x` is always `<= 1` in this formula. This also replaces `pow(2, x)` with `exp2(x)` which improves performance, particularly on CPU where the constant value cannot be inlined into Sleef. With numel=1e7 for example, I see a 1.35x speedup on CPU. [ghstack-poisoned]

This replaces `log2(1 + x)` with `log1p(x) * (1 / log(2))` which improves precision when `x` is small by avoiding the truncation from calculating `(1 + x) - 1`. Noting that `x` is always `<= 1` in this formula. This also replaces `pow(2, x)` with `exp2(x)` which improves performance, particularly on CPU where the constant value cannot be inlined into Sleef. With numel=1e7 for example, I see a 1.35x speedup on CPU. ghstack-source-id: a3e1fa4 Pull Request resolved: #92116

This replaces `log2(1 + x)` with `log1p(x) * (1 / log(2))` which improves precision when `x` is small by avoiding the truncation from calculating `(1 + x) - 1`. Noting that `x` is always `<= 1` in this formula. This also replaces `pow(2, x)` with `exp2(x)` which improves performance, particularly on CPU where the constant value cannot be inlined into Sleef. With numel=1e7 for example, I see a 1.35x speedup on CPU. cc jgong5 mingfeima XiaobingSuper sanchitintel ashokei jingxu10 [ghstack-poisoned]

This replaces `log2(1 + x)` with `log1p(x) * (1 / log(2))` which improves precision when `x` is small by avoiding the truncation from calculating `(1 + x) - 1`. Noting that `x` is always `<= 1` in this formula. This also replaces `pow(2, x)` with `exp2(x)` which improves performance, particularly on CPU where the constant value cannot be inlined into Sleef. With numel=1e7 for example, I see a 1.35x speedup on CPU. ghstack-source-id: 2d3a2a7 Pull Request resolved: #92116

This replaces `log2(1 + x)` with `log1p(x) * (1 / log(2))` which improves precision when `x` is small by avoiding the truncation from calculating `(1 + x) - 1`. Noting that `x` is always `<= 1` in this formula. This also replaces `pow(2, x)` with `exp2(x)` which improves performance, particularly on CPU where the constant value cannot be inlined into Sleef. With numel=1e7 for example, I see a 1.35x speedup on CPU. cc jgong5 mingfeima XiaobingSuper sanchitintel ashokei jingxu10 [ghstack-poisoned]

This replaces `log2(1 + x)` with `log1p(x) * (1 / log(2))` which improves precision when `x` is small by avoiding the truncation from calculating `(1 + x) - 1`. Noting that `x` is always `<= 1` in this formula. This also replaces `pow(2, x)` with `exp2(x)` which improves performance, particularly on CPU where the constant value cannot be inlined into Sleef. With numel=1e7 for example, I see a 1.35x speedup on CPU. ghstack-source-id: 149051a Pull Request resolved: #92116

This replaces `log2(1 + x)` with `log1p(x) * (1 / log(2))` which improves precision when `x` is small by avoiding the truncation from calculating `(1 + x) - 1`. Noting that `x` is always `<= 1` in this formula. This also replaces `pow(2, x)` with `exp2(x)` which improves performance, particularly on CPU where the constant value cannot be inlined into Sleef. With numel=1e7 for example, I see a 1.35x speedup on CPU. cc jgong5 mingfeima XiaobingSuper sanchitintel ashokei jingxu10 [ghstack-poisoned]

This replaces `log2(1 + x)` with `log1p(x) * (1 / log(2))` which improves precision when `x` is small by avoiding the truncation from calculating `(1 + x) - 1`. Noting that `x` is always `<= 1` in this formula. This also replaces `pow(2, x)` with `exp2(x)` which improves performance, particularly on CPU where the constant value cannot be inlined into Sleef. With numel=1e7 for example, I see a 1.35x speedup on CPU. ghstack-source-id: b795d24 Pull Request resolved: #92116

This replaces `log2(1 + x)` with `log1p(x) * (1 / log(2))` which improves precision when `x` is small by avoiding the truncation from calculating `(1 + x) - 1`. Noting that `x` is always `<= 1` in this formula. This also replaces `pow(2, x)` with `exp2(x)` which improves performance, particularly on CPU where the constant value cannot be inlined into Sleef. With numel=1e7 for example, I see a 1.35x speedup on CPU. cc jgong5 mingfeima XiaobingSuper sanchitintel ashokei jingxu10 [ghstack-poisoned]

This replaces `log2(1 + x)` with `log1p(x) * (1 / log(2))` which improves precision when `x` is small by avoiding the truncation from calculating `(1 + x) - 1`. Noting that `x` is always `<= 1` in this formula. This also replaces `pow(2, x)` with `exp2(x)` which improves performance, particularly on CPU where the constant value cannot be inlined into Sleef. With numel=1e7 for example, I see a 1.35x speedup on CPU. ghstack-source-id: d452205 Pull Request resolved: #92116

lezcano

In general, if you do scalar_t min = min(a,b) (or an equivalent auto [min, max] = minmax(a, b) you can change -abs(a-b) to be min - max, which should hopefully be faster.
Otherwise this LGTM

peterbell10 · 2023-01-16T14:05:33Z

you can change -abs(a-b) to be min - max

I think the former is better actually, since min/max has 4 cycles of latency while abs and neg are bitwise operators with 1 cycle latency each. So it's 2 cycles lower latency.

This replaces `log2(1 + x)` with `log1p(x) * (1 / log(2))` which improves precision when `x` is small by avoiding the truncation from calculating `(1 + x) - 1`. Noting that `x` is always `<= 1` in this formula. This also replaces `pow(2, x)` with `exp2(x)` which improves performance, particularly on CPU where the constant value cannot be inlined into Sleef. With numel=1e7 for example, I see a 1.35x speedup on CPU. cc jgong5 mingfeima XiaobingSuper sanchitintel ashokei jingxu10 [ghstack-poisoned]

This replaces `log2(1 + x)` with `log1p(x) * (1 / log(2))` which improves precision when `x` is small by avoiding the truncation from calculating `(1 + x) - 1`. Noting that `x` is always `<= 1` in this formula. This also replaces `pow(2, x)` with `exp2(x)` which improves performance, particularly on CPU where the constant value cannot be inlined into Sleef. With numel=1e7 for example, I see a 1.35x speedup on CPU. ghstack-source-id: 4479aff Pull Request resolved: pytorch#92116

This replaces `log2(1 + x)` with `log1p(x) * (1 / log(2))` which improves precision when `x` is small by avoiding the truncation from calculating `(1 + x) - 1`. Noting that `x` is always `<= 1` in this formula. This also replaces `pow(2, x)` with `exp2(x)` which improves performance, particularly on CPU where the constant value cannot be inlined into Sleef. With numel=1e7 for example, I see a 1.35x speedup on CPU. cc jgong5 mingfeima XiaobingSuper sanchitintel ashokei jingxu10 [ghstack-poisoned]

This replaces `log2(1 + x)` with `log1p(x) * (1 / log(2))` which improves precision when `x` is small by avoiding the truncation from calculating `(1 + x) - 1`. Noting that `x` is always `<= 1` in this formula. This also replaces `pow(2, x)` with `exp2(x)` which improves performance, particularly on CPU where the constant value cannot be inlined into Sleef. With numel=1e7 for example, I see a 1.35x speedup on CPU. ghstack-source-id: d6bd2ad Pull Request resolved: #92116

This replaces `log2(1 + x)` with `log1p(x) * (1 / log(2))` which improves precision when `x` is small by avoiding the truncation from calculating `(1 + x) - 1`. Noting that `x` is always `<= 1` in this formula. This also replaces `pow(2, x)` with `exp2(x)` which improves performance, particularly on CPU where the constant value cannot be inlined into Sleef. With numel=1e7 for example, I see a 1.35x speedup on CPU. cc jgong5 mingfeima XiaobingSuper sanchitintel ashokei jingxu10 [ghstack-poisoned]

This replaces `log2(1 + x)` with `log1p(x) * (1 / log(2))` which improves precision when `x` is small by avoiding the truncation from calculating `(1 + x) - 1`. Noting that `x` is always `<= 1` in this formula. This also replaces `pow(2, x)` with `exp2(x)` which improves performance, particularly on CPU where the constant value cannot be inlined into Sleef. With numel=1e7 for example, I see a 1.35x speedup on CPU. ghstack-source-id: 69e4ecd Pull Request resolved: #92116

ngimel · 2023-01-20T01:09:31Z

aten/src/ATen/native/cuda/LogAddExpKernel.cu

          else {
            scalar_t m = ::max(a, b);
-            return m + ::log2((scalar_t)(1.0) + ::pow((scalar_t)(2.0), -::abs(a - b)));
+            return m + ::log1p(::exp2(-::abs(a - b))) * inv_log_2;


it's a pre-existing condition, but looks like intermediate computations are not done in fp32, can you fix it here and in logaddexp (also replace accscalar with opmath)?

This replaces `log2(1 + x)` with `log1p(x) * (1 / log(2))` which improves precision when `x` is small by avoiding the truncation from calculating `(1 + x) - 1`. Noting that `x` is always `<= 1` in this formula. This also replaces `pow(2, x)` with `exp2(x)` which improves performance, particularly on CPU where the constant value cannot be inlined into Sleef. With numel=1e7 for example, I see a 1.35x speedup on CPU. cc jgong5 mingfeima XiaobingSuper sanchitintel ashokei jingxu10 [ghstack-poisoned]

peterbell10 · 2023-01-20T18:02:39Z

@pytorchbot merge

pytorchmergebot · 2023-01-20T18:04:23Z

Merge started

Your change will be merged once all checks pass (ETA 0-4 Hours).

Learn more about merging in the wiki.

Questions? Feedback? Please reach out to the PyTorch DevX Team

Advanced Debugging

Check the merge workflow status
here

This was referenced Jan 13, 2023

Improve performance for unary kernels using vml #91963

Closed

Disable vml's abs and log1p #92113

Closed

Replace log(1 + x) with log1p(x) #92114

Closed

Vectorize torch.exp2 on CPU and add complex support #92115

Closed

github-actions bot added the module: cpu CPU specific problem (e.g., perf, algorithm) label Jan 13, 2023

pytorchbot added the open source label Jan 13, 2023

peterbell10 added the topic: not user facing topic category label Jan 13, 2023

peterbell10 mentioned this pull request Jan 13, 2023

Replace exp(x) - 1 with expm1(x) #92154

Closed

lezcano approved these changes Jan 16, 2023

View reviewed changes

peterbell10 marked this pull request as ready for review January 16, 2023 14:11

peterbell10 added 2 commits January 16, 2023 14:30

peterbell10 added the ciflow/trunk Trigger trunk jobs on your pull request label Jan 18, 2023

peterbell10 mentioned this pull request Jan 19, 2023

[inductor] Lower torch.exp2 and use it for torch.pow(2, x) #92632

Closed

ngimel reviewed Jan 20, 2023

View reviewed changes

peterbell10 added 2 commits January 20, 2023 03:00

pytorchmergebot added the Merged label Jan 20, 2023

pytorchmergebot closed this in 2af2952 Jan 20, 2023

facebook-github-bot deleted the gh/peterbell10/500/head branch June 8, 2023 18:25

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

logaddexp2: Use log1p and exp2 #92116

logaddexp2: Use log1p and exp2 #92116

Uh oh!

peterbell10 commented Jan 13, 2023 •

edited

Loading

Uh oh!

pytorch-bot bot commented Jan 13, 2023 •

edited

Loading

Uh oh!

lezcano left a comment

Uh oh!

peterbell10 commented Jan 16, 2023

Uh oh!

ngimel Jan 20, 2023

Uh oh!

peterbell10 commented Jan 20, 2023

Uh oh!

pytorchmergebot commented Jan 20, 2023

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

logaddexp2: Use log1p and exp2 #92116

logaddexp2: Use log1p and exp2 #92116

Uh oh!

Conversation

peterbell10 commented Jan 13, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

pytorch-bot bot commented Jan 13, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/92116

✅ No Failures

Uh oh!

lezcano left a comment

Choose a reason for hiding this comment

Uh oh!

peterbell10 commented Jan 16, 2023

Uh oh!

ngimel Jan 20, 2023

Choose a reason for hiding this comment

Uh oh!

peterbell10 commented Jan 20, 2023

Uh oh!

pytorchmergebot commented Jan 20, 2023

Merge started

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

peterbell10 commented Jan 13, 2023 •

edited

Loading

pytorch-bot bot commented Jan 13, 2023 •

edited

Loading