- 
                Notifications
    You must be signed in to change notification settings 
- Fork 25.7k
logaddexp2: Use log1p and exp2 #92116
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
This replaces `log2(1 + x)` with `log1p(x) * (1 / log(2))` which improves precision when `x` is small by avoiding the truncation from calculating `(1 + x) - 1`. Noting that `x` is always `<= 1` in this formula. This also replaces `pow(2, x)` with `exp2(x)` which improves performance, particularly on CPU where the constant value cannot be inlined into Sleef. With numel=1e7 for example, I see a 1.35x speedup on CPU. [ghstack-poisoned]
| 🔗 Helpful Links🧪 See artifacts and rendered test results at hud.pytorch.org/pr/92116
 Note: Links to docs will display an error until the docs builds have been completed. ✅ No FailuresAs of commit 96f6185: This comment was automatically generated by Dr. CI and updates every 15 minutes. | 
This replaces `log2(1 + x)` with `log1p(x) * (1 / log(2))` which improves precision when `x` is small by avoiding the truncation from calculating `(1 + x) - 1`. Noting that `x` is always `<= 1` in this formula. This also replaces `pow(2, x)` with `exp2(x)` which improves performance, particularly on CPU where the constant value cannot be inlined into Sleef. With numel=1e7 for example, I see a 1.35x speedup on CPU. [ghstack-poisoned]
This replaces `log2(1 + x)` with `log1p(x) * (1 / log(2))` which improves precision when `x` is small by avoiding the truncation from calculating `(1 + x) - 1`. Noting that `x` is always `<= 1` in this formula. This also replaces `pow(2, x)` with `exp2(x)` which improves performance, particularly on CPU where the constant value cannot be inlined into Sleef. With numel=1e7 for example, I see a 1.35x speedup on CPU. [ghstack-poisoned]
This replaces `log2(1 + x)` with `log1p(x) * (1 / log(2))` which improves precision when `x` is small by avoiding the truncation from calculating `(1 + x) - 1`. Noting that `x` is always `<= 1` in this formula. This also replaces `pow(2, x)` with `exp2(x)` which improves performance, particularly on CPU where the constant value cannot be inlined into Sleef. With numel=1e7 for example, I see a 1.35x speedup on CPU. ghstack-source-id: a3e1fa4 Pull Request resolved: #92116
This replaces `log2(1 + x)` with `log1p(x) * (1 / log(2))` which improves precision when `x` is small by avoiding the truncation from calculating `(1 + x) - 1`. Noting that `x` is always `<= 1` in this formula. This also replaces `pow(2, x)` with `exp2(x)` which improves performance, particularly on CPU where the constant value cannot be inlined into Sleef. With numel=1e7 for example, I see a 1.35x speedup on CPU. cc jgong5 mingfeima XiaobingSuper sanchitintel ashokei jingxu10 [ghstack-poisoned]
This replaces `log2(1 + x)` with `log1p(x) * (1 / log(2))` which improves precision when `x` is small by avoiding the truncation from calculating `(1 + x) - 1`. Noting that `x` is always `<= 1` in this formula. This also replaces `pow(2, x)` with `exp2(x)` which improves performance, particularly on CPU where the constant value cannot be inlined into Sleef. With numel=1e7 for example, I see a 1.35x speedup on CPU. cc jgong5 mingfeima XiaobingSuper sanchitintel ashokei jingxu10 [ghstack-poisoned]
This replaces `log2(1 + x)` with `log1p(x) * (1 / log(2))` which improves precision when `x` is small by avoiding the truncation from calculating `(1 + x) - 1`. Noting that `x` is always `<= 1` in this formula. This also replaces `pow(2, x)` with `exp2(x)` which improves performance, particularly on CPU where the constant value cannot be inlined into Sleef. With numel=1e7 for example, I see a 1.35x speedup on CPU. ghstack-source-id: 2d3a2a7 Pull Request resolved: #92116
This replaces `log2(1 + x)` with `log1p(x) * (1 / log(2))` which improves precision when `x` is small by avoiding the truncation from calculating `(1 + x) - 1`. Noting that `x` is always `<= 1` in this formula. This also replaces `pow(2, x)` with `exp2(x)` which improves performance, particularly on CPU where the constant value cannot be inlined into Sleef. With numel=1e7 for example, I see a 1.35x speedup on CPU. cc jgong5 mingfeima XiaobingSuper sanchitintel ashokei jingxu10 [ghstack-poisoned]
This replaces `log2(1 + x)` with `log1p(x) * (1 / log(2))` which improves precision when `x` is small by avoiding the truncation from calculating `(1 + x) - 1`. Noting that `x` is always `<= 1` in this formula. This also replaces `pow(2, x)` with `exp2(x)` which improves performance, particularly on CPU where the constant value cannot be inlined into Sleef. With numel=1e7 for example, I see a 1.35x speedup on CPU. cc jgong5 mingfeima XiaobingSuper sanchitintel ashokei jingxu10 [ghstack-poisoned]
This replaces `log2(1 + x)` with `log1p(x) * (1 / log(2))` which improves precision when `x` is small by avoiding the truncation from calculating `(1 + x) - 1`. Noting that `x` is always `<= 1` in this formula. This also replaces `pow(2, x)` with `exp2(x)` which improves performance, particularly on CPU where the constant value cannot be inlined into Sleef. With numel=1e7 for example, I see a 1.35x speedup on CPU. ghstack-source-id: 149051a Pull Request resolved: #92116
This replaces `log2(1 + x)` with `log1p(x) * (1 / log(2))` which improves precision when `x` is small by avoiding the truncation from calculating `(1 + x) - 1`. Noting that `x` is always `<= 1` in this formula. This also replaces `pow(2, x)` with `exp2(x)` which improves performance, particularly on CPU where the constant value cannot be inlined into Sleef. With numel=1e7 for example, I see a 1.35x speedup on CPU. cc jgong5 mingfeima XiaobingSuper sanchitintel ashokei jingxu10 [ghstack-poisoned]
This replaces `log2(1 + x)` with `log1p(x) * (1 / log(2))` which improves precision when `x` is small by avoiding the truncation from calculating `(1 + x) - 1`. Noting that `x` is always `<= 1` in this formula. This also replaces `pow(2, x)` with `exp2(x)` which improves performance, particularly on CPU where the constant value cannot be inlined into Sleef. With numel=1e7 for example, I see a 1.35x speedup on CPU. ghstack-source-id: b795d24 Pull Request resolved: #92116
This replaces `log2(1 + x)` with `log1p(x) * (1 / log(2))` which improves precision when `x` is small by avoiding the truncation from calculating `(1 + x) - 1`. Noting that `x` is always `<= 1` in this formula. This also replaces `pow(2, x)` with `exp2(x)` which improves performance, particularly on CPU where the constant value cannot be inlined into Sleef. With numel=1e7 for example, I see a 1.35x speedup on CPU. cc jgong5 mingfeima XiaobingSuper sanchitintel ashokei jingxu10 [ghstack-poisoned]
This replaces `log2(1 + x)` with `log1p(x) * (1 / log(2))` which improves precision when `x` is small by avoiding the truncation from calculating `(1 + x) - 1`. Noting that `x` is always `<= 1` in this formula. This also replaces `pow(2, x)` with `exp2(x)` which improves performance, particularly on CPU where the constant value cannot be inlined into Sleef. With numel=1e7 for example, I see a 1.35x speedup on CPU. ghstack-source-id: d452205 Pull Request resolved: #92116
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In general, if you do scalar_t min = min(a,b) (or an equivalent auto [min, max] = minmax(a, b)  you can change -abs(a-b) to be min - max, which should hopefully be faster.
Otherwise this LGTM
| 
 I think the former is better actually, since  | 
This replaces `log2(1 + x)` with `log1p(x) * (1 / log(2))` which improves precision when `x` is small by avoiding the truncation from calculating `(1 + x) - 1`. Noting that `x` is always `<= 1` in this formula. This also replaces `pow(2, x)` with `exp2(x)` which improves performance, particularly on CPU where the constant value cannot be inlined into Sleef. With numel=1e7 for example, I see a 1.35x speedup on CPU. cc jgong5 mingfeima XiaobingSuper sanchitintel ashokei jingxu10 [ghstack-poisoned]
This replaces `log2(1 + x)` with `log1p(x) * (1 / log(2))` which improves precision when `x` is small by avoiding the truncation from calculating `(1 + x) - 1`. Noting that `x` is always `<= 1` in this formula. This also replaces `pow(2, x)` with `exp2(x)` which improves performance, particularly on CPU where the constant value cannot be inlined into Sleef. With numel=1e7 for example, I see a 1.35x speedup on CPU. cc jgong5 mingfeima XiaobingSuper sanchitintel ashokei jingxu10 [ghstack-poisoned]
This replaces `log2(1 + x)` with `log1p(x) * (1 / log(2))` which improves precision when `x` is small by avoiding the truncation from calculating `(1 + x) - 1`. Noting that `x` is always `<= 1` in this formula. This also replaces `pow(2, x)` with `exp2(x)` which improves performance, particularly on CPU where the constant value cannot be inlined into Sleef. With numel=1e7 for example, I see a 1.35x speedup on CPU. cc jgong5 mingfeima XiaobingSuper sanchitintel ashokei jingxu10 [ghstack-poisoned]
This replaces `log2(1 + x)` with `log1p(x) * (1 / log(2))` which improves precision when `x` is small by avoiding the truncation from calculating `(1 + x) - 1`. Noting that `x` is always `<= 1` in this formula. This also replaces `pow(2, x)` with `exp2(x)` which improves performance, particularly on CPU where the constant value cannot be inlined into Sleef. With numel=1e7 for example, I see a 1.35x speedup on CPU. cc jgong5 mingfeima XiaobingSuper sanchitintel ashokei jingxu10 [ghstack-poisoned]
This replaces `log2(1 + x)` with `log1p(x) * (1 / log(2))` which improves precision when `x` is small by avoiding the truncation from calculating `(1 + x) - 1`. Noting that `x` is always `<= 1` in this formula. This also replaces `pow(2, x)` with `exp2(x)` which improves performance, particularly on CPU where the constant value cannot be inlined into Sleef. With numel=1e7 for example, I see a 1.35x speedup on CPU. ghstack-source-id: 4479aff Pull Request resolved: pytorch#92116
This replaces `log2(1 + x)` with `log1p(x) * (1 / log(2))` which improves precision when `x` is small by avoiding the truncation from calculating `(1 + x) - 1`. Noting that `x` is always `<= 1` in this formula. This also replaces `pow(2, x)` with `exp2(x)` which improves performance, particularly on CPU where the constant value cannot be inlined into Sleef. With numel=1e7 for example, I see a 1.35x speedup on CPU. cc jgong5 mingfeima XiaobingSuper sanchitintel ashokei jingxu10 [ghstack-poisoned]
This replaces `log2(1 + x)` with `log1p(x) * (1 / log(2))` which improves precision when `x` is small by avoiding the truncation from calculating `(1 + x) - 1`. Noting that `x` is always `<= 1` in this formula. This also replaces `pow(2, x)` with `exp2(x)` which improves performance, particularly on CPU where the constant value cannot be inlined into Sleef. With numel=1e7 for example, I see a 1.35x speedup on CPU. ghstack-source-id: d6bd2ad Pull Request resolved: #92116
This replaces `log2(1 + x)` with `log1p(x) * (1 / log(2))` which improves precision when `x` is small by avoiding the truncation from calculating `(1 + x) - 1`. Noting that `x` is always `<= 1` in this formula. This also replaces `pow(2, x)` with `exp2(x)` which improves performance, particularly on CPU where the constant value cannot be inlined into Sleef. With numel=1e7 for example, I see a 1.35x speedup on CPU. cc jgong5 mingfeima XiaobingSuper sanchitintel ashokei jingxu10 [ghstack-poisoned]
This replaces `log2(1 + x)` with `log1p(x) * (1 / log(2))` which improves precision when `x` is small by avoiding the truncation from calculating `(1 + x) - 1`. Noting that `x` is always `<= 1` in this formula. This also replaces `pow(2, x)` with `exp2(x)` which improves performance, particularly on CPU where the constant value cannot be inlined into Sleef. With numel=1e7 for example, I see a 1.35x speedup on CPU. ghstack-source-id: 69e4ecd Pull Request resolved: #92116
| else { | ||
| scalar_t m = ::max(a, b); | ||
| return m + ::log2((scalar_t)(1.0) + ::pow((scalar_t)(2.0), -::abs(a - b))); | ||
| return m + ::log1p(::exp2(-::abs(a - b))) * inv_log_2; | 
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
it's a pre-existing condition, but looks like intermediate computations are not done in fp32, can you fix it here and in logaddexp (also replace accscalar with opmath)?
This replaces `log2(1 + x)` with `log1p(x) * (1 / log(2))` which improves precision when `x` is small by avoiding the truncation from calculating `(1 + x) - 1`. Noting that `x` is always `<= 1` in this formula. This also replaces `pow(2, x)` with `exp2(x)` which improves performance, particularly on CPU where the constant value cannot be inlined into Sleef. With numel=1e7 for example, I see a 1.35x speedup on CPU. cc jgong5 mingfeima XiaobingSuper sanchitintel ashokei jingxu10 [ghstack-poisoned]
This replaces `log2(1 + x)` with `log1p(x) * (1 / log(2))` which improves precision when `x` is small by avoiding the truncation from calculating `(1 + x) - 1`. Noting that `x` is always `<= 1` in this formula. This also replaces `pow(2, x)` with `exp2(x)` which improves performance, particularly on CPU where the constant value cannot be inlined into Sleef. With numel=1e7 for example, I see a 1.35x speedup on CPU. cc jgong5 mingfeima XiaobingSuper sanchitintel ashokei jingxu10 [ghstack-poisoned]
| @pytorchbot merge | 
| Merge startedYour change will be merged once all checks pass (ETA 0-4 Hours). Learn more about merging in the wiki. Questions? Feedback? Please reach out to the PyTorch DevX Team | 
Stack from ghstack (oldest at bottom):
This replaces
log2(1 + x)withlog1p(x) * (1 / log(2))which improvesprecision when
xis small by avoiding the truncation from calculating(1 + x) - 1. Noting thatxis always<= 1in this formula.This also replaces
pow(2, x)withexp2(x)which improves performance,particularly on CPU where the constant value cannot be inlined into Sleef.
With numel=1e7 for example, I see a 1.35x speedup on CPU.
cc @jgong5 @mingfeima @XiaobingSuper @sanchitintel @ashokei @jingxu10