Skip to content

Conversation

@peterbell10
Copy link
Collaborator

@peterbell10 peterbell10 commented Jan 13, 2023

Stack from ghstack (oldest at bottom):

This replaces log2(1 + x) with log1p(x) * (1 / log(2)) which improves
precision when x is small by avoiding the truncation from calculating
(1 + x) - 1. Noting that x is always <= 1 in this formula.

This also replaces pow(2, x) with exp2(x) which improves performance,
particularly on CPU where the constant value cannot be inlined into Sleef.
With numel=1e7 for example, I see a 1.35x speedup on CPU.

cc @jgong5 @mingfeima @XiaobingSuper @sanchitintel @ashokei @jingxu10

This replaces `log2(1 + x)` with `log1p(x) * (1 / log(2))` which improves
precision when `x` is small by avoiding the truncation from calculating
`(1 + x) - 1`. Noting that `x` is always `<= 1` in this formula.

This also replaces `pow(2, x)` with `exp2(x)` which improves performance,
particularly on CPU where the constant value cannot be inlined into Sleef.
With numel=1e7 for example, I see a 1.35x speedup on CPU.

[ghstack-poisoned]
@pytorch-bot
Copy link

pytorch-bot bot commented Jan 13, 2023

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/92116

Note: Links to docs will display an error until the docs builds have been completed.

✅ No Failures

As of commit 96f6185:
💚 Looks good so far! There are no failures yet. 💚

This comment was automatically generated by Dr. CI and updates every 15 minutes.

This replaces `log2(1 + x)` with `log1p(x) * (1 / log(2))` which improves
precision when `x` is small by avoiding the truncation from calculating
`(1 + x) - 1`. Noting that `x` is always `<= 1` in this formula.

This also replaces `pow(2, x)` with `exp2(x)` which improves performance,
particularly on CPU where the constant value cannot be inlined into Sleef.
With numel=1e7 for example, I see a 1.35x speedup on CPU.

[ghstack-poisoned]
This replaces `log2(1 + x)` with `log1p(x) * (1 / log(2))` which improves
precision when `x` is small by avoiding the truncation from calculating
`(1 + x) - 1`. Noting that `x` is always `<= 1` in this formula.

This also replaces `pow(2, x)` with `exp2(x)` which improves performance,
particularly on CPU where the constant value cannot be inlined into Sleef.
With numel=1e7 for example, I see a 1.35x speedup on CPU.

[ghstack-poisoned]
peterbell10 added a commit that referenced this pull request Jan 13, 2023
This replaces `log2(1 + x)` with `log1p(x) * (1 / log(2))` which improves
precision when `x` is small by avoiding the truncation from calculating
`(1 + x) - 1`. Noting that `x` is always `<= 1` in this formula.

This also replaces `pow(2, x)` with `exp2(x)` which improves performance,
particularly on CPU where the constant value cannot be inlined into Sleef.
With numel=1e7 for example, I see a 1.35x speedup on CPU.

ghstack-source-id: a3e1fa4
Pull Request resolved: #92116
@peterbell10 peterbell10 added the topic: not user facing topic category label Jan 13, 2023
This replaces `log2(1 + x)` with `log1p(x) * (1 / log(2))` which improves
precision when `x` is small by avoiding the truncation from calculating
`(1 + x) - 1`. Noting that `x` is always `<= 1` in this formula.

This also replaces `pow(2, x)` with `exp2(x)` which improves performance,
particularly on CPU where the constant value cannot be inlined into Sleef.
With numel=1e7 for example, I see a 1.35x speedup on CPU.

cc jgong5 mingfeima XiaobingSuper sanchitintel ashokei jingxu10

[ghstack-poisoned]
This replaces `log2(1 + x)` with `log1p(x) * (1 / log(2))` which improves
precision when `x` is small by avoiding the truncation from calculating
`(1 + x) - 1`. Noting that `x` is always `<= 1` in this formula.

This also replaces `pow(2, x)` with `exp2(x)` which improves performance,
particularly on CPU where the constant value cannot be inlined into Sleef.
With numel=1e7 for example, I see a 1.35x speedup on CPU.

cc jgong5 mingfeima XiaobingSuper sanchitintel ashokei jingxu10

[ghstack-poisoned]
peterbell10 added a commit that referenced this pull request Jan 13, 2023
This replaces `log2(1 + x)` with `log1p(x) * (1 / log(2))` which improves
precision when `x` is small by avoiding the truncation from calculating
`(1 + x) - 1`. Noting that `x` is always `<= 1` in this formula.

This also replaces `pow(2, x)` with `exp2(x)` which improves performance,
particularly on CPU where the constant value cannot be inlined into Sleef.
With numel=1e7 for example, I see a 1.35x speedup on CPU.

ghstack-source-id: 2d3a2a7
Pull Request resolved: #92116
This replaces `log2(1 + x)` with `log1p(x) * (1 / log(2))` which improves
precision when `x` is small by avoiding the truncation from calculating
`(1 + x) - 1`. Noting that `x` is always `<= 1` in this formula.

This also replaces `pow(2, x)` with `exp2(x)` which improves performance,
particularly on CPU where the constant value cannot be inlined into Sleef.
With numel=1e7 for example, I see a 1.35x speedup on CPU.

cc jgong5 mingfeima XiaobingSuper sanchitintel ashokei jingxu10

[ghstack-poisoned]
This replaces `log2(1 + x)` with `log1p(x) * (1 / log(2))` which improves
precision when `x` is small by avoiding the truncation from calculating
`(1 + x) - 1`. Noting that `x` is always `<= 1` in this formula.

This also replaces `pow(2, x)` with `exp2(x)` which improves performance,
particularly on CPU where the constant value cannot be inlined into Sleef.
With numel=1e7 for example, I see a 1.35x speedup on CPU.

cc jgong5 mingfeima XiaobingSuper sanchitintel ashokei jingxu10

[ghstack-poisoned]
peterbell10 added a commit that referenced this pull request Jan 14, 2023
This replaces `log2(1 + x)` with `log1p(x) * (1 / log(2))` which improves
precision when `x` is small by avoiding the truncation from calculating
`(1 + x) - 1`. Noting that `x` is always `<= 1` in this formula.

This also replaces `pow(2, x)` with `exp2(x)` which improves performance,
particularly on CPU where the constant value cannot be inlined into Sleef.
With numel=1e7 for example, I see a 1.35x speedup on CPU.

ghstack-source-id: 149051a
Pull Request resolved: #92116
This replaces `log2(1 + x)` with `log1p(x) * (1 / log(2))` which improves
precision when `x` is small by avoiding the truncation from calculating
`(1 + x) - 1`. Noting that `x` is always `<= 1` in this formula.

This also replaces `pow(2, x)` with `exp2(x)` which improves performance,
particularly on CPU where the constant value cannot be inlined into Sleef.
With numel=1e7 for example, I see a 1.35x speedup on CPU.

cc jgong5 mingfeima XiaobingSuper sanchitintel ashokei jingxu10

[ghstack-poisoned]
peterbell10 added a commit that referenced this pull request Jan 15, 2023
This replaces `log2(1 + x)` with `log1p(x) * (1 / log(2))` which improves
precision when `x` is small by avoiding the truncation from calculating
`(1 + x) - 1`. Noting that `x` is always `<= 1` in this formula.

This also replaces `pow(2, x)` with `exp2(x)` which improves performance,
particularly on CPU where the constant value cannot be inlined into Sleef.
With numel=1e7 for example, I see a 1.35x speedup on CPU.

ghstack-source-id: b795d24
Pull Request resolved: #92116
This replaces `log2(1 + x)` with `log1p(x) * (1 / log(2))` which improves
precision when `x` is small by avoiding the truncation from calculating
`(1 + x) - 1`. Noting that `x` is always `<= 1` in this formula.

This also replaces `pow(2, x)` with `exp2(x)` which improves performance,
particularly on CPU where the constant value cannot be inlined into Sleef.
With numel=1e7 for example, I see a 1.35x speedup on CPU.

cc jgong5 mingfeima XiaobingSuper sanchitintel ashokei jingxu10

[ghstack-poisoned]
peterbell10 added a commit that referenced this pull request Jan 15, 2023
This replaces `log2(1 + x)` with `log1p(x) * (1 / log(2))` which improves
precision when `x` is small by avoiding the truncation from calculating
`(1 + x) - 1`. Noting that `x` is always `<= 1` in this formula.

This also replaces `pow(2, x)` with `exp2(x)` which improves performance,
particularly on CPU where the constant value cannot be inlined into Sleef.
With numel=1e7 for example, I see a 1.35x speedup on CPU.

ghstack-source-id: d452205
Pull Request resolved: #92116
Copy link
Collaborator

@lezcano lezcano left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In general, if you do scalar_t min = min(a,b) (or an equivalent auto [min, max] = minmax(a, b) you can change -abs(a-b) to be min - max, which should hopefully be faster.
Otherwise this LGTM

@peterbell10
Copy link
Collaborator Author

you can change -abs(a-b) to be min - max

I think the former is better actually, since min/max has 4 cycles of latency while abs and neg are bitwise operators with 1 cycle latency each. So it's 2 cycles lower latency.

This replaces `log2(1 + x)` with `log1p(x) * (1 / log(2))` which improves
precision when `x` is small by avoiding the truncation from calculating
`(1 + x) - 1`. Noting that `x` is always `<= 1` in this formula.

This also replaces `pow(2, x)` with `exp2(x)` which improves performance,
particularly on CPU where the constant value cannot be inlined into Sleef.
With numel=1e7 for example, I see a 1.35x speedup on CPU.

cc jgong5 mingfeima XiaobingSuper sanchitintel ashokei jingxu10

[ghstack-poisoned]
@peterbell10 peterbell10 marked this pull request as ready for review January 16, 2023 14:11
This replaces `log2(1 + x)` with `log1p(x) * (1 / log(2))` which improves
precision when `x` is small by avoiding the truncation from calculating
`(1 + x) - 1`. Noting that `x` is always `<= 1` in this formula.

This also replaces `pow(2, x)` with `exp2(x)` which improves performance,
particularly on CPU where the constant value cannot be inlined into Sleef.
With numel=1e7 for example, I see a 1.35x speedup on CPU.

cc jgong5 mingfeima XiaobingSuper sanchitintel ashokei jingxu10

[ghstack-poisoned]
This replaces `log2(1 + x)` with `log1p(x) * (1 / log(2))` which improves
precision when `x` is small by avoiding the truncation from calculating
`(1 + x) - 1`. Noting that `x` is always `<= 1` in this formula.

This also replaces `pow(2, x)` with `exp2(x)` which improves performance,
particularly on CPU where the constant value cannot be inlined into Sleef.
With numel=1e7 for example, I see a 1.35x speedup on CPU.

cc jgong5 mingfeima XiaobingSuper sanchitintel ashokei jingxu10

[ghstack-poisoned]
This replaces `log2(1 + x)` with `log1p(x) * (1 / log(2))` which improves
precision when `x` is small by avoiding the truncation from calculating
`(1 + x) - 1`. Noting that `x` is always `<= 1` in this formula.

This also replaces `pow(2, x)` with `exp2(x)` which improves performance,
particularly on CPU where the constant value cannot be inlined into Sleef.
With numel=1e7 for example, I see a 1.35x speedup on CPU.

cc jgong5 mingfeima XiaobingSuper sanchitintel ashokei jingxu10

[ghstack-poisoned]
peterbell10 added a commit to peterbell10/pytorch that referenced this pull request Jan 18, 2023
This replaces `log2(1 + x)` with `log1p(x) * (1 / log(2))` which improves
precision when `x` is small by avoiding the truncation from calculating
`(1 + x) - 1`. Noting that `x` is always `<= 1` in this formula.

This also replaces `pow(2, x)` with `exp2(x)` which improves performance,
particularly on CPU where the constant value cannot be inlined into Sleef.
With numel=1e7 for example, I see a 1.35x speedup on CPU.

ghstack-source-id: 4479aff
Pull Request resolved: pytorch#92116
This replaces `log2(1 + x)` with `log1p(x) * (1 / log(2))` which improves
precision when `x` is small by avoiding the truncation from calculating
`(1 + x) - 1`. Noting that `x` is always `<= 1` in this formula.

This also replaces `pow(2, x)` with `exp2(x)` which improves performance,
particularly on CPU where the constant value cannot be inlined into Sleef.
With numel=1e7 for example, I see a 1.35x speedup on CPU.

cc jgong5 mingfeima XiaobingSuper sanchitintel ashokei jingxu10

[ghstack-poisoned]
peterbell10 added a commit that referenced this pull request Jan 18, 2023
This replaces `log2(1 + x)` with `log1p(x) * (1 / log(2))` which improves
precision when `x` is small by avoiding the truncation from calculating
`(1 + x) - 1`. Noting that `x` is always `<= 1` in this formula.

This also replaces `pow(2, x)` with `exp2(x)` which improves performance,
particularly on CPU where the constant value cannot be inlined into Sleef.
With numel=1e7 for example, I see a 1.35x speedup on CPU.

ghstack-source-id: d6bd2ad
Pull Request resolved: #92116
@peterbell10 peterbell10 added the ciflow/trunk Trigger trunk jobs on your pull request label Jan 18, 2023
This replaces `log2(1 + x)` with `log1p(x) * (1 / log(2))` which improves
precision when `x` is small by avoiding the truncation from calculating
`(1 + x) - 1`. Noting that `x` is always `<= 1` in this formula.

This also replaces `pow(2, x)` with `exp2(x)` which improves performance,
particularly on CPU where the constant value cannot be inlined into Sleef.
With numel=1e7 for example, I see a 1.35x speedup on CPU.

cc jgong5 mingfeima XiaobingSuper sanchitintel ashokei jingxu10

[ghstack-poisoned]
peterbell10 added a commit that referenced this pull request Jan 19, 2023
This replaces `log2(1 + x)` with `log1p(x) * (1 / log(2))` which improves
precision when `x` is small by avoiding the truncation from calculating
`(1 + x) - 1`. Noting that `x` is always `<= 1` in this formula.

This also replaces `pow(2, x)` with `exp2(x)` which improves performance,
particularly on CPU where the constant value cannot be inlined into Sleef.
With numel=1e7 for example, I see a 1.35x speedup on CPU.

ghstack-source-id: 69e4ecd
Pull Request resolved: #92116
else {
scalar_t m = ::max(a, b);
return m + ::log2((scalar_t)(1.0) + ::pow((scalar_t)(2.0), -::abs(a - b)));
return m + ::log1p(::exp2(-::abs(a - b))) * inv_log_2;
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it's a pre-existing condition, but looks like intermediate computations are not done in fp32, can you fix it here and in logaddexp (also replace accscalar with opmath)?

This replaces `log2(1 + x)` with `log1p(x) * (1 / log(2))` which improves
precision when `x` is small by avoiding the truncation from calculating
`(1 + x) - 1`. Noting that `x` is always `<= 1` in this formula.

This also replaces `pow(2, x)` with `exp2(x)` which improves performance,
particularly on CPU where the constant value cannot be inlined into Sleef.
With numel=1e7 for example, I see a 1.35x speedup on CPU.

cc jgong5 mingfeima XiaobingSuper sanchitintel ashokei jingxu10

[ghstack-poisoned]
This replaces `log2(1 + x)` with `log1p(x) * (1 / log(2))` which improves
precision when `x` is small by avoiding the truncation from calculating
`(1 + x) - 1`. Noting that `x` is always `<= 1` in this formula.

This also replaces `pow(2, x)` with `exp2(x)` which improves performance,
particularly on CPU where the constant value cannot be inlined into Sleef.
With numel=1e7 for example, I see a 1.35x speedup on CPU.

cc jgong5 mingfeima XiaobingSuper sanchitintel ashokei jingxu10

[ghstack-poisoned]
@peterbell10
Copy link
Collaborator Author

@pytorchbot merge

@pytorchmergebot
Copy link
Collaborator

Merge started

Your change will be merged once all checks pass (ETA 0-4 Hours).

Learn more about merging in the wiki.

Questions? Feedback? Please reach out to the PyTorch DevX Team

Advanced Debugging
Check the merge workflow status
here

@facebook-github-bot facebook-github-bot deleted the gh/peterbell10/500/head branch June 8, 2023 18:25
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

ciflow/trunk Trigger trunk jobs on your pull request Merged module: cpu CPU specific problem (e.g., perf, algorithm) open source topic: not user facing topic category

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants