Optimize 3-bit packing #1029

metascroy · 2024-10-07T22:43:28Z

Summary:
Optimizes 3-bit packing as outlined here: T199311618

Before change:

benchmark_pack_uint_values<3>/128/8 47.0 ns 46.4 ns 15106555
benchmark_pack_uint_values<3>/128/64 6.94 ns 6.90 ns 101226284
benchmark_pack_uint_values<3>/128/128 3.27 ns 3.24 ns 215022716
benchmark_unpack_uint_values<3>/128/8 22.0 ns 21.9 ns 32585572
benchmark_unpack_uint_values<3>/128/64 6.02 ns 5.98 ns 116910230
benchmark_unpack_uint_values<3>/128/128 2.74 ns 2.73 ns 257088291

After change:

benchmark_pack_uint_values<3>/128/8 19.5 ns 19.5 ns 36050883
benchmark_pack_uint_values<3>/128/64 3.90 ns 3.87 ns 181151919
benchmark_pack_uint_values<3>/128/128 1.57 ns 1.57 ns 447247194
benchmark_unpack_uint_values<3>/128/8 20.5 ns 20.4 ns 34490914
benchmark_unpack_uint_values<3>/128/64 3.19 ns 3.11 ns 228019714
benchmark_unpack_uint_values<3>/128/128 1.71 ns 1.70 ns 408587338

Unpacking perf for 128 values is 1.60x faster (2.74/1.71).

Differential Revision: D64010666

pytorch-bot · 2024-10-07T22:43:32Z

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/ao/1029

📄 Preview Python docs built from this PR

Note: Links to docs will display an error until the docs builds have been completed.

✅ No Failures

As of commit af0ea95 with merge base dec0313 ():
💚 Looks good so far! There are no failures yet. 💚

This comment was automatically generated by Dr. CI and updates every 15 minutes.

facebook-github-bot · 2024-10-07T22:43:42Z

This pull request was exported from Phabricator. Differential Revision: D64010666

Summary: Optimizes 3-bit packing as outlined here: T199311618 Before change: ---------------------------------------------------------------------------------- benchmark_pack_uint_values<3>/128/8 47.0 ns 46.4 ns 15106555 benchmark_pack_uint_values<3>/128/64 6.94 ns 6.90 ns 101226284 benchmark_pack_uint_values<3>/128/128 3.27 ns 3.24 ns 215022716 benchmark_unpack_uint_values<3>/128/8 22.0 ns 21.9 ns 32585572 benchmark_unpack_uint_values<3>/128/64 6.02 ns 5.98 ns 116910230 benchmark_unpack_uint_values<3>/128/128 2.74 ns 2.73 ns 257088291 After change: ---------------------------------------------------------------------------------- benchmark_pack_uint_values<3>/128/8 19.5 ns 19.5 ns 36050883 benchmark_pack_uint_values<3>/128/64 3.90 ns 3.87 ns 181151919 benchmark_pack_uint_values<3>/128/128 1.57 ns 1.57 ns 447247194 benchmark_unpack_uint_values<3>/128/8 20.5 ns 20.4 ns 34490914 benchmark_unpack_uint_values<3>/128/64 3.19 ns 3.11 ns 228019714 benchmark_unpack_uint_values<3>/128/128 1.71 ns 1.70 ns 408587338 Unpacking perf for 128 values is 1.60x faster (2.74/1.71). Reviewed By: digantdesai Differential Revision: D64010666

facebook-github-bot · 2024-10-07T23:19:12Z

This pull request was exported from Phabricator. Differential Revision: D64010666

Differential Revision: D64010666 Pull Request resolved: #1029

facebook-github-bot added the CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. label Oct 7, 2024

facebook-github-bot added the fb-exported label Oct 7, 2024

metascroy force-pushed the export-D64010666 branch from 5387317 to af0ea95 Compare October 7, 2024 23:19

kirklandsign approved these changes Oct 7, 2024

View reviewed changes

facebook-github-bot merged commit 93ff876 into pytorch:main Oct 8, 2024
17 of 19 checks passed

jainapurva pushed a commit that referenced this pull request Oct 9, 2024

Optimize 3-bit packing

164b978

Differential Revision: D64010666 Pull Request resolved: #1029

jainapurva pushed a commit that referenced this pull request Oct 15, 2024

Optimize 3-bit packing

fa6d9c3

Differential Revision: D64010666 Pull Request resolved: #1029

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Optimize 3-bit packing #1029

Optimize 3-bit packing #1029

metascroy commented Oct 7, 2024

pytorch-bot bot commented Oct 7, 2024 •

edited

Loading

facebook-github-bot commented Oct 7, 2024

facebook-github-bot commented Oct 7, 2024

Optimize 3-bit packing #1029

Optimize 3-bit packing #1029

Conversation

metascroy commented Oct 7, 2024

Before change:

After change:

pytorch-bot bot commented Oct 7, 2024 • edited Loading

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/ao/1029

✅ No Failures

facebook-github-bot commented Oct 7, 2024

facebook-github-bot commented Oct 7, 2024

pytorch-bot bot commented Oct 7, 2024 •

edited

Loading