Update bsr_dense_addmm kernel parameters for sizes 3 x 2 ^ N #122506

pearu · 2024-03-22T17:19:45Z

As in the title. The speed-ups for a particular set of input sizes range from about 7 to 85 % depending on the used BSR tensor block sizes.

Stack from ghstack (oldest at bottom):

-> Update bsr_dense_addmm kernel parameters for sizes 3 x 2 ^ N #122506

cc @albanD

[ghstack-poisoned]

pytorch-bot · 2024-03-22T17:19:48Z

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/122506

📄 Preview Python docs built from this PR
📄 Preview C++ docs built from this PR
❓ Need help or want to give feedback on the CI? Visit the bot commands wiki or our office hours

Note: Links to docs will display an error until the docs builds have been completed.

✅ You can merge normally! (1 Unrelated Failure)

As of commit cab36fa with merge base 3db64c1 ():

FLAKY - The following job failed but was likely due to flakiness present on trunk:

pull / linux-focal-cuda12.1-py3.10-gcc9 / test (default, 3, 5, linux.4xlarge.nvidia.gpu) (gh)
test_foreach.py::TestForeachCUDA::test_binary_op_list_error_cases__foreach_add_cuda_bfloat16

This comment was automatically generated by Dr. CI and updates every 15 minutes.

As in the title. The speed-ups for a particular set of input sizes range from about 7 to 85 % depending on the used BSR tensor block sizes. cc albanD [ghstack-poisoned]

ghstack-source-id: e8c373c Pull Request resolved: #122506

pearu · 2024-03-23T08:03:34Z

@pytorchbot merge

pytorchmergebot · 2024-03-23T08:06:26Z

Merge started

Your change will be merged once all checks pass (ETA 0-4 Hours).

Learn more about merging in the wiki.

Questions? Feedback? Please reach out to the PyTorch DevX Team

Advanced Debugging

Check the merge workflow status
here

As in the title. The speed-ups for a particular set of input sizes range from about 7 to 85 % depending on the used BSR tensor block sizes. Pull Request resolved: #122506 Approved by: https://github.com/cpuhrsch

Update bsr_dense_addmm kernel parameters for sizes 3 x 2 ^ N

4b9e44a

[ghstack-poisoned]

pytorch-bot bot added the release notes: sparse release notes category label Mar 22, 2024

pearu self-assigned this Mar 22, 2024

pearu added topic: not user facing topic category topic: performance topic category labels Mar 22, 2024

pearu requested a review from cpuhrsch March 22, 2024 17:25

pytorchbot added the open source label Mar 22, 2024

cpuhrsch approved these changes Mar 22, 2024

View reviewed changes

cpuhrsch added the skip-pr-sanity-checks label Mar 22, 2024

Update on "Update bsr_dense_addmm kernel parameters for sizes 3 x 2 ^ N"

cab36fa

As in the title. The speed-ups for a particular set of input sizes range from about 7 to 85 % depending on the used BSR tensor block sizes. cc albanD [ghstack-poisoned]

pearu added a commit that referenced this pull request Mar 22, 2024

Update bsr_dense_addmm kernel parameters for sizes 3 x 2 ^ N

b305226

ghstack-source-id: e8c373c Pull Request resolved: #122506

pytorch-bot bot added the ciflow/trunk Trigger trunk jobs on your pull request label Mar 23, 2024

pytorchmergebot added the merging label Mar 23, 2024

pytorchmergebot closed this in a39e638 Mar 23, 2024

pytorchmergebot added Merged and removed merging labels Mar 23, 2024

Aidyn-A mentioned this pull request Apr 17, 2024

test_triton_scaled_dot_product_attention_block_size_16_cuda_bfloat16 is broken on A100 #124333

Closed

github-actions bot deleted the gh/pearu/125/head branch April 23, 2024 01:56

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

Update bsr_dense_addmm kernel parameters for sizes 3 x 2 ^ N #122506

Update bsr_dense_addmm kernel parameters for sizes 3 x 2 ^ N #122506

Uh oh!

pearu commented Mar 22, 2024 •

edited

Loading

Uh oh!

pytorch-bot bot commented Mar 22, 2024 •

edited

Loading

Uh oh!

pearu commented Mar 23, 2024

Uh oh!

pytorchmergebot commented Mar 23, 2024

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

Uh oh!

Update bsr_dense_addmm kernel parameters for sizes 3 x 2 ^ N #122506

Update bsr_dense_addmm kernel parameters for sizes 3 x 2 ^ N #122506

Uh oh!

Conversation

pearu commented Mar 22, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

pytorch-bot bot commented Mar 22, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/122506

✅ You can merge normally! (1 Unrelated Failure)

Uh oh!

pearu commented Mar 23, 2024

Uh oh!

pytorchmergebot commented Mar 23, 2024

Merge started

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

pearu commented Mar 22, 2024 •

edited

Loading

pytorch-bot bot commented Mar 22, 2024 •

edited

Loading