Dense->BSR performance improvment #83085

amjames · 2022-08-09T17:16:33Z

Stack from ghstack (oldest at bottom):

Applies the algorithm for re-batching compressed indices to avoid n-batch kernel launches. This is an optimization for dim()>= 3 inputs and does not change behavior in any way.

cc @alexsamardzic @nikitaved @pearu @cpuhrsch @bhosmer

[ghstack-poisoned]

facebook-github-bot · 2022-08-09T17:16:43Z

🔗 Helpful links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/83085
✖️ Python docs build was skipped
✖️ C++ docs build was skipped
❓Need help or want to give feedback on the CI? Visit our office hours

❌ 1 New Failures

As of commit e1cfaa6 (more details on the Dr. CI page):

Expand to see more

1/1 failures introduced in this PR

🕵️‍♀️ 1 failure not recognized by patterns:

The following CI failures may be due to changes from the PR

Job	Step
^build	^Unknown

This comment was automatically generated by Dr. CI (expand for details).

Please report bugs/suggestions to the (internal) Dr. CI Users group.

Click here to manually regenerate this comment.

Applies the algorithm for re-batching compressed indices to avoid n-batch kernel launches. This is an optimization for `dim()>= 3` inputs and does not change behavior in any way. [ghstack-poisoned]

bhosmer

Not just a perf improvement but (going by the deleted comments) also relaxes the same-sparsity-pattern restriction, right? Which if so is as big a deal as decent perf I'd say

amjames · 2022-08-10T18:54:19Z

aten/src/ATen/native/TensorConversions.cpp

-    // This requirement is not included in Pearu's blog post on BSR invariants.
-    // He specifically states that different batches may have different sparsity
-    // patterns as long as the number of specified elements is the same for all
+    // If the input is n-d we require that nnz is the same for all


@bhosmer if this is the comment you are referring to. The sparsity pattern restriction was already relaxed at the time when batch dim support was added. The first three lines were from work by @cpuhrsch which added support for 1 batch dim (3-D) inputs. I relaxed the requirement to match the BSR invariants document, after I took over that work to generalize it to arbitrary number of batch dims, but left the original comment + my note so I would remember to point it out in review.

So this change does not introduce the relaxation of the restriction on the sparsity pattern, only removes the confusing comment taking about it.

Applies the algorithm for re-batching compressed indices to avoid n-batch kernel launches. This is an optimization for `dim()>= 3` inputs and does not change behavior in any way. [ghstack-poisoned]

nikitaved

Awesome! Faster and less code - double win!

Applies the algorithm for re-batching compressed indices to avoid n-batch kernel launches. This is an optimization for `dim()>= 3` inputs and does not change behavior in any way. [ghstack-poisoned]

amjames · 2022-09-01T16:00:44Z

@pytorchbot merge

pytorchmergebot · 2022-09-01T16:02:19Z

Merge failed

Reason: Command git -C /home/runner/work/pytorch/pytorch cherry-pick -x ef0f5a1d95580baf5ea4d0f2eaf283fa2f20c429 returned non-zero exit code 1

Auto-merging aten/src/ATen/native/TensorConversions.cpp
Auto-merging test/test_sparse_csr.py
CONFLICT (content): Merge conflict in test/test_sparse_csr.py
error: could not apply ef0f5a1d95... Dense -> CSR support batch dimensions
hint: After resolving the conflicts, mark them with
hint: "git add/rm <pathspec>", then run
hint: "git cherry-pick --continue".
hint: You can instead skip this commit with "git cherry-pick --skip".
hint: To abort and get back to the state before "git cherry-pick",
hint: run "git cherry-pick --abort".

If you believe this is an error, you can use the old behavior with @pytorchbot merge -g (optionally with the ciflow/trunk to get land checks) or use @pytorchbot merge -f "some reason here". For more information, see the bot wiki.

Please reach out to the PyTorch DevX Team with feedback or questions!

Details for Dev Infra team

Raised by workflow job

Applies the algorithm for re-batching compressed indices to avoid n-batch kernel launches. This is an optimization for `dim()>= 3` inputs and does not change behavior in any way. [ghstack-poisoned]

Applies the algorithm for re-batching compressed indices to avoid n-batch kernel launches. This is an optimization for `dim()>= 3` inputs and does not change behavior in any way. Pull Request resolved: #83085 Approved by: https://github.com/bhosmer, https://github.com/nikitaved

Summary: Applies the algorithm for re-batching compressed indices to avoid n-batch kernel launches. This is an optimization for `dim()>= 3` inputs and does not change behavior in any way. Pull Request resolved: #83085 Approved by: https://github.com/bhosmer, https://github.com/nikitaved Test Plan: contbuild & OSS CI, see https://hud.pytorch.org/commit/pytorch/pytorch/a5a01e443ce1dd8e31ef7d0b3fd6a2359881a922 Reviewed By: mehtanirav, izaitsevfb Differential Revision: D39277549 fbshipit-source-id: d3df886ac6ba688afd2fc8539a57ad6878fbc23e

Dense->BSR performance improvment

0afad70

[ghstack-poisoned]

This was referenced Aug 9, 2022

Dense <-> bsc conversions #80781

Closed

Dense -> CSR support batch dimensions #83084

Closed

Dense -> CSC support batch dimensions #83086

Closed

facebook-github-bot added the cla signed label Aug 9, 2022

This was referenced Aug 9, 2022

Sparse Compressed Transpose add support for Batch dims and BSR/BSC layouts #82122

Closed

Support BSC->CSC conversion #82121

Closed

amjames requested review from bhosmer and nikitaved August 9, 2022 17:17

amjames added the module: sparse Related to torch.sparse label Aug 9, 2022

pytorchbot added the open source label Aug 9, 2022

Update on "Dense->BSR performance improvment"

817ce24

Applies the algorithm for re-batching compressed indices to avoid n-batch kernel launches. This is an optimization for `dim()>= 3` inputs and does not change behavior in any way. [ghstack-poisoned]

bhosmer approved these changes Aug 10, 2022

View reviewed changes

amjames commented Aug 10, 2022

View reviewed changes

amjames added 5 commits August 17, 2022 16:08

Update on "Dense->BSR performance improvment"

f194fb0

Applies the algorithm for re-batching compressed indices to avoid n-batch kernel launches. This is an optimization for `dim()>= 3` inputs and does not change behavior in any way. [ghstack-poisoned]

Update on "Dense->BSR performance improvment"

eab1a2f

Applies the algorithm for re-batching compressed indices to avoid n-batch kernel launches. This is an optimization for `dim()>= 3` inputs and does not change behavior in any way. [ghstack-poisoned]

Update on "Dense->BSR performance improvment"

ee8d9db

Applies the algorithm for re-batching compressed indices to avoid n-batch kernel launches. This is an optimization for `dim()>= 3` inputs and does not change behavior in any way. [ghstack-poisoned]

Update on "Dense->BSR performance improvment"

0587004

Applies the algorithm for re-batching compressed indices to avoid n-batch kernel launches. This is an optimization for `dim()>= 3` inputs and does not change behavior in any way. [ghstack-poisoned]

Update on "Dense->BSR performance improvment"

c2a1b96

Applies the algorithm for re-batching compressed indices to avoid n-batch kernel launches. This is an optimization for `dim()>= 3` inputs and does not change behavior in any way. [ghstack-poisoned]

nikitaved approved these changes Aug 29, 2022

View reviewed changes

amjames added 2 commits August 30, 2022 12:19

Update on "Dense->BSR performance improvment"

76c2b64

Applies the algorithm for re-batching compressed indices to avoid n-batch kernel launches. This is an optimization for `dim()>= 3` inputs and does not change behavior in any way. [ghstack-poisoned]

Update on "Dense->BSR performance improvment"

3b79368

Applies the algorithm for re-batching compressed indices to avoid n-batch kernel launches. This is an optimization for `dim()>= 3` inputs and does not change behavior in any way. [ghstack-poisoned]

Update on "Dense->BSR performance improvment"

e1cfaa6

Applies the algorithm for re-batching compressed indices to avoid n-batch kernel launches. This is an optimization for `dim()>= 3` inputs and does not change behavior in any way. [ghstack-poisoned]

pytorchmergebot closed this in a5a01e4 Sep 2, 2022

facebook-github-bot deleted the gh/amjames/14/head branch September 6, 2022 14:20

kit1980 added the Merged label Mar 24, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Dense->BSR performance improvment #83085

Dense->BSR performance improvment #83085

Uh oh!

amjames commented Aug 9, 2022 •

edited by pytorch-bot bot

Loading

Uh oh!

facebook-github-bot commented Aug 9, 2022 •

edited

Loading

🕵️‍♀️ 1 failure not recognized by patterns:

Uh oh!

bhosmer left a comment

Uh oh!

amjames Aug 10, 2022 •

edited

Loading

Uh oh!

nikitaved left a comment

Uh oh!

amjames commented Sep 1, 2022

Uh oh!

pytorchmergebot commented Sep 1, 2022

Uh oh!

Uh oh!

Dense->BSR performance improvment #83085

Dense->BSR performance improvment #83085

Uh oh!

Conversation

amjames commented Aug 9, 2022 • edited by pytorch-bot bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

facebook-github-bot commented Aug 9, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

🔗 Helpful links

❌ 1 New Failures

🕵️‍♀️ 1 failure not recognized by patterns:

Uh oh!

bhosmer left a comment

Choose a reason for hiding this comment

Uh oh!

amjames Aug 10, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

nikitaved left a comment

Choose a reason for hiding this comment

Uh oh!

amjames commented Sep 1, 2022

Uh oh!

pytorchmergebot commented Sep 1, 2022

Merge failed

Uh oh!

Uh oh!

amjames commented Aug 9, 2022 •

edited by pytorch-bot bot

Loading

facebook-github-bot commented Aug 9, 2022 •

edited

Loading

amjames Aug 10, 2022 •

edited

Loading