-
Notifications
You must be signed in to change notification settings - Fork 25.2k
Dense->BSR performance improvment #83085
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
[ghstack-poisoned]
🔗 Helpful links
❌ 1 New FailuresAs of commit e1cfaa6 (more details on the Dr. CI page): Expand to see more
🕵️♀️ 1 failure not recognized by patterns:This comment was automatically generated by Dr. CI (expand for details).Please report bugs/suggestions to the (internal) Dr. CI Users group. |
Applies the algorithm for re-batching compressed indices to avoid n-batch kernel launches. This is an optimization for `dim()>= 3` inputs and does not change behavior in any way. [ghstack-poisoned]
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not just a perf improvement but (going by the deleted comments) also relaxes the same-sparsity-pattern restriction, right? Which if so is as big a deal as decent perf I'd say
// This requirement is not included in Pearu's blog post on BSR invariants. | ||
// He specifically states that different batches may have different sparsity | ||
// patterns as long as the number of specified elements is the same for all | ||
// If the input is n-d we require that nnz is the same for all |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@bhosmer if this is the comment you are referring to. The sparsity pattern restriction was already relaxed at the time when batch dim support was added. The first three lines were from work by @cpuhrsch which added support for 1 batch dim (3-D) inputs. I relaxed the requirement to match the BSR invariants document, after I took over that work to generalize it to arbitrary number of batch dims, but left the original comment + my note so I would remember to point it out in review.
So this change does not introduce the relaxation of the restriction on the sparsity pattern, only removes the confusing comment taking about it.
Applies the algorithm for re-batching compressed indices to avoid n-batch kernel launches. This is an optimization for `dim()>= 3` inputs and does not change behavior in any way. [ghstack-poisoned]
Applies the algorithm for re-batching compressed indices to avoid n-batch kernel launches. This is an optimization for `dim()>= 3` inputs and does not change behavior in any way. [ghstack-poisoned]
Applies the algorithm for re-batching compressed indices to avoid n-batch kernel launches. This is an optimization for `dim()>= 3` inputs and does not change behavior in any way. [ghstack-poisoned]
Applies the algorithm for re-batching compressed indices to avoid n-batch kernel launches. This is an optimization for `dim()>= 3` inputs and does not change behavior in any way. [ghstack-poisoned]
Applies the algorithm for re-batching compressed indices to avoid n-batch kernel launches. This is an optimization for `dim()>= 3` inputs and does not change behavior in any way. [ghstack-poisoned]
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Awesome! Faster and less code - double win!
Applies the algorithm for re-batching compressed indices to avoid n-batch kernel launches. This is an optimization for `dim()>= 3` inputs and does not change behavior in any way. [ghstack-poisoned]
Applies the algorithm for re-batching compressed indices to avoid n-batch kernel launches. This is an optimization for `dim()>= 3` inputs and does not change behavior in any way. [ghstack-poisoned]
@pytorchbot merge |
Merge failedReason: Command
If you believe this is an error, you can use the old behavior with Please reach out to the PyTorch DevX Team with feedback or questions! Details for Dev Infra teamRaised by workflow job |
Applies the algorithm for re-batching compressed indices to avoid n-batch kernel launches. This is an optimization for `dim()>= 3` inputs and does not change behavior in any way. [ghstack-poisoned]
Applies the algorithm for re-batching compressed indices to avoid n-batch kernel launches. This is an optimization for `dim()>= 3` inputs and does not change behavior in any way. Pull Request resolved: #83085 Approved by: https://github.com/bhosmer, https://github.com/nikitaved
Applies the algorithm for re-batching compressed indices to avoid n-batch kernel launches. This is an optimization for `dim()>= 3` inputs and does not change behavior in any way. Pull Request resolved: #83085 Approved by: https://github.com/bhosmer, https://github.com/nikitaved
Summary: Applies the algorithm for re-batching compressed indices to avoid n-batch kernel launches. This is an optimization for `dim()>= 3` inputs and does not change behavior in any way. Pull Request resolved: #83085 Approved by: https://github.com/bhosmer, https://github.com/nikitaved Test Plan: contbuild & OSS CI, see https://hud.pytorch.org/commit/pytorch/pytorch/a5a01e443ce1dd8e31ef7d0b3fd6a2359881a922 Reviewed By: mehtanirav, izaitsevfb Differential Revision: D39277549 fbshipit-source-id: d3df886ac6ba688afd2fc8539a57ad6878fbc23e
Stack from ghstack (oldest at bottom):
Applies the algorithm for re-batching compressed indices to avoid n-batch kernel launches. This is an optimization for
dim()>= 3
inputs and does not change behavior in any way.cc @alexsamardzic @nikitaved @pearu @cpuhrsch @bhosmer