-
Notifications
You must be signed in to change notification settings - Fork 39
Adds row stride support to offset calculation methods #22
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Updates offset calculation logic to conditionally use row stride when sum_s_q is not -1, providing more flexible memory access patterns for different data layouts. Adds row_stride parameter to both zoh_offset and active_mask_offset methods to support this new calculation mode.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull Request Overview
This PR extends offset calculation in BlockInfo to support a row_stride when sum_s_q is set, enabling flexible memory layouts.
- Added
row_strideparameter to bothzoh_offsetandactive_mask_offsetsignatures - Updated offset logic to use
row_stridewhensum_s_q != -1 - Retained legacy behavior when
sum_s_q == -1
Comments suppressed due to low confidence (1)
csrc/src/block_info.h:40
- The new
row_stridepath (sum_s_q != -1) isn’t exercised by existing tests; consider adding unit tests to cover behavior whensum_s_qis set.
) const {
| template <typename index_t> | ||
| __forceinline__ __device__ index_t zoh_offset(const index_t batch_stride, const int bidb | ||
| __forceinline__ __device__ index_t zoh_offset(const index_t batch_stride, const int row_stride, const int bidb | ||
| ) const { |
Copilot
AI
Jun 23, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The row_stride parameter is declared as int while batch_stride uses index_t; consider making row_stride an index_t as well to maintain consistency and avoid implicit narrowing or overflow.
| template <typename index_t> | ||
| __forceinline__ __device__ index_t active_mask_offset(const index_t batch_stride, const int bidb) const { | ||
| return bidb * batch_stride; | ||
| __forceinline__ __device__ index_t active_mask_offset(const index_t batch_stride, int row_stride, const int bidb) const { |
Copilot
AI
Jun 23, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The row_stride parameter here omits the const qualifier present in zoh_offset; for consistency, mark it const int or better yet const index_t.
| __forceinline__ __device__ index_t active_mask_offset(const index_t batch_stride, int row_stride, const int bidb) const { | |
| __forceinline__ __device__ index_t active_mask_offset(const index_t batch_stride, const int row_stride, const int bidb) const { |
| __forceinline__ __device__ index_t zoh_offset(const index_t batch_stride, const int row_stride, const int bidb | ||
| ) const { | ||
| return bidb * batch_stride; | ||
| return sum_s_q == -1 ? bidb * batch_stride : uint32_t(sum_s_q) * row_stride; |
Copilot
AI
Jun 23, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Casting sum_s_q to uint32_t may narrow its value and doesn’t match index_t; consider using static_cast<index_t>(sum_s_q) * static_cast<index_t>(row_stride).
| __forceinline__ __device__ index_t zoh_offset(const index_t batch_stride, const int row_stride, const int bidb | ||
| ) const { | ||
| return bidb * batch_stride; | ||
| return sum_s_q == -1 ? bidb * batch_stride : uint32_t(sum_s_q) * row_stride; | ||
| } | ||
|
|
||
| template <typename index_t> | ||
| __forceinline__ __device__ index_t active_mask_offset(const index_t batch_stride, const int bidb) const { | ||
| return bidb * batch_stride; | ||
| __forceinline__ __device__ index_t active_mask_offset(const index_t batch_stride, int row_stride, const int bidb) const { | ||
| return sum_s_q == -1 ? bidb * batch_stride : uint32_t(sum_s_q) * row_stride; |
Copilot
AI
Jun 23, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The logic in zoh_offset and active_mask_offset is identical; consider extracting the conditional expression into a shared helper to reduce duplication.
Updates offset calculation logic to conditionally use row stride when sum_s_q is not -1, providing more flexible memory access patterns for different data layouts.
Adds row_stride parameter to both zoh_offset and active_mask_offset methods to support this new calculation mode.