Skip to content

Conversation

@LoserCheems
Copy link
Collaborator

Updates offset calculation logic to conditionally use row stride when sum_s_q is not -1, providing more flexible memory access patterns for different data layouts.

Adds row_stride parameter to both zoh_offset and active_mask_offset methods to support this new calculation mode.

Updates offset calculation logic to conditionally use row stride when sum_s_q is not -1, providing more flexible memory access patterns for different data layouts.

Adds row_stride parameter to both zoh_offset and active_mask_offset methods to support this new calculation mode.
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

This PR extends offset calculation in BlockInfo to support a row_stride when sum_s_q is set, enabling flexible memory layouts.

  • Added row_stride parameter to both zoh_offset and active_mask_offset signatures
  • Updated offset logic to use row_stride when sum_s_q != -1
  • Retained legacy behavior when sum_s_q == -1
Comments suppressed due to low confidence (1)

csrc/src/block_info.h:40

  • The new row_stride path (sum_s_q != -1) isn’t exercised by existing tests; consider adding unit tests to cover behavior when sum_s_q is set.
    ) const {

template <typename index_t>
__forceinline__ __device__ index_t zoh_offset(const index_t batch_stride, const int bidb
__forceinline__ __device__ index_t zoh_offset(const index_t batch_stride, const int row_stride, const int bidb
) const {
Copy link

Copilot AI Jun 23, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The row_stride parameter is declared as int while batch_stride uses index_t; consider making row_stride an index_t as well to maintain consistency and avoid implicit narrowing or overflow.

Copilot uses AI. Check for mistakes.
template <typename index_t>
__forceinline__ __device__ index_t active_mask_offset(const index_t batch_stride, const int bidb) const {
return bidb * batch_stride;
__forceinline__ __device__ index_t active_mask_offset(const index_t batch_stride, int row_stride, const int bidb) const {
Copy link

Copilot AI Jun 23, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The row_stride parameter here omits the const qualifier present in zoh_offset; for consistency, mark it const int or better yet const index_t.

Suggested change
__forceinline__ __device__ index_t active_mask_offset(const index_t batch_stride, int row_stride, const int bidb) const {
__forceinline__ __device__ index_t active_mask_offset(const index_t batch_stride, const int row_stride, const int bidb) const {

Copilot uses AI. Check for mistakes.
__forceinline__ __device__ index_t zoh_offset(const index_t batch_stride, const int row_stride, const int bidb
) const {
return bidb * batch_stride;
return sum_s_q == -1 ? bidb * batch_stride : uint32_t(sum_s_q) * row_stride;
Copy link

Copilot AI Jun 23, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Casting sum_s_q to uint32_t may narrow its value and doesn’t match index_t; consider using static_cast<index_t>(sum_s_q) * static_cast<index_t>(row_stride).

Copilot uses AI. Check for mistakes.
Comment on lines +39 to +46
__forceinline__ __device__ index_t zoh_offset(const index_t batch_stride, const int row_stride, const int bidb
) const {
return bidb * batch_stride;
return sum_s_q == -1 ? bidb * batch_stride : uint32_t(sum_s_q) * row_stride;
}

template <typename index_t>
__forceinline__ __device__ index_t active_mask_offset(const index_t batch_stride, const int bidb) const {
return bidb * batch_stride;
__forceinline__ __device__ index_t active_mask_offset(const index_t batch_stride, int row_stride, const int bidb) const {
return sum_s_q == -1 ? bidb * batch_stride : uint32_t(sum_s_q) * row_stride;
Copy link

Copilot AI Jun 23, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The logic in zoh_offset and active_mask_offset is identical; consider extracting the conditional expression into a shared helper to reduce duplication.

Copilot uses AI. Check for mistakes.
@LoserCheems LoserCheems merged commit c9c6f8d into main Jun 23, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants