Updates copy operations to use improved vectorization #44

LoserCheems · 2025-06-30T14:49:23Z

Replaces generic copy struct with AutoVectorizingCopyWithAssumedAlignment for better memory access patterns.

Reduces vector layout from 8 to 4 values per read for ZOH and ActiveMask operations to optimize memory bandwidth usage.

Replaces generic copy struct with AutoVectorizingCopyWithAssumedAlignment for better memory access patterns. Reduces vector layout from 8 to 4 values per read for ZOH and ActiveMask operations to optimize memory bandwidth usage.

Copilot

Pull Request Overview

This PR updates copy operations to leverage improved vectorization for better memory access patterns. Key changes include:

Replacing Gmem_copy_struct with AutoVectorizingCopyWithAssumedAlignment for ZOH and ActiveMask copy operations
Reducing the vector layout from 8 to 4 values per read for ZOH and ActiveMask operations to optimize memory bandwidth usage

Comments suppressed due to low confidence (2)

csrc/src/kernel_traits.h:157

Consider adding a brief inline comment or documentation note explaining why AutoVectorizingCopyWithAssumedAlignment with a 64-byte alignment is used for ZOH and ActiveMask, especially given that a different alignment (128) is used for the 'O' operation.

        make_tiled_copy(Copy_Atom<AutoVectorizingCopyWithAssumedAlignment<64>, Element>{},

csrc/src/kernel_traits.h:159

Ensure that tests cover the new vector layout configuration to confirm that reducing from 8 to 4 values per read does not introduce unintended behavior.

                        Layout<Shape<_1, _4>>{}));      // Val layout, 4 vals per read

Updates copy operations to use improved vectorization

10b22a2

Replaces generic copy struct with AutoVectorizingCopyWithAssumedAlignment for better memory access patterns. Reduces vector layout from 8 to 4 values per read for ZOH and ActiveMask operations to optimize memory bandwidth usage.

LoserCheems requested review from Evanwu1125, SNHuan, Copilot and wubingheng111 June 30, 2025 14:49

LoserCheems assigned SNHuan, Evanwu1125, wubingheng111 and LoserCheems Jun 30, 2025

LoserCheems added the bug Something isn't working label Jun 30, 2025

Copilot AI reviewed Jun 30, 2025

View reviewed changes

LoserCheems merged commit ce9873c into main Jun 30, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Updates copy operations to use improved vectorization #44

Updates copy operations to use improved vectorization #44

Uh oh!

LoserCheems commented Jun 30, 2025

Uh oh!

Copilot AI left a comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

Updates copy operations to use improved vectorization #44

Updates copy operations to use improved vectorization #44

Uh oh!

Conversation

LoserCheems commented Jun 30, 2025

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull Request Overview

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants