Skip to content

Fix the fixed warp tile used in global to shared memory load/store. #18

@lcy-seso

Description

@lcy-seso

The current global-to-shared load uses a fixed 16x16 base tile to align with TensorCore's warp tile requirement. This approach results in inefficiency when the overall problem size is large enough to support a larger warp tile for coalescing memory access.

Metadata

Metadata

Assignees

Labels

enhancementNew feature or request

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions