Skip to content

Conversation

@raikonenfnu
Copy link
Contributor

With the IREE bump, we can now properly use resetOffset to enable > 4GB buffer ops. It works because Wave generates:

memref.reinterpret_cast w/ workgroup offset + buffer_cast w/ resetOffset

With resetOffset, amdgpu.raw_buffer_cast will make the base pointer as the memref's base_ptr + workgroup_offset from reinterpret_cast. This makes the offsets during the read much much smaller allowing us to do > 4GB buffer loads/writes.

We also enable the tests for batched GEMM mxpf4 that has > 4GB to use bufferOps.

With the IREE bump, we can now properly use resetOffset to enable > 4GB
buffer ops. It works because Wave generates:

memref.reinterpret_cast w/ workgroup offset + buffer_cast w/ resetOffset

With resetOffset, amdgpu.raw_buffer_cast will make the base pointer as
the memref's base_ptr + workgroup_offset from reinterpret_cast. This
makes the offsets during the read much much smaller allowing us to do >
4GB buffer loads/writes.

Signed-off-by: Stanley Winata <stanley.winata@amd.com>
@raikonenfnu raikonenfnu changed the title [Wave] Enable 4G using resetOffset [Wave] Enable > 4GB bufferOps using resetOffset Jul 23, 2025
@raikonenfnu raikonenfnu merged commit 47518bf into iree-org:main Jul 23, 2025
12 of 15 checks passed
badgerbroch pushed a commit to badgerbroch/wave that referenced this pull request Jul 29, 2025
With the IREE bump, we can now properly use resetOffset to enable > 4GB
buffer ops. It works because Wave generates:

memref.reinterpret_cast w/ workgroup offset + buffer_cast w/ resetOffset

With resetOffset, amdgpu.raw_buffer_cast will make the base pointer as
the memref's base_ptr + workgroup_offset from reinterpret_cast. This
makes the offsets during the read much much smaller allowing us to do >
4GB buffer loads/writes.

We also enable the tests for batched GEMM mxpf4 that has > 4GB to use
bufferOps.

Signed-off-by: Stanley Winata <stanley.winata@amd.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants