Transition to physical addressing #477

kpet · 2022-12-04T20:15:02Z

Physical addressing ought to become the default when available but we should reach parity wrt. logical addressing on the CTS before we transition. Once we land #453, we can start working on the following:

Auto-detect and enable physical addressing when available
Fix all CTS tests that have regressed (to replace with a list of tests once the PR lands)

rjodinchr · 2023-05-30T08:36:53Z

Here are the regressions I got running on an Nvidia device:

DATA ERROR

CRASH

FAILED

buffers/buffer_read_struct
buffers/buffer_map_read_struct

kpet · 2023-05-30T09:26:56Z

Thanks for posting this! Looks about the same as what I'm seeing at my end.

rjodinchr · 2023-05-30T10:03:39Z

I am working on those right now.

rjodinchr · 2023-05-30T13:23:10Z

I have started with buffer/buffer_read_struct which is in fact as those in DATA ERROR.
It feels like a simple one to start with, but I have trouble understanding what is wrong.

Here is the kernel and the compiled version: https://godbolt.org/z/1Ejas96Yb

The output is:

dst[tid].a = 3.40282346638528860e+38
dst[tid].b = 0

I have tried to inverse the two stores: dst[tid].a and dst[tid].b. I ended up with:

dst[tid].a = ((1<<16)+1)
dst[tid].b = 0.0

It is like only the second storing is taken into account, and the last index of the OpPtrAccessChain used to compute the storing address is not taken into account.

I have also tried to force clspv to use ulong for the index of the two OpPtrAccessChain used to compute the storing address, but it had the same behavior.

@alan-baker , @kpet , any ideas?

alan-baker · 2023-05-30T19:04:09Z

The shader looks ok to me. If you make the stores have thread dependent values (e.g. store tid) is the writing occurring in the right threads? I would guess driver bug, but not sure beyond that.

kpet · 2023-05-30T20:29:40Z

Agree the shader looks correct after a quick scan. As tempting as it is to blame drivers, I am seeing failures across two Vulkan stacks and HW from 3 vendors (NVIDIA, Mesa/AMD, Mesa/Intel). clvk is definitely not allocating the memory properly, the code in main does not have some of the flags used in my original prototype (VK_BUFFER_USAGE_SHADER_DEVICE_ADDRESS_BIT usage flag on the buffer and corresponding VK_MEMORY_ALLOCATE_DEVICE_ADDRESS_BIT memory allocation flag). After fixing this the validation layers are happy but it does not affect what I'm reading back from memory. On an Intel A750, I'm seeing all zeros. On AMD HW, I'm seeing one of the two values (the float with the test unmodified, the int if I comment out the line storing the float) but 3 bytes off where it should be. I've tried to strengthen memory barriers after kernels, to no avail. This is going to be a fun one ...

rjodinchr · 2023-05-31T08:09:28Z

The shader looks ok to me. If you make the stores have thread dependent values (e.g. store tid) is the writing occurring in the right threads? I would guess driver bug, but not sure beyond that.

Yes I have tried that, the writing occurs in the right thread.

kpet · 2023-05-31T19:01:29Z

I've got the test passing, we're not laying out the structure but all structures in the PhysicalStorageBuffer storage class must be explicitly laid out. I've hacked clspv to test but a proper solution will require more work. There might be cases where the same struct type is used to declare objects in storages classes where they must be explicitly laid out and in others where they must not but the layout decorations are applied to the type so we need multiple types. I vaguely remember that there's logic to deal with this in clspv already. Happy to hand over to @alan-baker or @rjodinchr for a full clspv solution.

kpet · 2023-05-31T19:03:50Z

Here's the clspv hack I used for reference:

diff --git a/lib/SPIRVProducerPass.cpp b/lib/SPIRVProducerPass.cpp
index d39555e5..f1eb95bc 100644
--- a/lib/SPIRVProducerPass.cpp
+++ b/lib/SPIRVProducerPass.cpp
@@ -1918,9 +1918,9 @@ SPIRVID SPIRVProducerPassImpl::getSPIRVType(Type *Ty, bool needs_layout) {
     StructType *canonical = cast<StructType>(CanonicalType(STy));
     bool use_layout =
         (Option::SpvVersion() < SPIRVVersion::SPIRV_1_4) || needs_layout;
-    if (TypesNeedingLayout.idFor(STy) &&
+    if (/*TypesNeedingLayout.idFor(STy) &&
         (canonical == STy || !TypesNeedingLayout.idFor(canonical)) &&
-        use_layout) {
+        use_layout*/true) {
       for (unsigned MemberIdx = 0; MemberIdx < STy->getNumElements();
            MemberIdx++) {
         // Ops[0] = Structure Type ID

alan-baker · 2023-05-31T19:20:07Z

I'll take a look, I would have thought this was handled already.

See kpet/clvk#477 * Collect physical ssbo types that need a layout early in the producer

rjodinchr · 2023-06-30T16:41:16Z

I am working on the DATA ERROR issue.
I have made some progress, but I need more time to have it fixed.

kpet mentioned this issue Dec 13, 2022

Add support for using buffer device addresses #453

Merged

alan-baker added a commit to alan-baker/clspv that referenced this issue May 31, 2023

Ensure types in physical storage buffers get layouts

ad19dfc

See kpet/clvk#477 * Collect physical ssbo types that need a layout early in the producer

alan-baker mentioned this issue May 31, 2023

Ensure types in physical storage buffers get layouts google/clspv#1126

Merged

alan-baker added a commit to google/clspv that referenced this issue Jun 1, 2023

Ensure types in physical storage buffers get layouts (#1126)

13a1238

See kpet/clvk#477 * Collect physical ssbo types that need a layout early in the producer

rjodinchr mentioned this issue Jul 13, 2023

fix physicalStorageBuffers when a kernel is calling another kernel google/clspv#1152

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Transition to physical addressing #477

Transition to physical addressing #477

kpet commented Dec 4, 2022

rjodinchr commented May 30, 2023 •

edited

Loading

kpet commented May 30, 2023

rjodinchr commented May 30, 2023

rjodinchr commented May 30, 2023

alan-baker commented May 30, 2023

kpet commented May 30, 2023

rjodinchr commented May 31, 2023

kpet commented May 31, 2023

kpet commented May 31, 2023

alan-baker commented May 31, 2023

rjodinchr commented Jun 30, 2023

Transition to physical addressing #477

Transition to physical addressing #477

Comments

kpet commented Dec 4, 2022

rjodinchr commented May 30, 2023 • edited Loading

DATA ERROR

CRASH

FAILED

kpet commented May 30, 2023

rjodinchr commented May 30, 2023

rjodinchr commented May 30, 2023

alan-baker commented May 30, 2023

kpet commented May 30, 2023

rjodinchr commented May 31, 2023

kpet commented May 31, 2023

kpet commented May 31, 2023

alan-baker commented May 31, 2023

rjodinchr commented Jun 30, 2023

rjodinchr commented May 30, 2023 •

edited

Loading