Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Transition to physical addressing #477

Open
2 tasks
kpet opened this issue Dec 4, 2022 · 11 comments
Open
2 tasks

Transition to physical addressing #477

kpet opened this issue Dec 4, 2022 · 11 comments

Comments

@kpet
Copy link
Owner

kpet commented Dec 4, 2022

Physical addressing ought to become the default when available but we should reach parity wrt. logical addressing on the CTS before we transition. Once we land #453, we can start working on the following:

  • Auto-detect and enable physical addressing when available
  • Fix all CTS tests that have regressed (to replace with a list of tests once the PR lands)
@rjodinchr
Copy link
Contributor

rjodinchr commented May 30, 2023

Here are the regressions I got running on an Nvidia device:

DATA ERROR

  • api/set_kernel_arg_struct_array
  • basic/async_copy_local_to_global
  • basic/async_copy_global_to_local
  • basic/async_strided_copy_global_to_local
  • basic/async_strided_copy_local_to_global
  • basic/vload_local
  • basic/vstore_local
  • basic/work_item_functions
  • compiler/compile_and_link_status_options_log
  • compiler/execute_after_serialize_reload_library
  • compiler/execute_after_simple_library_with_link
  • compiler/execute_after_two_file_link
  • compiler/multi_file_libraries
  • compiler/multiple_files
  • compiler/multiple_files_multiple_libraries
  • compiler/multiple_libraries
  • compiler/program_binary_type
  • compiler/execute_after_included_header_link
  • images/kernel_image_methods/1D-CL_SIGNED_INT32
  • images/kernel_image_methods/1D-CL_SIGNED_INT8
  • images/kernel_image_methods/1D-CL_SNORM_INT16
  • images/kernel_image_methods/1D-CL_SNORM_INT8
  • images/kernel_image_methods/1D-CL_UNORM_INT16
  • images/kernel_image_methods/1D-CL_UNORM_INT8
  • images/kernel_image_methods/1D-CL_UNORM_SHORT_565
  • images/kernel_image_methods/1D-CL_UNSIGNED_INT16
  • images/kernel_image_methods/1D-CL_UNSIGNED_INT32
  • images/kernel_image_methods/1D-CL_UNSIGNED_INT8
  • images/kernel_image_methods/1Darray-CL_FLOAT
  • images/kernel_image_methods/1Darray-CL_HALF_FLOAT
  • images/kernel_image_methods/1Darray-CL_SIGNED_INT16
  • images/kernel_image_methods/1Darray-CL_SIGNED_INT32
  • images/kernel_image_methods/1Darray-CL_SIGNED_INT8
  • images/kernel_image_methods/1Darray-CL_SNORM_INT16
  • images/kernel_image_methods/1Darray-CL_SNORM_INT8
  • images/kernel_image_methods/1Darray-CL_UNORM_SHORT_565
  • images/kernel_image_methods/2D-CL_FLOAT
  • images/kernel_image_methods/2D-CL_HALF_FLOAT
  • images/kernel_image_methods/2D-CL_SIGNED_INT16
  • images/kernel_image_methods/2D-CL_SIGNED_INT32
  • images/kernel_image_methods/2D-CL_SIGNED_INT8
  • images/kernel_image_methods/2D-CL_SNORM_INT16
  • images/kernel_image_methods/2D-CL_SNORM_INT8
  • images/kernel_image_methods/2D-CL_UNORM_INT16
  • images/kernel_image_methods/2D-CL_UNSIGNED_INT16
  • images/kernel_image_methods/2Darray-CL_FLOAT
  • images/kernel_image_methods/2Darray-CL_SIGNED_INT16
  • images/kernel_image_methods/2Darray-CL_SIGNED_INT32
  • images/kernel_image_methods/2Darray-CL_SIGNED_INT8
  • images/kernel_image_methods/2Darray-CL_SNORM_INT8
  • images/kernel_image_methods/2Darray-CL_UNORM_INT8
  • images/kernel_image_methods/2Darray-CL_UNORM_SHORT_565
  • images/kernel_image_methods/2Darray-CL_UNSIGNED_INT16
  • images/kernel_image_methods/2Darray-CL_UNSIGNED_INT8
  • images/kernel_image_methods/3D-CL_FLOAT
  • images/kernel_image_methods/3D-CL_SIGNED_INT16
  • images/kernel_image_methods/3D-CL_SIGNED_INT32
  • images/kernel_image_methods/3D-CL_SIGNED_INT8
  • images/kernel_image_methods/3D-CL_SNORM_INT8
  • images/kernel_image_methods/3D-CL_UNORM_INT16
  • images/kernel_image_methods/3D-CL_UNORM_INT8
  • images/kernel_image_methods/3D-CL_UNSIGNED_INT16
  • images/kernel_image_methods/3D-CL_UNSIGNED_INT32
  • images/kernel_image_methods/3D-CL_UNSIGNED_INT8
  • images/kernel_image_methods/1D-CL_FLOAT
  • images/kernel_image_methods/1D-CL_HALF_FLOAT
  • images/kernel_image_methods/1D-CL_SIGNED_INT16
  • images/kernel_image_methods/1Darray-CL_UNORM_INT16
  • images/kernel_image_methods/1Darray-CL_UNORM_INT8
  • images/kernel_image_methods/1Darray-CL_UNSIGNED_INT16
  • images/kernel_image_methods/1Darray-CL_UNSIGNED_INT32
  • images/kernel_image_methods/1Darray-CL_UNSIGNED_INT8
  • images/kernel_image_methods/2D-CL_UNORM_INT8
  • images/kernel_image_methods/2D-CL_UNORM_SHORT_565
  • images/kernel_image_methods/2D-CL_UNSIGNED_INT32
  • images/kernel_image_methods/2D-CL_UNSIGNED_INT8
  • images/kernel_image_methods/2Darray-CL_HALF_FLOAT
  • images/kernel_image_methods/2Darray-CL_SNORM_INT16
  • images/kernel_image_methods/2Darray-CL_UNORM_INT16
  • images/kernel_image_methods/2Darray-CL_UNSIGNED_INT32
  • images/kernel_image_methods/3D-CL_HALF_FLOAT
  • images/kernel_image_methods/3D-CL_SNORM_INT16
  • images/kernel_image_methods/3D-CL_UNORM_SHORT_565
  • vectors/vec_align_packed_struct_arr
  • profiling/read_array_struct

CRASH

  • atomics/atomic_add
  • atomics/atomic_and
  • atomics/atomic_cmpxchg
  • atomics/atomic_dec
  • atomics/atomic_max
  • atomics/atomic_min
  • atomics/atomic_or
  • atomics/atomic_sub
  • atomics/atomic_xchg
  • atomics/atomic_inc
  • atomics/atomic_xor
  • c11_atomics/atomic_compare_exchange_strong
  • c11_atomics/atomic_exchange
  • c11_atomics/atomic_fetch_add
  • c11_atomics/atomic_fetch_and
  • c11_atomics/atomic_fetch_min
  • c11_atomics/atomic_fetch_orand
  • c11_atomics/atomic_fetch_sub
  • c11_atomics/atomic_fetch_xor
  • c11_atomics/atomic_fetch_xor2
  • c11_atomics/atomic_compare_exchange_weak
  • c11_atomics/atomic_fence
  • c11_atomics/atomic_fetch_max
  • c11_atomics/atomic_fetch_or
  • half/vstore_half_rtn-wimpy
  • half/vstore_half_rtp-wimpy
  • half/vstore_half_rte-wimpy
  • half/vstore_half_rtz-wimpy
  • half/vstorea_half_rte-wimpy
  • non_uniform_work_group/non_uniform_1d_atomics
  • non_uniform_work_group/non_uniform_1d_barriers
  • non_uniform_work_group/non_uniform_2d_atomics
  • non_uniform_work_group/non_uniform_2d_barriers
  • non_uniform_work_group/non_uniform_3d_atomics
  • non_uniform_work_group/non_uniform_3d_barriers
  • non_uniform_work_group/non_uniform_3d_basic
  • non_uniform_work_group/non_uniform_other_basic
  • non_uniform_work_group/non_uniform_1d_basic
  • non_uniform_work_group/non_uniform_2d_basic
  • non_uniform_work_group/non_uniform_other_atomics

FAILED

  • buffers/buffer_read_struct
  • buffers/buffer_map_read_struct

@kpet
Copy link
Owner Author

kpet commented May 30, 2023

Thanks for posting this! Looks about the same as what I'm seeing at my end.

@rjodinchr
Copy link
Contributor

I am working on those right now.

@rjodinchr
Copy link
Contributor

I have started with buffer/buffer_read_struct which is in fact as those in DATA ERROR.
It feels like a simple one to start with, but I have trouble understanding what is wrong.

Here is the kernel and the compiled version: https://godbolt.org/z/1Ejas96Yb

The output is:

  • dst[tid].a = 3.40282346638528860e+38
  • dst[tid].b = 0

I have tried to inverse the two stores: dst[tid].a and dst[tid].b. I ended up with:

  • dst[tid].a = ((1<<16)+1)
  • dst[tid].b = 0.0

It is like only the second storing is taken into account, and the last index of the OpPtrAccessChain used to compute the storing address is not taken into account.

I have also tried to force clspv to use ulong for the index of the two OpPtrAccessChain used to compute the storing address, but it had the same behavior.

@alan-baker , @kpet , any ideas?

@alan-baker
Copy link
Contributor

The shader looks ok to me. If you make the stores have thread dependent values (e.g. store tid) is the writing occurring in the right threads? I would guess driver bug, but not sure beyond that.

@kpet
Copy link
Owner Author

kpet commented May 30, 2023

Agree the shader looks correct after a quick scan. As tempting as it is to blame drivers, I am seeing failures across two Vulkan stacks and HW from 3 vendors (NVIDIA, Mesa/AMD, Mesa/Intel). clvk is definitely not allocating the memory properly, the code in main does not have some of the flags used in my original prototype (VK_BUFFER_USAGE_SHADER_DEVICE_ADDRESS_BIT usage flag on the buffer and corresponding VK_MEMORY_ALLOCATE_DEVICE_ADDRESS_BIT memory allocation flag). After fixing this the validation layers are happy but it does not affect what I'm reading back from memory. On an Intel A750, I'm seeing all zeros. On AMD HW, I'm seeing one of the two values (the float with the test unmodified, the int if I comment out the line storing the float) but 3 bytes off where it should be. I've tried to strengthen memory barriers after kernels, to no avail. This is going to be a fun one ...

@rjodinchr
Copy link
Contributor

The shader looks ok to me. If you make the stores have thread dependent values (e.g. store tid) is the writing occurring in the right threads? I would guess driver bug, but not sure beyond that.

Yes I have tried that, the writing occurs in the right thread.

@kpet
Copy link
Owner Author

kpet commented May 31, 2023

I've got the test passing, we're not laying out the structure but all structures in the PhysicalStorageBuffer storage class must be explicitly laid out. I've hacked clspv to test but a proper solution will require more work. There might be cases where the same struct type is used to declare objects in storages classes where they must be explicitly laid out and in others where they must not but the layout decorations are applied to the type so we need multiple types. I vaguely remember that there's logic to deal with this in clspv already. Happy to hand over to @alan-baker or @rjodinchr for a full clspv solution.

@kpet
Copy link
Owner Author

kpet commented May 31, 2023

Here's the clspv hack I used for reference:

diff --git a/lib/SPIRVProducerPass.cpp b/lib/SPIRVProducerPass.cpp
index d39555e5..f1eb95bc 100644
--- a/lib/SPIRVProducerPass.cpp
+++ b/lib/SPIRVProducerPass.cpp
@@ -1918,9 +1918,9 @@ SPIRVID SPIRVProducerPassImpl::getSPIRVType(Type *Ty, bool needs_layout) {
     StructType *canonical = cast<StructType>(CanonicalType(STy));
     bool use_layout =
         (Option::SpvVersion() < SPIRVVersion::SPIRV_1_4) || needs_layout;
-    if (TypesNeedingLayout.idFor(STy) &&
+    if (/*TypesNeedingLayout.idFor(STy) &&
         (canonical == STy || !TypesNeedingLayout.idFor(canonical)) &&
-        use_layout) {
+        use_layout*/true) {
       for (unsigned MemberIdx = 0; MemberIdx < STy->getNumElements();
            MemberIdx++) {
         // Ops[0] = Structure Type ID

@alan-baker
Copy link
Contributor

I'll take a look, I would have thought this was handled already.

alan-baker added a commit to alan-baker/clspv that referenced this issue May 31, 2023
See kpet/clvk#477

* Collect physical ssbo types that need a layout early in the producer
alan-baker added a commit to google/clspv that referenced this issue Jun 1, 2023
See kpet/clvk#477

* Collect physical ssbo types that need a layout early in the producer
@rjodinchr
Copy link
Contributor

I am working on the DATA ERROR issue.
I have made some progress, but I need more time to have it fixed.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants