Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Bitcast lost due to llvm's opaque pointer #1352

Open
fredchow99 opened this issue May 17, 2024 · 4 comments
Open

Bitcast lost due to llvm's opaque pointer #1352

fredchow99 opened this issue May 17, 2024 · 4 comments

Comments

@fredchow99
Copy link

This is the cut-down test case:

kernel void postprocessHistogram(global uint* hist, int grid_height,
                                 int grid_stride) {
  const size_t grid = get_global_id(0);

  int grid_x_idx = grid % grid_stride;
  int grid_y_idx = grid / grid_stride;
  int i = grid_y_idx * grid_height + grid_height / 2;

  ulong4 sum4 = {0, 0, 0, 0};
  global uint4* histAddr4 = (global uint4*)(hist+grid);
  for (uint i = 0; i < 4086; i += 4) {
    const uint4 val4 = *histAddr4;
    const uint4 indices = {i, i + 1, i + 2, i + 3};
    sum4 += convert_ulong4(val4 * indices);
    ++histAddr4;
  }
  ulong sum = sum4.x + sum4.y + sum4.z + sum4.w;
  const ushort avg = convert_ushort_rte(sum / (float)(grid_height));

  ulong var = 0;
  for (int i = 0; i < 4086; i += 4) {
    ulong4 diff = {avg - i, avg - i - 1, avg - i - 2, avg - i - 3};
    diff *= diff;
    diff *= convert_ulong4(*histAddr4);
    var += diff.x;
    var += diff.y;
  }
}

The cast at this line:
global uint4* histAddr4 = (global uint4*)(hist+grid);
was dropped in the llvm ir due to opaque pointers. and the resulting llvm instruction is:
%add.ptr = getelementptr inbounds i32, ptr addrspace(1) %7, i32 %8
If the cast was not dropped, it should have been:
%add.ptr = getelementptr inbounds <4 x i32>, ptr addrspace(1) %7, i32 %8
The generated spv fails spirv validation because the OpPhi for histAddr4, whose first operand is defined by this getelementptr, has inconsistent types:

        %31 = OpAccessChain %_ptr_StorageBuffer_uint %14 %uint_0 %29
               . . .
        %35 = OpPhi %_ptr_StorageBuffer_v4uint %31 %22 %52 %33
Fail to validate SPIR-V binary:
error: 88: OpPhi's result type <id> '51[%_ptr_StorageBuffer_v4uint]' does not match incoming value <id> '31[%31]' type <id> '30[%_ptr_StorageBuffer_uint]'.
  %35 = OpPhi %_ptr_StorageBuffer_v4uint %31 %22 %52 %33

We need to recover the dropped cast type for the getelementptr instruction. I see there are type inferencing code in Types.cpp to recover lost type information. In this example, the type in the OpPhi was actually returned by InferUsersType(). If we apply type inferencing to the getelementptr, there would be a circular dependency.
Any suggestion on how to recover the lost typecast for the above getelementptr?

@rjodinchr
Copy link
Collaborator

There is obviously something wrong in the compiler here as it should emit valid SPIR-V.

But it also feels to me that this source is not correct. The compiler has no way to know that grid is a multiple of 4.
Thus global uint4* histAddr4 = (global uint4*)(hist+grid); result in a potential unaligned pointer, which lead to undefined behavior according to the specification .

A workaround for this bug would be to either:

  • define hist as global uint4* hist and change the cast if grid is known to be a multiple of 4:
- global uint4* histAddr4 = (global uint4*)(hist+grid);
+ global uint4* histAddr4 = (global uint4*)(hist + grid/4);
  • perform all the loads from hist with a vload4 which supports unaligned access.

@rjodinchr
Copy link
Collaborator

In fact grid = get_global_id(0); will not always be a multiple of 4. So a proper implementation should use vload4. Something like that:

kernel void postprocessHistogram(global uint *hist, int grid_height, int grid_stride)
{
    const size_t grid = get_global_id(0);

    int grid_x_idx = grid % grid_stride;
    int grid_y_idx = grid / grid_stride;
    int i = grid_y_idx * grid_height + grid_height / 2;

    ulong4 sum4 = { 0, 0, 0, 0 };
    hist += grid;
    for (uint i = 0; i < 4086; i += 4) {
        const uint4 val4 = vload4(0, hist);
        const uint4 indices = { i, i + 1, i + 2, i + 3 };
        sum4 += convert_ulong4(val4 * indices);
        hist += 4;
    }
    ulong sum = sum4.x + sum4.y + sum4.z + sum4.w;
    const ushort avg = convert_ushort_rte(sum / (float)(grid_height));

    ulong var = 0;
    for (int i = 0; i < 4086; i += 4) {
        ulong4 diff = { avg - i, avg - i - 1, avg - i - 2, avg - i - 3 };
        diff *= diff;
        diff *= convert_ulong4(vload4(0, hist));
        var += diff.x;
        var += diff.y;
    }
}

Which seems to be well compiled by clspv according to spirv-val.

It leads to this kind of pattern in the SPIR-V:

         %37 = OpLoad %uint %34
         %39 = OpPtrAccessChain %_ptr_StorageBuffer_uint %34 %uint_1
         %40 = OpLoad %uint %39
         %42 = OpPtrAccessChain %_ptr_StorageBuffer_uint %34 %uint_2
         %43 = OpLoad %uint %42
         %45 = OpPtrAccessChain %_ptr_StorageBuffer_uint %34 %uint_3
         %46 = OpLoad %uint %45
         %47 = OpCompositeConstruct %v4uint %37 %40 %43 %46

And then it's up to the Vulkan SPIR-V compiler to know whether or not the hardware supports unaligned loads and can coalesce those loads into one I guess.

@fredchow99
Copy link
Author

The point that the casting may cause run-time alignment issue is well taken. But not sure if similar invalid spirv (phi with inconsistent types) will show up with a different kind of cast.

We'll do the workaround in the source code. Thanks.

@rjodinchr
Copy link
Collaborator

I agree, there is a bug to fix here.
2 ideas for it:

  • Return a proper error with clspv when this kind of pattern is detected.
  • Find a way to force lowering the users of the gep to use i32 instead of v4i32.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants