Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Pointer Dereference Optimization Bug in Clang-18 on ARM64 Depending on Data Patterns at Different Optimization Levels #69294

Open
gyuminb opened this issue Oct 17, 2023 · 4 comments

Comments

@gyuminb
Copy link

gyuminb commented Oct 17, 2023

Description:

When compiling the provided PoC on ARM64 architecture with Clang-18, there seems to be a pointer dereference optimization issue. The behavior of the code changes based on different optimization levels, and it's influenced by the data patterns used as well as the structure of adjacent printf calls. For some data patterns, the issue is observed across optimization levels O1 to O3. Intriguingly, when replacing two identical printf calls with two distinct ones before and after the problematic line, the issue exclusively appears in O3. It suggests that the optimization is influenced not just by data patterns but also by the presence and structure of adjacent print functions.

Environment:

  • Compiler: Clang-18
  • Target Architecture: ARM64
  • Optimization Level: This issue is noticeable at O1, O2, and O3 depending on the data patterns used. For patterns like 0x123456789abcdeff, the issue can be observed from to , but for patterns like 0x1234567fffffffff, it exclusively appears at .
  • OS: Ubuntu 22.04.2

PoC:

#include <stdio.h>
#include <stdint.h>
struct StructA {
   uint32_t  val1;
   const int8_t  val2;
   uint64_t  val3;
   uint16_t  val4;
};

union UnionB {
   uint32_t  u_val1;
   struct StructA  s_val;
   uint32_t  u_val2;
   int32_t   u_val3;
   int32_t   u_val4;
   uint64_t  u_val5;
};

static union UnionB main_union = {1UL};
static uint32_t *ptr_val1 = &main_union.s_val.val1;
static uint32_t **double_ptr = &ptr_val1;

static uint32_t ***triple_ptr = &double_ptr;

int main() {
    printf("main_union.u_val5: %lx\n", main_union.u_val5);

    uint32_t **local_double_ptr = &ptr_val1;
    uint64_t local_val = 0x123456789abcedffLL;
    uint64_t *local_ptr = &main_union.u_val5;

    (*local_ptr) = local_val;
    (triple_ptr = &local_double_ptr);
    (***triple_ptr) = 0UL;

    printf("main_union.u_val5: %lx\n", main_union.u_val5);

    return 0;
}

Expected Behavior:

The value of main_union.u_val5 should be consistent across different optimization levels after the pointer dereference operation.

Observed Behavior:

he value of main_union.u_val5 changes depending on the optimization level, data patterns, and the structure of adjacent printf calls.

Analysis:

The optimization seems to overlook the (**triple_ptr) = 0UL; operation. The discrepancy in output, depending on the structure of printf calls and data patterns, indicates a misoptimization during the compilation process. Notably, when changing the structure of the printf statement or using a data pattern with repeating digits, the issue singularly appears in O3 optimization level. This brings to light the complex nature of this optimization bug that is sensitive to both the data patterns and surrounding code structures.

Steps to Reproduce:

  1. Compile the PoC code using Clang-18 on ARM64 with various optimization levels (O1, O2, and O3).
  2. Execute the compiled binary.
  3. Observe the inconsistent behavior dependent on optimization level, data patterns, and printf structure.

Evidence:

The following output showcases the behavior for various optimization levels:


O0 Output:
main_union.u_val5: 1
main_union.u_val5: 1234567800000000

O1 Output:
main_union.u_val5: 1
main_union.u_val5: 123456789abcdeff

O2 Output:
main_union.u_val5: 1
main_union.u_val5: 123456789abcdeff

O3 Output:
main_union.u_val5: 1
main_union.u_val5: 123456789abcdeff

What's intriguing is that when we replace two identical printf calls before and after the problematic line with two distinct printf calls, such as:

printf("Before main_union.u_val5: %lx\n", main_union.u_val5);

and

printf("After main_union.u_val5: %lx\n", main_union.u_val5);

the issue only manifests at O3 optimization level.

Conclusion:

Across different optimization levels (O1 to O3), there is a clear evidence of a bug likely resulting from incorrect compiler optimization. The unique scenarios under which this bug emerges, especially when altering the printf structures or data patterns, further underline the unpredictable nature of this issue. This bug certainly requires attention to ensure consistent and correct behavior across all optimization levels.

@llvmbot
Copy link

llvmbot commented Oct 17, 2023

@llvm/issue-subscribers-backend-aarch64

Author: None (gyuminb)

### **Description:**

When compiling the provided PoC on ARM64 architecture with Clang-18, there seems to be a pointer dereference optimization issue. The behavior of the code changes based on different optimization levels, and it's influenced by the data patterns used as well as the structure of adjacent printf calls. For some data patterns, the issue is observed across optimization levels O1 to O3. Intriguingly, when replacing two identical printf calls with two distinct ones before and after the problematic line, the issue exclusively appears in O3. It suggests that the optimization is influenced not just by data patterns but also by the presence and structure of adjacent print functions.

Environment:

  • Compiler: Clang-18
  • Target Architecture: ARM64
  • Optimization Level: This issue is noticeable at O1, O2, and O3 depending on the data patterns used. For patterns like 0x123456789abcdeff, the issue can be observed from to , but for patterns like 0x1234567fffffffff, it exclusively appears at .
  • OS: Ubuntu 22.04.2

PoC:

cCopy code
#include &lt;stdio.h&gt;#include &lt;stdint.h&gt;struct StructA {
   uint32_t  val1;
   const int8_t  val2;
   uint64_t  val3;
   uint16_t  val4;
};

union UnionB {
   uint32_t  u_val1;
   struct StructA  s_val;
   uint32_t  u_val2;
   int32_t   u_val3;
   int32_t   u_val4;
   uint64_t  u_val5;
};

static union UnionB main_union = {1UL};
static uint32_t *ptr_val1 = &amp;main_union.s_val.val1;
static uint32_t **double_ptr = &amp;ptr_val1;

static uint32_t ***triple_ptr = &amp;double_ptr;

int main() {
    printf("Before main_union.u_val5: %lx\n", main_union.u_val5);

    uint32_t **local_double_ptr = &amp;ptr_val1;
    uint64_t local_val = 0x123456789abcedffLL;
    uint64_t *local_ptr = &amp;main_union.u_val5;

    (*local_ptr) = local_val;
    (triple_ptr = &amp;local_double_ptr);
    (***triple_ptr) = 0UL;

    printf("After main_union.u_val5: %lx\n", main_union.u_val5);

    return 0;
}

Expected Behavior:

The value of main_union.u_val5 should be consistent across different optimization levels after the pointer dereference operation.

Observed Behavior:

he value of main_union.u_val5 changes depending on the optimization level, data patterns, and the structure of adjacent printf calls.

Analysis:

The optimization seems to overlook the (**triple_ptr) = 0UL; operation. The discrepancy in output, depending on the structure of printf calls and data patterns, indicates a misoptimization during the compilation process. Notably, when changing the structure of the printf statement or using a data pattern with repeating digits, the issue singularly appears in O3 optimization level. This brings to light the complex nature of this optimization bug that is sensitive to both the data patterns and surrounding code structures.

Steps to Reproduce:

  1. Compile the PoC code using Clang-18 on ARM64 with various optimization levels (O1, O2, and O3).
  2. Execute the compiled binary.
  3. Observe the inconsistent behavior dependent on optimization level, data patterns, and printf structure.

Evidence:

The following output showcases the behavior for various optimization levels:


O0 Output:
main_union.u_val5: 1
main_union.u_val5: 1234567800000000

O1 Output:
main_union.u_val5: 1
main_union.u_val5: 123456789abcdeff

O2 Output:
main_union.u_val5: 1
main_union.u_val5: 123456789abcdeff

O3 Output:
main_union.u_val5: 1
main_union.u_val5: 123456789abcdeff

What's intriguing is that when we replace two identical printf calls before and after the problematic line with two distinct printf calls, such as:

printf("Before main_union.u_val5: %lx\n", main_union.u_val5);

and

printf("After main_union.u_val5: %lx\n", main_union.u_val5);

the issue only manifests at O3 optimization level.

Conclusion:

Across different optimization levels (O1 to O3), there is a clear evidence of a bug likely resulting from incorrect compiler optimization. The unique scenarios under which this bug emerges, especially when altering the printf structures or data patterns, further underline the unpredictable nature of this issue. This bug certainly requires attention to ensure consistent and correct behavior across all optimization levels.

@topperc
Copy link
Collaborator

topperc commented Oct 17, 2023

Does it give consistent results with -fno-strict-aliasing?

@gyuminb gyuminb closed this as completed Oct 18, 2023
@gyuminb gyuminb reopened this Oct 18, 2023
@gyuminb
Copy link
Author

gyuminb commented Oct 18, 2023

Does it give consistent results with -fno-strict-aliasing?

Thank you for the suggestion, topperc.

While the use of -fno-strict-aliasing provides consistent results, it's worth noting that the observed issue is specific to the ARM64 architecture when compiled without this option. On other architectures like x86-64, and with other compilers like GCC, the behavior is consistent and as expected, even without the -fno-strict-aliasing flag.

This specificity suggests that there might be a deeper underlying issue with the ARM64 architecture optimization in Clang-18, rather than just a generic strict aliasing concern. Would appreciate further insights into this matter.

@smithp35
Copy link
Collaborator

I've taken a brief look at the Arm and AArch64 code. They both generate the same IL

{code}
define hidden i32 @main() local_unnamed_addr #0 {
  %1 = alloca ptr, align 8
  %2 = load i64, ptr @main_union, align 8, !tbaa !5
  %3 = tail call i32 (ptr, ...) @__2printf(ptr noundef nonnull dereferenceable(1) @.str, i64 noundef %2) #2
  call void @llvm.lifetime.start.p0(i64 8, ptr nonnull %1) #2
  store ptr @ptr_val1, ptr %1, align 8, !tbaa !8
  store i64 1311768467463794175, ptr @main_union, align 8, !tbaa !10
  store ptr %1, ptr @triple_ptr, align 8, !tbaa !8
  %4 = load ptr, ptr @ptr_val1, align 8, !tbaa !8
  store i32 0, ptr %4, align 4, !tbaa !12
  %5 = load i64, ptr @main_union, align 8, !tbaa !5
  %6 = call i32 (ptr, ...) @__2printf(ptr noundef nonnull dereferenceable(1) @.str, i64 noundef %5) #2
  call void @llvm.lifetime.end.p0(i64 8, ptr nonnull %1) #2
  ret i32 0
}

I'm not well versed enough in TBAA to trace all this through, although I note that store of 0 and the store of the initializing value have different values of TBAA. An expert may be able to help point out an aliasing problem.

Looking at the generated code between Arm and AArch64 it looks like this is a matter of scheduling. The write of 0 happens before the write of the initializing value which removes it. By selecting a different CPU with a different scheduling model like cortex-53 the write of 0 happens after the initialization so you get the answer you expect.

I think this adds some weight to an aliasing problem rather than a specific bug in the AArch64 backend although I couldn't be sure. I normally work on linkers.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

5 participants