Skip to content

A miscompilation bug with unsigned char #36817

@aqjune

Description

@aqjune
Bugzilla Link 37469
Version trunk
OS Linux
Attachments A source file that raises the bug.
CC @chandlerc,@majnemer,@dwblaikie,@efriedma-quic,@hfinkel,@dobbelaj-snps,@MattPD,@mizvekov,@nunoplopes,@RalfJung,@regehr,@zygoloid,@sanjoy,@rotateright,@yuanfang-chen

Extended Description

$ cat test-main.c
#include <string.h>
#include <stdio.h>
#include <stdint.h>

// If f(A, B + 4) is given, and integer representation of A and B + 4
// are the same, c1 == c2 in the loop becomes true,
// so arr3 = arr1. Hence r = A, and *A should be 10.
// However, if compiled with -O3, *A is still 1.
void store_10_to_p(int *p, int *q) {
  unsigned char arr1[8];
  unsigned char arr2[8];
  unsigned char arr3[8];
  // Type punning with char* is allowed.
  memcpy((unsigned char*)arr1, (unsigned char *)&p, sizeof(p));
  memcpy((unsigned char*)arr2, (unsigned char *)&q, sizeof(q));
  // Now arr1 is p, arr2 is q.

  for (int i = 0; i < sizeof(q); i++) {
    int c1 = (int)arr1[i], c2 = (int)arr2[i];
    // Note that c1 == c2 is a comparison between _integers_ (not pointers).
    if (c1 == c2)
      // Always true if p and q have same integer representation.
      arr3[i] = arr1[i];
    else
      arr3[i] = arr2[i];
  }
  // Now arr3 is equivalent to arr1, which is p.
  int *r;
  memcpy(&r, (unsigned char *)arr3, sizeof(r));
  // Now r is p.
  *p = 1;
  *r = 10;
}

int main() {
  int A[4], B[4];
  printf("%p %p\n", A, &B[4]);
  if ((uintptr_t)A == (uintptr_t)&B[4]) {
    store_10_to_p(A, &B[4]);
    printf("%d\n", A[0]);
  }
  return 0;
}

$ clang -O3 -o test-main test-main.c
$ ./test-main
0x7fffffffe580 0x7fffffffe580
1
$ clang -O0 -o test-main test-main.c
$ ./test-main
0x7fffffffe580 0x7fffffffe580
10

This is what is happening inside LLVM:
(1) Instcombine changes the loop body to "arr3[i] = arr2[i];"
(2) Loop idiom recognizer replaces the loop with a "memcpy(arr3, arr2, 8)"
(3) Instcombine does the store forwarding from the memcpy to the load

I think this is related with lowering 'unsigned char' in C into 'i8' in LLVM.

There are two choices:
(1) Lowering 'unsigned char' into 'i8' is correct.
(2) Lowering 'unsigned char' into 'i8' is incorrect.

If (1) is right, then one of the optimizations happening in this example should be disabled, to stop the miscompilation.
If (2) is right, then it is clang which miscompiles this example. 'unsigned char' should be lowered into something else.

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions