Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Miscompile in x86 backend, incorrectly removed bit mask #69965

Closed
meheff opened this issue Oct 23, 2023 · 10 comments
Closed

Miscompile in x86 backend, incorrectly removed bit mask #69965

meheff opened this issue Oct 23, 2023 · 10 comments
Assignees
Labels
llvm:SelectionDAG SelectionDAGISel as well miscompilation

Comments

@meheff
Copy link

meheff commented Oct 23, 2023

EDIT: Initial post was incorrect. See comment below #69965 (comment) for correct repro.

https://godbolt.org/z/h4cWeWqP6

Input IR:

source_filename = "__module"
target datalayout = "e-m:e-p270:32:32-p271:32:32-p272:64:64-i64:64-i128:128-f80:128-n8:16:32:64-S128"

define i16 @test(i8 %_in) local_unnamed_addr #0 {
  %_1 = and i8 %_in, 127
  %_2 = xor i8 %_1, 127
  %_3 = or i8 %_2, -128
  %_4 = zext i8 %_3 to i16
  %_6 = shl nuw i16 %_4, 8
  %_7 = shl nuw i8 %_2, 1
  %_8 = zext i8 %_7 to i16
  %_9 = or i16 %_6, %_8
  ret i16 %_9
}

Resulting assembly with llc (trunk):

test:                                   # @test
        not     dil
        mov     eax, edi
        or      al, -128
        movzx   ecx, al
        shl     ecx, 8
        add     dil, dil
        movzx   eax, dil
        or      eax, ecx
        ret

It looks like the initial mask and i8 %_in, 127 is being incorrectly optimized away. -O0 looks correct.

@hongted
Copy link

hongted commented Oct 23, 2023

Suspect 2984e35 as being the incorrect optimization.

@llvmbot
Copy link
Collaborator

llvmbot commented Oct 23, 2023

@llvm/issue-subscribers-backend-x86

Author: None (meheff)

https://godbolt.org/z/h4cWeWqP6

Input IR:

source_filename = "__module"
target datalayout = "e-m:e-p270:32:32-p271:32:32-p272:64:64-i64:64-i128:128-f80:128-n8:16:32:64-S128"

define i16 @<!-- -->test(i8 %_in) local_unnamed_addr #<!-- -->0 {
  %_1 = and i8 %_in, 127
  %_2 = xor i8 %_1, 127
  %_3 = or i8 %_2, -128
  %_4 = zext i8 %_3 to i16
  %_6 = shl nuw i16 %_4, 8
  %_7 = shl nuw i8 %_2, 1
  %_8 = zext i8 %_7 to i16
  %_9 = or i16 %_6, %_8
  ret i16 %_9
}

Resulting assembly with llc (trunk):

test:                                   # @<!-- -->test
        not     dil
        mov     eax, edi
        or      al, -128
        movzx   ecx, al
        shl     ecx, 8
        add     dil, dil
        movzx   eax, dil
        or      eax, ecx
        ret

It looks like the initial mask and i8 %_in, 127 is being incorrectly optimized away. -O0 looks correct.

@RKSimon RKSimon self-assigned this Oct 23, 2023
@RKSimon
Copy link
Collaborator

RKSimon commented Oct 23, 2023

Suspect 2984e35 as being the incorrect optimization.

I might be missing something, but how can this be when the same asm is generated back on 15.x? https://godbolt.org/z/37oshEnxK

@hongted
Copy link

hongted commented Oct 23, 2023

Can't speak for 15.x and this may be a latent issue revealed by that change.

All I can say is that by backing out that change -- notably the line 2984e35#diff-b2d3b99d3a02afc2ac5e2777da4194f8281f7113e1ac580d1e9c89c988dd22e1R2648 that added the additional !Src->getFlags().hasNoUnsignedWrap() constraint -- I no longer see a similar miscompile.

@RKSimon
Copy link
Collaborator

RKSimon commented Oct 24, 2023

I think I've found the (very old...) underlying problem - TargetLowering.SimplifyDemandedBits ISD::SHL handling doesn't adjust the upper demanded bits depending on NUW/NSW

Let me take a further look.

@RKSimon RKSimon added llvm:SelectionDAG SelectionDAGISel as well and removed backend:X86 labels Oct 24, 2023
RKSimon added a commit that referenced this issue Oct 24, 2023
RKSimon added a commit to RKSimon/llvm-project that referenced this issue Oct 24, 2023
…sw/nuw ops

Matches InstCombinerImpl::SimplifyDemandedUseBits

Exposes an issue with AND(CTPOP(X),1) -> PARITY(X) fold which fails to correctly demand known zero upper bits

Fixes llvm#69965
@meheff
Copy link
Author

meheff commented Oct 24, 2023

FWIW the bug is still there after removing nuw: https://godbolt.org/z/enPndGjaW

@RKSimon
Copy link
Collaborator

RKSimon commented Oct 24, 2023

The xls original error was a miscompile affecting the final result. But the reduced ir looks just to be less optimal codegen, do you agree?

@meheff
Copy link
Author

meheff commented Oct 24, 2023

Whoops. Looks like I ended up stripping out the actual problem when I simplified things to post to the LLVM bug then I misread the assembly and convinced myself the bug was still there. Sorry for the confusion!

Here is a repro you can execute.

Instructions to compare O0 and O2 results:

# O2
$ llc test.ir -filetype=obj -o test.o
$ clang main.c test.o -o run_test
$ ./run_test
0x9c36

# O0
$ llc -O0 test.ir -filetype=obj -o test.o
$ clang main.c test.o -o run_test
$ ./run_test
0x9b36

main.c

#include <stdio.h>
#include <stdint.h>

uint64_t __sample__main(void**, void**, void*, void*, void*, int64_t);

int main (int argc, char** argv) {
  uint8_t input0 = 0x64;
  void* inputs[1] = {&input0};

  uint16_t output0;
  void* outputs[1] = {&output0};
  __sample__main(inputs, outputs, NULL, NULL, NULL, 0);

  printf("0x%x\n", output0);

  return 0;
}

test.ir

; ModuleID = '__module'
source_filename = "__module"
target datalayout = "e-m:e-p270:32:32-p271:32:32-p272:64:64-i64:64-i128:128-f80:128-n8:16:32:64-S128"

; Function Attrs: mustprogress nofree norecurse nosync nounwind willreturn memory(readwrite, inaccessiblemem: none)
define i64 @__sample__main(ptr nocapture readonly %input_ptrs, ptr nocapture readonly %output_ptrs, ptr nocapture readnone %tmp_buffer, ptr nocapture readnone %events, ptr nocapture readnone %user_data, ptr nocapture readnone %runtime, i64 %continuation_point) local_unnamed_addr #0 {
entry:
  %0 = load ptr, ptr %input_ptrs, align 8
  %.val.i = load i8, ptr %0, align 1
  %1 = and i8 %.val.i, 127
  %2 = xor i8 %1, 127
  %3 = or i8 %2, -128
  %4 = zext i8 %3 to i16
  %5 = load ptr, ptr %output_ptrs, align 8
  %6 = shl nuw i16 %4, 8
  %7 = shl nuw i8 %2, 1
  %8 = zext i8 %7 to i16
  %9 = or i16 %6, %8
  store i16 %9, ptr %5, align 2
  ret i64 0
}

RKSimon added a commit that referenced this issue Oct 26, 2023
The previous reduced test case just showed a minor codegen regression, this test now shows the actual miscompilation
zahiraam pushed a commit to zahiraam/llvm-project that referenced this issue Oct 26, 2023
The previous reduced test case just showed a minor codegen regression, this test now shows the actual miscompilation
zahiraam pushed a commit to zahiraam/llvm-project that referenced this issue Oct 26, 2023
…plify a SHL node's input

We already do this for variable shifts, but we missed it for constant shifts

Fixes llvm#69965
@RKSimon
Copy link
Collaborator

RKSimon commented Oct 27, 2023

@meheff Please can you confirm if this has now fixed the original case?

@meheff
Copy link
Author

meheff commented Oct 27, 2023

The original issue is resolved. Thanks for the quick fix!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
llvm:SelectionDAG SelectionDAGISel as well miscompilation
Projects
None yet
Development

Successfully merging a pull request may close this issue.

5 participants