New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Miscompile in x86 backend, incorrectly removed bit mask #69965
Comments
Suspect 2984e35 as being the incorrect optimization. |
@llvm/issue-subscribers-backend-x86 Author: None (meheff)
https://godbolt.org/z/h4cWeWqP6
Input IR:
Resulting assembly with llc (trunk):
It looks like the initial mask |
I might be missing something, but how can this be when the same asm is generated back on 15.x? https://godbolt.org/z/37oshEnxK |
Can't speak for 15.x and this may be a latent issue revealed by that change. All I can say is that by backing out that change -- notably the line 2984e35#diff-b2d3b99d3a02afc2ac5e2777da4194f8281f7113e1ac580d1e9c89c988dd22e1R2648 that added the additional |
I think I've found the (very old...) underlying problem - TargetLowering.SimplifyDemandedBits ISD::SHL handling doesn't adjust the upper demanded bits depending on NUW/NSW Let me take a further look. |
…sw/nuw ops Matches InstCombinerImpl::SimplifyDemandedUseBits Exposes an issue with AND(CTPOP(X),1) -> PARITY(X) fold which fails to correctly demand known zero upper bits Fixes llvm#69965
FWIW the bug is still there after removing nuw: https://godbolt.org/z/enPndGjaW |
The xls original error was a miscompile affecting the final result. But the reduced ir looks just to be less optimal codegen, do you agree? |
Whoops. Looks like I ended up stripping out the actual problem when I simplified things to post to the LLVM bug then I misread the assembly and convinced myself the bug was still there. Sorry for the confusion! Here is a repro you can execute. Instructions to compare O0 and O2 results:
main.c #include <stdio.h>
#include <stdint.h>
uint64_t __sample__main(void**, void**, void*, void*, void*, int64_t);
int main (int argc, char** argv) {
uint8_t input0 = 0x64;
void* inputs[1] = {&input0};
uint16_t output0;
void* outputs[1] = {&output0};
__sample__main(inputs, outputs, NULL, NULL, NULL, 0);
printf("0x%x\n", output0);
return 0;
} test.ir ; ModuleID = '__module'
source_filename = "__module"
target datalayout = "e-m:e-p270:32:32-p271:32:32-p272:64:64-i64:64-i128:128-f80:128-n8:16:32:64-S128"
; Function Attrs: mustprogress nofree norecurse nosync nounwind willreturn memory(readwrite, inaccessiblemem: none)
define i64 @__sample__main(ptr nocapture readonly %input_ptrs, ptr nocapture readonly %output_ptrs, ptr nocapture readnone %tmp_buffer, ptr nocapture readnone %events, ptr nocapture readnone %user_data, ptr nocapture readnone %runtime, i64 %continuation_point) local_unnamed_addr #0 {
entry:
%0 = load ptr, ptr %input_ptrs, align 8
%.val.i = load i8, ptr %0, align 1
%1 = and i8 %.val.i, 127
%2 = xor i8 %1, 127
%3 = or i8 %2, -128
%4 = zext i8 %3 to i16
%5 = load ptr, ptr %output_ptrs, align 8
%6 = shl nuw i16 %4, 8
%7 = shl nuw i8 %2, 1
%8 = zext i8 %7 to i16
%9 = or i16 %6, %8
store i16 %9, ptr %5, align 2
ret i64 0
} |
The previous reduced test case just showed a minor codegen regression, this test now shows the actual miscompilation
The previous reduced test case just showed a minor codegen regression, this test now shows the actual miscompilation
…plify a SHL node's input We already do this for variable shifts, but we missed it for constant shifts Fixes llvm#69965
@meheff Please can you confirm if this has now fixed the original case? |
The original issue is resolved. Thanks for the quick fix! |
EDIT: Initial post was incorrect. See comment below #69965 (comment) for correct repro.
https://godbolt.org/z/h4cWeWqP6
Input IR:
Resulting assembly with llc (trunk):
It looks like the initial mask
and i8 %_in, 127
is being incorrectly optimized away. -O0 looks correct.The text was updated successfully, but these errors were encountered: