Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

regression: Compilation hangs with -O2 -mavx for certain input (valid code) #26728

Closed
llvmbot opened this issue Jan 28, 2016 · 7 comments
Closed
Labels
bugzilla Issues migrated from bugzilla c++14 regression

Comments

@llvmbot
Copy link
Collaborator

llvmbot commented Jan 28, 2016

Bugzilla Link 26354
Resolution FIXED
Resolved on Jan 30, 2016 06:11
Version 3.8
OS All
Blocks #26433
Attachments bzip2-compressed preprocessor dump
Reporter LLVM Bugzilla Contributor
CC @majnemer,@zmodem,@rotateright

Extended Description

I have a (unfortunately very large still) chunk of C++ code which clang++ never appears to finish compiling.

The following three criteria need to met in order for this to happen:

  • I need to use clang 3.8.0rc1 (the issue does not occur with 3.7.1)
  • I need to pass -mavx (or something that implies it, like -march=sandybridge)
  • I need to pass -O2 (the issue does not occur with -O1)

I've created a preprocessor dump using clang 3.7.1 and -save-temps that I'm attaching to this report. As expected, it shows the following behaviour:

clang++ 3.7.1 didn't have the problem

% ( ulimit -t 10; time clang++3.7.1 -c -std=c++14 -o /dev/null uggridgeometry.ii -O2 -march=sandybridge )
1.26s user 0.02s system 99% cpu 1.287 total

clang++ 3.8.0rc1 has the problem

% ( ulimit -t 10; time clang++3.8.0rc1 -c -std=c++14 -o /dev/null uggridgeometry.ii -O2 -march=sandybridge )
#​0 0x0000000001c048e5 llvm::sys::PrintStackTrace(llvm::raw_ostream&) (/home/mi/pipping/dune/inst/clang-3.8.0rc1/bin/clang-3.8+0x1c048e5)
#​1 0x0000000001c028a6 llvm::sys::RunSignalHandlers() (/home/mi/pipping/dune/inst/clang-3.8.0rc1/bin/clang-3.8+0x1c028a6)
#​2 0x0000000001c02ac4 SignalHandler(int) (/home/mi/pipping/dune/inst/clang-3.8.0rc1/bin/clang-3.8+0x1c02ac4)
#​3 0x00007efdc42e08d0 __restore_rt (/lib/x86_64-linux-gnu/libpthread.so.0+0xf8d0)
#​4 0x000000000147cebc llvm::FindAvailableLoadedValue(llvm::Value*, llvm::BasicBlock*, llvm::ilist_iteratorllvm::Instruction&, unsigned int, llvm::AAResults*, llvm::AAMDNodes*) (/home/mi/pipping/dune/inst/clang-3.8.0rc1/bin/clang-3.8+0x147cebc)
#​5 0x00000000019a1b5a llvm::InstCombiner::visitLoadInst(llvm::LoadInst&) (/home/mi/pipping/dune/inst/clang-3.8.0rc1/bin/clang-3.8+0x19a1b5a)
#​6 0x0000000001965213 llvm::InstCombiner::run() (/home/mi/pipping/dune/inst/clang-3.8.0rc1/bin/clang-3.8+0x1965213)
#​7 0x0000000001966309 combineInstructionsOverFunction(llvm::Function&, llvm::InstCombineWorklist&, llvm::AAResults*, llvm::AssumptionCache&, llvm::TargetLibraryInfo&, llvm::DominatorTree&, llvm::LoopInfo*) (/home/mi/pipping/dune/inst/clang-3.8.0rc1/bin/clang-3.8+0x1966309)
#​8 0x0000000001966c70 (anonymous namespace)::InstructionCombiningPass::runOnFunction(llvm::Function&) (/home/mi/pipping/dune/inst/clang-3.8.0rc1/bin/clang-3.8+0x1966c70)
#​9 0x00000000018c7083 llvm::FPPassManager::runOnFunction(llvm::Function&) (/home/mi/pipping/dune/inst/clang-3.8.0rc1/bin/clang-3.8+0x18c7083)
#​10 0x00000000018c76cb llvm::legacy::PassManagerImpl::run(llvm::Module&) (/home/mi/pipping/dune/inst/clang-3.8.0rc1/bin/clang-3.8+0x18c76cb)
#​11 0x0000000001d1d0d2 clang::EmitBackendOutput(clang::DiagnosticsEngine&, clang::CodeGenOptions const&, clang::TargetOptions const&, clang::LangOptions const&, llvm::StringRef, llvm::Module*, clang::BackendAction, llvm::raw_pwrite_stream*) (/home/mi/pipping/dune/inst/clang-3.8.0rc1/bin/clang-3.8+0x1d1d0d2)
#​12 0x0000000002271917 clang::BackendConsumer::HandleTranslationUnit(clang::ASTContext&) (/home/mi/pipping/dune/inst/clang-3.8.0rc1/bin/clang-3.8+0x2271917)
#​13 0x000000000255755d clang::ParseAST(clang::Sema&, bool, bool) (/home/mi/pipping/dune/inst/clang-3.8.0rc1/bin/clang-3.8+0x255755d)
#​14 0x00000000022719fb clang::CodeGenAction::ExecuteAction() (/home/mi/pipping/dune/inst/clang-3.8.0rc1/bin/clang-3.8+0x22719fb)
#​15 0x0000000001fd9b26 clang::FrontendAction::Execute() (/home/mi/pipping/dune/inst/clang-3.8.0rc1/bin/clang-3.8+0x1fd9b26)
#​16 0x0000000001fb3216 clang::CompilerInstance::ExecuteAction(clang::FrontendAction&) (/home/mi/pipping/dune/inst/clang-3.8.0rc1/bin/clang-3.8+0x1fb3216)
#​17 0x00000000020601b3 clang::ExecuteCompilerInvocation(clang::CompilerInstance*) (/home/mi/pipping/dune/inst/clang-3.8.0rc1/bin/clang-3.8+0x20601b3)
#​18 0x0000000000a985c8 cc1_main(llvm::ArrayRef<char const*>, char const*, void*) (/home/mi/pipping/dune/inst/clang-3.8.0rc1/bin/clang-3.8+0xa985c8)
#​19 0x0000000000a57c82 main (/home/mi/pipping/dune/inst/clang-3.8.0rc1/bin/clang-3.8+0xa57c82)
#​20 0x00007efdc350ab45 __libc_start_main /build/glibc-3Vu5mt/glibc-2.19/csu/libc-start.c:321:0
#​21 0x0000000000a947c4 _start (/home/mi/pipping/dune/inst/clang-3.8.0rc1/bin/clang-3.8+0xa947c4)
Stack dump:
0. Program arguments: /home/mi/pipping/dune/inst/clang-3.8.0rc1/bin/clang-3.8 -cc1 -triple x86_64-unknown-linux-gnu -emit-obj -disable-free -main-file-name uggridgeometry.ii -mrelocation-model static -mthread-model posix -fmath-errno -masm-verbose -mconstructor-aliases -munwind-tables -fuse-init-array -target-cpu sandybridge -momit-leaf-frame-pointer -dwarf-column-info -debugger-tuning=gdb -coverage-file /dev/null -resource-dir /home/mi/pipping/dune/inst/clang-3.8.0rc1/bin/../lib/clang/3.8.0 -O2 -std=c++14 -fdeprecated-macro -fdebug-compilation-dir /home/mi/pipping/dune/build-Release/dune-grid/lib -ferror-limit 19 -fmessage-length 175 -fobjc-runtime=gcc -fcxx-exceptions -fexceptions -fdiagnostics-show-option -fcolor-diagnostics -vectorize-loops -vectorize-slp -o /dev/null -x c++-cpp-output uggridgeometry.ii

  1. parser at end of file
  2. Per-module optimization passes
  3. Running pass 'Function Pass Manager' on module 'uggridgeometry.ii'.
  4. Running pass 'Combine redundant instructions' on function '@_ZN4Dune5UG_NSILi3EE14TransformationEiPPdRKNS_11FieldVectorIdLi3EEERNS_11FieldMatrixIdLi3ELi3EEE'
    clang-3.8: error: unable to execute command: CPU time limit exceeded
    clang-3.8: error: clang frontend command failed due to signal (use -v to see invocation)
    clang version 3.8.0 (tags/RELEASE_380/rc1)
    Target: x86_64-unknown-linux-gnu
    Thread model: posix
    InstalledDir: /home/mi/pipping/dune/inst/clang-3.8.0rc1/bin
    clang-3.8: note: diagnostic msg: PLEASE submit a bug report to http://llvm.org/bugs/ and include the crash backtrace, preprocessed source, and associated run script.
    clang-3.8: note: diagnostic msg: Error generating preprocessed source(s) - no preprocessable inputs.
    11.12s user 0.06s system 98% cpu 11.348 total

The problem goes away with -O1

% ( ulimit -t 10; time clang++3.8.0rc1 -c -std=c++14 -o /dev/null uggridgeometry.ii -O1 -march=sandybridge )
1.30s user 0.04s system 99% cpu 1.340 total

The problem goes away with an architecture that does not support AVX

% ( ulimit -t 10; time clang++3.8.0rc1 -c -std=c++14 -o /dev/null uggridgeometry.ii -O2 -march=westmere )
1.29s user 0.04s system 99% cpu 1.340 total

The problem is triggered by the addition of AVX

% ( ulimit -t 10; time clang++3.8.0rc1 -c -std=c++14 -o /dev/null uggridgeometry.ii -O2 -march=westmere -mavx )
< killed after 10s, same output a above >

The problem is triggered by AVX alone

% ( ulimit -t 10; time clang++3.8.0rc1 -c -std=c++14 -o /dev/null uggridgeometry.ii -O2 -mavx )
< killed after 10s, same output a above >

@zmodem
Copy link
Collaborator

zmodem commented Jan 28, 2016

+David, looks like it has instcombine on the stack.

@majnemer
Copy link
Mannequin

majnemer mannequin commented Jan 28, 2016

reduced IR:
target datalayout = "e-m:e-i64:64-f80:128-n8:16:32:64-S128"
target triple = "x86_64-unknown-linux-gnu"

define i1 @​f(double* %tmp, i1 %B) align 2 {
entry:
%tmp1 = bitcast double* %tmp to <2 x double>*
%tmp2 = load <2 x double>, <2 x double>* %tmp1, align 8
%tmp3 = extractelement <2 x double> %tmp2, i32 0
%sub12 = fsub double undef, %tmp3
%tmp5 = extractelement <2 x double> %tmp2, i32 1
br i1 %B, label %if.else, label %if.end

if.else: ; preds = %entry
%tmp6 = insertelement <4 x double> zeroinitializer, double %tmp5, i32 3
br label %if.end

if.end: ; preds = %if.else, %entry
%phi = phi <4 x double> [ undef, %entry ], [ %tmp6, %if.else ]
%tmp7 = fadd <4 x double> undef, %phi
%tmp9 = extractelement <4 x double> %tmp7, i32 1
%mul37 = fmul double %sub12, %tmp9
%mul38 = fmul double %mul37, undef
%tobool39 = fcmp une double %mul38, 0.000000e+00
ret i1 %tobool39
}

@majnemer
Copy link
Mannequin

majnemer mannequin commented Jan 28, 2016

Bisected to r256394:
Author: Sanjay Patel spatel@rotateright.com
Date: Thu Dec 24 21:17:56 2015 +0000

[InstCombine] transform more extract/insert pairs into shuffles (#2109 )

This is an extension of the shuffle combining from r203229:
http://reviews.llvm.org/rL203229

The idea is to widen a short input vector with undef elements so the
existing shuffle transform for extract/insert can kick in.

The motivation is to finally solve llvm/llvm-project#2481 :
https://llvm.org/bugs/show_bug.cgi?id=2109

For that example, the IR becomes:

%1 = bitcast <2 x i32>* %P to <2 x float>*
%ld1 = load <2 x float>, <2 x float>* %1, align 8
%2 = shufflevector <2 x float> %ld1, <2 x float> undef, <4 x i32> <i32 0, i32 1, i32 undef, i32 undef>
%i2 = shufflevector <4 x float> %A, <4 x float> %2, <4 x i32> <i32 0, i32 1, i32 4, i32 5>
ret <4 x float> %i2

And x86 SSE output improves from:

movq	(%rdi), %xmm1           ## xmm1 = mem[0],zero
movdqa	%xmm1, %xmm2
shufps	$229, %xmm2, %xmm2      ## xmm2 = xmm2[1,1,2,3]
shufps	$48, %xmm0, %xmm1       ## xmm1 = xmm1[0,0],xmm0[3,0]
shufps	$132, %xmm1, %xmm0      ## xmm0 = xmm0[0,1],xmm1[0,2]
shufps	$32, %xmm0, %xmm2       ## xmm2 = xmm2[0,0],xmm0[2,0]
shufps	$36, %xmm2, %xmm0       ## xmm0 = xmm0[0,1],xmm2[2,0]
retq

To the almost optimal:

movhpd	(%rdi), %xmm0

Note: There's a tension in the existing transform related to generating
arbitrary shufflevector masks. We avoid that in other places in InstCombine
because we're scared that codegen can't handle strange masks, but it looks
like we're ok with producing those here. I purposely chose weird insert/extract
indexes for the regression tests to see the effect in these cases.
For PowerPC+Altivec, AArch64, and X86+SSE/AVX, I think the codegen is equal or
better for these examples.

Sanjay, can you please take a look?

@rotateright
Copy link
Contributor

Sanjay, can you please take a look?

Yep, checking it out now. Thanks for the reduction!

@rotateright
Copy link
Contributor

I've checked in a patch here:
http://reviews.llvm.org/rL259236

The change should be strictly limiting the offending transform, so I think it should be safe to accept to the 3.8 branch (assuming there's no bot fallout).

@llvmbot
Copy link
Collaborator Author

llvmbot commented Jan 30, 2016

This fixes the problem for me. :)

Thanks a lot to David Majnemer for the reduction!
Thanks a lot to Sanjay Patel for the fix!

@llvmbot
Copy link
Collaborator Author

llvmbot commented Nov 26, 2021

mentioned in issue llvm/llvm-bugzilla-archive#30923

@llvmbot llvmbot transferred this issue from llvm/llvm-bugzilla-archive Dec 10, 2021
This issue was closed.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bugzilla Issues migrated from bugzilla c++14 regression
Projects
None yet
Development

No branches or pull requests

3 participants