Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Clang fails to pass validation when compiled with gcc-9 in release build. #39893

Closed
serge-sans-paille opened this issue Jan 31, 2019 · 53 comments
Labels
bugzilla Issues migrated from bugzilla clang Clang issues not falling into any other category

Comments

@serge-sans-paille
Copy link
Collaborator

Bugzilla Link 40547
Resolution FIXED
Resolved on Nov 27, 2019 11:59
Version 8.0
OS Linux
Blocks #42705
CC @asb,@dexonsmith,@foutrelis,@zmodem,@LW-archlinux,@slacka,@marxin,@zygoloid,@tstellar
Fixed by commit(s) r372281 7fc9f12

Extended Description

Building clang 8.0.0rc1 for fedora using gcc 9 compiles file, but validation fails in ~150 places.

When building with -O0, everything validates fine.
When building with -O1, everything validates fine.
When building with -O2, validation fails.
When building with -O2 -fsanitize=address, everything validates fine.

Running clang under valgrind gives good hint on the error location:

valgrind /builddir/build/BUILD/cfe-8.0.0rc1.src/_build/bin/clang -cc1 -internal-isystem /builddir/build/BUILD/cfe-8.0.0rc1.src/_build/lib64/clang/8.0.0/include -nostdsysteminc -fopenmp -x c++ -triple powerpc64le-unknown-unknown -fopenmp-targets=nvptx64-nvidia-cuda -emit-llvm /builddir/build/BUILD/cfe-8.0.0rc1.src/test/OpenMP/nvptx_declare_target_var_ctor_dtor_codegen.cpp -o -

==2582== Conditional jump or move depends on uninitialised value(s)
==2582== at 0x880DC79: clang::CodeGen::CodeGenModule::ConstructAttributeList(llvm::StringRef, clang::CodeGen::CGFunctionInfo const&, clang::CodeGen::CGCalleeInfo, llvm::AttributeList&, unsigned int&, bool) (CGCall.cpp:2060)
==2582== by 0x8A1E749: clang::CodeGen::CodeGenModule::SetLLVMFunctionAttributes(clang::GlobalDecl, clang::CodeGen::CGFunctionInfo const&, llvm::Function*) (CodeGenModule.cpp:1171)
==2582== by 0x8A238AA: clang::CodeGen::CodeGenModule::SetFunctionAttributes(clang::GlobalDecl, llvm::Function*, bool, bool) (CodeGenModule.cpp:1548)
==2582== by 0x8A5CA16: clang::CodeGen::CodeGenModule::GetOrCreateLLVMFunction(llvm::StringRef, llvm::Type*, clang::GlobalDecl, bool, bool, bool, llvm::AttributeList, clang::CodeGen::ForDefinition_t) (CodeGenModule.cpp:2824)
==2582== by 0x87F7FAC: clang::CodeGen::CodeGenModule::getAddrOfCXXStructor(clang::CXXMethodDecl const*, clang::CodeGen::StructorType, clang::CodeGen::CGFunctionInfo const*, llvm::FunctionType*, bool, clang::CodeGen::ForDefinition_t) (CGCXX.cpp:253)
==2582== by 0x882255E: clang::CodeGen::CodeGenFunction::EmitCXXConstructorCall(clang::CXXConstructorDecl const*, clang::CXXCtorType, bool, bool, clang::CodeGen::Address, clang::CodeGen::CallArgList&, clang::CodeGen::AggValueSlot::Overlap_t, clang::SourceLocation, bool) (CGClass.cpp:2134)
==2582== by 0x882343C: clang::CodeGen::CodeGenFunction::EmitCXXConstructorCall(clang::CXXConstructorDecl const*, clang::CXXCtorType, bool, bool, clang::CodeGen::Address, clang::CXXConstructExpr const*, clang::CodeGen::AggValueSlot::Overlap_t, bool) (CGClass.cpp:2052)
==2582== by 0x88B1B79: clang::CodeGen::CodeGenFunction::EmitCXXConstructEx

(...)

@serge-sans-paille
Copy link
Collaborator Author

The problem lies in the initialization of the bitfield members of ABIArgInfo.
Rewriting the default constructor to properly set each member fixes the error.

@pogo59
Copy link
Collaborator

pogo59 commented Jan 31, 2019

+hans for release bugs.

@serge-sans-paille
Copy link
Collaborator Author

Fix available in https://reviews.llvm.org/D57523

@zmodem
Copy link
Collaborator

zmodem commented Feb 6, 2019

==2582== Conditional jump or move depends on uninitialised value(s)
==2582== at 0x880DC79:
clang::CodeGen::CodeGenModule::ConstructAttributeList(llvm::StringRef,
clang::CodeGen::CGFunctionInfo const&, clang::CodeGen::CGCalleeInfo,
llvm::AttributeList&, unsigned int&, bool) (CGCall.cpp:2060)

CGCall.cpp:2060 at 8.0.0-rc1 is this line:

  else if (AI.getInReg())

that returns the InReg bitfield.

But that bitfield does get initialized (to false) in both of ABIArgInfo's constructors.

So I don't understand what the error is here exactly?

@zmodem
Copy link
Collaborator

zmodem commented Feb 6, 2019

Can you provide more information about exactly what gcc version you're using (I believe 9 doesn't even have a release branch yet), how your llvm build is configured, and in what way "validation" fails, i.e. an example of a test that's failing.

@serge-sans-paille
Copy link
Collaborator Author

Sure.

GCC version is from the gcc-9 branch, that's the one used on fedora rawhide:

svn export svn://gcc.gnu.org/svn/gcc/branches/redhat/gcc-9-branch@268371

The llvm build uses RelWithDebInfo

You can get the whole setup here:

https://koji.fedoraproject.org/koji/taskinfo?taskID=32372028

@serge-sans-paille
Copy link
Collaborator Author

EDIT: the above link is for a rebuild of 7.01, but I had the same issue with 8.0.0rc1.

@zmodem
Copy link
Collaborator

zmodem commented Feb 6, 2019

Sure.

GCC version is from the gcc-9 branch, that's the one used on fedora rawhide:

svn export svn://gcc.gnu.org/svn/gcc/branches/redhat/gcc-9-branch@268371

The llvm build uses RelWithDebInfo

You can get the whole setup here:

https://koji.fedoraproject.org/koji/taskinfo?taskID=32372028

Perfect, I'll take a look.

@serge-sans-paille
Copy link
Collaborator Author

@​hans: same problem when rebuilding from master with the following setup:

gcc built from svn export svn://gcc.gnu.org/svn/gcc/branches/redhat/gcc-9-branch@268371

llvm master

CXXFLAGS -O2 -g -pipe -Wall -Werror=format-security -Wp,-D_FORTIFY_SOURCE=2 -Wp,-D_GLIBCXX_ASSERTIONS -fexceptions -fstack-protector-strong -grecord-gcc-switches -m64 -mtune=generic -fasynchronous-unwind-tables -fstack-clash-protection -fcf-protection

CFLAGS=CXXFLAGS

cmake options: ../llvm-project/llvm -DCMAKE_BUILD_TYPE=RelWithDebInfo -DLLVM_ENABLE_PROJECTS=clang -DCMAKE_CXX_COMPILER=/opt/notnfs/sergesanspaille/install/bin/g++ -DCMAKE_C_COMPILER=/opt/notnfs/sergesanspaille/install/bin/gcc

@zmodem
Copy link
Collaborator

zmodem commented Feb 7, 2019

I was able to reproduce this with the specified gcc version, except I only got four test failures:

Failing Tests (4):
Clang :: CodeGen/ppc64-extend.c
Clang :: CodeGen/ppc64-inline-asm.c
Clang :: CodeGenCXX/mips-size_t-ptrdiff_t.cpp
Clang :: Driver/le32-unknown-nacl.cpp

I'm still not sure what's going on though. Maybe the next step is to try bisecting gcc.

@zmodem
Copy link
Collaborator

zmodem commented Feb 7, 2019

I'm still not sure what's going on though. Maybe the next step is to try
bisecting gcc.

Does the redhat/gcc-9-branch branch exist in gcc's git mirror?

@zmodem
Copy link
Collaborator

zmodem commented Feb 7, 2019

I'm still not sure what's going on though. Maybe the next step is to try
bisecting gcc.

Does the redhat/gcc-9-branch branch exist in gcc's git mirror?

Never mind, I can reproduce with trunk gcc.

@zmodem
Copy link
Collaborator

zmodem commented Feb 8, 2019

My bisection last night failed, and I'll probably not have time to try again today, so if someone else wants to give it a try, please do so.

@zmodem
Copy link
Collaborator

zmodem commented Feb 19, 2019

I think my previous bisection failed because the test failure isn't completely deterministic. The same test doesn't fail each time.

I tried bisecting again, building gcc, then clang, then running check-all at each step, and it points to this:


75a6f948423540b728611660963d1740440d87eb is the first bad commit
commit 75a6f948423540b728611660963d1740440d87eb
Author: hubicka hubicka@138bc75d-0d04-0410-961f-82ee72b054a4
Date: Sat Jan 12 18:18:23 2019 +0000

        * params.def (inline-unit-growth): Set to 40.


git-svn-id: svn+ssh://gcc.gnu.org/svn/gcc/trunk@267883 138bc75d-0d04-0410-961f-82ee72b054a4

:040000 040000 73c34e4b0fc97de49ad228c609d495461e496b90 5882541f2e6fbcd712fd641f535238b4856ff045 M gcc

That doesn't seem very interesting. If it is indeed the culprit, I guess it only uncovered some already-existing problem :-/

@zmodem
Copy link
Collaborator

zmodem commented Feb 20, 2019

I can't really spend more time on this. Landing https://reviews.llvm.org/D57523 seems like the way to go.

@serge-sans-paille
Copy link
Collaborator Author

Patch merged, thanks Hans!

@zmodem
Copy link
Collaborator

zmodem commented Feb 21, 2019

Looks like it was reverted in r354549. Let's keep this open until it sticks and also makes it to the release_80 branch.

@zmodem
Copy link
Collaborator

zmodem commented Feb 25, 2019

Serge, do you have any ideas for how to do this without breaking various GCC versions? :-)

@serge-sans-paille
Copy link
Collaborator Author

Not quite :-) As gcc 9 is not officially out yet, as of https://gcc.gnu.org/releases.html at least, I guess it's okay to still do the release, and eventually backport a patch if needed?

@zmodem
Copy link
Collaborator

zmodem commented Feb 26, 2019

Not quite :-) As gcc 9 is not officially out yet, as of
https://gcc.gnu.org/releases.html at least, I guess it's okay to still do
the release, and eventually backport a patch if needed?

As we have an idea of how to work around it, it would still be nice to fix before release if possible.

Of course it would be even better if GCC was fixed before released. Do you know if there's a bug tracking this on the GCC side?

@zmodem
Copy link
Collaborator

zmodem commented Mar 5, 2019

Of course it would be even better if GCC was fixed before released. Do you know if there's a bug tracking this on the GCC side?

Ping?

@serge-sans-paille
Copy link
Collaborator Author

I'll give it another try today :-)

@zmodem
Copy link
Collaborator

zmodem commented Mar 8, 2019

Unblocking the release from this.

@slacka
Copy link
Mannequin

slacka mannequin commented Aug 15, 2019

On Ubuntu 18.08, I'm seeing the following Failing Tests (5):
Clang :: CodeGen/arm-fp16-arguments.c
Clang :: CodeGen/lanai-regparm.c
caused by https://gcc.gnu.org/viewcvs/gcc?view=revision&revision=267883
* params.def (inline-unit-growth): Set to 40.

Clang :: CodeGen/ppc64-complex-parms.c
Clang :: CodeGen/ppc64-complex-return.c
Clang :: CodeGen/ppc64-vector.c

caused by https://gcc.gnu.org/viewcvs/gcc?view=revision&revision=268448
* parms.def (MAX_INLINE_INSNS_SINGLE): Reduce from 400 to 200.

But these changes only caused failures in clang 7+. Bisecting clan on
Clang :: CodeGen/ppc64-complex-parms.c
Clang :: CodeGen/ppc64-complex-return.c
Clang :: CodeGen/ppc64-vector.c
I get:
http://llvm.org/viewvc/llvm-project?view=revision&revision=322396
Refactor handling of signext/zeroext in ABIArgInfo

Alex Bradbury,
Could you take a look at this to see why gcc is over optimizing your code after r268448?

@tstellar
Copy link
Collaborator

*** Bug llvm/llvm-bugzilla-archive#42927 has been marked as a duplicate of this bug. ***

@tstellar
Copy link
Collaborator

I'm still getting around 129 test failures when building clang with gcc-9 on Fedora 31.

@tstellar
Copy link
Collaborator

Also, re-applying https://reviews.llvm.org/rL354546 fixes the failures for me.

@llvmbot
Copy link
Member

llvmbot commented Aug 15, 2019

The patch changing inliner metrics should not trigger wrong code, so indeed it probably just uncovers latent problem. Is it understood whether this is a GCC bug and if so, do you have a testcase?

Note that I have also run into misoptimization issues while compiling clang with GCC 9 and FDO. I had no chance to debug it, but it was crashing after parsing even quite simple source file. It would be great to figure out where the problem is.

To compile with profile feedback you need to pass -fprofile-generate in addition to standard flags and build whole tree, then train by testsuite and then rebuilding with -fprofile-use.

@LW-archlinux
Copy link

Also, re-applying https://reviews.llvm.org/rL354546 fixes the failures for
me.

I had trouble re-applying that cleanly due to the changes in https://reviews.llvm.org/D60349 and created the attached patch.

I'm not a programmer or coder, please verify the attached patch.

@LW-archlinux
Copy link

@slacka
Copy link
Mannequin

slacka mannequin commented Aug 16, 2019

One more data point. I bisected Clang against:
Clang :: CodeGen/arm-fp16-arguments.c
Clang :: CodeGen/lanai-regparm.c

which pointed to:
http://llvm.org/viewvc/llvm-project?view=revision&revision=337514
Reapply "ADT: Shrink size of SmallVector by 8B on 64-bit platforms"

Duncan,
This commit appears to be be causing gcc to over optimize.

@dexonsmith
Copy link
Mannequin

dexonsmith mannequin commented Aug 16, 2019

One more data point. I bisected Clang against:
Clang :: CodeGen/arm-fp16-arguments.c
Clang :: CodeGen/lanai-regparm.c

which pointed to:
http://llvm.org/viewvc/llvm-project?view=revision&revision=337514
Reapply "ADT: Shrink size of SmallVector by 8B on 64-bit platforms"

Duncan,
This commit appears to be be causing gcc to over optimize.

Interesting! It caused problems in very old (unsupported) versions of clang too, likely due to a bug in the AArch64 backend. I won’t be able to investigate GCC to figure out the bug, but let me know if you want more details about what that commit did. I suspect it’s a recent regression in GCC since that commit landed over a year ago in Clang.

@slacka
Copy link
Mannequin

slacka mannequin commented Aug 20, 2019

I took one last stab at bisecting gcc. This time rather trace back a current failure, I bisected against the first validation failures, which yielded:
https://gcc.gnu.org/viewcvs/gcc?view=revision&revision=261090
* c-cppbuiltin.c (c_cpp_builtins): Bump __cpp_deduction_guides to 201703

@zmodem
Copy link
Collaborator

zmodem commented Aug 27, 2019

It would be nice to get to the bottom of this, but I don't think we can block the release on it.

@slacka
Copy link
Mannequin

slacka mannequin commented Sep 12, 2019

I made another attempt to bisect gcc. Rather target a specific failure, this time I bisected against any validation failure. This points to
https://gcc.gnu.org/viewcvs/gcc?view=revision&revision=261089
gimple-ssa-store-merging.c: Include gimple-fold.h.

While subsequent gcc tweaks resulted in different tests failing, this commit appears to be the genesis of all of the failures.

@tstellar
Copy link
Collaborator

I made another attempt to bisect gcc. Rather target a specific failure, this
time I bisected against any validation failure. This points to
https://gcc.gnu.org/viewcvs/gcc?view=revision&revision=261089
gimple-ssa-store-merging.c: Include gimple-fold.h.

While subsequent gcc tweaks resulted in different tests failing, this commit
appears to be the genesis of all of the failures.

Have you filed a gcc bug for this?

@slacka
Copy link
Mannequin

slacka mannequin commented Sep 12, 2019

Have you filed a gcc bug for this?

Done. See https://gcc.gnu.org/bugzilla/show_bug.cgi?id=91758

Please add any useful information that I may have left out, as I am neither a clang nor gcc developer.

@marxin
Copy link
Contributor

marxin commented Sep 17, 2019

Ok, I've got a patch. It's an issue in clang where ABIArgInfo constructor does not initialize some of the bit fields and then we end up with usage of an uninitialized variables. Following patch should fix it:

diff --git a/clang/include/clang/CodeGen/CGFunctionInfo.h b/clang/include/clang/CodeGen/CGFunctionInfo.h
index 1f81072e23d..ec2567555a3 100644
--- a/clang/include/clang/CodeGen/CGFunctionInfo.h
+++ b/clang/include/clang/CodeGen/CGFunctionInfo.h
@@ -109,14 +109,11 @@ private:
UnpaddedCoerceAndExpandType = T;
}

  • ABIArgInfo(Kind K)
  •  : TheKind(K), PaddingInReg(false), InReg(false) {
    
  • }

public:

  • ABIArgInfo()
  •  : TypeData(nullptr), PaddingType(nullptr), DirectOffset(0),
    
  •    TheKind(Direct), PaddingInReg(false), InReg(false) {}
    
  • ABIArgInfo(Kind K = Direct)

  •  : TheKind(K), PaddingInReg(false), InAllocaSRet(false),
    
  •    IndirectByVal(false), IndirectRealign(false), SRetAfterThis(false),
    
  •    InReg(false), CanBeFlattened(false), SignExt(false) {}
    

    static ABIArgInfo getDirect(llvm::Type *T = nullptr, unsigned Offset = 0,
    llvm::Type *Padding = nullptr,

I'm going to send the patch to Phabricator.

@tstellar
Copy link
Collaborator

Have you tested this patch with gcc-7? This similar patch was committed and reverted due to ICE in gcc-7: https://reviews.llvm.org/D57523

@marxin
Copy link
Contributor

marxin commented Sep 17, 2019

Have you tested this patch with gcc-7? This similar patch was committed and
reverted due to ICE in gcc-7: https://reviews.llvm.org/D57523

Ah, you had the patch. Yes, the patch should be re-applied!
I'll rebuild it with gcc-7 to see the ICE.
Do you have a link to the ICE please?

@tstellar
Copy link
Collaborator

Have you tested this patch with gcc-7? This similar patch was committed and
reverted due to ICE in gcc-7: https://reviews.llvm.org/D57523

Ah, you had the patch. Yes, the patch should be re-applied!
I'll rebuild it with gcc-7 to see the ICE.
Do you have a link to the ICE please?

There was a link in the revert commit, but the logs have been deleted now:
https://reviews.llvm.org/rL354549

@marxin
Copy link
Contributor

marxin commented Sep 17, 2019

There was a link in the revert commit, but the logs have been deleted now:
https://reviews.llvm.org/rL354549

Ok, I'm rebuilding clang with the CC=gcc-7 CXX=g++-7 with the very same instructions like here:
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=91758#c0

The ICE was a normal ICE in GCC compiler during compilation of a clang object file, right?

@marxin
Copy link
Contributor

marxin commented Sep 17, 2019

There was a link in the revert commit, but the logs have been deleted now:
https://reviews.llvm.org/rL354549

Ok, I'm rebuilding clang with the CC=gcc-7 CXX=g++-7 with the very same
instructions like here:
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=91758#c0

The ICE was a normal ICE in GCC compiler during compilation of a clang
object file, right?

I can't see the ICE with:
g++-7 --version
g++-7 (SUSE Linux) 7.4.1 20190725 [gcc-7-branch revision 273795]

So please reapply the patch back.

@tstellar
Copy link
Collaborator

There was a link in the revert commit, but the logs have been deleted now:
https://reviews.llvm.org/rL354549

Ok, I'm rebuilding clang with the CC=gcc-7 CXX=g++-7 with the very same
instructions like here:
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=91758#c0

The ICE was a normal ICE in GCC compiler during compilation of a clang
object file, right?

I can't see the ICE with:
g++-7 --version
g++-7 (SUSE Linux) 7.4.1 20190725 [gcc-7-branch revision 273795]

So please reapply the patch back.

Can you submit your patch? The old patch doesn't apply cleanly any more.

@marxin
Copy link
Contributor

marxin commented Sep 17, 2019

Proposed patch
Please make a patch submission on my behalf. Thanks.

@zmodem
Copy link
Collaborator

zmodem commented Sep 18, 2019

Ok, I've got a patch. It's an issue in clang where ABIArgInfo constructor
does not initialize some of the bit fields and then we end up with usage of
an uninitialized variables.

Can you point to where the uninitialized variable gets used? I stared hard at the code but couldn't find it.

Maybe initializing all of them is a good idea in any case though.

@marxin
Copy link
Contributor

marxin commented Sep 18, 2019

Ok, I've got a patch. It's an issue in clang where ABIArgInfo constructor
does not initialize some of the bit fields and then we end up with usage of
an uninitialized variables.

Can you point to where the uninitialized variable gets used? I stared hard
at the code but couldn't find it.

It happens in void computeInfo(CGFunctionInfo &FI) const override {:
I.info = classifyArgumentType(I.type, State);

Here there's an assignment which eventually will copy members of the ABIArgInfo.

Maybe initializing all of them is a good idea in any case though.

@serge-sans-paille
Copy link
Collaborator Author

Commited as https://reviews.llvm.org/rGe93aded7f02d661234ad81aac0785ccdef6a79dd
Let's hope there's no gcc regression this time :-/

@slacka
Copy link
Mannequin

slacka mannequin commented Sep 22, 2019

Verified that r372281 fixes the issue. Both
gcc-7: gcc (Ubuntu 7.4.0-1ubuntu1~18.04.1) 7.4.0
gcc-10: gcc-8.3 (GCC) 10.0.0 20190822 (experimental)

'make check-all' for clang r372468 with no failures.

@llvmbot
Copy link
Member

llvmbot commented Nov 23, 2019

Can we get this backported to 9.0.1 (if it isn't yet)?

@tstellar
Copy link
Collaborator

Merged: 7fc9f12

@tstellar
Copy link
Collaborator

mentioned in issue llvm/llvm-bugzilla-archive#42927

@nico
Copy link
Contributor

nico commented Nov 27, 2021

mentioned in issue #42705

@llvmbot llvmbot transferred this issue from llvm/llvm-bugzilla-archive Dec 10, 2021
This issue was closed.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bugzilla Issues migrated from bugzilla clang Clang issues not falling into any other category
Projects
None yet
Development

No branches or pull requests

8 participants