Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[clang] incorrect code generation when building gawk 5.2.1 using -O2/-O3 #59792

Closed
0xC0ncord opened this issue Jan 2, 2023 · 17 comments
Closed
Assignees
Labels
clang:frontend Language frontend issues, e.g. anything involving "Sema" miscompilation release:backport

Comments

@0xC0ncord
Copy link

0xC0ncord commented Jan 2, 2023

It was found that when building gawk 5.2.1 with -O2 or -O3, incorrect machine code is generated which is causing differing runtime behavior than expected.

Background

Originally I ran into this when upgrading gawk on my personal Gentoo systems to 5.2.1. There is a particular regex[1] used in plymouth which was suddenly erroring that did not previously. Downgrading to gawk 5.1.1, building it with gcc, or disabling compiler optimizations would work around the issue.

This issue is being submitted after some discussion[2] on the gawk-bug mailing list, particularly because of these findings[3].

A simple test case is this command:

head /dev/zero | awk 'BEGIN { RS="[[][:blank:]]" }'

When running this command, the expected behavior is no output and a clean exit. However, when building gawk 5.2.1 with clang and -O2 or -O3, we see this error (and a non-zero exit code):

awk: cmd. line:1: fatal: invalid character class

[1] https://gitlab.freedesktop.org/plymouth/plymouth/-/blob/main/scripts/plymouth-set-default-theme.in#L50
[2] https://lists.gnu.org/archive/html/bug-gawk/2022-12/msg00010.html
[3] https://lists.gnu.org/archive/html/bug-gnulib/2023-01/msg00002.html

@thesamesam
Copy link
Member

cc @eggert

@MaskRay MaskRay self-assigned this Jan 2, 2023
@llvmbot
Copy link
Collaborator

llvmbot commented Jan 2, 2023

@llvm/issue-subscribers-clang-codegen

@MaskRay
Copy link
Member

MaskRay commented Jan 3, 2023

As noticed, gawk/support/dfa.c lines 1141-1143 confuses Clang. Clang thinks the ConditionalOperator (... ? dfaerror : dfawarn) is noreturn as dfaerror is noreturn (while it shouldn't since dfawarn isn't noreturn).

    ((dfa->syntax.dfaopts & DFA_CONFUSING_BRACKETS_ERROR
      ? dfaerror : dfawarn)
     (_("character class syntax is [[:space:]], not [:space:]")));

This problem has been known since 2010 (8c6b56f)

QualType ASTContext::mergeFunctionTypes(QualType lhs, QualType rhs, ...
...
  // It's noreturn if either type is.
  // FIXME: some uses, e.g. conditional exprs, really want this to be 'both'.
  bool NoReturn = lbaseInfo.getNoReturn() || rbaseInfo.getNoReturn();

In some contexts (merging two declarations), we do want to use the logical OR result. But here for the ConditionalOperator, we want to use the logical AND result.

@MaskRay
Copy link
Member

MaskRay commented Jan 3, 2023

Patch: https://reviews.llvm.org/D140868

@EugeneZelenko EugeneZelenko added clang:frontend Language frontend issues, e.g. anything involving "Sema" and removed clang:codegen labels Jan 3, 2023
@llvmbot
Copy link
Collaborator

llvmbot commented Jan 3, 2023

@llvm/issue-subscribers-clang-frontend

@tru
Copy link
Collaborator

tru commented Jan 5, 2023

@tstellar not sure this really should go into a hotfix. If I read the history here correctly this has been known since at least 2010 and in my mind, it doesn't qualify as a regression that should be fixed at this point.

But maybe it's harmless enough to warrant to pick over.

@thesamesam
Copy link
Member

thesamesam commented Jan 5, 2023

"Known" in the sense of a FIXME, but not known that it miscompiles a core part of Linux userland until 2 days ago.

If it was well-known, we wouldn't be in the situation we are wrt the standard (see the phab review).

Part of my interest here is in, if possible, making LLVM 15 as solid as possible given it'll be a while before we can use Clang 16 due to the large disruption we're working on. So even if it doesn't go into 15.0.7, people who want to keep using Clang as a system compiler will have to cherry-pick it anyway.

@tru
Copy link
Collaborator

tru commented Jan 5, 2023

I have read up on the issue and I still hold the same position: I don't think this is a regression in LLVM 15.x. Since we are rolling 15.0.7 because of #59242 I don't object to it being included as long as the risk of this is low, but I wouldn't suggest a new patch for this issue alone.

This sounds maybe a bit like a lawyer discussion because the end result is probably the same (that we ship this fix in 15.0.7), but I think it's important that we figure out exactly what kind of fixes we are putting into releases this late in the cycle and set the right expectations for users and testers.

@tstellar
Copy link
Collaborator

tstellar commented Jan 5, 2023

@tru I added this one to the milestone, just so we don't lose track of it. We don't have to pull it into the release.
That being said, miscompiling of a project like gawk, seems like a bad bug, and I wouldn't object to pulling it into 15.0.7 since we are planning the release. My biggest concern with this one is that it's still not committed yet, and it won't have spent much time in main by the time of the release.

@thesamesam Do you know if previous versions of clang/llvm can gompile gawk 5.2.1 correctly?

@MaskRay
Copy link
Member

MaskRay commented Jan 5, 2023

If llvm-project or a Linux distribution wants to cherry pick the fix, the commits are:
b58927e (test update) and cf8fd21 (with a clang/docs/ReleaseNotes.rst change which needs to resolve conflicts).

@thesamesam
Copy link
Member

@thesamesam Do you know if previous versions of clang/llvm can gompile gawk 5.2.1 correctly?

Clang 14 miscompiles it as well at least. I can try 13 if needed.

@tstellar
Copy link
Collaborator

tstellar commented Jan 9, 2023

/cherry-pick b58927e cf8fd21

@llvmbot
Copy link
Collaborator

llvmbot commented Jan 9, 2023

Failed to cherry-pick: cf8fd21

https://github.com/llvm/llvm-project/actions/runs/3877397004

Please manually backport the fix and push it to your github fork. Once this is done, please add a comment like this:

/branch <user>/<repo>/<branch>

tstellar pushed a commit to tstellar/llvm-project that referenced this issue Jan 9, 2023
In C mode, if e1 has __attribute__((noreturn)) but e2 doesn't, `(c ? e1 : e2)`
is incorrectly noreturn and Clang codegen produces `unreachable`
which may lead to miscompiles (see [1] `gawk/support/dfa.c`).
This problem has been known since
8c6b56f (2010) or earlier.

Fix this by making the result type noreturn only if both e1 and e2 are
noreturn, matching GCC.

`_Noreturn` and `[[noreturn]]` do not have the aforementioned problem.

Fix llvm#59792 [1]

Reviewed By: aaron.ballman

Differential Revision: https://reviews.llvm.org/D140868
@tstellar
Copy link
Collaborator

tstellar commented Jan 9, 2023

/branch tstellar/llvm-project/issue-59792

@llvmbot
Copy link
Collaborator

llvmbot commented Jan 9, 2023

/pull-request llvm/llvm-project-release-prs#227

@tstellar
Copy link
Collaborator

This patch introduces an ABI change in libclang-cpp.so. It needs to be rewritten so the old signatures of mergeFunctionTypes ( ) and mergeTypes ( ) are added back in.

@nikic
Copy link
Contributor

nikic commented Jan 26, 2023

This was not included in the 15.0.7 release, so closing this issue.

@nikic nikic closed this as completed Jan 26, 2023
thesamesam added a commit to thesamesam/llvm-project that referenced this issue Feb 22, 2023
thesamesam added a commit that referenced this issue Feb 22, 2023
This is for 16.x.

Bug: #59792

Reviewed By: aaron.ballman

Differential Revision: https://reviews.llvm.org/D144540
Noxime pushed a commit to Noxime/llvm-project that referenced this issue Jun 16, 2023
This is for 16.x.

Bug: llvm#59792

Reviewed By: aaron.ballman

Differential Revision: https://reviews.llvm.org/D144540
tarunprabhu pushed a commit to tarunprabhu/kitsune that referenced this issue Sep 26, 2023
This is for 16.x.

Bug: llvm#59792

Reviewed By: aaron.ballman

Differential Revision: https://reviews.llvm.org/D144540
tarunprabhu pushed a commit to tarunprabhu/kitsune that referenced this issue Sep 26, 2023
This is for 16.x.

Bug: llvm#59792

Reviewed By: aaron.ballman

Differential Revision: https://reviews.llvm.org/D144540
karouzakisp pushed a commit to karouzakisp/llvm-project that referenced this issue Jul 20, 2024
This is for 16.x.

Bug: llvm#59792

Reviewed By: aaron.ballman

Differential Revision: https://reviews.llvm.org/D144540
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
clang:frontend Language frontend issues, e.g. anything involving "Sema" miscompilation release:backport
Projects
None yet
Development

Successfully merging a pull request may close this issue.

8 participants