Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Possible bug when using -O2/-O3 in clang 13 for ARMv7 #52669

Closed
mikhailramalho opened this issue Dec 13, 2021 · 13 comments
Closed

Possible bug when using -O2/-O3 in clang 13 for ARMv7 #52669

mikhailramalho opened this issue Dec 13, 2021 · 13 comments
Labels
backend:ARM invalid Resolved as invalid, i.e. not a bug

Comments

@mikhailramalho
Copy link
Member

mikhailramalho commented Dec 13, 2021

When I build JSC using either -O2 or -O3, I get random garbage when querying for the "Infinity" constant from javascript, as if the constant was not being initialized. The variable is being initialized correctly, that I'm sure.

Some tests I did:

  1. Using -O1 or no optimization doesn't trigger the issue.
  2. Using either -O2 or -O3 with address or the undef behavior sanitizers doesn't trigger the issue.
  3. Building JSC with clang 11.0.1-2 (from Debian) and clang 12.0.1 (from github) doesn't trigger the issue.
  4. The issue happens with clang 13.0.0 (from github) and the 13.0.1-rc1 (also from github).

It seems like some optimization introduced by -O2 is causing the issue.

Steps to reproduce:

To build JSC, you need to get WebKit from https://github.com/WebKit/WebKit (shallow clone is a friend here), and run:

$ ./Tools/Scripts/build-jsc --Release --jsc-only '--cmakeargs=-DCMAKE_CXX_COMPILER=<path-to-clang-13>/bin/clang++ -DCMAKE_C_COMPILER=<path-to-clang-13>/bin/clang'

Release by default builds with -O3 -DNDEBUG. JSC will be built in WebKitBuild/Release/bin/jsc.

To build the debug version, you must replace --release by --debug, and JSC will be built in WebKitBuild/Debug/bin/jsc. To rebuild, you can either remove the WebKitBuild dir, or go in WebKitBuild/Release/ and do a ninja clean + ninja.

The program I'm using is:

$ cat foo.js
print(Infinity)
let a = Infinity / Infinity
print(Number.isNaN(a))

JSC built in release mode (the value infinity changes every time):

$ WebKitBuild/Release/bin/jsc foo.js
-1.1089394371691584e+269
false

Expected output:

$ ./WebKitBuild/Debug/bin/jsc foo.js
Infinity
true

I'm running the tests in:

$ uname -a
Linux bbox-11-armhf 5.10.0-0.bpo.7-arm64 #1 SMP Debian 5.10.40-1~bpo10+1 (2021-06-04) armv8l GNU/Linux
@DavidSpickett
Copy link
Collaborator

The good news is, it does reproduce in our of our armv8 dev containers. Your instructions were spot on, thanks!

Built with the config from our quick bot: https://lab.llvm.org/buildbot/#/builders/171/builds/7188
Build clang, then over in WebKit:

$ rm -rf WebKitBuild && ./Tools/Scripts/build-jsc --Release --jsc-only '--cmakeargs=-DCMAKE_CXX_COMPILER=/home/david.spickett/build-llvm-arm/bin/clang++ -DCMAKE_C_COMPILER=/home/david.spickett/build-llvm-arm/bin/clang' && ./WebKitBuild/Release/bin/jsc /tmp/test.js

Bisect found this change:

$ git bisect good
395607af3cb80f25ee05420ea5ae0ad0be948533 is the first bad commit
commit 395607af3cb80f25ee05420ea5ae0ad0be948533
Author: Juneyoung Lee <aqjune@gmail.com>
Date:   Sun Nov 29 04:26:44 2020 +0900

    Reapply [ConstantFold] Fold more operations to poison

https://reviews.llvm.org/D92270

I'm not 100% sure of the "why" of the bug here, but I'm going to read more about it today. Others have reported issues with the change.

@DavidSpickett
Copy link
Collaborator

DavidSpickett commented Dec 15, 2021

as if the constant was not being initialized. The variable is being initialized correctly, that I'm sure.

Can you explain how you know that, is there a single place in WebKit that this happens? It would be good to compare what the code is doing to the reproducers that have already been logged against the change.

Might be able to match one of those against what WebKit is doing. Certainly a few mention fp to int conversions.

@mikhailramalho
Copy link
Member Author

as if the constant was not being initialized. The variable is being initialized correctly, that I'm sure.

Can you explain how you know that, is there a single place in WebKit that this happens? It would be good to compare what the code is doing to the reproducers that have already been logged against the change.

Might be able to match one of those against what WebKit is doing. Certainly a few mention fp to int conversions.

We store the inf value in a union: https://github.com/mikhailramalho/WebKit/blob/main/Source/JavaScriptCore/runtime/JSCJSValue.h#L95

But before it reaches the store, we do the following check:

https://github.com/WebKit/WebKit/blob/d91ff40795821ebc7645be5ec85213e0ba645a7d/Source/JavaScriptCore/runtime/MathCommon.h#L214

which is undefined behavior so maybe the value is being marked as poison and that's why it returns a different value every time?

@mikhailramalho
Copy link
Member Author

The line causing the issue is: https://github.com/mikhailramalho/WebKit/blob/main/Source/JavaScriptCore/runtime/JSGlobalObject.cpp#L706

Compiling with -O1 I get the following IR:

  call void @llvm.dbg.value(metadata i64 9218868437227405312, metadata !63481, metadata !DIExpression()) #22, !dbg !63516
  %3765 = getelementptr inbounds [3 x %"struct.JSC::JSGlobalObject::GlobalPropertyInfo"], [3 x %"struct.JSC::JSGlobalObject::GlobalPropertyInfo"]* %6, i32 0, i32 1, i32 2, i32 0, i32 0, !dbg !63535
  store i64 9218868437227405312, i64* %3765, align 8, !dbg !63535
...

9218868437227405312 is inf.

When compiling with -O2 I get:

  call void @llvm.dbg.value(metadata i64 poison, metadata !81229, metadata !DIExpression()), !dbg !81264
  %32 = getelementptr inbounds [3 x %"struct.JSC::JSGlobalObject::GlobalPropertyInfo"], [3 x %"struct.JSC::JSGlobalObject::GlobalPropertyInfo"]* %3, i32 0, i32 1, i32 3, !dbg !81284

You can see the poison and no store.

@DavidSpickett
Copy link
Collaborator

Thanks. That there is a change in clang's ouptut is very clear.

Off the top of my head I don't know whether we'd call this a regression or simply a change in UB which clang would be allowed to do (though there is sometimes argument to keep it as is). To be honest, I don't think I'm qualified to do so but the authors of that patch are.

So with your knowledge of what WebKit is doing exactly, could you make a small reproducer in C?

I did observe simple double to int32_t casts changing behaviour but wasn't sure of WebKit's exact use case and I don't want to miss communicate it.

@DavidSpickett
Copy link
Collaborator

(not sure if this will be relevant, we'll see if it looks anything like your reproducer)

https://reviews.llvm.org/D115804 just landed which means we use a saturating cast when the -fno-strict-float-cast-overflow flag is passed.

$ cat /tmp/test.cpp
#include <limits>
#include <stdint.h>

int32_t foo() {
  return std::numeric_limits<double>::infinity();
}
$ ./bin/clang++ /tmp/test.cpp -O2 -o - -emit-llvm -S
<...>
define dso_local i32 @_Z3foov() local_unnamed_addr #0 {
entry:
  ret i32 poison
}
<...>
$ ./bin/clang++ /tmp/test.cpp -O2 -o - -emit-llvm -S -fno-strict-float-cast-overflow
<...>
define dso_local i32 @_Z3foov() local_unnamed_addr #0 {
entry:
  ret i32 2147483647
}
<...>

@mikhailramalho
Copy link
Member Author

mikhailramalho commented Dec 16, 2021

Hi,

(not sure if this will be relevant, we'll see if it looks anything like your reproducer)

https://reviews.llvm.org/D115804 just landed which means we use a saturating cast when the -fno-strict-float-cast-overflow flag is passed.

I tried adding the flag and it seems to work, but I still think that WebKit is correct here: while there is an undef behavior, that call checks if the double can be represented as an int32_t but never actually uses the int32_t. The path is:

  1. Call jsNumber: https://github.com/WebKit/WebKit/blob/96e383c2b4cd73899f67bd31565a09bd5a5faf78/Source/JavaScriptCore/runtime/JSGlobalObject.cpp#L706
  2. jsNumber which calls JSValue::JSValue: https://github.com/WebKit/WebKit/blob/96e383c2b4cd73899f67bd31565a09bd5a5faf78/Source/JavaScriptCore/runtime/JSCJSValue.h#L603
  3. JSValue::JSValue which calls canBeStrictInt32 (which has the undef behavior): https://github.com/WebKit/WebKit/blob/96e383c2b4cd73899f67bd31565a09bd5a5faf78/Source/JavaScriptCore/runtime/JSCJSValueInlines.h#L241

But here canBeStrictInt32 is false, so it jumps to the next statement in JSValue::JSValue and stores as a double. But doing the check, does it poisons both sides of the branch?

Here's the bytecode generated by -O0:

  %call7 = call double @_ZNSt14numeric_limitsIdE8infinityEv() #13, !dbg !55315
  call void @llvm.experimental.noalias.scope.decl(metadata !55316), !dbg !55319
  store double %call7, double* %d.addr.i, align 8, !noalias !55316

Where the store is double.

Here's the bytecode with -O3 using clang 12.1 and clang 13 if I revert the change in https://github.com/llvm/llvm-project/blob/main/llvm/lib/IR/ConstantFold.cpp#L462:

  call void @llvm.dbg.value(metadata i64 9218868437227405312, metadata !173195, metadata !DIExpression()), !dbg !173230
  %32 = getelementptr inbounds [3 x %"struct.JSC::JSGlobalObject::GlobalPropertyInfo"], [3 x %"struct.JSC::JSGlobalObject::GlobalPropertyInfo"]* %3, i32 0, i32 1, i32 2, i32 0, i32 0, !dbg !173250
  store i64 9218868437227405312, i64* %32, align 8, !dbg !173250

My guess is that clang is changing the store from double to i64, which then results in a poison value because the original type of the constant is a double. Does that make sense? Sorry, I'm not an expert on the optimizations LLVM performs so this is a wild guess.

I'll try to remove the undef behavior call from WebKit, then I'll try to use bugpoint or llvm-reduce to try to reduce the generated bytecode (the -O1 is 306793 LOC).

@nikic
Copy link
Contributor

nikic commented Dec 16, 2021

But here canBeStrictInt32 is false, so it jumps to the next statement in JSValue::JSValue and stores as a double. But doing the check, does it poisons both sides of the branch?

Why is canBeStrictInt32 false? It also performs a double -> int32_t cast on infinity, and as such is UB.

@mikhailramalho
Copy link
Member Author

But here canBeStrictInt32 is false, so it jumps to the next statement in JSValue::JSValue and stores as a double. But doing the check, does it poisons both sides of the branch?

Why is canBeStrictInt32 false? It also performs a double -> int32_t cast on infinity, and as such is UB.

Right, so you're saying that since there is undefined behavior there, clang can decide that canBeStrictInt32 is true? That would explain the problem.

@nikic
Copy link
Contributor

nikic commented Dec 16, 2021

But here canBeStrictInt32 is false, so it jumps to the next statement in JSValue::JSValue and stores as a double. But doing the check, does it poisons both sides of the branch?

Why is canBeStrictInt32 false? It also performs a double -> int32_t cast on infinity, and as such is UB.

Right, so you're saying that since there is undefined behavior there, clang can decide that canBeStrictInt32 is true? That would explain the problem.

Yes. We specify branch on poison as undefined behavior, though currently it will just pick one of the two branches.

I think on the LLVM side everything is fine here, because the source program has undefined behavior. The only bit I find concerning is

Using either -O2 or -O3 with address or the undef behavior sanitizers doesn't trigger the issue.

because this problem should be caught by -fsanitize=float-cast-overflow.

@mikhailramalho
Copy link
Member Author

mikhailramalho commented Dec 16, 2021 via email

@nikic
Copy link
Contributor

nikic commented Jan 4, 2022

IIRC undef behaviour does complain about it but we get the expected result, the Infinity constant is initialized.

Ah okay, that's fine then.

Is -fsanitize=float-cast-overflow enabled by default if we use -fsanitize=address? I'll try it again tomorrow.

It's not part of -fsanitize=address, but I believe it should be part of -fsanitize=undefined by default.

@mikhailramalho
Copy link
Member Author

Closing this issue as it was an undefined behavior on WebKit.

@EugeneZelenko EugeneZelenko added the invalid Resolved as invalid, i.e. not a bug label Feb 14, 2023
@EugeneZelenko EugeneZelenko closed this as not planned Won't fix, can't repro, duplicate, stale Feb 14, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
backend:ARM invalid Resolved as invalid, i.e. not a bug
Projects
None yet
Development

No branches or pull requests

4 participants