Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Unoptimized IR output has different ordering of functions on different platforms and produces different bytecode #14361

Closed
cameel opened this issue Jun 26, 2023 · 8 comments · Fixed by #14374
Assignees
Labels
bug 🐛 medium effort Default level of effort medium impact Default level of impact selected for development It's on our short-term development
Milestone

Comments

@cameel
Copy link
Member

cameel commented Jun 26, 2023

Discovered in the via IR bytecode comparison PR (#14355). The PR revealed that unoptimized via IR produces slightly different bytecode on some platforms (specifically emscripten and macOS).

I investigated 288_conditional_with_all_types_sol and in this case the differences are caused by functions being ordered differently in the generated IR.

Repro

Input

input.json
{
    "language": "Solidity",
    "sources": {
        "A": {"content": "
            contract C {
                function f2() public {
                    bytes1[2] memory k;
                    k[0] = bytes1(0);
                }
            }"
        }
    },
    "settings": {
        "optimizer": {"enabled": false},
        "viaIR": true,
        "outputSelection": {
            "*": {"*": [
                "ir",
                "evm.assembly",
                "evm.bytecode.object"
            ]}
        }
    }
}
Script
npm install solc

emscripten_output=$(cat input.json | npx --no -- solcjs --standard-json)
linux_output=$(cat input.json | solc --standard-json)

emscripten_bytecode=$(echo "$emscripten_output" | jq --raw-output .contracts.A.C.evm.bytecode.object | fold --width 80)
emscripten_ir=$(echo "$emscripten_output" | jq --raw-output .contracts.A.C.ir)
emscripten_asm=$(echo "$emscripten_output" | jq --raw-output .contracts.A.C.evm.assembly)

linux_bytecode=$(echo "$linux_output" | jq --raw-output .contracts.A.C.evm.bytecode.object | fold --width 80)
linux_ir=$(echo "$linux_output" | jq --raw-output .contracts.A.C.ir)
linux_asm=$(echo "$linux_output" | jq --raw-output .contracts.A.C.evm.assembly)

echo "=== BYTECODE DIFF ==="
diff --color --unified <(echo "$emscripten_bytecode") <(echo "$linux_bytecode")
echo "=== IR DIFF ==="
diff --color --unified <(echo "$emscripten_ir") <(echo "$linux_ir")
echo "=== ASM DIFF ==="
diff --color --unified <(echo "$emscripten_asm") <(echo "$linux_asm")

Output

Versions
0.8.20+commit.a1b79de6.Emscripten.clang
solc, the solidity compiler commandline interface
Version: 0.8.20+commit.a1b79de6.Linux.g++
Bytecode diff
@@ -13,11 +13,11 @@
 01a7818361017f565b5050919050565b5f6101b96002610187565b905090565b5f81905091905056
 5b5f7fff000000000000000000000000000000000000000000000000000000000000008216905091
 9050565b5f8160f81b9050919050565b5f61021861021361020e846101be565b6101f2565b6101c7
-565b9050919050565b7f4e487b710000000000000000000000000000000000000000000000000000
-00005f52603260045260245ffd5b5f60029050919050565b5f6102608261024c565b831061026f57
-61026e61021f565b5b6020830280830191505092915050565b5f819050919050565b5f8190509190
-50565b5f6102ab6102a66102a1846101be565b610288565b61027f565b9050919050565b6102bb82
+565b9050919050565b5f819050919050565b5f819050919050565b5f61024b610246610241846101
+be565b610228565b61021f565b9050919050565b7f4e487b71000000000000000000000000000000
+000000000000000000000000005f52603260045260245ffd5b5f60029050919050565b5f61029382
+61027f565b83106102a2576102a1610252565b5b6020830280830191505092915050565b6102bb82
 6101c7565b81525050565b60025f6102cc6101ae565b8091505f6102d9816101fe565b83805f8361
-02f8816102f36102ed85610291565b86610256565b6102b2565b505050505050505050565bfea264
+02f8816102f36102ed85610231565b86610289565b6102b2565b505050505050505050565bfea264
 6970667358221220a48f1dac064e7879796321b58551b9e9892676c3ba4d848e77dd46134e985a74
 64736f6c63430008140033
IR diff
@@ -168,6 +168,18 @@
                 converted := cleanup_t_bytes1(shift_left_248(cleanup_t_rational_0_by_1(value)))
             }

+            function cleanup_t_uint256(value) -> cleaned {
+                cleaned := value
+            }
+
+            function identity(value) -> ret {
+                ret := value
+            }
+
+            function convert_t_rational_0_by_1_to_t_uint256(value) -> converted {
+                converted := cleanup_t_uint256(identity(cleanup_t_rational_0_by_1(value)))
+            }
+
             function panic_error_0x32() {
                 mstore(0, 35408467139433450592217433187231851964531694900788300625387963629091585785856)
                 mstore(4, 0x32)
@@ -190,18 +202,6 @@
                 addr := add(baseRef, offset)
             }

-            function cleanup_t_uint256(value) -> cleaned {
-                cleaned := value
-            }
-
-            function identity(value) -> ret {
-                ret := value
-            }
-
-            function convert_t_rational_0_by_1_to_t_uint256(value) -> converted {
-                converted := cleanup_t_uint256(identity(cleanup_t_rational_0_by_1(value)))
-            }
-
             function write_to_memory_t_bytes1(memPtr, value) {
                 mstore(memPtr, cleanup_t_bytes1(value))
             }
Assembly diff
@@ -358,12 +358,17 @@
       pop
       jump     // out
     tag_22:
-      mstore(0x00, 0x4e487b7100000000000000000000000000000000000000000000000000000000)
-      mstore(0x04, 0x32)
-      revert(0x00, 0x24)
+      0x00
+      dup2
+      swap1
+      pop
+      swap2
+      swap1
+      pop
+      jump     // out
     tag_23:
       0x00
-      0x02
+      dup2
       swap1
       pop
       swap2
@@ -373,45 +378,31 @@
     tag_24:
       0x00
       tag_85
-      dup3
-      tag_23
-      jump     // in
-    tag_85:
-      dup4
-      lt
       tag_86
-      jumpi
       tag_87
-      tag_22
+      dup5
+      tag_18
       jump     // in
     tag_87:
+      tag_23
+      jump     // in
     tag_86:
-      0x20
-      dup4
-      mul
-      dup1
-      dup4
-      add
-      swap2
-      pop
-      pop
-      swap3
-      swap2
-      pop
-      pop
-      jump     // out
-    tag_25:
-      0x00
-      dup2
+      tag_22
+      jump     // in
+    tag_85:
       swap1
       pop
       swap2
       swap1
       pop
       jump     // out
+    tag_25:
+      mstore(0x00, 0x4e487b7100000000000000000000000000000000000000000000000000000000)
+      mstore(0x04, 0x32)
+      revert(0x00, 0x24)
     tag_26:
       0x00
-      dup2
+      0x02
       swap1
       pop
       swap2
@@ -421,22 +412,31 @@
     tag_27:
       0x00
       tag_91
+      dup3
+      tag_26
+      jump     // in
+    tag_91:
+      dup4
+      lt
       tag_92
+      jumpi
       tag_93
-      dup5
-      tag_18
+      tag_25
       jump     // in
     tag_93:
-      tag_26
-      jump     // in
     tag_92:
-      tag_25
-      jump     // in
-    tag_91:
-      swap1
+      0x20
+      dup4
+      mul
+      dup1
+      dup4
+      add
+      swap2
+      pop
       pop
+      swap3
       swap2
-      swap1
+      pop
       pop
       jump     // out
     tag_28:
@@ -483,11 +483,11 @@
       tag_100
       tag_101
       dup6
-      tag_27
+      tag_24
       jump     // in
     tag_101:
       dup7
-      tag_24
+      tag_27
       jump     // in
     tag_100:
       tag_28
@cameel cameel added bug 🐛 codegen error Compiler generates invalid code. Critical. medium effort Default level of effort medium impact Default level of impact must have Something we consider an essential part of Solidity 1.0. labels Jun 26, 2023
@ekpyron ekpyron added selected for development It's on our short-term development and removed must have Something we consider an essential part of Solidity 1.0. labels Jun 26, 2023
@ekpyron
Copy link
Member

ekpyron commented Jun 26, 2023

must have is orders of magnitude too low for this. We should fix this before the next release.

@ekpyron ekpyron added this to the 0.8.21 milestone Jun 26, 2023
@ekpyron ekpyron removed the codegen error Compiler generates invalid code. Critical. label Jun 26, 2023
@ekpyron
Copy link
Member

ekpyron commented Jun 26, 2023

But it's also not a code generation error in its defined sense :-). Or do the two versions behave differently as well?

@cameel
Copy link
Member Author

cameel commented Jun 26, 2023

I intepreted codegen error pretty loosely here - basically what it generates is not exactly what it should, even if that stuff works. I was going to discuss on the channel if this is the right label, but that's when I finish reporting two other things I found.

@cameel
Copy link
Member Author

cameel commented Jun 26, 2023

@kuzdogan, this will is probably be quite problematic for source verification. One workaround for this would be to keep multiple binaries and try with each one when verification with one fails. I haven't verified that yet, but I suspect that keeping just two (emscripten and linux) might be enough to cover all bases. But I'll be able to say more when we know the exact cause of this.

@kuzdogan
Copy link
Member

Thanks for the heads up. We don't run the emscripten compiler if it compiles successfully with the native Linux. In this case looks like we should.

The pattern we should look for is this right ?

        "optimizer": {"enabled": false},
        "viaIR": true,

@cameel
Copy link
Member Author

cameel commented Jun 26, 2023

Yes.

@cameel
Copy link
Member Author

cameel commented Jun 26, 2023

Ah, actually "optimizer": {"enabled": false} may not be the only way to trigger this. You probably can do the same with "enabled": true by disabling the Yul optimizer or changing the sequence.

When I figure out the cause, I'll let you know what specifically triggers this and which versions are affected.

@cameel
Copy link
Member Author

cameel commented Jun 28, 2023

The cause turns out be a difference in expression evaluation order between Clang and GCC (which is generally undefined in C++). macOS and Emscripten binaries are built with Clang, while Linux ones are build with GCC.

In the codegen we generate functions as we go through the AST and we often do this inside function calls embedded in expressions. If the expression contains more than one such call, the evaluation order determines which which function we will be generated first. I'm actually surprised that we had only one case of this. This is something we should not be relying upon and we mostly don't, but it's very easy to do this by uintentionally. The only safeguard is the bytecode comparison, which for IR we'll be running only starting with this version.

@kuzdogan So unfortunately --via-ir is potentially affected in all versions before 0.8.21. This particular instance of the problem seems to have been introduced in #7015, way back in in 0.5.11, before the IR became non-experimental. It's even possible that more issues of this kind were introduced and then disappeared in any older version supporting IR - I only did bytecode comparison on the current develop. Anything from before that is suspect.

Given that, I think it's not safe to assume this only ever happens with optimizer disabled. Any optimizer step that reorders functions could potentially be the one responsible for restoring the deterministic order and the sequence is fully customizable by the user. It's also possible that other bugs of this kind existed before 0.8.21. It's best to assume that any --via-ir bytecode from before 0.8.21 is potentially affected.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug 🐛 medium effort Default level of effort medium impact Default level of impact selected for development It's on our short-term development
Projects
Archived in project
Development

Successfully merging a pull request may close this issue.

3 participants