Shift in AST IDs affects inlining and expression splitting decisions in Yul Optimizer, causing bytecode differences #14829

klkvr · 2024-02-02T00:26:23Z

Description

This is probably related to #14494, I am using solc 0.8.23 and still encounter the issue

Steps to Reproduce

Here are two standard JSON input files:
input1.json
input2.json

The only difference between those is a small source file:

"contracts/DummyContract.sol": {
    "content": "contract DummyContract {}"
},

We are interested in the UniversalRouter bytecode. It was obtained by using the following command:

cat input1.json | '/path/to/0.8.23' --standard-json | jq '.contracts."contracts/UniversalRouter.sol".UniversalRouter.evm.bytecode.object' > bytecode1.txt

Resulted bytecodes:
bytecode1.txt
bytecode2.txt

Bytecodes are expected to be equal, however they have a pretty big diff:

The text was updated successfully, but these errors were encountered:

cameel · 2024-02-05T18:36:40Z

Thanks for the report! Unfortunately I can confirm that this is still reproducible on 0.8.24 and indeed looks like a difference in the bytecode coming from the optimizer. Including DummyContract.sol shifts AST IDs and they seem to have impact on the output even though they shouldn't.

It only happens with the --optimize flag. After inspecting the --ir-optimized output I see some IR differences that go beyond just slightly different names and immutable IDs. There seem to be two spots where decisions taken by the optimizer differ.

IR differences

There's one spot where an expression does not get split into subexpressions:

                         let usr$firstWord := calldataload(expr_component_2)
                         let var_token0 := shr(96, usr$firstWord)
-                        let _36 := 0xffffff
-                        let _37 := add(expr_component_2, 23)
-                        let _38 := calldataload(_37)
-                        let var_token1 := shr(96, _38)
-                        let expr_4 := fun_computePoolAddress(var_token0, var_token1, and(shr(72, usr$firstWord), _36))
+                        let _36 := add(expr_component_2, 23)
+                        let var_token1 := shr(96, calldataload(_36))
+                        let expr_4 := fun_computePoolAddress(var_token0, var_token1, and(shr(72, usr$firstWord), 0xffffff))

There are two places where a function called fun_swap() gets inlined. Here's the first one, from external dispatch:

                             default {
                                 if gt(23, expr_component_1) { revert(0, 0) }
-                                let lengthOut := add(expr_component_1, not(22))
                                 if iszero(lt(expr_component_4, shl(255, 1))) { revert(0, 0) }
-                                let _41 := negate_int256(expr_component_4)
-                                if lt(lengthOut, 43)
-                                {
-                                    let _42 := mload(_2)
-                                    mstore(_42, shl(224, 0x3b99b53d))
-                                    revert(_42, _3)
-                                }
-                                let var_token1_1 := shr(96, calldataload(add(expr_component_2, 46)))
-                                let expr_5 := lt(var_token1_1, var_token1)
-                                let _43 := and(fun_computePoolAddress(var_token1, var_token1_1, and(shr(72, _38), _36)), _33)
-                                let expr_6 := 0
-                                switch expr_5
-                                case 0 {
-                                    expr_6 := 0xfffd8963efd1fc6a506488495d951d5263988d25
-                                }
-                                default { expr_6 := 0x01000276a4 }
-                                let expr_mpos := mload(_2)
-                                mstore(add(expr_mpos, 32), _2)
-                                let tail := abi_encode_bytes_calldata(_37, lengthOut, add(expr_mpos, 96))
-                                mstore(add(expr_mpos, _2), _34)
-                                let _44 := sub(tail, expr_mpos)
-                                mstore(expr_mpos, add(_44, not(31)))
-                                finalize_allocation(expr_mpos, _44)
-                                let _45 := mload(_2)
-                                mstore(_45, shl(227, 0x02515961))
-                                mstore(add(_45, _3), caller())
-                                mstore(add(_45, 36), expr_5)
-                                mstore(add(_45, 68), _41)
-                                mstore(add(_45, 100), and(expr_6, _33))
-                                mstore(add(_45, 132), 160)
-                                let _46 := call(gas(), _43, 0, _45, sub(abi_encode_bytes(expr_mpos, add(_45, 164)), _45), _45, _2)
-                                if iszero(_46)
-                                {
-                                    let pos_2 := mload(_2)
-                                    returndatacopy(pos_2, 0, returndatasize())
-                                    revert(pos_2, returndatasize())
-                                }
-                                if _46
-                                {
-                                    let _47 := _2
-                                    if gt(_2, returndatasize()) { _47 := returndatasize() }
-                                    finalize_allocation(_45, _47)
-                                    if slt(sub(add(_45, _47), _45), _2) { revert(0, 0) }
-                                }
+                                let expr_component_5, expr_component_6, expr_component_7 := fun_swap(negate_int256(expr_component_4), caller(), _36, add(expr_component_1, not(22)), _34)
                             }

Repro on the CLI

Here's my CLI repro. For the ease of debugging it skips some options that are not relevant and also uses the same input files in both cases, just changing the command.

json_file="input1.json"
for source_unit_name in $(jq --raw-output '.sources | keys[]' "$json_file"); do
    mkdir -p ./"$(dirname "$source_unit_name")"
    jq --raw-output '.sources["'"$source_unit_name"'"].content' "$json_file" > ./"$source_unit_name"
done
jq '.settings' "$json_file" --indent 4

function solc-universal-router {
    solc \
        --via-ir \
        --optimize \
        --no-cbor-metadata \
        --debug-info none \
        "solmate/=lib/solmate/" \
        "permit2/=lib/permit2/" \
        "forge-std/=lib/forge-std/src/" \
        "@openzeppelin/=node_modules/@openzeppelin/" \
        "@uniswap/=node_modules/@uniswap/" \
        "ds-test/=lib/forge-std/lib/ds-test/src/" \
        "$@"
}

function solc-universal-router-bin {
    solc-universal-router --bin "$@" |
        grep '======= contracts/UniversalRouter.sol:UniversalRouter =======' --after-context=2
}

diff --color --unified --report-identical-files \
    <(solc-universal-router-bin contracts/UniversalRouter.sol contracts/DummyContract.sol | fold --width 100) \
    <(solc-universal-router-bin contracts/UniversalRouter.sol                             | fold --width 100)

diff --color --unified --report-identical-files \
    <(solc-universal-router --ir-optimized contracts/UniversalRouter.sol contracts/DummyContract.sol) \
    <(solc-universal-router --ir-optimized contracts/UniversalRouter.sol)

klkvr added the bug 🐛 label Feb 2, 2024

cameel added selected for development It's on our short-term development medium effort Default level of effort medium impact Default level of impact labels Feb 5, 2024

cameel changed the title ~~Different bytecode depending on sources in scope (0.8.23)~~ Shift in AST IDs affects inlining and expression splitting decisions in Yul Optimizer, causing bytecode differences Feb 5, 2024

cameel added this to the 0.8.25 milestone Feb 5, 2024

nikola-matic self-assigned this Feb 7, 2024

cameel modified the milestones: 0.8.25, 0.8.26 Mar 13, 2024

nikola-matic mentioned this issue Mar 27, 2024

Bytecode difference caused by SSA transform #14968

Merged

nikola-matic closed this as completed in #14968 Apr 17, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Shift in AST IDs affects inlining and expression splitting decisions in Yul Optimizer, causing bytecode differences #14829

Shift in AST IDs affects inlining and expression splitting decisions in Yul Optimizer, causing bytecode differences #14829

klkvr commented Feb 2, 2024

cameel commented Feb 5, 2024

Shift in AST IDs affects inlining and expression splitting decisions in Yul Optimizer, causing bytecode differences #14829

Shift in AST IDs affects inlining and expression splitting decisions in Yul Optimizer, causing bytecode differences #14829

Comments

klkvr commented Feb 2, 2024

Description

Steps to Reproduce

cameel commented Feb 5, 2024

IR differences

Repro on the CLI