Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Shift in AST IDs affects inlining and expression splitting decisions in Yul Optimizer, causing bytecode differences #14829

Closed
klkvr opened this issue Feb 2, 2024 · 1 comment · Fixed by #14968
Assignees
Labels
bug 🐛 medium effort Default level of effort medium impact Default level of impact selected for development It's on our short-term development
Milestone

Comments

@klkvr
Copy link

klkvr commented Feb 2, 2024

Description

This is probably related to #14494, I am using solc 0.8.23 and still encounter the issue

Steps to Reproduce

Here are two standard JSON input files:
input1.json
input2.json

The only difference between those is a small source file:

"contracts/DummyContract.sol": {
    "content": "contract DummyContract {}"
},

We are interested in the UniversalRouter bytecode. It was obtained by using the following command:

cat input1.json | '/path/to/0.8.23' --standard-json | jq '.contracts."contracts/UniversalRouter.sol".UniversalRouter.evm.bytecode.object' > bytecode1.txt 

Resulted bytecodes:
bytecode1.txt
bytecode2.txt

Bytecodes are expected to be equal, however they have a pretty big diff:

image
@klkvr klkvr added the bug 🐛 label Feb 2, 2024
@cameel
Copy link
Member

cameel commented Feb 5, 2024

Thanks for the report! Unfortunately I can confirm that this is still reproducible on 0.8.24 and indeed looks like a difference in the bytecode coming from the optimizer. Including DummyContract.sol shifts AST IDs and they seem to have impact on the output even though they shouldn't.

It only happens with the --optimize flag. After inspecting the --ir-optimized output I see some IR differences that go beyond just slightly different names and immutable IDs. There seem to be two spots where decisions taken by the optimizer differ.

IR differences

  1. There's one spot where an expression does not get split into subexpressions:

                             let usr$firstWord := calldataload(expr_component_2)
                             let var_token0 := shr(96, usr$firstWord)
    -                        let _36 := 0xffffff
    -                        let _37 := add(expr_component_2, 23)
    -                        let _38 := calldataload(_37)
    -                        let var_token1 := shr(96, _38)
    -                        let expr_4 := fun_computePoolAddress(var_token0, var_token1, and(shr(72, usr$firstWord), _36))
    +                        let _36 := add(expr_component_2, 23)
    +                        let var_token1 := shr(96, calldataload(_36))
    +                        let expr_4 := fun_computePoolAddress(var_token0, var_token1, and(shr(72, usr$firstWord), 0xffffff))
  2. There are two places where a function called fun_swap() gets inlined. Here's the first one, from external dispatch:

                                 default {
                                     if gt(23, expr_component_1) { revert(0, 0) }
    -                                let lengthOut := add(expr_component_1, not(22))
                                     if iszero(lt(expr_component_4, shl(255, 1))) { revert(0, 0) }
    -                                let _41 := negate_int256(expr_component_4)
    -                                if lt(lengthOut, 43)
    -                                {
    -                                    let _42 := mload(_2)
    -                                    mstore(_42, shl(224, 0x3b99b53d))
    -                                    revert(_42, _3)
    -                                }
    -                                let var_token1_1 := shr(96, calldataload(add(expr_component_2, 46)))
    -                                let expr_5 := lt(var_token1_1, var_token1)
    -                                let _43 := and(fun_computePoolAddress(var_token1, var_token1_1, and(shr(72, _38), _36)), _33)
    -                                let expr_6 := 0
    -                                switch expr_5
    -                                case 0 {
    -                                    expr_6 := 0xfffd8963efd1fc6a506488495d951d5263988d25
    -                                }
    -                                default { expr_6 := 0x01000276a4 }
    -                                let expr_mpos := mload(_2)
    -                                mstore(add(expr_mpos, 32), _2)
    -                                let tail := abi_encode_bytes_calldata(_37, lengthOut, add(expr_mpos, 96))
    -                                mstore(add(expr_mpos, _2), _34)
    -                                let _44 := sub(tail, expr_mpos)
    -                                mstore(expr_mpos, add(_44, not(31)))
    -                                finalize_allocation(expr_mpos, _44)
    -                                let _45 := mload(_2)
    -                                mstore(_45, shl(227, 0x02515961))
    -                                mstore(add(_45, _3), caller())
    -                                mstore(add(_45, 36), expr_5)
    -                                mstore(add(_45, 68), _41)
    -                                mstore(add(_45, 100), and(expr_6, _33))
    -                                mstore(add(_45, 132), 160)
    -                                let _46 := call(gas(), _43, 0, _45, sub(abi_encode_bytes(expr_mpos, add(_45, 164)), _45), _45, _2)
    -                                if iszero(_46)
    -                                {
    -                                    let pos_2 := mload(_2)
    -                                    returndatacopy(pos_2, 0, returndatasize())
    -                                    revert(pos_2, returndatasize())
    -                                }
    -                                if _46
    -                                {
    -                                    let _47 := _2
    -                                    if gt(_2, returndatasize()) { _47 := returndatasize() }
    -                                    finalize_allocation(_45, _47)
    -                                    if slt(sub(add(_45, _47), _45), _2) { revert(0, 0) }
    -                                }
    +                                let expr_component_5, expr_component_6, expr_component_7 := fun_swap(negate_int256(expr_component_4), caller(), _36, add(expr_component_1, not(22)), _34)
                                 }

Repro on the CLI

Here's my CLI repro. For the ease of debugging it skips some options that are not relevant and also uses the same input files in both cases, just changing the command.

json_file="input1.json"
for source_unit_name in $(jq --raw-output '.sources | keys[]' "$json_file"); do
    mkdir -p ./"$(dirname "$source_unit_name")"
    jq --raw-output '.sources["'"$source_unit_name"'"].content' "$json_file" > ./"$source_unit_name"
done
jq '.settings' "$json_file" --indent 4
function solc-universal-router {
    solc \
        --via-ir \
        --optimize \
        --no-cbor-metadata \
        --debug-info none \
        "solmate/=lib/solmate/" \
        "permit2/=lib/permit2/" \
        "forge-std/=lib/forge-std/src/" \
        "@openzeppelin/=node_modules/@openzeppelin/" \
        "@uniswap/=node_modules/@uniswap/" \
        "ds-test/=lib/forge-std/lib/ds-test/src/" \
        "$@"
}

function solc-universal-router-bin {
    solc-universal-router --bin "$@" |
        grep '======= contracts/UniversalRouter.sol:UniversalRouter =======' --after-context=2
}

diff --color --unified --report-identical-files \
    <(solc-universal-router-bin contracts/UniversalRouter.sol contracts/DummyContract.sol | fold --width 100) \
    <(solc-universal-router-bin contracts/UniversalRouter.sol                             | fold --width 100)

diff --color --unified --report-identical-files \
    <(solc-universal-router --ir-optimized contracts/UniversalRouter.sol contracts/DummyContract.sol) \
    <(solc-universal-router --ir-optimized contracts/UniversalRouter.sol)

@cameel cameel added selected for development It's on our short-term development medium effort Default level of effort medium impact Default level of impact labels Feb 5, 2024
@cameel cameel changed the title Different bytecode depending on sources in scope (0.8.23) Shift in AST IDs affects inlining and expression splitting decisions in Yul Optimizer, causing bytecode differences Feb 5, 2024
@cameel cameel added this to the 0.8.25 milestone Feb 5, 2024
@nikola-matic nikola-matic self-assigned this Feb 7, 2024
@cameel cameel modified the milestones: 0.8.25, 0.8.26 Mar 13, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug 🐛 medium effort Default level of effort medium impact Default level of impact selected for development It's on our short-term development
Projects
Archived in project
Development

Successfully merging a pull request may close this issue.

3 participants