Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[seqbench] the-good-parts-mk2 sequence #14928

Closed
wants to merge 3 commits into from

Conversation

cameel
Copy link
Member

@cameel cameel commented Mar 8, 2024

Related to #14406

Do not merge. This is just a test PR to evaluate how the the-good-parts-mk2 sequence would affect costs in CI.

A small tweak of the-good-parts sequence (#14909), which makes it deal better with the erc20.sol contract.

The difference is the addition of the gvif part before the final segment that makes code "short and pretty".

…and some extra refinements

- `the-good-parts` variant found using seqbench.
- second refinement, based on the erc20.sol contract.
@cameel cameel self-assigned this Mar 8, 2024
@cameel
Copy link
Member Author

cameel commented Mar 8, 2024

gas_diff_stats

File name IR optimized Legacy optimized Legacy
array/create_memory_array.sol 9%
array/array_storage_push_empty_length_address.sol 8%
externalContracts/base64.sol 8% -0%
array/array_storage_index_boundary_test.sol 7%
array/delete/bytes_delete_element.sol 7% -0%
array/copying/cleanup_during_multi_element_per_slot_copy.sol 6%
array/copying/bytes_storage_to_storage.sol 5% -0%
array/copying/storage_memory_packed_dyn.sol 5%
array/byte_array_transitional_2.sol 5%
array/copying/array_copy_cleanup_uint40.sol 5%
abiEncoderV2/calldata_array.sol 3%
array/array_storage_push_pop.sol 3%
inheritance/inherited_function_calldata_memory_interface.sol 3%
abiEncoderV2/abi_encode_calldata_slice.sol 3% -0%
abiEncoderV1/abi_encode_calldata_slice.sol 3% -0%
array/array_storage_push_empty.sol 2%
array/array_storage_index_zeroed_test.sol 2%
isoltestTesting/balance_other_contract.sol 2%
externalContracts/prbmath_unsigned.sol 2%
array/dynamic_arrays_in_storage.sol 2%
externalContracts/FixedFeeRegistrar.sol 1% +0%
various/erc20.sol 1% +0%
events/event_indexed_string.sol 1% -0%
abiEncoderV2/abi_encode_v2_in_function_inherited_in_v1_contract.sol 1% -0%
array/fixed_arrays_as_return_type.sol 1% -0%
array/push/byte_array_push_transition.sol 1%
various/selfdestruct_pre_cancun_multiple_beneficiaries.sol 1%
userDefinedValueType/erc20.sol 1% +0%
array/copying/array_of_struct_calldata_to_storage.sol 1%
array/copying/array_copy_clear_storage.sol 1%
array/copying/memory_dyn_2d_bytes_to_storage.sol 1% -0%
events/event_dynamic_array_storage_v2.sol 1% +0%
events/event_dynamic_array_storage.sol 1% +0%
array/array_storage_index_access.sol 1%
externalContracts/prbmath_signed.sol 1%
constructor/bytes_in_constructors_packer.sol +0% +0%
various/value_complex.sol 1%
various/value_insane.sol +0%
salted_create/salted_create_with_value.sol +0%
array/push/array_push.sol +0%
structs/memory_structs_nested_load.sol +0%
externalContracts/snark.sol +0%
byte_array_to_storage_cleanup.sol 1% -0%
array/copying/array_of_struct_memory_to_storage.sol +0%
array/pop/byte_array_pop_long_storage_empty.sol +0%
array/copying/storage_memory_nested_from_pointer.sol +0%
array/copying/storage_memory_nested.sol +0%
various/selfdestruct_pre_cancun_redeploy.sol +0%
array/copying/array_copy_storage_storage_different_base.sol +0%
array/copying/copy_byte_array_to_storage.sol +0% -0%
array/push/push_no_args_bytes.sol +0%
structs/struct_memory_to_storage_function_ptr.sol +0%
array/copying/array_copy_storage_storage_dynamic_dynamic.sol +0%
structs/struct_delete_storage_with_array.sol +0%
array/push/nested_bytes_push.sol +0% -0%
array/constant_var_as_array_length.sol +0% +0%
array/copying/array_of_structs_containing_arrays_calldata_to_storage.sol +0%
array/dynamic_multi_array_cleanup.sol +0%
structs/struct_copy_via_local.sol +0%
array/push/array_push_struct.sol +0%
abiEncoderV1/struct/struct_storage_ptr.sol +0%
libraries/using_library_mappings_public.sol +0%
array/fixed_arrays_in_constructors.sol +0%
array/copying/array_copy_different_packing.sol +0%
constructor/constructor_arguments_external.sol +0%
abiEncoderV2/abi_encode_v2.sol +0% +0%
array/copying/nested_array_element_storage_to_storage.sol +0%
viaYul/copy_struct_invalid_ir_bug.sol +0%
userDefinedValueType/calldata.sol +0% -0%
array/copying/array_copy_storage_to_memory_nested.sol +0% +0%
storage/packed_storage_structs_bytes.sol +0%
array/copying/calldata_array_dynamic_to_storage.sol +0% -0%
array/dynamic_array_cleanup.sol +0%
inheritance/value_for_constructor.sol +0%
array/copying/array_nested_calldata_to_storage.sol +0%
immutable/use_scratch.sol +0%
constructor/constructor_static_array_argument.sol +0% -0%
events/event_dynamic_nested_array_storage_v2.sol +0% +0%
array/push/array_push_struct_from_calldata.sol +0% -0%
various/destructuring_assignment.sol +0% -0%
array/copying/array_copy_target_simple.sol +0%
array/copying/array_copy_including_array.sol +0%
functionCall/creation_function_call_with_salt.sol +0%
array/copying/array_copy_storage_storage_different_base_nested.sol +0%
array/copying/storage_memory_nested_struct.sol +0%
array/copying/nested_array_of_structs_calldata_to_storage.sol +0%
array/copying/elements_of_nested_array_of_structs_calldata_to_storage.sol +0%
structs/copy_struct_array_from_storage.sol +0%
functionCall/creation_function_call_with_args.sol +0%
array/copying/array_nested_memory_to_storage.sol +0%
types/mapping/copy_from_mapping_to_mapping.sol +0% +0%
array/copying/nested_dynamic_array_element_calldata_to_storage.sol +0%
array/fixed_array_cleanup.sol +0%
array/copying/nested_array_of_structs_memory_to_storage.sol +0%
array/copying/array_copy_storage_storage_struct.sol +0%
array/copying/array_copy_storage_storage_static_dynamic.sol +0%
abiencodedecode/abi_decode_simple_storage.sol +0% -0%
structs/calldata/calldata_struct_with_nested_array_to_storage.sol +0% +0%
array/bytes_length_member.sol +0%
array/reusing_memory.sol +0%
array/pop/array_pop_array_transition.sol +0%
array/push/array_push_nested_from_calldata.sol +0% -0%
array/copying/nested_array_of_structs_storage_to_storage.sol +0%
array/copying/array_of_structs_containing_arrays_memory_to_storage.sol +0%
abiEncoderV1/abi_decode_v2_storage.sol +0% +0%
array/push/push_no_args_2d.sol +0%
array/copying/array_storage_multi_items_per_slot.sol +0%
structs/struct_copy.sol +0%
abiEncoderV2/calldata_overlapped_dynamic_arrays.sol +0% -0%
array/copying/copy_function_internal_storage_array.sol +0%
array/copying/array_copy_storage_storage_dyn_dyn.sol +0%
various/staticcall_for_view_and_pure.sol -0%
array/array_storage_length_access.sol -0%
array/copying/copy_byte_array_in_struct_to_storage.sol +0% -0%
calldata/copy_from_calldata_removes_bytes_data.sol -0%
array/copying/array_copy_storage_storage_static_static.sol -0%
array/copying/bytes_inside_mappings.sol -0%
array/invalid_encoding_for_storage_byte_array.sol -0% -0%
array/copying/array_of_function_external_storage_to_storage_dynamic_different_mutability.sol -0% +0%
constructor/no_callvalue_check.sol -0%
array/copying/array_copy_nested_array.sol -0% -0%
array/copying/elements_of_nested_array_of_structs_memory_to_storage.sol -0%
array/array_memory_index_access.sol -0%
array/copying/copy_removes_bytes_data.sol -0%
various/many_subassemblies.sol -0%
immutable/multi_creation.sol -0%
array/copying/calldata_array_to_mapping.sol -0%
array/copying/array_to_mapping.sol -0% +0%
array/copying/array_of_function_external_storage_to_storage_dynamic.sol -0% -0%
structs/struct_containing_bytes_copy_and_delete.sol -0% -0%
array/copying/array_copy_target_simple_2.sol -0%
abiEncoderV2/storage_array_encoding.sol -0% -0%
structs/copy_substructures_from_mapping.sol -0% -0%
array/copying/array_elements_to_mapping.sol -0% +0%
array/copying/function_type_array_to_storage.sol -0% -0%
various/skip_dynamic_types_for_structs.sol -0% -0%
storage/empty_nonempty_empty.sol -0% -0%
structs/copy_to_mapping.sol -0% -0%
structs/copy_from_mapping.sol -0% -0%
array/pop/byte_array_pop_long_storage_empty_garbage_ref.sol -0%
array/copying/array_copy_calldata_storage.sol -0% -0%
array/copying/storage_memory_nested_bytes.sol -0% +0%
array/pop/array_pop_uint16_transition.sol -0%
array/copying/arrays_from_and_to_storage.sol -0%
functionCall/external_call_to_nonexisting_debugstrings.sol -0% +0%
array/pop/array_pop_uint24_transition.sol -0%
functionCall/gas_and_value_brace_syntax.sol -0%
functionCall/gas_and_value_basic.sol -0%
structs/copy_substructures_to_mapping.sol -0% -0%
libraries/internal_types_in_library.sol -0%
constructor/arrays_in_constructors.sol -1% +0%
structs/structs.sol -0%
functionCall/mapping_array_internal_argument.sol -0%
array/pop/byte_array_pop_masking_long.sol -0% -0%
array/copying/array_copy_target_leftover.sol -0% -0%
structs/conversion/recursive_storage_memory.sol -0%
externalContracts/strings.sol -0% +0%
libraries/using_library_mappings_return.sol -1%
array/arrays_complex_from_and_to_storage.sol -1% +0%
functionCall/external_call_to_nonexisting.sol -1% +0%
constructor/bytes_in_constructors_unpacker.sol -1% -0%
externalContracts/deposit_contract.sol -1% +0%
various/address_code.sol -1% -0%
array/function_array_cross_calls.sol -2% 1%
various/create_calldata.sol -1% -0%
externalContracts/ramanujan_pi.sol -6%

Changes compared to the same table for the-good-parts

-| `various/erc20.sol`                                   | 4%  | +0% |   |
+| `various/erc20.sol`                                   | 1%  | +0% |   |
-| `userDefinedValueType/erc20.sol`                      | 3%  | +0% |   |
+| `userDefinedValueType/erc20.sol`                      | 1%  | +0% |   |
-| `byte_array_to_storage_cleanup.sol`                   | 3%  | -0% |   |
+| `byte_array_to_storage_cleanup.sol`                   | 1%  | -0% |   |
-| `array/push/push_no_args_bytes.sol`                   | 3%  |     |   |
+| `array/push/push_no_args_bytes.sol`                   | +0% |     |   |
-| `externalContracts/FixedFeeRegistrar.sol`             | 2%  | +0% |   |
+| `externalContracts/FixedFeeRegistrar.sol`             | 1%  | +0% |   |
-| `salted_create/salted_create_with_value.sol`          | 1%  |     |   |
+| `salted_create/salted_create_with_value.sol`          | +0% |     |   |
-| `structs/conversion/recursive_storage_memory.sol`     | 1%  |     |   |
+| `structs/conversion/recursive_storage_memory.sol`     | -0% |     |   |
-| `structs/memory_structs_nested_load.sol`              | 1%  |     |   |
+| `structs/memory_structs_nested_load.sol`              | +0% |     |   |
-| `array/copying/array_of_struct_memory_to_storage.sol` | 1%  |     |   |
+| `array/copying/array_of_struct_memory_to_storage.sol` | +0% |     |   |
-| `array/pop/array_pop_uint16_transition.sol`           | 1%  |     |   |
+| `array/pop/array_pop_uint16_transition.sol`           | -0% |     |   |
-| `array/pop/array_pop_uint24_transition.sol`           | 1%  |     |   |
+| `array/pop/array_pop_uint24_transition.sol`           | -0% |     |   |
-| `array/copying/array_elements_to_mapping.sol`         | +0% | +0% |   |
+| `array/copying/array_elements_to_mapping.sol`         | -0% | +0% |   |
-| `array/copying/function_type_array_to_storage.sol`    | +0% | -0% |   |
+| `array/copying/function_type_array_to_storage.sol`    | -0% | -0% |   |
-| `externalContracts/snark.sol`                         | 1%  |     |   |
+| `externalContracts/snark.sol`                         | +0% |     |   |
-| `libraries/using_library_mappings_return.sol`         | -0% |     |   |
+| `libraries/using_library_mappings_return.sol`         | -1% |     |   |
-| `immutable/multi_creation.sol`                        | +0% |     |   |
+| `immutable/multi_creation.sol`                        | -0% |     |   |

@cameel
Copy link
Member Author

cameel commented Mar 8, 2024

External test benchmark diff

CI run on develop vs CI run on seqbench-sequence-the-good-parts-mk2

ir-no-optimize

project bytecode_size deployment_gas method_gas
bleeps
brink 0%
chainlink
colony 0%
elementfi 0%
ens 0%
euler
gnosis
gp2 0%
perpetual-pools 0% +0% +0.01% ❌
pool-together 0%
trident 0%
uniswap 0% -0% -0%
yield_liquidator 0% -0% 0%
zeppelin

ir-optimize-evm+yul

project bytecode_size deployment_gas method_gas
bleeps
brink +0.39% ❌
chainlink -2.4% ✅
colony +0.13% ❌
elementfi -1.65% ✅
ens -0.16% ✅ -3.08% ✅ -0.01% ✅
euler +1.18% ❌
gnosis
gp2 +0.49% ❌
perpetual-pools +0.08% ❌ -0.22% ✅ +0.13% ❌
pool-together -1.06% ✅
trident +0.27% ❌
uniswap +1.65% ❌ +1.66% ❌ +2.04% ❌
yield_liquidator +0.92% ❌ -0.1% ✅ +0.04% ❌
zeppelin -0.38% ✅ -1.18% ✅ +0.01% ❌

ir-optimize-evm-only

project bytecode_size deployment_gas method_gas
bleeps
brink 0%
chainlink
colony 0%
elementfi 0%
ens 0% -0% 0%
euler
gnosis
gp2 0%
perpetual-pools 0% +0% -0.01% ✅
pool-together 0%
trident 0%
uniswap 0% -0% +0%
yield_liquidator 0% +0% -0%
zeppelin 0%

legacy-no-optimize

project bytecode_size deployment_gas method_gas
bleeps
brink 0%
chainlink 0%
colony 0%
elementfi 0%
ens 0%
euler 0%
gnosis 0%
gp2 0%
perpetual-pools 0% +0% -0.02% ✅
pool-together 0%
trident 0%
uniswap 0% -0% -0%
yield_liquidator 0% -0% 0%
zeppelin 0% -0% -0.09% ✅

legacy-optimize-evm+yul

project bytecode_size deployment_gas method_gas
bleeps +0.13% ❌
brink -0.03% ✅
chainlink +0.12% ❌
colony +0.13% ❌
elementfi +0.09% ❌
ens +0.18% ❌ -0.02% ✅ -0%
euler +0.14% ❌
gnosis +0.16% ❌
gp2 +0.27% ❌
perpetual-pools -0.03% ✅ +0% +0.03% ❌
pool-together +0.03% ❌
trident +0.1% ❌
uniswap +0.04% ❌ +0.03% ❌ +0%
yield_liquidator +0.04% ❌ +0.02% ❌ -0.01% ✅
zeppelin +0.15% ❌ +0.17% ❌ -0.06% ✅

legacy-optimize-evm-only

project bytecode_size deployment_gas method_gas
bleeps
brink 0%
chainlink 0%
colony 0%
elementfi 0%
ens 0% 0% 0%
euler 0%
gnosis 0%
gp2 0%
perpetual-pools 0% -0% +0%
pool-together 0%
trident 0%
uniswap 0% +0% +0%
yield_liquidator 0% -0% 0%
zeppelin 0% -0% +0.02% ❌

!V = version mismatch
!B = no value in the "before" version
!A = no value in the "after" version
!T = one or both values were not numeric and could not be compared
-0 = very small negative value rounded to zero
+0 = very small positive value rounded to zero

@cameel
Copy link
Member Author

cameel commented Mar 8, 2024

External test comparison with the-good-parts.

Interestingly, there is only one test where the new sequence performs worse (perpetual-pools) in almost all others it does the same or better, with just a few being neutral.

the-good-parts -> the-good-parts-mk2 (ir-optimize-evm+yul diff)

-|            brink | +0.73% ❌ |           |           |
+|            brink | +0.39% ❌ |           |           |

-|        chainlink | -1.83% ✅ |           |           |
+|        chainlink |  -2.4% ✅ |           |           |

-|        elementfi | -1.33% ✅ |           |           |
+|        elementfi | -1.65% ✅ |           |           |

-|              ens | +0.39% ❌ | -2.77% ✅ | +0.02% ❌ |
+|              ens | -0.16% ✅ | -3.08% ✅ | -0.01% ✅ |

-|            euler | +1.29% ❌ |           |           |
+|            euler | +1.18% ❌ |           |           |

-|              gp2 | +0.86% ❌ |           |           |
+|              gp2 | +0.49% ❌ |           |           |

-|  perpetual-pools | -0.39% ✅ | -0.78% ✅ | +0.32% ❌ |
+|  perpetual-pools | +0.08% ❌ | -0.22% ✅ | +0.13% ❌ |

-|    pool-together | -0.83% ✅ |           |           |
+|    pool-together | -1.06% ✅ |           |           |

-|          trident |  +0.4% ❌ |           |           |
+|          trident | +0.27% ❌ |           |           |

-|          uniswap | +2.06% ❌ |  +2.2% ❌ | +2.13% ❌ |
+|          uniswap | +1.65% ❌ | +1.66% ❌ | +2.04% ❌ |

-| yield_liquidator |  +1.3% ❌ | +0.34% ❌ | +0.17% ❌ |
+| yield_liquidator | +0.92% ❌ |  -0.1% ✅ | +0.04% ❌ |

-|         zeppelin |  -0.2% ✅ | -1.16% ✅ | +0.12% ❌ |
+|         zeppelin | -0.38% ✅ | -1.18% ✅ | +0.01% ❌ |

the-good-parts -> the-good-parts-mk2 (legacy-optimize-evm+yul diff)

+|        chainlink | +0.12% ❌ |           |           |
-|        chainlink | +0.13% ❌ |           |           |

-|        elementfi | +0.12% ❌ |           |           |
+|        elementfi | +0.09% ❌ |           |           |

-|            euler | +0.17% ❌ |           |           |
+|            euler | +0.14% ❌ |           |           |

-|           gnosis |  +0.2% ❌ |           |           |
+|           gnosis | +0.16% ❌ |           |           |

-|  perpetual-pools | -0.03% ✅ |       +0% | -0.01% ✅ |
+|  perpetual-pools | -0.03% ✅ |       +0% | +0.03% ❌ |

-|    pool-together | +0.05% ❌ |           |           |
+|    pool-together | +0.03% ❌ |           |           |

-| yield_liquidator | +0.09% ❌ | +0.06% ❌ | -0.01% ✅ |
+| yield_liquidator | +0.04% ❌ | +0.02% ❌ | -0.01% ✅ |

-|         zeppelin | +0.15% ❌ | +0.17% ❌ | -0.02% ✅ |
+|         zeppelin | +0.15% ❌ | +0.17% ❌ | -0.06% ✅ |

@github-actions github-actions bot added the stale The issue/PR was marked as stale because it has been open for too long. label Mar 23, 2024
@cameel cameel removed the stale The issue/PR was marked as stale because it has been open for too long. label Mar 25, 2024
@ethereum ethereum deleted a comment from github-actions bot Mar 25, 2024
@cameel

This comment was marked as duplicate.

@cameel
Copy link
Member Author

cameel commented Mar 26, 2024

Hmm... after trying it out, it does not seem to affect as many cases as I expected. Basically devcon_example.yul and two others. Still, does not make anything worse either so probably does not hurt to include it in the cleanup.

Comment on lines +22 to +24
// f_63()
// f()
// f()
// f()
// f_65()
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks like a regression compared to the default sequence. UnusedFunctionParameterPruner (p) does not remove the obviously unnecessary parameter so the function gets specialized. It happens because when we run p, the body contains an (unused) assignment to the parameter:

        function f(a) -> x {
            (...)
            for { } true { (...) } {
                (...)
                a := a_1
                mstore(a_1, 0)
                (...)
            }
        }

The assignment is created by the SSA but not removed by UnusedAssignEliminator (r) that runs afterwards because the variable is used as initialization for some (unused) variables at that point:

        function f(a) -> x {
            (...)
            for { } true { (...) let a_2 := a (...) } {
                (...)
                let a_3 := a
                (...)
                a := a_4
                (...)
            }
        }

This can be solved by either replacing the [r] loop after the SSA with [ru] or by running r before p.

I think that the default sequence does not have this problem due to the presence of the simplification segment, which includes u and runs before we reverse SSA.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Comparison with unmodified the-good-parts-mk2 on cancun

Tweak

-       "Tpeul"                        // Run functional expression inliner
+       "Trpeul"                       // Run functional expression inliner

External tests (ir-optimize-evm+yul)

seqbench-sequence-the-good-parts-mk2-tweaks-r-before-p vs seqbench-sequence-the-good-parts-mk2-cancun

project bytecode_size deployment_gas method_gas
elementfi -0.03% ✅
ens -0.01% ✅ 0% 0%
euler -0.07% ✅
uniswap +0.35% ❌
yield_liquidator -0.02% ✅ -0.03% ✅ -0%
zeppelin -0.01% ✅ -0.01% ✅ +0.04% ❌

Gas diff stats

No differences over +/-0.5%.

// }
// function gcd(_a, _b) -> out
// {
// switch _b
// case 0 { out := _a }
// default { out := gcd(_b, mod(_a, _b)) }
// }
// function foo_singlereturn()
// { extcodecopy(1, msize(), 1, 1) }
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks odd. Is the test buggy? The name of foo_singlereturn() suggests it should return a single value and the function technically does have a single out return, but that's never assigned to and gets optimized out. Hard to tell if if was intentional or not.

In any case, the main thing that changes here is that the function does not get inlined. Note that it wasn't being inlined ether back in 0.5.11 when the test was added (0c0b5a0), so not sure if I should classify it as a regression.

Comment on lines +14 to 18
// let x := 0
// switch mload(0)
// case 0 { }
// case 1 { }
// case 0 { x := 0 }
// case 1 { x := 1 }
// default { invalid() }
Copy link
Member Author

@cameel cameel Mar 27, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This one also seems to be the fault of the removed simplification segment. Adding an extra u solves both this and the unused parameter pruner regression (#14928 (comment)).

Comment on lines -26 to +32
// sstore(0, add(mload(0x40), 128))
// sstore(1, 0x20)
// let p := mload(0x40)
// mstore(0x40, add(p, 96))
// let _1 := add(p, 128)
// mstore(_1, 2)
// mstore(0x40, 0x20)
// sstore(0, _1)
// sstore(1, mload(0x40))
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This one is caused by LoadResolver (L) not being able to resolve mload(0x40) when the unused mstore(0x80, 0x40) stands between it and mstore(0x40, 0x20).

In the default sequence the xa[r]cL segment can deal with it because it runs after the structural simplification segment, which includes StructuralSimplifier (t) and UnusedStoreEliminator (S). The new sequence instead encounters this:

            if 1
            {
                if 1 { mstore(0x40, 0x20) }
                for { } true { }
                {
                    if 1 { break }
                    mstore(0x80, 0x40)
                }
            }
            sstore(0, _1)
            sstore(1, mload(0x40))

Replacing xa[r]cL with txa[r]cSL fixes the problem and mload(0x40) gets resolved into 0x20.
Note that t here is ineffective if we run it on SSA form while the opposite is true for S.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oddly, while the change helps with this test and also minimally decreases gas usage in several other tests, there are also a few where it makes things a little worse.

For example for functionCall/external_call_to_nonexisting.sol bytecode size seems to grow enough to increase the code deposit cost by almost 2%:

-// gas irOptimized: 89315
-// gas irOptimized code: 169800
+// gas irOptimized: 89578
+// gas irOptimized code: 173000

So my proposed fix has mixed effects. I'll have to see the effect on external tests to see if it's actually worth it.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Comparison with unmodified the-good-parts-mk2 on cancun

Tweak

-       "xa[r]cL"                      // Turn into SSA again and simplify
+       "txa[r]cSL"                    // Turn into SSA again and simplify

External tests (ir-optimize-evm+yul)

seqbench-sequence-the-good-parts-mk2-tweaks-t-and-c-before-load-resolver vs seqbench-sequence-the-good-parts-mk2-cancun

project bytecode_size deployment_gas method_gas
elementfi -0.09% ✅
ens +0.25% ❌ +1.42% ❌ -0.02% ✅
euler +0.05% ❌
gp2 +0.08% ❌
perpetual-pools !A !A !A
pool-together +0.05% ❌
uniswap +0.03% ❌
yield_liquidator +0.03% ❌ +0.01% ❌ -0%
zeppelin +0.06% ❌ +0.1% ❌ +0.22% ❌

For perpetual-pools compilation fails with YulException: Variable _16 is 1 too deep in the stack. (memoryguard was present).

Gas diff stats

File name IR optimized Legacy optimized Legacy
functionCall/external_call_to_nonexisting.sol 1%
inlineAssembly/transient_storage_selfdestruct.sol 1%
functionCall/external_call_to_nonexisting_debugstrings.sol 1%
functionCall/gas_and_value_brace_syntax.sol 1%
functionCall/gas_and_value_basic.sol 1%

Comment on lines -15 to 21
// let _1 := iszero(caller())
// for { }
// true
// {
// let _1 := iszero(caller())
// for { } iszero(_1) { }
// { }
// mstore(192, 0)
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure if this tests still tests what it's supposed to test - the comment says that the expression containing caller() is not supposed to be movable but the expectations clearly expect it to be moved.

Still, regardless of that the version with caller() outside of both loops seems both more optimal and correct so this is a regression. It happens because the new sequence runs LoopInvariantCodeMotion (M) only once and it happens before inlining. In the default sequence this is not a problem because it has the big loop.

Appending M after the inlining segment (gvif) fixes the problem. Perhaps this step should always follow inlining.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Comparison with unmodified the-good-parts-mk2 on cancun

Tweak

-       "gvif"                         // Run full inliner
+       "gvifM"                        // Run full inliner
        "CTUca[r]LSsTFOtfDnca[r]Iulc"  // SSA plus simplify

        "scCTUt"
-       "gvif"                         // Run full inliner
+       "gvifM"                        // Run full inliner
        "[scCTUt] TOntnfDIul"          // Perform structural simplification
-       "gvif"                         // Run full inliner
+       "gvifM"                        // Run full inliner

External tests (ir-optimize-evm+yul)

seqbench-sequence-the-good-parts-mk2-tweaks-M-after-inlining vs seqbench-sequence-the-good-parts-mk2-cancun

project bytecode_size deployment_gas method_gas
ens +0.01% ❌ -0% 0%
gp2 +0.03% ❌
perpetual-pools +0.04% ❌ +0.04% ❌ -0%
pool-together -0.04% ✅
uniswap +0.01% ❌
zeppelin -0% -0.01% ✅ +0.02% ❌

Gas diff stats

File name IR optimized Legacy optimized Legacy
array/byte_array_transitional_2.sol -4%
array/copying/bytes_storage_to_storage.sol -6%

Comment on lines 16 to +17
// let x := calldataload(0)
// sstore(shr(248, x), and(shr(4, x), 3855))
// sstore(shr(248, x), and(and(shr(4, x), sub(shl(244, 1), 1)), 3855))
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is yet another test that seems to have broken expectations. The comment says that we're checking that expression simplifier does not modify the unsplit expressions, but expectations are in a simplified form.

This is probably still a regression though. The new expectation seems to be simplified too, but not fully. it's transformed to something weird.

Copy link
Member Author

@cameel cameel Apr 11, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

After looking into it, I see it's just that it's not fully optimized with the new sequence. ExpressionSimplifier (s) in the default sequence eventually figures out that part of this expression is an application of two constant masks ((4 >> x) & 0x01e7 & 0xf0f) and one of them can be safely dropped because only the three lowest bits matter and those are identical.

The simplification does not take place though if the expression is not split. In the default sequence that happens eventually due to repetition, in the new one it does not. I found that adding ExpressionSplitter (x) resolves this. Either just x before [scCTUt] or xsu somewhere later in the sequence.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Comparison with unmodified the-good-parts-mk2 on cancun

Tweak

-       "[scCTUt] TOntnfDIul"          // Perform structural simplification
+       "x[scCTUt] TOntnfDIul"         // Perform structural simplification

External tests (ir-optimize-evm+yul)

seqbench-sequence-the-good-parts-mk2-tweaks-x-before-s vs seqbench-sequence-the-good-parts-mk2-cancun

project bytecode_size deployment_gas method_gas
brink +0.06% ❌
elementfi -0.1% ✅
ens -0.26% ✅ 0% +0%
euler -0.13% ✅
gp2 -0.04% ✅
perpetual-pools -0.33% ✅ -0.33% ✅ -0.11% ✅
pool-together -0.06% ✅
uniswap -0.33% ✅
yield_liquidator -0.07% ✅ -0.08% ✅ +0.1% ❌
zeppelin -0.08% ✅ -0.07% ✅ +0.06% ❌

Gas diff stats

File name IR optimized Legacy optimized Legacy
externalContracts/base64.sol -1%
array/array_storage_push_pop.sol -1%
array/function_array_cross_calls.sol -1%
array/push/byte_array_push_transition.sol -1%
array/pop/byte_array_pop_long_storage_empty.sol -1%
array/byte_array_transitional_2.sol -1%

Comment on lines 19 to +21
// ----
// f() -> "A", 8, 4, "B"
// gas irOptimized: 125822
// gas irOptimized: 136664
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In this one the default sequence seems to win mostly due to efficient use of FunctionSpecializer (F).

The heaviest parts of the code are the two allocations of 2D arrays, which turn into loops in Yul. For example the first one looks like this with the default sequence:

                        for { } lt(i, 9600) { i := add(i, _5) }
                        {
                            let memPtr_1 := allocate_memory()
                            calldatacopy(memPtr_1, calldatasize(), _4)
                            mstore(add(add(memPtr, i), _5), memPtr_1)
                        }

and like this with the new sequence:

                        for { } lt(i, _3) { i := add(i, 32) }
                        {
                            let size_2 := 0
                            let _4 := 0
                            _4 := 0
                            size_2 := 64
                            let memPtr_2 := allocate_memory(size_2)
                            let size_3 := 0
                            let _5 := 0
                            _5 := 0
                            size_3 := size_2
                            calldatacopy(memPtr_2, calldatasize(), size_2)
                            mstore(add(add(memPtr_1, i), 32), memPtr_2)
                        }

Surprisingly, the bit that accounts for most of the cost increase is the parameter passed into allocate_memory(). The default sequence inlines the uses of the function outside of the loop and then can specialize it for the loop, making it significantly cheaper. Other differences are much less significant.

I managed to replicate this behavior with an extra TFviu at the end of the new sequence (the v is important, since without it the function gets completely inlined). The downside of this is that it significantly hurts some other examples. E.g. the bytecode size in ramanujan_pi.sol blows up by ~20%. The default sequence seems to be able to avoid it, probably due to repetition of other sequence parts so maybe adding some of them at the end could help. I also noticed that I can move the new segment a bit earlier in the sequence by prepending some steps to it (e.g. ciTFviu) and this makes the other examples degrade less.

Another change that improves this test case a little is adding Tr to the cleanup sequence (i.e. fDnTOcTrUmu).

Overall, this one could possibly be improved with more experimentation, but it's much harder than the other tweaks.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually, regression for ramanujan_pi.sol is not as bad as it seemed at first, because I was comparing with the original results for the new sequence. With the above tweaks and compared with the default sequence it's only 8%. Still, it was -6% before so by doing that we just trade one regression for another.

Copy link
Member Author

@cameel cameel Apr 11, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Comparison with unmodified the-good-parts-mk2 on cancun (ciTFviu + Tr)

Tweak

+       "ciTFviu"
        "[scCTUt] TOntnfDIul"          // Perform structural simplification
        "gvif"                         // Run full inliner

        "jmul[jul] VcTOcul jmul";      // Make source short and pretty

-   static char constexpr DefaultYulOptimiserCleanupSteps[] = "fDnTOcmu";
+   static char constexpr DefaultYulOptimiserCleanupSteps[] = "fDnTOcTrmu";

External tests

seqbench-sequence-the-good-parts-mk2-tweaks-ciTFviu-Tr vs seqbench-sequence-the-good-parts-mk2-cancun

ir-optimize-evm+yul
project bytecode_size deployment_gas method_gas
brink +1.24% ❌
colony -0.01% ✅
elementfi +2.22% ❌
ens +0.29% ❌ -1.04% ✅ -0.04% ✅
euler -0.13% ✅
gp2 +0.25% ❌
perpetual-pools -2.11% ✅ -2.31% ✅ -0.26% ✅
pool-together +1.47% ❌
uniswap -1.18% ✅
yield_liquidator -0.3% ✅ -0.34% ✅ -0.1% ✅
zeppelin +1.66% ❌ +1.53% ❌ -0.02% ✅
legacy-optimize-evm+yul
project bytecode_size deployment_gas method_gas
colony -0.01% ✅
elementfi -0.01% ✅
ens -0.01% ✅ +0% 0%
euler -0.06% ✅
gp2 -0.04% ✅
perpetual-pools -0.02% ✅ -0.02% ✅ -0%
pool-together +0.01% ❌
yield_liquidator -0.02% ✅ -0.01% ✅ 0%
zeppelin -0.01% ✅ -0.01% ✅ -0.08% ✅

Gas diff stats

File name IR optimized Legacy optimized Legacy
externalContracts/ramanujan_pi.sol 14%
externalContracts/prbmath_signed.sol 1%
array/function_array_cross_calls.sol 1%
abiEncoderV2/abi_encode_v2_in_function_inherited_in_v1_contract.sol 1% -0%
various/address_code.sol 1%
events/event_emit_from_other_contract.sol -1%
functionCall/gas_and_value_brace_syntax.sol -2%
functionCall/gas_and_value_basic.sol -2%
functionCall/external_call_to_nonexisting_debugstrings.sol -2%
inlineAssembly/transient_storage_selfdestruct.sol -3%
externalContracts/base64.sol -3%
functionCall/external_call_to_nonexisting.sol -4%
byte_array_to_storage_cleanup.sol -4%
array/create_memory_array.sol -8%

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Comparison with unmodified the-good-parts-mk2 on cancun (Tr only)

Tweak

-   static char constexpr DefaultYulOptimiserCleanupSteps[] = "fDnTOcmu";
+   static char constexpr DefaultYulOptimiserCleanupSteps[] = "fDnTOcTrmu";

External tests (ir-optimize-evm+yul)

seqbench-sequence-the-good-parts-mk2-tweaks-cleanup-sequence-add-Tr vs seqbench-sequence-the-good-parts-mk2-cancun

project bytecode_size deployment_gas method_gas
brink +0.2% ❌
colony -0.01% ✅
elementfi +0.07% ❌
ens -0.03% ✅ -0.31% ✅ -0%
euler -0.06% ✅
gp2 +0.1% ❌
perpetual-pools +0% -0.07% ✅ -0.01% ✅
pool-together +0.15% ❌
uniswap -0.07% ✅
yield_liquidator 0% -0.05% ✅ -0%
zeppelin -0.34% ✅ -0.63% ✅ -0.01% ✅

Gas diff stats

File name IR optimized Legacy optimized Legacy
externalContracts/ramanujan_pi.sol 3%
abiEncoderV2/abi_encode_v2_in_function_inherited_in_v1_contract.sol 1% -0%
various/address_code.sol 1%
array/create_memory_array.sol -1%
functionCall/gas_and_value_brace_syntax.sol -2%
functionCall/gas_and_value_basic.sol -2%

Comment on lines 35 to +37
// constructor()
// gas irOptimized: 79503
// gas irOptimized code: 328000
// gas irOptimized: 77762
// gas irOptimized code: 307000
Copy link
Member Author

@cameel cameel Mar 28, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My TFviu fix for create_memory_array.sol (#14928 (comment)) destroys this gain and makes the cost increase by 8% here instead.

I see now that it's actually all coming from constructor - neither the new sequence nor the one with the fix seems to have affected execution cost of the prb_pi() function. So it's all just bytecode size.

Interestingly, the new sequence (without the fix) produces smaller bytecode even thought the Yul source is visibly larger. There's a whole chunk of ~125 lines that's not present in the output from default sequence. It does have one function less, but the extra chunk does not comes from inlining of that function. The function is a specialization of another, present in both versions. Overall the bytecode size difference is ~100 bytes.

One thing that draws attention when comparing assembly is that the bigger version produced by the default sequence has more large constants in it. My theory is that this is the major factor affecting the bytecode size for this contract. It aligns with my observation that versions with bigger bytecode have more copies of the specialized function I mentioned above. The function has several large numbers in it:

            function fun_mulDivFixedPoint(var_x, var_y) -> var_result
            {
                let usr$mm := mulmod(var_x, var_y, not(0))
                let var_prod0 := mul(var_x, var_y)
                let _1 := lt(usr$mm, var_prod0)
                let _2 := sub(usr$mm, var_prod0)
                let var_prod1 := sub(_2, _1)
                let _3 := 1000000000000000000
                let var_remainder := mulmod(var_x, var_y, _3)
                let var_roundUpUnit := gt(var_remainder, 499999999999999999)
                if eq(_2, _1)
                {
                    var_result := add(div(var_prod0, _3), var_roundUpUnit)
                    leave
                }
                if iszero(gt(_3, var_prod1)) { revert(0, 0) }
                var_result := add(mul(or(shr(18, sub(var_prod0, var_remainder)), shl(238, sub(var_prod1, gt(var_remainder, var_prod0)))), 78156646155174841979727994598816262306175212592076161876661508869554232690281), var_roundUpUnit)
           }

The code optimized with default sequence has two copies (the original and a specialization).
The one from the new sequence (-6% gas) has one.
The one from the new sequence with the TFviu fix (+8% gas) has three.

My conclusion here is that this is again a problem with the specializer. Unfortunately this time I don't have a good fix. I can tweak the sequence to make the specialization more or less aggressive but some contracts gain while others lose. I think that dealing with this properly would require some heuristics that can decide when it's worth it to specialize. Or maybe some adjustment to inlining heuristics - those specialized functions do not get inlined and perhaps they should be.

In the light of that I'm not sure if I should really treat the original 9% increase for create_memory_array.sol as a regression. It's balanced out by the 6% decrease for ramanujan_pi.sol and I think it might also balance out for real contracts. That's pretty much what happens in external tests.

Comment on lines -28 to 34
// for { } lt(i, length) { i := add(i, 1) }
// for { } true { i := add(i, 1) }
// {
// let _2 := iszero(lt(i, length))
// if _2 { break }
// _2 := 0
// sum := add(sum, calldataload(add(add(_1, shl(5, i)), 0x20)))
// }
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reposting #14928 (comment), so that I can keep it contained in a thread.

A quick observation from reviewing the modified code: in a lot of cases the difference is in which variables get rematerialized or which expressions get joined. Sometimes this also prevents further optimizations (see e.g. devcon_example.yul for example where it interferes with ForLoopConditionOutOfBody).

That's unexpected to me given that the cleanup sequence ends with Rematerialiser (m) and that ExpressionJoiner (j) is also used quite liberally in the sequence. After some experimentation I see that this seems to be due to the extra zero assignments that ConditionalSimplifier (C) inserts. I'm not sure why they interfere but using ConditionalUnsimplifier (U) before mu or j seems to help.

The conclusion is that adding U to the cleanup sequence is probably a good idea.

EDIT: An extra O at the end of cleanup is also needed to move the condition out of body (though it's just a cosmetic difference).
EDIT2: Maybe not just cosmetic - I expected gas use to be the same but it actually decreased by a very small amount in several tests (and increased in one).

Copy link
Member Author

@cameel cameel Apr 11, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Comparison with unmodified the-good-parts-mk2 on cancun (U)

Tweak

-   static char constexpr DefaultYulOptimiserCleanupSteps[] = "fDnTOcmu";
+   static char constexpr DefaultYulOptimiserCleanupSteps[] = "fDnTOcUmu";

External tests (ir-optimize-evm+yul)

seqbench-sequence-the-good-parts-mk2-tweaks-cleanup-sequence-add-U vs seqbench-sequence-the-good-parts-mk2-cancun

project bytecode_size deployment_gas method_gas
elementfi +0.01% ❌
ens -0.01% ✅ -0% 0%
euler +0.02% ❌
gp2 +0.06% ❌
perpetual-pools +0.02% ❌ +0.03% ❌ +0%
pool-together +0.11% ❌
uniswap +0.06% ❌
yield_liquidator +0.03% ❌ +0.03% ❌ 0%
zeppelin +0% +0.01% ❌ +0%

Gas diff stats

No differences over +/-0.5%.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Comparison with unmodified the-good-parts-mk2 on cancun (U + O)

Tweak

-   static char constexpr DefaultYulOptimiserCleanupSteps[] = "fDnTOcmu";
+   static char constexpr DefaultYulOptimiserCleanupSteps[] = "fDnTOcUmuO";

External tests (ir-optimize-evm+yul)

seqbench-sequence-the-good-parts-mk2-tweaks-cleanup-sequence-add-U-O vs seqbench-sequence-the-good-parts-mk2-cancun

project bytecode_size deployment_gas method_gas
elementfi +0.01% ❌
ens -0.01% ✅ -0% 0%
euler +0.02% ❌
gp2 +0.06% ❌
perpetual-pools +0.02% ❌ +0.03% ❌ +0%
pool-together +0.11% ❌
uniswap +0.06% ❌
yield_liquidator +0.03% ❌ +0.03% ❌ 0%
zeppelin +0% +0.01% ❌ +0.11% ❌

Gas diff stats

No differences over +/-0.5%.

Summary

Pretty much no change compared to U alone. The difference in method_gas for zeppelin is probably still below the margin or error - it sometimes gets worse in runs with only U too.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Comparison with unmodified the-good-parts-mk2 on cancun (O)

Tweak

-   static char constexpr DefaultYulOptimiserCleanupSteps[] = "fDnTOcmu";
+   static char constexpr DefaultYulOptimiserCleanupSteps[] = "fDnTOcmuO";

External tests (ir-optimize-evm+yul)

seqbench-sequence-the-good-parts-mk2-tweaks-cleanup-sequence-add-O vs seqbench-sequence-the-good-parts-mk2-cancun

project bytecode_size deployment_gas method_gas
perpetual-pools 0% +0% +0.02% ❌
zeppelin 0% +0% -0.02% ✅

Gas diff stats

No differences at all.

@cameel
Copy link
Member Author

cameel commented Apr 15, 2024

Closing in favor of #15031.

@cameel cameel closed this Apr 15, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

1 participant