New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[seqbench] the-good-parts-mk2
sequence
#14928
Conversation
…and some extra refinements - `the-good-parts` variant found using seqbench. - second refinement, based on the erc20.sol contract.
|
File name | IR optimized | Legacy optimized | Legacy |
---|---|---|---|
array/create_memory_array.sol |
9% | ||
array/array_storage_push_empty_length_address.sol |
8% | ||
externalContracts/base64.sol |
8% | -0% | |
array/array_storage_index_boundary_test.sol |
7% | ||
array/delete/bytes_delete_element.sol |
7% | -0% | |
array/copying/cleanup_during_multi_element_per_slot_copy.sol |
6% | ||
array/copying/bytes_storage_to_storage.sol |
5% | -0% | |
array/copying/storage_memory_packed_dyn.sol |
5% | ||
array/byte_array_transitional_2.sol |
5% | ||
array/copying/array_copy_cleanup_uint40.sol |
5% | ||
abiEncoderV2/calldata_array.sol |
3% | ||
array/array_storage_push_pop.sol |
3% | ||
inheritance/inherited_function_calldata_memory_interface.sol |
3% | ||
abiEncoderV2/abi_encode_calldata_slice.sol |
3% | -0% | |
abiEncoderV1/abi_encode_calldata_slice.sol |
3% | -0% | |
array/array_storage_push_empty.sol |
2% | ||
array/array_storage_index_zeroed_test.sol |
2% | ||
isoltestTesting/balance_other_contract.sol |
2% | ||
externalContracts/prbmath_unsigned.sol |
2% | ||
array/dynamic_arrays_in_storage.sol |
2% | ||
externalContracts/FixedFeeRegistrar.sol |
1% | +0% | |
various/erc20.sol |
1% | +0% | |
events/event_indexed_string.sol |
1% | -0% | |
abiEncoderV2/abi_encode_v2_in_function_inherited_in_v1_contract.sol |
1% | -0% | |
array/fixed_arrays_as_return_type.sol |
1% | -0% | |
array/push/byte_array_push_transition.sol |
1% | ||
various/selfdestruct_pre_cancun_multiple_beneficiaries.sol |
1% | ||
userDefinedValueType/erc20.sol |
1% | +0% | |
array/copying/array_of_struct_calldata_to_storage.sol |
1% | ||
array/copying/array_copy_clear_storage.sol |
1% | ||
array/copying/memory_dyn_2d_bytes_to_storage.sol |
1% | -0% | |
events/event_dynamic_array_storage_v2.sol |
1% | +0% | |
events/event_dynamic_array_storage.sol |
1% | +0% | |
array/array_storage_index_access.sol |
1% | ||
externalContracts/prbmath_signed.sol |
1% | ||
constructor/bytes_in_constructors_packer.sol |
+0% | +0% | |
various/value_complex.sol |
1% | ||
various/value_insane.sol |
+0% | ||
salted_create/salted_create_with_value.sol |
+0% | ||
array/push/array_push.sol |
+0% | ||
structs/memory_structs_nested_load.sol |
+0% | ||
externalContracts/snark.sol |
+0% | ||
byte_array_to_storage_cleanup.sol |
1% | -0% | |
array/copying/array_of_struct_memory_to_storage.sol |
+0% | ||
array/pop/byte_array_pop_long_storage_empty.sol |
+0% | ||
array/copying/storage_memory_nested_from_pointer.sol |
+0% | ||
array/copying/storage_memory_nested.sol |
+0% | ||
various/selfdestruct_pre_cancun_redeploy.sol |
+0% | ||
array/copying/array_copy_storage_storage_different_base.sol |
+0% | ||
array/copying/copy_byte_array_to_storage.sol |
+0% | -0% | |
array/push/push_no_args_bytes.sol |
+0% | ||
structs/struct_memory_to_storage_function_ptr.sol |
+0% | ||
array/copying/array_copy_storage_storage_dynamic_dynamic.sol |
+0% | ||
structs/struct_delete_storage_with_array.sol |
+0% | ||
array/push/nested_bytes_push.sol |
+0% | -0% | |
array/constant_var_as_array_length.sol |
+0% | +0% | |
array/copying/array_of_structs_containing_arrays_calldata_to_storage.sol |
+0% | ||
array/dynamic_multi_array_cleanup.sol |
+0% | ||
structs/struct_copy_via_local.sol |
+0% | ||
array/push/array_push_struct.sol |
+0% | ||
abiEncoderV1/struct/struct_storage_ptr.sol |
+0% | ||
libraries/using_library_mappings_public.sol |
+0% | ||
array/fixed_arrays_in_constructors.sol |
+0% | ||
array/copying/array_copy_different_packing.sol |
+0% | ||
constructor/constructor_arguments_external.sol |
+0% | ||
abiEncoderV2/abi_encode_v2.sol |
+0% | +0% | |
array/copying/nested_array_element_storage_to_storage.sol |
+0% | ||
viaYul/copy_struct_invalid_ir_bug.sol |
+0% | ||
userDefinedValueType/calldata.sol |
+0% | -0% | |
array/copying/array_copy_storage_to_memory_nested.sol |
+0% | +0% | |
storage/packed_storage_structs_bytes.sol |
+0% | ||
array/copying/calldata_array_dynamic_to_storage.sol |
+0% | -0% | |
array/dynamic_array_cleanup.sol |
+0% | ||
inheritance/value_for_constructor.sol |
+0% | ||
array/copying/array_nested_calldata_to_storage.sol |
+0% | ||
immutable/use_scratch.sol |
+0% | ||
constructor/constructor_static_array_argument.sol |
+0% | -0% | |
events/event_dynamic_nested_array_storage_v2.sol |
+0% | +0% | |
array/push/array_push_struct_from_calldata.sol |
+0% | -0% | |
various/destructuring_assignment.sol |
+0% | -0% | |
array/copying/array_copy_target_simple.sol |
+0% | ||
array/copying/array_copy_including_array.sol |
+0% | ||
functionCall/creation_function_call_with_salt.sol |
+0% | ||
array/copying/array_copy_storage_storage_different_base_nested.sol |
+0% | ||
array/copying/storage_memory_nested_struct.sol |
+0% | ||
array/copying/nested_array_of_structs_calldata_to_storage.sol |
+0% | ||
array/copying/elements_of_nested_array_of_structs_calldata_to_storage.sol |
+0% | ||
structs/copy_struct_array_from_storage.sol |
+0% | ||
functionCall/creation_function_call_with_args.sol |
+0% | ||
array/copying/array_nested_memory_to_storage.sol |
+0% | ||
types/mapping/copy_from_mapping_to_mapping.sol |
+0% | +0% | |
array/copying/nested_dynamic_array_element_calldata_to_storage.sol |
+0% | ||
array/fixed_array_cleanup.sol |
+0% | ||
array/copying/nested_array_of_structs_memory_to_storage.sol |
+0% | ||
array/copying/array_copy_storage_storage_struct.sol |
+0% | ||
array/copying/array_copy_storage_storage_static_dynamic.sol |
+0% | ||
abiencodedecode/abi_decode_simple_storage.sol |
+0% | -0% | |
structs/calldata/calldata_struct_with_nested_array_to_storage.sol |
+0% | +0% | |
array/bytes_length_member.sol |
+0% | ||
array/reusing_memory.sol |
+0% | ||
array/pop/array_pop_array_transition.sol |
+0% | ||
array/push/array_push_nested_from_calldata.sol |
+0% | -0% | |
array/copying/nested_array_of_structs_storage_to_storage.sol |
+0% | ||
array/copying/array_of_structs_containing_arrays_memory_to_storage.sol |
+0% | ||
abiEncoderV1/abi_decode_v2_storage.sol |
+0% | +0% | |
array/push/push_no_args_2d.sol |
+0% | ||
array/copying/array_storage_multi_items_per_slot.sol |
+0% | ||
structs/struct_copy.sol |
+0% | ||
abiEncoderV2/calldata_overlapped_dynamic_arrays.sol |
+0% | -0% | |
array/copying/copy_function_internal_storage_array.sol |
+0% | ||
array/copying/array_copy_storage_storage_dyn_dyn.sol |
+0% | ||
various/staticcall_for_view_and_pure.sol |
-0% | ||
array/array_storage_length_access.sol |
-0% | ||
array/copying/copy_byte_array_in_struct_to_storage.sol |
+0% | -0% | |
calldata/copy_from_calldata_removes_bytes_data.sol |
-0% | ||
array/copying/array_copy_storage_storage_static_static.sol |
-0% | ||
array/copying/bytes_inside_mappings.sol |
-0% | ||
array/invalid_encoding_for_storage_byte_array.sol |
-0% | -0% | |
array/copying/array_of_function_external_storage_to_storage_dynamic_different_mutability.sol |
-0% | +0% | |
constructor/no_callvalue_check.sol |
-0% | ||
array/copying/array_copy_nested_array.sol |
-0% | -0% | |
array/copying/elements_of_nested_array_of_structs_memory_to_storage.sol |
-0% | ||
array/array_memory_index_access.sol |
-0% | ||
array/copying/copy_removes_bytes_data.sol |
-0% | ||
various/many_subassemblies.sol |
-0% | ||
immutable/multi_creation.sol |
-0% | ||
array/copying/calldata_array_to_mapping.sol |
-0% | ||
array/copying/array_to_mapping.sol |
-0% | +0% | |
array/copying/array_of_function_external_storage_to_storage_dynamic.sol |
-0% | -0% | |
structs/struct_containing_bytes_copy_and_delete.sol |
-0% | -0% | |
array/copying/array_copy_target_simple_2.sol |
-0% | ||
abiEncoderV2/storage_array_encoding.sol |
-0% | -0% | |
structs/copy_substructures_from_mapping.sol |
-0% | -0% | |
array/copying/array_elements_to_mapping.sol |
-0% | +0% | |
array/copying/function_type_array_to_storage.sol |
-0% | -0% | |
various/skip_dynamic_types_for_structs.sol |
-0% | -0% | |
storage/empty_nonempty_empty.sol |
-0% | -0% | |
structs/copy_to_mapping.sol |
-0% | -0% | |
structs/copy_from_mapping.sol |
-0% | -0% | |
array/pop/byte_array_pop_long_storage_empty_garbage_ref.sol |
-0% | ||
array/copying/array_copy_calldata_storage.sol |
-0% | -0% | |
array/copying/storage_memory_nested_bytes.sol |
-0% | +0% | |
array/pop/array_pop_uint16_transition.sol |
-0% | ||
array/copying/arrays_from_and_to_storage.sol |
-0% | ||
functionCall/external_call_to_nonexisting_debugstrings.sol |
-0% | +0% | |
array/pop/array_pop_uint24_transition.sol |
-0% | ||
functionCall/gas_and_value_brace_syntax.sol |
-0% | ||
functionCall/gas_and_value_basic.sol |
-0% | ||
structs/copy_substructures_to_mapping.sol |
-0% | -0% | |
libraries/internal_types_in_library.sol |
-0% | ||
constructor/arrays_in_constructors.sol |
-1% | +0% | |
structs/structs.sol |
-0% | ||
functionCall/mapping_array_internal_argument.sol |
-0% | ||
array/pop/byte_array_pop_masking_long.sol |
-0% | -0% | |
array/copying/array_copy_target_leftover.sol |
-0% | -0% | |
structs/conversion/recursive_storage_memory.sol |
-0% | ||
externalContracts/strings.sol |
-0% | +0% | |
libraries/using_library_mappings_return.sol |
-1% | ||
array/arrays_complex_from_and_to_storage.sol |
-1% | +0% | |
functionCall/external_call_to_nonexisting.sol |
-1% | +0% | |
constructor/bytes_in_constructors_unpacker.sol |
-1% | -0% | |
externalContracts/deposit_contract.sol |
-1% | +0% | |
various/address_code.sol |
-1% | -0% | |
array/function_array_cross_calls.sol |
-2% | 1% | |
various/create_calldata.sol |
-1% | -0% | |
externalContracts/ramanujan_pi.sol |
-6% |
Changes compared to the same table for the-good-parts
-| `various/erc20.sol` | 4% | +0% | |
+| `various/erc20.sol` | 1% | +0% | |
-| `userDefinedValueType/erc20.sol` | 3% | +0% | |
+| `userDefinedValueType/erc20.sol` | 1% | +0% | |
-| `byte_array_to_storage_cleanup.sol` | 3% | -0% | |
+| `byte_array_to_storage_cleanup.sol` | 1% | -0% | |
-| `array/push/push_no_args_bytes.sol` | 3% | | |
+| `array/push/push_no_args_bytes.sol` | +0% | | |
-| `externalContracts/FixedFeeRegistrar.sol` | 2% | +0% | |
+| `externalContracts/FixedFeeRegistrar.sol` | 1% | +0% | |
-| `salted_create/salted_create_with_value.sol` | 1% | | |
+| `salted_create/salted_create_with_value.sol` | +0% | | |
-| `structs/conversion/recursive_storage_memory.sol` | 1% | | |
+| `structs/conversion/recursive_storage_memory.sol` | -0% | | |
-| `structs/memory_structs_nested_load.sol` | 1% | | |
+| `structs/memory_structs_nested_load.sol` | +0% | | |
-| `array/copying/array_of_struct_memory_to_storage.sol` | 1% | | |
+| `array/copying/array_of_struct_memory_to_storage.sol` | +0% | | |
-| `array/pop/array_pop_uint16_transition.sol` | 1% | | |
+| `array/pop/array_pop_uint16_transition.sol` | -0% | | |
-| `array/pop/array_pop_uint24_transition.sol` | 1% | | |
+| `array/pop/array_pop_uint24_transition.sol` | -0% | | |
-| `array/copying/array_elements_to_mapping.sol` | +0% | +0% | |
+| `array/copying/array_elements_to_mapping.sol` | -0% | +0% | |
-| `array/copying/function_type_array_to_storage.sol` | +0% | -0% | |
+| `array/copying/function_type_array_to_storage.sol` | -0% | -0% | |
-| `externalContracts/snark.sol` | 1% | | |
+| `externalContracts/snark.sol` | +0% | | |
-| `libraries/using_library_mappings_return.sol` | -0% | | |
+| `libraries/using_library_mappings_return.sol` | -1% | | |
-| `immutable/multi_creation.sol` | +0% | | |
+| `immutable/multi_creation.sol` | -0% | | |
External test benchmark diffCI run on
|
project | bytecode_size | deployment_gas | method_gas |
---|---|---|---|
bleeps | |||
brink | 0% |
||
chainlink | |||
colony | 0% |
||
elementfi | 0% |
||
ens | 0% |
||
euler | |||
gnosis | |||
gp2 | 0% |
||
perpetual-pools | 0% |
+0% |
+0.01% ❌ |
pool-together | 0% |
||
trident | 0% |
||
uniswap | 0% |
-0% |
-0% |
yield_liquidator | 0% |
-0% |
0% |
zeppelin |
ir-optimize-evm+yul
project | bytecode_size | deployment_gas | method_gas |
---|---|---|---|
bleeps | |||
brink | +0.39% ❌ |
||
chainlink | -2.4% ✅ |
||
colony | +0.13% ❌ |
||
elementfi | -1.65% ✅ |
||
ens | -0.16% ✅ |
-3.08% ✅ |
-0.01% ✅ |
euler | +1.18% ❌ |
||
gnosis | |||
gp2 | +0.49% ❌ |
||
perpetual-pools | +0.08% ❌ |
-0.22% ✅ |
+0.13% ❌ |
pool-together | -1.06% ✅ |
||
trident | +0.27% ❌ |
||
uniswap | +1.65% ❌ |
+1.66% ❌ |
+2.04% ❌ |
yield_liquidator | +0.92% ❌ |
-0.1% ✅ |
+0.04% ❌ |
zeppelin | -0.38% ✅ |
-1.18% ✅ |
+0.01% ❌ |
ir-optimize-evm-only
project | bytecode_size | deployment_gas | method_gas |
---|---|---|---|
bleeps | |||
brink | 0% |
||
chainlink | |||
colony | 0% |
||
elementfi | 0% |
||
ens | 0% |
-0% |
0% |
euler | |||
gnosis | |||
gp2 | 0% |
||
perpetual-pools | 0% |
+0% |
-0.01% ✅ |
pool-together | 0% |
||
trident | 0% |
||
uniswap | 0% |
-0% |
+0% |
yield_liquidator | 0% |
+0% |
-0% |
zeppelin | 0% |
legacy-no-optimize
project | bytecode_size | deployment_gas | method_gas |
---|---|---|---|
bleeps | |||
brink | 0% |
||
chainlink | 0% |
||
colony | 0% |
||
elementfi | 0% |
||
ens | 0% |
||
euler | 0% |
||
gnosis | 0% |
||
gp2 | 0% |
||
perpetual-pools | 0% |
+0% |
-0.02% ✅ |
pool-together | 0% |
||
trident | 0% |
||
uniswap | 0% |
-0% |
-0% |
yield_liquidator | 0% |
-0% |
0% |
zeppelin | 0% |
-0% |
-0.09% ✅ |
legacy-optimize-evm+yul
project | bytecode_size | deployment_gas | method_gas |
---|---|---|---|
bleeps | +0.13% ❌ |
||
brink | -0.03% ✅ |
||
chainlink | +0.12% ❌ |
||
colony | +0.13% ❌ |
||
elementfi | +0.09% ❌ |
||
ens | +0.18% ❌ |
-0.02% ✅ |
-0% |
euler | +0.14% ❌ |
||
gnosis | +0.16% ❌ |
||
gp2 | +0.27% ❌ |
||
perpetual-pools | -0.03% ✅ |
+0% |
+0.03% ❌ |
pool-together | +0.03% ❌ |
||
trident | +0.1% ❌ |
||
uniswap | +0.04% ❌ |
+0.03% ❌ |
+0% |
yield_liquidator | +0.04% ❌ |
+0.02% ❌ |
-0.01% ✅ |
zeppelin | +0.15% ❌ |
+0.17% ❌ |
-0.06% ✅ |
legacy-optimize-evm-only
project | bytecode_size | deployment_gas | method_gas |
---|---|---|---|
bleeps | |||
brink | 0% |
||
chainlink | 0% |
||
colony | 0% |
||
elementfi | 0% |
||
ens | 0% |
0% |
0% |
euler | 0% |
||
gnosis | 0% |
||
gp2 | 0% |
||
perpetual-pools | 0% |
-0% |
+0% |
pool-together | 0% |
||
trident | 0% |
||
uniswap | 0% |
+0% |
+0% |
yield_liquidator | 0% |
-0% |
0% |
zeppelin | 0% |
-0% |
+0.02% ❌ |
!V
= version mismatch
!B
= no value in the "before" version
!A
= no value in the "after" version
!T
= one or both values were not numeric and could not be compared
-0
= very small negative value rounded to zero
+0
= very small positive value rounded to zero
External test comparison with
|
This comment was marked as duplicate.
This comment was marked as duplicate.
Hmm... after trying it out, it does not seem to affect as many cases as I expected. Basically |
// f_63() | ||
// f() | ||
// f() | ||
// f() | ||
// f_65() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This looks like a regression compared to the default sequence. UnusedFunctionParameterPruner
(p
) does not remove the obviously unnecessary parameter so the function gets specialized. It happens because when we run p
, the body contains an (unused) assignment to the parameter:
function f(a) -> x {
(...)
for { } true { (...) } {
(...)
a := a_1
mstore(a_1, 0)
(...)
}
}
The assignment is created by the SSA but not removed by UnusedAssignEliminator
(r
) that runs afterwards because the variable is used as initialization for some (unused) variables at that point:
function f(a) -> x {
(...)
for { } true { (...) let a_2 := a (...) } {
(...)
let a_3 := a
(...)
a := a_4
(...)
}
}
This can be solved by either replacing the [r]
loop after the SSA with [ru]
or by running r
before p
.
I think that the default sequence does not have this problem due to the presence of the simplification segment, which includes u
and runs before we reverse SSA.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Comparison with unmodified the-good-parts-mk2
on cancun
Tweak
- "Tpeul" // Run functional expression inliner
+ "Trpeul" // Run functional expression inliner
External tests (ir-optimize-evm+yul
)
seqbench-sequence-the-good-parts-mk2-tweaks-r-before-p
vs seqbench-sequence-the-good-parts-mk2-cancun
project | bytecode_size | deployment_gas | method_gas |
---|---|---|---|
elementfi | -0.03% ✅ |
||
ens | -0.01% ✅ |
0% |
0% |
euler | -0.07% ✅ |
||
uniswap | +0.35% ❌ |
||
yield_liquidator | -0.02% ✅ |
-0.03% ✅ |
-0% |
zeppelin | -0.01% ✅ |
-0.01% ✅ |
+0.04% ❌ |
Gas diff stats
No differences over +/-0.5%.
// } | ||
// function gcd(_a, _b) -> out | ||
// { | ||
// switch _b | ||
// case 0 { out := _a } | ||
// default { out := gcd(_b, mod(_a, _b)) } | ||
// } | ||
// function foo_singlereturn() | ||
// { extcodecopy(1, msize(), 1, 1) } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This looks odd. Is the test buggy? The name of foo_singlereturn()
suggests it should return a single value and the function technically does have a single out
return, but that's never assigned to and gets optimized out. Hard to tell if if was intentional or not.
In any case, the main thing that changes here is that the function does not get inlined. Note that it wasn't being inlined ether back in 0.5.11 when the test was added (0c0b5a0), so not sure if I should classify it as a regression.
// let x := 0 | ||
// switch mload(0) | ||
// case 0 { } | ||
// case 1 { } | ||
// case 0 { x := 0 } | ||
// case 1 { x := 1 } | ||
// default { invalid() } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This one also seems to be the fault of the removed simplification segment. Adding an extra u
solves both this and the unused parameter pruner regression (#14928 (comment)).
// sstore(0, add(mload(0x40), 128)) | ||
// sstore(1, 0x20) | ||
// let p := mload(0x40) | ||
// mstore(0x40, add(p, 96)) | ||
// let _1 := add(p, 128) | ||
// mstore(_1, 2) | ||
// mstore(0x40, 0x20) | ||
// sstore(0, _1) | ||
// sstore(1, mload(0x40)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This one is caused by LoadResolver
(L
) not being able to resolve mload(0x40)
when the unused mstore(0x80, 0x40)
stands between it and mstore(0x40, 0x20)
.
In the default sequence the xa[r]cL
segment can deal with it because it runs after the structural simplification segment, which includes StructuralSimplifier
(t
) and UnusedStoreEliminator
(S
). The new sequence instead encounters this:
if 1
{
if 1 { mstore(0x40, 0x20) }
for { } true { }
{
if 1 { break }
mstore(0x80, 0x40)
}
}
sstore(0, _1)
sstore(1, mload(0x40))
Replacing xa[r]cL
with txa[r]cSL
fixes the problem and mload(0x40)
gets resolved into 0x20
.
Note that t
here is ineffective if we run it on SSA form while the opposite is true for S
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oddly, while the change helps with this test and also minimally decreases gas usage in several other tests, there are also a few where it makes things a little worse.
For example for functionCall/external_call_to_nonexisting.sol
bytecode size seems to grow enough to increase the code deposit cost by almost 2%:
-// gas irOptimized: 89315
-// gas irOptimized code: 169800
+// gas irOptimized: 89578
+// gas irOptimized code: 173000
So my proposed fix has mixed effects. I'll have to see the effect on external tests to see if it's actually worth it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Comparison with unmodified the-good-parts-mk2
on cancun
Tweak
- "xa[r]cL" // Turn into SSA again and simplify
+ "txa[r]cSL" // Turn into SSA again and simplify
External tests (ir-optimize-evm+yul
)
seqbench-sequence-the-good-parts-mk2-tweaks-t-and-c-before-load-resolver
vs seqbench-sequence-the-good-parts-mk2-cancun
project | bytecode_size | deployment_gas | method_gas |
---|---|---|---|
elementfi | -0.09% ✅ |
||
ens | +0.25% ❌ |
+1.42% ❌ |
-0.02% ✅ |
euler | +0.05% ❌ |
||
gp2 | +0.08% ❌ |
||
perpetual-pools | !A |
!A |
!A |
pool-together | +0.05% ❌ |
||
uniswap | +0.03% ❌ |
||
yield_liquidator | +0.03% ❌ |
+0.01% ❌ |
-0% |
zeppelin | +0.06% ❌ |
+0.1% ❌ |
+0.22% ❌ |
For perpetual-pools
compilation fails with YulException: Variable _16 is 1 too deep in the stack.
(memoryguard was present).
Gas diff stats
File name | IR optimized | Legacy optimized | Legacy |
---|---|---|---|
functionCall/external_call_to_nonexisting.sol |
1% | ||
inlineAssembly/transient_storage_selfdestruct.sol |
1% | ||
functionCall/external_call_to_nonexisting_debugstrings.sol |
1% | ||
functionCall/gas_and_value_brace_syntax.sol |
1% | ||
functionCall/gas_and_value_basic.sol |
1% |
// let _1 := iszero(caller()) | ||
// for { } | ||
// true | ||
// { | ||
// let _1 := iszero(caller()) | ||
// for { } iszero(_1) { } | ||
// { } | ||
// mstore(192, 0) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not sure if this tests still tests what it's supposed to test - the comment says that the expression containing caller()
is not supposed to be movable but the expectations clearly expect it to be moved.
Still, regardless of that the version with caller()
outside of both loops seems both more optimal and correct so this is a regression. It happens because the new sequence runs LoopInvariantCodeMotion
(M
) only once and it happens before inlining. In the default sequence this is not a problem because it has the big loop.
Appending M
after the inlining segment (gvif
) fixes the problem. Perhaps this step should always follow inlining.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Comparison with unmodified the-good-parts-mk2
on cancun
Tweak
- "gvif" // Run full inliner
+ "gvifM" // Run full inliner
"CTUca[r]LSsTFOtfDnca[r]Iulc" // SSA plus simplify
"scCTUt"
- "gvif" // Run full inliner
+ "gvifM" // Run full inliner
"[scCTUt] TOntnfDIul" // Perform structural simplification
- "gvif" // Run full inliner
+ "gvifM" // Run full inliner
External tests (ir-optimize-evm+yul
)
seqbench-sequence-the-good-parts-mk2-tweaks-M-after-inlining
vs seqbench-sequence-the-good-parts-mk2-cancun
project | bytecode_size | deployment_gas | method_gas |
---|---|---|---|
ens | +0.01% ❌ |
-0% |
0% |
gp2 | +0.03% ❌ |
||
perpetual-pools | +0.04% ❌ |
+0.04% ❌ |
-0% |
pool-together | -0.04% ✅ |
||
uniswap | +0.01% ❌ |
||
zeppelin | -0% |
-0.01% ✅ |
+0.02% ❌ |
Gas diff stats
File name | IR optimized | Legacy optimized | Legacy |
---|---|---|---|
array/byte_array_transitional_2.sol |
-4% | ||
array/copying/bytes_storage_to_storage.sol |
-6% |
// let x := calldataload(0) | ||
// sstore(shr(248, x), and(shr(4, x), 3855)) | ||
// sstore(shr(248, x), and(and(shr(4, x), sub(shl(244, 1), 1)), 3855)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is yet another test that seems to have broken expectations. The comment says that we're checking that expression simplifier does not modify the unsplit expressions, but expectations are in a simplified form.
This is probably still a regression though. The new expectation seems to be simplified too, but not fully. it's transformed to something weird.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
After looking into it, I see it's just that it's not fully optimized with the new sequence. ExpressionSimplifier
(s
) in the default sequence eventually figures out that part of this expression is an application of two constant masks ((4 >> x) & 0x01e7 & 0xf0f
) and one of them can be safely dropped because only the three lowest bits matter and those are identical.
The simplification does not take place though if the expression is not split. In the default sequence that happens eventually due to repetition, in the new one it does not. I found that adding ExpressionSplitter
(x
) resolves this. Either just x
before [scCTUt]
or xsu
somewhere later in the sequence.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Comparison with unmodified the-good-parts-mk2
on cancun
Tweak
- "[scCTUt] TOntnfDIul" // Perform structural simplification
+ "x[scCTUt] TOntnfDIul" // Perform structural simplification
External tests (ir-optimize-evm+yul
)
seqbench-sequence-the-good-parts-mk2-tweaks-x-before-s
vs seqbench-sequence-the-good-parts-mk2-cancun
project | bytecode_size | deployment_gas | method_gas |
---|---|---|---|
brink | +0.06% ❌ |
||
elementfi | -0.1% ✅ |
||
ens | -0.26% ✅ |
0% |
+0% |
euler | -0.13% ✅ |
||
gp2 | -0.04% ✅ |
||
perpetual-pools | -0.33% ✅ |
-0.33% ✅ |
-0.11% ✅ |
pool-together | -0.06% ✅ |
||
uniswap | -0.33% ✅ |
||
yield_liquidator | -0.07% ✅ |
-0.08% ✅ |
+0.1% ❌ |
zeppelin | -0.08% ✅ |
-0.07% ✅ |
+0.06% ❌ |
Gas diff stats
File name | IR optimized | Legacy optimized | Legacy |
---|---|---|---|
externalContracts/base64.sol |
-1% | ||
array/array_storage_push_pop.sol |
-1% | ||
array/function_array_cross_calls.sol |
-1% | ||
array/push/byte_array_push_transition.sol |
-1% | ||
array/pop/byte_array_pop_long_storage_empty.sol |
-1% | ||
array/byte_array_transitional_2.sol |
-1% |
// ---- | ||
// f() -> "A", 8, 4, "B" | ||
// gas irOptimized: 125822 | ||
// gas irOptimized: 136664 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In this one the default sequence seems to win mostly due to efficient use of FunctionSpecializer
(F
).
The heaviest parts of the code are the two allocations of 2D arrays, which turn into loops in Yul. For example the first one looks like this with the default sequence:
for { } lt(i, 9600) { i := add(i, _5) }
{
let memPtr_1 := allocate_memory()
calldatacopy(memPtr_1, calldatasize(), _4)
mstore(add(add(memPtr, i), _5), memPtr_1)
}
and like this with the new sequence:
for { } lt(i, _3) { i := add(i, 32) }
{
let size_2 := 0
let _4 := 0
_4 := 0
size_2 := 64
let memPtr_2 := allocate_memory(size_2)
let size_3 := 0
let _5 := 0
_5 := 0
size_3 := size_2
calldatacopy(memPtr_2, calldatasize(), size_2)
mstore(add(add(memPtr_1, i), 32), memPtr_2)
}
Surprisingly, the bit that accounts for most of the cost increase is the parameter passed into allocate_memory()
. The default sequence inlines the uses of the function outside of the loop and then can specialize it for the loop, making it significantly cheaper. Other differences are much less significant.
I managed to replicate this behavior with an extra TFviu
at the end of the new sequence (the v
is important, since without it the function gets completely inlined). The downside of this is that it significantly hurts some other examples. E.g. the bytecode size in ramanujan_pi.sol
blows up by ~20%. The default sequence seems to be able to avoid it, probably due to repetition of other sequence parts so maybe adding some of them at the end could help. I also noticed that I can move the new segment a bit earlier in the sequence by prepending some steps to it (e.g. ciTFviu
) and this makes the other examples degrade less.
Another change that improves this test case a little is adding Tr
to the cleanup sequence (i.e. fDnTOcTrUmu
).
Overall, this one could possibly be improved with more experimentation, but it's much harder than the other tweaks.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actually, regression for ramanujan_pi.sol
is not as bad as it seemed at first, because I was comparing with the original results for the new sequence. With the above tweaks and compared with the default sequence it's only 8%. Still, it was -6% before so by doing that we just trade one regression for another.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Comparison with unmodified the-good-parts-mk2
on cancun
(ciTFviu
+ Tr
)
Tweak
+ "ciTFviu"
"[scCTUt] TOntnfDIul" // Perform structural simplification
"gvif" // Run full inliner
"jmul[jul] VcTOcul jmul"; // Make source short and pretty
- static char constexpr DefaultYulOptimiserCleanupSteps[] = "fDnTOcmu";
+ static char constexpr DefaultYulOptimiserCleanupSteps[] = "fDnTOcTrmu";
External tests
seqbench-sequence-the-good-parts-mk2-tweaks-ciTFviu-Tr
vs seqbench-sequence-the-good-parts-mk2-cancun
ir-optimize-evm+yul
project | bytecode_size | deployment_gas | method_gas |
---|---|---|---|
brink | +1.24% ❌ |
||
colony | -0.01% ✅ |
||
elementfi | +2.22% ❌ |
||
ens | +0.29% ❌ |
-1.04% ✅ |
-0.04% ✅ |
euler | -0.13% ✅ |
||
gp2 | +0.25% ❌ |
||
perpetual-pools | -2.11% ✅ |
-2.31% ✅ |
-0.26% ✅ |
pool-together | +1.47% ❌ |
||
uniswap | -1.18% ✅ |
||
yield_liquidator | -0.3% ✅ |
-0.34% ✅ |
-0.1% ✅ |
zeppelin | +1.66% ❌ |
+1.53% ❌ |
-0.02% ✅ |
legacy-optimize-evm+yul
project | bytecode_size | deployment_gas | method_gas |
---|---|---|---|
colony | -0.01% ✅ |
||
elementfi | -0.01% ✅ |
||
ens | -0.01% ✅ |
+0% |
0% |
euler | -0.06% ✅ |
||
gp2 | -0.04% ✅ |
||
perpetual-pools | -0.02% ✅ |
-0.02% ✅ |
-0% |
pool-together | +0.01% ❌ |
||
yield_liquidator | -0.02% ✅ |
-0.01% ✅ |
0% |
zeppelin | -0.01% ✅ |
-0.01% ✅ |
-0.08% ✅ |
Gas diff stats
File name | IR optimized | Legacy optimized | Legacy |
---|---|---|---|
externalContracts/ramanujan_pi.sol |
14% | ||
externalContracts/prbmath_signed.sol |
1% | ||
array/function_array_cross_calls.sol |
1% | ||
abiEncoderV2/abi_encode_v2_in_function_inherited_in_v1_contract.sol |
1% | -0% | |
various/address_code.sol |
1% | ||
events/event_emit_from_other_contract.sol |
-1% | ||
functionCall/gas_and_value_brace_syntax.sol |
-2% | ||
functionCall/gas_and_value_basic.sol |
-2% | ||
functionCall/external_call_to_nonexisting_debugstrings.sol |
-2% | ||
inlineAssembly/transient_storage_selfdestruct.sol |
-3% | ||
externalContracts/base64.sol |
-3% | ||
functionCall/external_call_to_nonexisting.sol |
-4% | ||
byte_array_to_storage_cleanup.sol |
-4% | ||
array/create_memory_array.sol |
-8% |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Comparison with unmodified the-good-parts-mk2
on cancun
(Tr
only)
Tweak
- static char constexpr DefaultYulOptimiserCleanupSteps[] = "fDnTOcmu";
+ static char constexpr DefaultYulOptimiserCleanupSteps[] = "fDnTOcTrmu";
External tests (ir-optimize-evm+yul
)
seqbench-sequence-the-good-parts-mk2-tweaks-cleanup-sequence-add-Tr
vs seqbench-sequence-the-good-parts-mk2-cancun
project | bytecode_size | deployment_gas | method_gas |
---|---|---|---|
brink | +0.2% ❌ |
||
colony | -0.01% ✅ |
||
elementfi | +0.07% ❌ |
||
ens | -0.03% ✅ |
-0.31% ✅ |
-0% |
euler | -0.06% ✅ |
||
gp2 | +0.1% ❌ |
||
perpetual-pools | +0% |
-0.07% ✅ |
-0.01% ✅ |
pool-together | +0.15% ❌ |
||
uniswap | -0.07% ✅ |
||
yield_liquidator | 0% |
-0.05% ✅ |
-0% |
zeppelin | -0.34% ✅ |
-0.63% ✅ |
-0.01% ✅ |
Gas diff stats
File name | IR optimized | Legacy optimized | Legacy |
---|---|---|---|
externalContracts/ramanujan_pi.sol |
3% | ||
abiEncoderV2/abi_encode_v2_in_function_inherited_in_v1_contract.sol |
1% | -0% | |
various/address_code.sol |
1% | ||
array/create_memory_array.sol |
-1% | ||
functionCall/gas_and_value_brace_syntax.sol |
-2% | ||
functionCall/gas_and_value_basic.sol |
-2% |
// constructor() | ||
// gas irOptimized: 79503 | ||
// gas irOptimized code: 328000 | ||
// gas irOptimized: 77762 | ||
// gas irOptimized code: 307000 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
My TFviu
fix for create_memory_array.sol
(#14928 (comment)) destroys this gain and makes the cost increase by 8% here instead.
I see now that it's actually all coming from constructor - neither the new sequence nor the one with the fix seems to have affected execution cost of the prb_pi()
function. So it's all just bytecode size.
Interestingly, the new sequence (without the fix) produces smaller bytecode even thought the Yul source is visibly larger. There's a whole chunk of ~125 lines that's not present in the output from default sequence. It does have one function less, but the extra chunk does not comes from inlining of that function. The function is a specialization of another, present in both versions. Overall the bytecode size difference is ~100 bytes.
One thing that draws attention when comparing assembly is that the bigger version produced by the default sequence has more large constants in it. My theory is that this is the major factor affecting the bytecode size for this contract. It aligns with my observation that versions with bigger bytecode have more copies of the specialized function I mentioned above. The function has several large numbers in it:
function fun_mulDivFixedPoint(var_x, var_y) -> var_result
{
let usr$mm := mulmod(var_x, var_y, not(0))
let var_prod0 := mul(var_x, var_y)
let _1 := lt(usr$mm, var_prod0)
let _2 := sub(usr$mm, var_prod0)
let var_prod1 := sub(_2, _1)
let _3 := 1000000000000000000
let var_remainder := mulmod(var_x, var_y, _3)
let var_roundUpUnit := gt(var_remainder, 499999999999999999)
if eq(_2, _1)
{
var_result := add(div(var_prod0, _3), var_roundUpUnit)
leave
}
if iszero(gt(_3, var_prod1)) { revert(0, 0) }
var_result := add(mul(or(shr(18, sub(var_prod0, var_remainder)), shl(238, sub(var_prod1, gt(var_remainder, var_prod0)))), 78156646155174841979727994598816262306175212592076161876661508869554232690281), var_roundUpUnit)
}
The code optimized with default sequence has two copies (the original and a specialization).
The one from the new sequence (-6% gas) has one.
The one from the new sequence with the TFviu
fix (+8% gas) has three.
My conclusion here is that this is again a problem with the specializer. Unfortunately this time I don't have a good fix. I can tweak the sequence to make the specialization more or less aggressive but some contracts gain while others lose. I think that dealing with this properly would require some heuristics that can decide when it's worth it to specialize. Or maybe some adjustment to inlining heuristics - those specialized functions do not get inlined and perhaps they should be.
In the light of that I'm not sure if I should really treat the original 9% increase for create_memory_array.sol
as a regression. It's balanced out by the 6% decrease for ramanujan_pi.sol
and I think it might also balance out for real contracts. That's pretty much what happens in external tests.
// for { } lt(i, length) { i := add(i, 1) } | ||
// for { } true { i := add(i, 1) } | ||
// { | ||
// let _2 := iszero(lt(i, length)) | ||
// if _2 { break } | ||
// _2 := 0 | ||
// sum := add(sum, calldataload(add(add(_1, shl(5, i)), 0x20))) | ||
// } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reposting #14928 (comment), so that I can keep it contained in a thread.
A quick observation from reviewing the modified code: in a lot of cases the difference is in which variables get rematerialized or which expressions get joined. Sometimes this also prevents further optimizations (see e.g.
devcon_example.yul
for example where it interferes withForLoopConditionOutOfBody
).That's unexpected to me given that the cleanup sequence ends with
Rematerialiser
(m
) and thatExpressionJoiner
(j
) is also used quite liberally in the sequence. After some experimentation I see that this seems to be due to the extra zero assignments thatConditionalSimplifier
(C
) inserts. I'm not sure why they interfere but usingConditionalUnsimplifier
(U
) beforemu
orj
seems to help.The conclusion is that adding
U
to the cleanup sequence is probably a good idea.EDIT: An extra
O
at the end of cleanup is also needed to move the condition out of body (though it's just a cosmetic difference).
EDIT2: Maybe not just cosmetic - I expected gas use to be the same but it actually decreased by a very small amount in several tests (and increased in one).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Comparison with unmodified the-good-parts-mk2
on cancun
(U
)
Tweak
- static char constexpr DefaultYulOptimiserCleanupSteps[] = "fDnTOcmu";
+ static char constexpr DefaultYulOptimiserCleanupSteps[] = "fDnTOcUmu";
External tests (ir-optimize-evm+yul
)
seqbench-sequence-the-good-parts-mk2-tweaks-cleanup-sequence-add-U
vs seqbench-sequence-the-good-parts-mk2-cancun
project | bytecode_size | deployment_gas | method_gas |
---|---|---|---|
elementfi | +0.01% ❌ |
||
ens | -0.01% ✅ |
-0% |
0% |
euler | +0.02% ❌ |
||
gp2 | +0.06% ❌ |
||
perpetual-pools | +0.02% ❌ |
+0.03% ❌ |
+0% |
pool-together | +0.11% ❌ |
||
uniswap | +0.06% ❌ |
||
yield_liquidator | +0.03% ❌ |
+0.03% ❌ |
0% |
zeppelin | +0% |
+0.01% ❌ |
+0% |
Gas diff stats
No differences over +/-0.5%.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Comparison with unmodified the-good-parts-mk2
on cancun
(U
+ O
)
Tweak
- static char constexpr DefaultYulOptimiserCleanupSteps[] = "fDnTOcmu";
+ static char constexpr DefaultYulOptimiserCleanupSteps[] = "fDnTOcUmuO";
External tests (ir-optimize-evm+yul
)
seqbench-sequence-the-good-parts-mk2-tweaks-cleanup-sequence-add-U-O
vs seqbench-sequence-the-good-parts-mk2-cancun
project | bytecode_size | deployment_gas | method_gas |
---|---|---|---|
elementfi | +0.01% ❌ |
||
ens | -0.01% ✅ |
-0% |
0% |
euler | +0.02% ❌ |
||
gp2 | +0.06% ❌ |
||
perpetual-pools | +0.02% ❌ |
+0.03% ❌ |
+0% |
pool-together | +0.11% ❌ |
||
uniswap | +0.06% ❌ |
||
yield_liquidator | +0.03% ❌ |
+0.03% ❌ |
0% |
zeppelin | +0% |
+0.01% ❌ |
+0.11% ❌ |
Gas diff stats
No differences over +/-0.5%.
Summary
Pretty much no change compared to U
alone. The difference in method_gas
for zeppelin
is probably still below the margin or error - it sometimes gets worse in runs with only U
too.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Comparison with unmodified the-good-parts-mk2
on cancun
(O
)
Tweak
- static char constexpr DefaultYulOptimiserCleanupSteps[] = "fDnTOcmu";
+ static char constexpr DefaultYulOptimiserCleanupSteps[] = "fDnTOcmuO";
External tests (ir-optimize-evm+yul
)
seqbench-sequence-the-good-parts-mk2-tweaks-cleanup-sequence-add-O
vs seqbench-sequence-the-good-parts-mk2-cancun
project | bytecode_size | deployment_gas | method_gas |
---|---|---|---|
perpetual-pools | 0% |
+0% |
+0.02% ❌ |
zeppelin | 0% |
+0% |
-0.02% ✅ |
Gas diff stats
No differences at all.
Closing in favor of #15031. |
Related to #14406
Do not merge. This is just a test PR to evaluate how the
the-good-parts-mk2
sequence would affect costs in CI.A small tweak of
the-good-parts
sequence (#14909), which makes it deal better with theerc20.sol
contract.The difference is the addition of the
gvif
part before the final segment that makes code "short and pretty".