New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Yul Optimizer] Replace zero by returndatasize before calls. #11093
Conversation
0e9c28f
to
55bd7b0
Compare
if (BuiltinFunction const* builtin = _dialect.builtin(_funCall.functionName.name)) | ||
if (!builtin->sideEffects.keepsReturndatasize) | ||
return true; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I find it rather weird that the side effect propagator does not seem to report the side-effects of builtins themselves...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sounds like we should change that?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd need to see how it interacts with the other uses, but maybe, yeah... the only downside would be that the set is larger, but that's not a major issue I guess. And for example SideEffectsCollector::operator()(FunctionCall const& _functionCall)
has to do the same as this helper here...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actually, I'll just use the SideEffectsCollector
instead for this PR for now, then we can change this separately.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thinking about it, actually I'll revert this again :-D. The SideEffectsCollector
will visit the function call arguments, which is not a problem, but not needed for this use case here, so it's just wasteful...
{ | ||
ASTModifier::operator()(_funCall); | ||
if (m_seenCall) | ||
util::BreadthFirstSearch<YulString>{{_funCall.functionName.name}, m_badFunctions}.run([&](YulString _fn, auto&& _addChild) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So yeah - this is the core part 1:
The side effect propagator already propagates invalidating returndatasize backwards through the call graph. This here now, propagates forwards, whenever the control flow I'm currently following has unknown returndatasize already.
if (!hadSeenCall && m_seenCall) | ||
{ | ||
m_badLoops.insert(&_loop); | ||
ASTModifier::operator()(_loop); | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is the core part 2:
If I knew that returndatasize was zero before a loop, but I don't know anymore afterwards, then I must have had a call in the loop - but to properly propagate this fact down to all potential function calls in the beginning of the loop, I need to visit the loop again (this time with m_seenCall
set to true
from the start).
c6be8c3
to
228f6c1
Compare
Damn, our test suite can't handle the fact that this optimization step depends on the EVM version... |
@@ -47,7 +47,7 @@ struct OptimiserSettings | |||
"gvif" // Run full inliner | |||
"CTUcarrLsTOtfDncarrIulc" // SSA plus simplify | |||
"]" | |||
"jmuljuljul VcTOcul jmul"; // Make source short and pretty | |||
"jmuljuljul VcTOcul jmulz"; // Make source short and pretty |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would do it before the constant optimizer and the rematerializer.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actually, moving it anywhere further up will increase gas cost.
The reason is probably that the rematerializer actually cannot rematerialize returndatasize
since in general it's not movable.
So I'd just keep it at the very end for now. Maybe things will be different, once we make side-effects more fine-grained.
ba21375
to
d253b2e
Compare
I think this is a nice "hack", it goes in the direction of x86 optimisations 😉 One thing to keep in mind: there are some proposals to have transaction packages, i.e. a chain of transactions as a single block. Such proposals state that returndata would be populated by the previous transaction (up until returndata is overwritten by something else). So even if we set the evmversion, this would mean such proposal can't be introduced without account versioning. While I understand people can already use this trick, I don't think anyone is, but Solidity having this step will put a large set of users on the chain. The second remark I have is how would this affect static analyzers? I think they should already track calls and any potential state, but still it may be nice to reach out to some projects. And the last remark is readability of the source/IR. I think this change reduces readability. We should really add the #10475 and include real world examples to see actual gas change and not of those synthetic tests we have. |
(local.set $_1 (i64.const 0)) | ||
(call $mstore_internal (i32.const 0) (local.get $_1) (local.get $_1) (local.get $_1) (local.get $_1)) | ||
(call $mstore_internal (i32.const 32) (local.get $_1) (local.get $_1) (local.get $_1) (i64.const 1)) | ||
(local.set $_1 (i64.extend_i32_u (call $eth.getReturnDataSize))) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I know we don't care about wasm, but it would need an exception or another optimiser to get rid of this very expensive step :)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, funnily for the wasm transform we'd actually probably need to start with a step that does exactly the inverse :-).
Yeah, fair points. As I wrote in the description, this mainly started as an exercise in seeing how complex doing such kind of analysis would be in the optimizer, because checking that memory operations are safe for the stack limit evader will involve something similar (only that's a bit more complicated, since the base property is more complex than just invalidating returndata). So I wouldn't be too unhappy with not doing this in the end - but it does pay out and I'll also expect it to keep paying out in real-world cases - not so much in runtime cost, since it's only one gas each time, but in code size and deploy cost, since I save an entire byte... I wouldn't worry too much about static analyzers with this, but the EIP is an interesting point to consider indeed. Not sure the deploy time gas savings here are worth complicating stuff like that... |
@axic just quickly skimmed through the EIP... do you think there is much reason to put a transaction in a transaction package that doesn't have specific support for it (and would proactively read the "returndata" from the last transaction)? EDIT: turns out that batching existing contract calls that do produce non-empty returndata is likely desirable for the EIP, so this is indeed an issue to consider. |
As a last point regarding your comments: |
The last comment on https://ethereum-magicians.org/t/eip-2733-transaction-package/4365/10 seems to indicate that we might not have to puzzle our heads too much over that... |
On a call with @chriseth our tendency was not to do this at least right now. I'll still keep it open for now and maybe we want to have another look at the gas savings - and it might also be interesting to have a quick look at the logic in the PR, because I think the process of defining a bad property, then propagating it from leafs in the call graph upwards (here done by the side effect propagator), and then visiting the AST forward propagating the property down calls (here done by the BreadthFirstSearch) while treating loops specially, is something we might need for memory analysis/optimization later - but generally I'm fine with closing this. |
d253b2e
to
e8f464e
Compare
Just for the fun of it, even though we probably won't merge this as is anyways, this is the gas cost changes - which are often nice, but occasionally actually bad even (not sure whay that is): Click for a table of gas differences
Ańywaýs, I think I'll just close this for now - we might want to keep the branch, though, since, as I said, this may be a good prototype for propagating global information through the Yul AST. |
This is mainly a toy optimizer step to check how complicated it gets to trace a property through the Yul AST that is changed at some random builtin function calls - but since I think it works now, it does save quite some gas actually.