Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Reverse lookup for DataFlowAnalyzer #14112

Open
wants to merge 1 commit into
base: develop
Choose a base branch
from

Conversation

nikola-matic
Copy link
Collaborator

@nikola-matic nikola-matic commented Apr 12, 2023

Part of #13719 and #13822.

Benchmarking is run on the contract.

Mostly helps with CSE (CommonSubexpressionEleminator) and LiteralRematerialiser, LoadResolver and ExpressionSimplifier, i.e. steps that inherit from and thus rely on the DataFlowAnalyzer. The results are as follows:

This PR is essentially a rework of @chriseth's Improve DataFlowAnalyzer PR, which includes many other improvements (of which I think all were merged already in separate PRs).
The question here is whether we'd prefer the approach here (where the in-order and reverse lookup are wrapped in a separate class), or in Chris' PR, where the reverse lookup is just added to the DataFlowAnalyzer::State object and then used directly from there.

develop (baseline)

======================================
  0.000% (8e-06 s): FunctionGrouper
  0.001% (1.4e-05 s): VarDeclInitializer
  0.001% (2.3e-05 s): ForLoopInitRewriter
  0.003% (8.1e-05 s): FunctionHoister
  0.017% (0.000421 s): ExpressionInliner
  0.035% (0.00085 s): UnusedFunctionParameterPruner
  0.075% (0.001829 s): ForLoopConditionIntoBody
  0.077% (0.001873 s): ForLoopConditionOutOfBody
  0.082% (0.002001 s): SSAReverser
  0.093% (0.002263 s): StructuralSimplifier
  0.135% (0.003282 s): BlockFlattener
  0.148% (0.003606 s): CircularReferencesPruner
  0.223% (0.00542 s): ExpressionJoiner
  0.235% (0.005713 s): EquivalentFunctionCombiner
  0.313% (0.007626 s): Rematerialiser
  0.335% (0.008165 s): LoopInvariantCodeMotion
  0.561% (0.013661 s): FunctionSpecializer
  0.845% (0.020568 s): ExpressionSplitter
  0.902% (0.021972 s): ConditionalUnsimplifier
  0.961% (0.023401 s): ConditionalSimplifier
  1.059% (0.025783 s): ControlFlowSimplifier
  1.072% (0.026108 s): DeadCodeEliminator
  1.551% (0.037774 s): EqualStoreEliminator
  3.646% (0.088806 s): FullInliner
  3.853% (0.09384 s): UnusedStoreEliminator
  4.819% (0.117355 s): UnusedPruner
  7.358% (0.179209 s): SSATransform
  7.365% (0.179365 s): UnusedAssignEliminator
  9.883% (0.240684 s): ExpressionSimplifier
 11.218% (0.273217 s): LoadResolver
 16.676% (0.40613 s): LiteralRematerialiser
 26.459% (0.644396 s): CommonSubexpressionEliminator
--------------------------------------
    100% (2.435 s)

with reverse lookup (this)

Performance metrics of optimizer steps
======================================
  0.001% (1e-05 s): FunctionGrouper
  0.001% (2.1e-05 s): VarDeclInitializer
  0.002% (2.9e-05 s): ForLoopInitRewriter
  0.006% (9.3e-05 s): FunctionHoister
  0.026% (0.00041 s): ExpressionInliner
  0.053% (0.000846 s): UnusedFunctionParameterPruner
  0.113% (0.001798 s): ForLoopConditionIntoBody
  0.125% (0.001986 s): ForLoopConditionOutOfBody
  0.134% (0.002124 s): SSAReverser
  0.146% (0.002312 s): StructuralSimplifier
  0.183% (0.002902 s): BlockFlattener
  0.228% (0.003604 s): CircularReferencesPruner
  0.332% (0.005263 s): ExpressionJoiner
  0.336% (0.005319 s): Rematerialiser
  0.367% (0.005814 s): EquivalentFunctionCombiner
  0.514% (0.008142 s): LoopInvariantCodeMotion
  0.963% (0.015248 s): FunctionSpecializer
  1.211% (0.019182 s): EqualStoreEliminator
  1.309% (0.020744 s): ExpressionSplitter
  1.360% (0.021539 s): ConditionalUnsimplifier
  1.429% (0.022642 s): ConditionalSimplifier
  1.634% (0.02588 s): ControlFlowSimplifier
  1.656% (0.026226 s): DeadCodeEliminator
  5.166% (0.081833 s): FullInliner
  5.865% (0.092911 s): UnusedStoreEliminator
  7.376% (0.116846 s): UnusedPruner
  8.498% (0.134622 s): LoadResolver
  8.557% (0.135551 s): ExpressionSimplifier
  9.066% (0.143613 s): LiteralRematerialiser
 11.084% (0.175588 s): SSATransform
 11.256% (0.178307 s): UnusedAssignEliminator
 21.004% (0.332742 s): CommonSubexpressionEliminator
--------------------------------------
    100% (1.584 s)

@nikola-matic nikola-matic force-pushed the dataflow-analyzer-reverse-lookup branch from 45dc45c to aabdfa9 Compare April 12, 2023 13:47
@nikola-matic nikola-matic self-assigned this Apr 12, 2023
@nikola-matic nikola-matic marked this pull request as draft April 12, 2023 14:06
@nikola-matic nikola-matic force-pushed the dataflow-analyzer-reverse-lookup branch 2 times, most recently from 830857a to 5f80ae7 Compare April 12, 2023 15:50
@ethereum ethereum deleted a comment from stackenbotten Apr 12, 2023
@ethereum ethereum deleted a comment from stackenbotten Apr 12, 2023
@ethereum ethereum deleted a comment from stackenbotten Apr 12, 2023
@ethereum ethereum deleted a comment from stackenbotten Apr 12, 2023
@ethereum ethereum deleted a comment from stackenbotten Apr 12, 2023
@ethereum ethereum deleted a comment from stackenbotten Apr 12, 2023
@ethereum ethereum deleted a comment from stackenbotten Apr 12, 2023
@ethereum ethereum deleted a comment from stackenbotten Apr 12, 2023
@ethereum ethereum deleted a comment from stackenbotten Apr 12, 2023
@nikola-matic nikola-matic marked this pull request as ready for review April 20, 2023 12:40
@nikola-matic
Copy link
Collaborator Author

Also, @chriseth, do you have a suggestion for the changelog entry?

@nikola-matic nikola-matic force-pushed the dataflow-analyzer-reverse-lookup branch 2 times, most recently from 88545c5 to f759e94 Compare April 24, 2023 17:09
@nikola-matic nikola-matic force-pushed the dataflow-analyzer-reverse-lookup branch 2 times, most recently from f79d84e to b6a2886 Compare May 11, 2023 09:00
@nikola-matic nikola-matic force-pushed the dataflow-analyzer-reverse-lookup branch from b6a2886 to 232d967 Compare May 16, 2023 08:05

/**
* Erase entries in both maps based on provided ``_variable``.
* For example, after deleting ``c`` for ``Ref 1``, ``Ref 1`` would contain the following:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we say that erase(x) is exactly the same as set(x, {})?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Eh, yes and no - since we do m_ordered[_variable] = _references;, which will still make an insertion for key x. I don't think it would ultimately affect the behaviour. I could insert an empty check for _references however, in which case your statement would be fully correct.

edit: Actually, I'm assuming you already knew that this will insert a key with an empty value, so yes, it's exactly the same as set(x, {}).

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't get your answer here. You're saying that it's not the same but then that it's the same after all. It doesn't look the same to me.

I could insert an empty check for _references however, in which case your statement would be fully correct.

I'd add this check because I don't think we care about distinguishing x being assigned an empty set of variables from not being assigned anything. The former can't even be expressed in the language. With this check the behavior of the container will be more consistent - currently with getOrderedOrNullptr() you have to check for an empty set explicitly, while with getReversedOrNullptr() you can assume you'll always get nullptr instead.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually, now that I think about it, it can be expressed after all. x can just be assigned a constant expression that does not depend on other variables. So yeah, depends on whether we want the ability to express that. Does not seem to me like we're using that distinction for anything currently.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The edit's the final answer - i.e. yes, it's exactly as Chris suggested; adding the empty _references check would alter the behaviour of the analysis (I would assume we'd see failing tests, but I'd have to check). I.e. we fetch by key, and then use the value set to either perform arithmetic (i.e. add two sets together), or lookup, neither of which need an empty set check.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok, you're right about this altering the analysis.

But I'm still confused as to why you think erase(x) and set(x, {}) would be equivalent. This just does not seem true to me. Are you referring to the fact that m_ordered[_variable] will modify m_ordered and insert the key if it's not there? You're still doing m_ordered.erase() at the end of the function so yeah, it will technically insert the key but the key won't be there when the function finishes. So I don't think it's true that The behaviour is the same as ``set("x", {})``.. This bit should be removed from the docstring unless I'm missing something here.

@nikola-matic nikola-matic force-pushed the dataflow-analyzer-reverse-lookup branch from 232d967 to 80454c1 Compare May 26, 2023 08:49
@chriseth
Copy link
Contributor

Looks good! Please squash.

@nikola-matic nikola-matic force-pushed the dataflow-analyzer-reverse-lookup branch from 80454c1 to a8cc9bd Compare May 26, 2023 09:09
@nikola-matic
Copy link
Collaborator Author

Looks good! Please squash.

Done!

@@ -0,0 +1,39 @@
#include <libyul/optimiser/VariableAssignmentMap.h>
Copy link
Member

@cameel cameel May 26, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Something felt off to me about this file and I finally realized why. It looks too clean. We can't have such nice things here :P You must add the ugly license boilerplate.

* Class that implements a reverse lookup for an ``unordered_map<YulString, set<YulString>>`` by wrapping the
* two such maps - one ordered, and one reversed, e.g.
*
* m_ordered m_reversed
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This naming is a bit confusing. What specifically is ordered here? Its type is unordered_map so it surely can't be referring to the order of elements, can it?

Maybe something like m_assignments and m_uses would be better names? Or maybe m_lValues and m_rValues?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's ordered in the sense that it's the opposite of reversed. m_assigments and m_uses isn't really correct either, e.g. if we have

a = x + y;

then (a,x) and (a,y) could be assignments - but what are the uses? (x, a), (y, a)? That doesn't really make sense to me either. lvaluesandrvalues` do make sense, but are then somewhat confusing in terms of C++ semantics. In any case, I've spend quite a while trying to come up with names for these, and I'm still convinced these are the best, especially since the whole purpose of this PR is to implement a reverse lookup.

*
* m_ordered m_reversed
* x -> (y, z,) y -> (x,)
* y -> (z,)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
* y -> (z,)
* z -> (x,)


/**
* Erase entries in both maps based on provided ``_variable``.
* For example, after deleting ``c`` for ``Ref 1``, ``Ref 1`` would contain the following:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't get your answer here. You're saying that it's not the same but then that it's the same after all. It doesn't look the same to me.

I could insert an empty check for _references however, in which case your statement would be fully correct.

I'd add this check because I don't think we care about distinguishing x being assigned an empty set of variables from not being assigned anything. The former can't even be expressed in the language. With this check the behavior of the container will be more consistent - currently with getOrderedOrNullptr() you have to check for an empty set explicitly, while with getReversedOrNullptr() you can assume you'll always get nullptr instead.

if (names.count(variableToClear))
_variables.emplace(ref);
if (auto&& references = m_state.references.getReversedOrNullptr(variableToClear))
_variables += *references;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are you sure getReversedOrNullptr() will never return nullptr here? Seems like it should happen any time we have a variable that's assigned to and then never used. If this does not crash on a null dereference then perhaps we don't have any case like that in tests? Sounds unlikely though.

Please either handle nullptr here or add an assert.

By the way, having two completely different things named references here makes this bit unnecessarily confusing to read.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's handled already (see if condition). What do you mean by having two things named references? auto&& references and m_state.references?

Copy link
Member

@cameel cameel May 26, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's handled already (see if condition).

Ah, you're right. That's why I'm not really sold on this if-declaration syntax personally. It can be convenient but sometimes it just does not register as a proper condition when I read it :)

What do you mean by having two things named references? auto&& references and m_state.references?

Yes.

Comment on lines 354 to 357
// Also clear variables that reference variables to be cleared.
for (auto const& variableToClear: _variables)
for (auto const& [ref, names]: m_state.references)
if (names.count(variableToClear))
_variables.emplace(ref);
if (auto&& references = m_state.references.getReversedOrNullptr(variableToClear))
_variables += *references;
Copy link
Member

@cameel cameel May 26, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This whole loop looks suspect to me, both before and after your change. We're inserting new items into _variables while iterating over it. Apparently adding to a set in C++ does not invalidate iterators so this won't crash if it's wrong and we have a finite number of items so it won't fall into an infinite loop either. Still, it looks like it would inconsistently iterate over some added elements while skipping others - depending on whether they sort before or after the current element. Also, iterating over newly added elements seems wrong in the first place because the comment above says we're not supposed to clear variables recursively and that would basically be the result.

I think that clearing too little is more dangerous here than cleaning too little and therefore this does not outright break things. It probably just makes the analyzer less effective instead.


/**
* Erase entries in both maps based on provided ``_variable``.
* For example, after deleting ``c`` for ``Ref 1``, ``Ref 1`` would contain the following:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually, now that I think about it, it can be expressed after all. x can just be assigned a constant expression that does not depend on other variables. So yeah, depends on whether we want the ability to express that. Does not seem to me like we're using that distinction for anything currently.

Comment on lines +16 to +26
for (auto&& reference: m_ordered[_variable])
if (m_reversed.find(reference) != m_reversed.end())
{
if (m_reversed[reference].size() > 1)
m_reversed[reference].erase(_variable);
else
// Only fully remove an entry if no variables other than _variable
// are contained in the set pointed to by reference.
m_reversed.erase(reference);
}
m_ordered.erase(_variable);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There's an easy micro-optimization here - you're looking up reference 3 times, while you could instead just find an iterator to the element and then use that. Similar with _variable - 2 separate lookups.

It's a fixed number of times so it won't change the complexity of the whole algorithm but it's very easy to do. I'm curious if it will make any kind of difference in benchmark results given that we perform this operation a lot. The lookup is something like O(log n) so nothing compared to the linear search we had before but still worth a try.

@cameel
Copy link
Member

cameel commented May 26, 2023

I checked how this PR affects compilation times in external tests.
Here are the times for ir-optimize-evm+yul preset from
run 29920 on develop (8c7404f)
vs
run 29928 on dataflow-analyzer-reverse-lookup (a8cc9bd).

test develop this PR difference
zeppelin 6m48.256s 5m46.622s -1:02
colony 1m51.012s 1m27.582s -0:24
uniswap 1m46.737s 1m40.339s -0:06
prb-math 0m13.031s 0m10.887s -0:03
gp2 0m44.495s 0m43.621s -0:01
brink 0m04.842s 0m05.302s +0:01
ens 0m45.856s 0m48.436s +0:03
perpetual-pools 1m28.025s 1m32.749s +0:04
yield-liquidator 0m31.450s 0m38.634s +0:07
elementfi 2m48.506s 3m15.299s +0:27

I wonder what's happening with ElementFi. I double checked I got the numbers from the right pages and looks like it really took ~30 s longer. May be worth checking if it's repeatable or just a fluke. Still, some other times increased too so there might be something more to it.

The table contains only the real time. Full results are below in case anyone's interested.

zeppelin
real	6m48.256s user	6m47.482s sys	0m2.189s
real	5m46.622s user	5m46.191s sys	0m1.551s

uniswap
real	1m46.737s user	1m48.668s sys	0m0.732s
real	1m40.339s user	1m41.697s sys	0m0.673s

ens
real	0m45.856s user	0m45.927s sys	0m0.656s
real	0m48.436s user	0m48.450s sys	0m0.580s

gp2
real	0m44.495s user	0m52.945s sys	0m1.442s
real	0m43.621s user	0m51.973s sys	0m1.369s

brink
real	0m4.842s user	0m4.763s sys	0m0.603
real	0m5.302s user	0m5.272s sys	0m0.556s

elementfi
real	2m48.506s user	2m56.110s sys	0m2.225s
real	3m15.299s user	3m24.894s sys	0m2.584s

perpetual-pools
real	1m28.025s user	1m30.875s sys	0m1.107s
real	1m32.749s user	1m36.234s sys	0m1.284s

prb-math
real	0m13.031s user	0m13.804s sys	0m0.469s
real	0m10.887s user	0m11.300s sys	0m0.444s

yield-liquidator
real	0m31.450s user	0m34.304s sys	0m0.567s
real	0m38.634s user	0m41.636s sys	0m0.534s

colony
real	1m51.012s user	2m17.810s sys	0m3.484s
real	1m27.582s user	1m50.964s sys	0m2.876s

@github-actions
Copy link

This pull request is stale because it has been open for 14 days with no activity.
It will be closed in 7 days unless the stale label is removed.

@github-actions github-actions bot added the stale The issue/PR was marked as stale because it has been open for too long. label Jun 19, 2023
@nikola-matic nikola-matic added roadmap and removed stale The issue/PR was marked as stale because it has been open for too long. labels Jun 19, 2023
@cameel cameel force-pushed the dataflow-analyzer-reverse-lookup branch from a8cc9bd to 0d1ec65 Compare April 19, 2024 16:39
@cameel
Copy link
Member

cameel commented Apr 19, 2024

I gathered fresh timing data (and translated the original data into the same format).

  1. It seems to me that all the timing differences we see in external tests, even the positive ones may be just noise:
    • The differences are much smaller now. Especially for zeppelin, elementfi and colony, which originally had the largest difference, the difference not only decreased but completely reversed direction.
    • In some cases the differences flip between positive and negative in different runs for the same project.
    • zeppelin and uniswap compile much faster on develop now. In case of zeppelin this could very well be due to upstream changes in tests. For uniswap though we have a forked repo.
  2. On the other hand, there is actually a 2x difference in our own timing benchmark - though only for a single contract (chains.sol). Others, e.g. OptimizorClub.sol, do not seem affected.

My conclusion here is that there's no strong evidence that this change significantly affects external tests, be it positively or negatively. If there is a difference, it's lower than the normal variance of CI timing. Still, the change does seem to help one especially pathological contract from our repo (chains.sol). I wonder if it might be because profiling was done specifically on that contract and the PR addresses its specific bottleneck and the bottleneck happens to be different than in more common scenarios?

Original timing

Project develop this PR
brink 5 s 5 s
colony 111 s 88 s
elementfi 169 s 195 s
ens 46 s 48 s
euler
gnosis
gp2 44 s 44 s
perpetual-pools 88 s 93 s
pool-together
uniswap 107 s 100 s
yield_liquidator 31 s 39 s
zeppelin 408 s 347 s
prb-math 13 s 11 s

Current timing

Project develop this PR (run 1) this PR (run 2) this PR (run 3)
brink 5 s 4 s 5 s 5 s
colony 105 s 113 s 110 s 114 s
elementfi 172 s 162 s 163 s 167 s
ens 40 s 43 s 43 s 47 s
euler 52 s 59 s 62 s 72 s
gnosis
gp2 42 s 48 s 47 s 60 s
perpetual-pools 81 s 72 s 84 s 79 s
pool-together 56 s 49 s 51 s 61 s
uniswap 82 s 84 s 71 s 84 s
yield_liquidator 26 s 27 s 26 s 25 s
zeppelin 261 s 266 s 274 s 277 s
prb-math

Timing diff with develop

Project Diff (original) Diff (run 1) Diff (run 2) Diff (run 3)
brink 1 s -1 s 0 s 0 s
colony -24 s 8 s 5 s 9 s
elementfi 27 s -10 s -9 s -5 s
ens 3 s 3 s 3 s 7 s
euler 7 s 10 s 20 s
gnosis
gp2 -1 s 6 s 5 s 18 s
perpetual-pools 4 s -9 s 3 s -2 s
pool-together -7 s -5 s 5 s
uniswap -6 s 2 s -11 s 2 s
yield_liquidator 7 s 1 s 0 s -1 s
zeppelin -62 s 5 s 13 s 16 s
prb-math -3 s

Timing benchmark

develop

Binary from b_ubu_static from the last run on develop

File Pipeline Bytecode size Time Exit code
verifier.sol legacy 4940 bytes 0.20 s 0
verifier.sol via-ir 4417 bytes 0.66 s 0
OptimizorClub.sol legacy 0 bytes 0.52 s 1
OptimizorClub.sol via-ir 22391 bytes 4.05 s 0
chains.sol legacy 5878 bytes 0.17 s 0
chains.sol via-ir 23076 bytes 23.16 s 0

this PR

Binary from b_ubu_static from the last run on dataflow-analyzer-reverse-lookup

File Pipeline Bytecode size Time Exit code
verifier.sol legacy 4940 bytes 0.14 s 0
verifier.sol via-ir 4417 bytes 0.65 s 0
OptimizorClub.sol legacy 0 bytes 0.57 s 1
OptimizorClub.sol via-ir 22391 bytes 4.23 s 0
chains.sol legacy 5878 bytes 0.17 s 0
chains.sol via-ir 23076 bytes 10.52 s 0

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
Development

Successfully merging this pull request may close these issues.

None yet

3 participants