C++: Fix overlappingVariableMemoryLocations perf#2570
Conversation
The `overlappingVariableMemoryLocations` predicate was a helper
predicate introduced to fix a join-order issue in
`overlappingIRVariableMemoryLocations`. Unfortunately it caused a
performance issue of its own because it could grow too large. On the
small project (38MB zip) awslabs/s2n there were 181M rows in
`overlappingVariableMemoryLocations`, and it took 134s to evaluate.
The fix is to collapse the two predicates into one and fix join ordering
by including an extra column in the predicates being joined.
In addition, some parameters were reordered to avoid the overhead of
auto-generated `join_rhs` predicates.
Tuple counts of `overlappingVariableMemoryLocations` before:
623285 ~176% {2} r1 = JOIN AliasedSSA::isCoveredOffset#fff_120#join_rhs AS L WITH AliasedSSA::isCoveredOffset#fff_120#join_rhs AS R ON FIRST 2 OUTPUT L.<2>, R.<2>
119138 ~3% {2} r2 = SCAN AliasedSSA::VariableMemoryLocation::getVirtualVariable_dispred#ff AS I OUTPUT I.<1>, I.<0>
172192346 ~0% {2} r3 = JOIN r2 WITH AliasedSSA::hasUnknownOffset#ff_10#join_rhs AS R ON FIRST 1 OUTPUT R.<1>, r2.<1>
172815631 ~0% {2} r4 = r1 \/ r3
172192346 ~0% {2} r5 = JOIN r2 WITH AliasedSSA::hasUnknownOffset#ff_10#join_rhs AS R ON FIRST 1 OUTPUT r2.<1>, R.<1>
345007977 ~87% {2} r6 = r4 \/ r5
return r6
Tuple counts of `overlappingIRVariableMemoryLocations` after:
117021 ~134% {2} r1 = JOIN AliasedSSA::isCoveredOffset#ffff AS L WITH AliasedSSA::isCoveredOffset#ffff AS R ON FIRST 3 OUTPUT L.<3>, R.<3>
201486 ~1% {2} r2 = JOIN AliasedSSA::hasUnknownOffset#fff AS L WITH AliasedSSA::hasVariableAndVirtualVariable#fff AS R ON FIRST 2 OUTPUT L.<2>, R.<2>
318507 ~26% {2} r3 = r1 \/ r2
201486 ~3% {2} r4 = JOIN AliasedSSA::hasUnknownOffset#fff AS L WITH AliasedSSA::hasVariableAndVirtualVariable#fff AS R ON FIRST 2 OUTPUT R.<2>, L.<2>
519993 ~92% {2} r5 = r3 \/ r4
return r5
|
I made this change without a deep understanding of exactly what's computed and why. Please check that it computes the same thing as before. I've changed all the relations so they join on both the |
|
I won't be around to address review comments next week, so please implement any fixes directly and review them for each other instead. |
|
Yeah, there's no correctness reason to join on |
geoffw0
left a comment
There was a problem hiding this comment.
This sounds good and the code changes look good. Unfortunately it's difficult to verify any deeper than those tuple counts (which I've checked are greatly improved on other projects as well), because the affected projects all seem to time out both before and after this change. I've also verified that the predicate has the same (nontrivial) number of results before and after this change.
I'm keen to get this merged once @rdmarsh has finished tinkering with it (or as is).
|
Looks like I can't push to this branch while @jbj is on vacation. I'll merge this and make a new PR with the tinkering |
The
overlappingVariableMemoryLocationspredicate was a helper predicate introduced to fix a join-order issue inoverlappingIRVariableMemoryLocations. Unfortunately it caused a performance issue of its own because it could grow too large. On the small project (38MB zip) awslabs/s2n there were 181M rows inoverlappingVariableMemoryLocations, and it took 134s to evaluate.The fix is to collapse the two predicates into one and fix join ordering by including an extra column in the predicates being joined.
In addition, some parameters were reordered to avoid the overhead of auto-generated
join_rhspredicates.Tuple counts of
overlappingVariableMemoryLocationsbefore:Tuple counts of
overlappingIRVariableMemoryLocationsafter: