-
Notifications
You must be signed in to change notification settings - Fork 1.8k
C++: Split index calculation from BB membership #123
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Instead of computing these two things in one predicate, they are computed in separate predicates and then joined. This splits the predicate `primitive_basic_block_member`, which took 77s before, into predicates that together take 18s on a medium-sized db.
This change gave a slight speed-up by eliminating an unnecessary intermediate predicate.
I've retargeted this to If someone wants to test this for performance, be sure not to trust the "Clause timing report" printed at the end of the log in QL4E. I've observed it to underestimate the time spent on recursive predicates by a factor of 10. It's better to search through the whole log and subtract the timestamps between the beginning and end of the evaluation of the predicate(s) of interest. |
The QL is correct. It's surprising that it gives such a big speed up! The recursion in |
I can't really explain why it gets so much faster -- or rather, why it's so slow in the first place. It could be a bad join order that just happens to get fixed by changing up the definitions. I did a little bit of fiddling with I also tried using the |
By the way, I've been testing for correctness by checking that this query gives the same results before and after:
I should also follow up by checking if the IRBlock construction can benefit from the same change. |
Does this merit a change note? |
I don't think there's a need for a change note. Every release makes some things slower (but better) and other things faster, and we hope it roughly balances out. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
I haven't had a chance to reproduce the performance improvement, but I will try to do so today. Feel free to merge before that.
I just tested this on Wireshark and saw only a reduction from 35s to 19s. Still, I think it's worth merging even if performance is already adequate on some snapshots. |
The use of transitive closure for BB index calculation has been the cause of an out-of-memory error. This commit switches the calculation to use the `shortestDistances` HOP, which still has the problem that the result needs to fit in RAM, but at least the RAM requirements are sure to be linear in the size of the result. The `shortestDistances` HOP is already used for BB index calculation for the C++ IR and for C#. We could guard even better against OOM by switching the calculation to use manual recursion, but that would undo the much-needed performance improvements we got from github#123. This change improves performance on Wireshark, which is notorious for having long basic blocks. When I benchmarked `shortestDistances` for github#123, it was slower than TC. With the current evaluator, it looks like `shortestDistances` is faster. Performance before was: PrimitiveBasicBlocks::Cached::getMemberIndex#ff ................... 9.7s (executed 8027 times) #PrimitiveBasicBlocks::Cached::member_step#ffPlus ................. 6.6s PrimitiveBasicBlocks::Cached::primitive_basic_block_entry_node#f .. 3.5s PrimitiveBasicBlocks::Cached::primitive_basic_block_member#fff .... 2.3s Performance with this commit is: PrimitiveBasicBlocks::Cached::primitive_basic_block_entry_node#f ................................................................... 3.5s shortestDistances@PrimitiveBasicBlocks::Cached::primitive_basic_block_entry_node#1@PrimitiveBasicBlocks::Cached::member_step#2#fff . 3s PrimitiveBasicBlocks::Cached::primitive_basic_block_member#fff ..................................................................... 963ms
Kotlin: More operator tests, and support for more numeric ops
…-resolution Support `super` with `instanceof`
…uper-resolution Support `super` with `instanceof`
…pelineByPropertyName PS: Flow through `ValueFromPipelineByPropertyName` parameters
This change gave me a great speedup. I'm not totally done testing this, but I'm opening the PR just to mark it as a 1.18 item.
Instead of computing these two things in one predicate, they are computed in separate predicates and then joined. This splits the predicate
primitive_basic_block_member
, which took 77s before, into predicates that together take 18s on a medium-sized db.