New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Optimize optimizer eliminate stage #3732
Optimize optimizer eliminate stage #3732
Conversation
#ifdef PROFILING | ||
tstmtscan += clock() - start; | ||
start = clock(); | ||
#endif |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Indentation looks off here and on the block above.
4f73ab0
to
31e528a
Compare
All core asm2 tests succeed. |
Reading the code, this all looks good (and I like the refactoring), and those numbers sound great. However, I tested on another codebase (a large game engine) and I actually see a slowdown: 7.37 seconds to 8.27. I also tested on Poppler from the test suite, with This makes me think we should test this more, it might be that it helps the original testcase that motivated this, but harms others. |
Is the game engine available for me to compile? Possibly as a .bc? Big slow testcases are good. I'll take a look at poppler as well and see if I can massage it into something usable. I'm kinda surprised it's slower, but on reflection I can see that the changes I made could punish lots of small functions - I wonder if that's what's happening. There is a fairly invasive change I was considering, but delayed to see what the thoughts were thus far. Let me work on that and see if it can deliver any improvement. |
The engine isn't open, even as a .bc. But I'll test locally tomorrow with each patch separately, to at least try to narrow down the issue. I'm hoping though that this isn't specific to unity. Maybe poppler or bananabread (I can provide a bc for that) with |
I suspect you'll find it's the final commit, 31e528a. I should've really gone through the benchmarks before, will try and do so tomorrow. |
Yes, it does look like the last commit ("Avoid iterating over irrelevant tracked variables") is where the unity slowdown comes from. |
7a28638
to
0edde45
Compare
Can you try the latest branch against unity? |
Sorry, that posted before I was ready - I've got rid of the problematic commit and there's still a speedup with the lambdas test run from 42s to 18s. There's even a very tiny sqlite speedup (~3%). |
6eb10bc
to
46b372f
Compare
I've added a couple of new commits after managing to grab the unity .bc file from a unity install and I'm mostly happy with where this is up to. I did benchmarking of the optimizer run directly against an arbitrary unity js chunk (14M), the sqlite (14M) and the lambdas testcase (22M). Each measurement is for all O3 optimisation passes, best of 5 runs, user time as reported by the
It's worth taking a look at "Avoid array bounds checking for traverse* functions" - it's theoretically less safe because Ref[] does bounds checks, but I feel that these functions are small enough to be verifiable. |
I still see a slowdown here, on unity. When I set How are you measuring your times? |
46b372f
to
4aa1d65
Compare
That's very interesting. My measurement was taken by pausing the build process during js opts, taking one of the chunks out of there and running the optimizer directly against it ( When I do a full compilation using Assuming you're on Linux, what distro and gcc version are you using? I've got Ubuntu 14.04 and the default compiler, gcc 4.8.4 - I wonder if a different compiler version is doing something differently. |
Ignore me, I've done something very stupid, let me look again... |
Nothing would happen anyway. Additionally add some commentary about the justification for skipping.
19036f3
to
02ba28a
Compare
Summary of changes:
Now I've fixed my issues and tweaked the dep lookup, I get from about 54.1s knocked down to 53.3s for the unity js opts phase on a single core. |
I still see qualitatively different numbers here, no change or a tiny slowdown on unity, tiny slowdown on poppler, no change on bullet. For poppler, for example, I run Maybe it's just a matter of a different cpu / different local compiler and stdlib, that accounts for us seeing different things. But if so then it suggests the changes here are going to vary across machines. If we can't figure this out, it might make sense to still merge this or parts of this, as the refactorings are nice. But perhaps we should only take the refactorings for clarity, and not for speed? |
No no, it's not good enough to slow things down! I will rummage around for some low hanging fruit. |
Another thought, there might be some bigger things to optimize in that pass. If I recall correctly, around the comment " |
- !isSpace in Frag() is not necessary because that case will fall through and abort anyway - any successful switch case will assign str, no need to recheck - there is only one callsite of parseAfterIdent, move the skipspace inside the call (and therefore avoid calling skipspace twice before expressions) - no need for two checks for 0 bytes
We know that there will be no assignment part of var nodes because normalisation has either removed them or aborted because the var node needs fixing.
I've pushed a bunch of changes. I opted not to go for the harder idea, and instead pursued the low-hanging fruit. There are some speedups from
Unfortunately, the final item makes the traverse* functions really ugly. However, the rest of the commits are generally just refactoring. You may wish to examine the following two with suspicion:
The rest of the refactorings should be easily verifiable. I see a ~10% speedup in the optimisation of unity (54.5s to 48.5s with I've run all asm2 tests (+ |
Confirmed, I see an 8% speedup. Nice! Reviewing now. |
Please add any changes after this point as followup commits, so I don't need to re-read the ones I am going through now. |
That's it for my comments. Excellent work! |
Made review changes. |
Merged, thanks! |
These are just speed and style changes - there should be no output difference at all.
Using the .bc file from #3718, before:
After:
So 20-25% optimisation speedup, 15-20% overall speedup. The optimizer tests pass, I've not yet run any others, will do so now.
I've got a few other optimisations, but want to get these and the switch optimisation (#3733) fix done first.