New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Operate on WebAssembly? #323
Comments
Related: a simple, offline super optimizer for wasm by @kripken: https://github.com/WebAssembly/binaryen/blob/superoptimizer/src/tools/wasm-analyze.cpp |
I'll need to think about this, but I'll just note right now that Souper doesn't support MSVC, but rather MSVC IR and Souper IR are similar enough that Gratian at Microsoft wrote a quick MSVC->Souper translator and then manually implemented a bunch of optimizations that it discovered. Also, we recently learned that the Mono compiler people used Souper similarly! But I don't have much detail about that yet. I assume they used their SSA IR but I don't know for sure. As far as non-SSA IRs go, a while ago I started on a CompCert->Souper translator but got stalled for lack of time. I don't have a good sense for how easy it would be to do a WebAssembly->Souper translator. |
Ok, I thought about this a bit more. The first issue is getting Souper to understand WebAssembly, this is less trivial than other cases due to the stack machine. On the other hand, a very nice thing about conversion to Souper is that you can stop whenever you want, for example because you run into something tricky. I've never written code to symbolically execute stack machine code to make it into register code, but conceptually this doesn't sound that hard (again, as long as we can bail out when things get tricky). As far as I know, after un-stack-conversion (is there a word for this?) the underlying operations should map nicely between WebAssembly and Souper, and Souper's synthesizer should find optimizations perfectly normally. The tricky part, then, is translating the synthesized results back into WebAssembly optimizations. I suppose this would be 100% manual at first, and then there are probably things that make sense to automate. So anyhow my initial impression is that we want an unstackifier that makes Souper IR out of webasm, and then we can proceed from there. I think this makes more sense than trying to shoehorn a stack machine into the Souper implementation. |
Thanks for the thoughtful reply :) One benefit I see of destackifying wasm into Souper IR is that experimentation developing that tool can happen out-of-tree and can move fast and loose. Souper's ability to bail out on tricky constructs is nice, too. This enables incremental development, where we could get an MVP working and then start ratcheting up the percent wasm that is understood by the translator. On the other hand, I also have a concern with this approach. Code size is super important for wasm because it is delivered over the network, and therefore potential size optimizations are particularly enticing. I fear that translating wasm into Souper IR won't help us discover "silly" wasm-specific size optimizations, like https://kripken.github.io/blog/binaryen/2018/04/13/a-silly-optimization.html Maybe this fear is unfounded, and size optimizations aren't as specific to the instruction set and its encoding as I'm assuming? Or maybe translation into Souper IR just isn't the right approach for wasm size optimization, and further development of a wasm-specific super optimizer is a more promising avenue of investigation? |
I have the same worry about size! Best we can do is give it a try, and maybe it'll turn out that re-stackifying the Souper output gets back the benefits? |
I agree this is definitely worth trying! I could look into creating a Binaryen pass to convert Binaryen IR to Souper IR, if that would be helpful? (Binaryen IR is almost identical to wasm, can be converted to and from it, and preserves size and performance aspects.) It might be simplest for me to convert to the Souper text format maybe, to get started? About the size concern, yeah, that seems like it could be an issue. For example if the wasm => Souper IR conversion is inefficient, then Souper's proposed optimizations might be to remove those inefficiencies. We'll just have to make sure it's a good conversion :) |
Alon, if you could hack something up that generates Souper text, this would be great! I'm happy to read code, play with crappy prototypes, etc. Also I'm happy to do the Souper runs (synthesizer can be finicky). One thing to keep in mind is that one of Souper's most powerful features is its path conditions-- so if there's a way to generate those easily as part of the conversion, this will be especially helpful in generating optimizations (though applying the optimizations becomes less straightforward). |
Great! I'll see what I can hack up. Is the Souper paper the best place for docs on the IR text format? About path conditions, naively it seems like that should be easy since we know the conditions of conditional branches etc. But if we need to do stuff like forward conditions through multiple conditional branches etc. (if the condition is still true in them) then I'm not sure offhand. |
Argh we don't really have docs for Souper IR -- but it is really simple. I guess maybe ask me lots of questions and look at the test cases?
|
Re. path conditions, extracting the condition from only the most recent branch is probably the way to go. |
Sorry for chiming in on this :) Beside *.opt as John says, I think souper/InstRef.rst is a good place in terms of Souper IR. Otherwise, you may have to end up scanning the relevant code. In terms of path condition (PC in Souper's terminology), I think knowing the conditions of conditional branches should be enough for the first version. You can skip the BlockPC part at the moment. |
Ah, regarding docs/InstRef.rst, let me take a look and make sure it's up to date, has not been touched for a quite a while. |
@regehr , right, I just realized that we haven't touched it for a while :) |
Thanks @regehr and @chenyang78, those look like what I need to get started! |
Ok, I wrote up something that seems like it's starting to have the right general shape for simple things. Feedback welcome! Code is in a To try it, you can do something like
(
can be written in wasm like this
and
which looks like it might be valid Souper IR. I didn't try to load it yet though... :) Still a lot of TODOs (IR not yet converted) and FIXMEs (no Some thoughts as I was doing this:
|
Awesome. Is there a particular C/C++ code base that is particularly representative of code you're interested in optimizing, that I should start with? Otherwise I'll just build a few random things. So far I've been creating wasm by passing --target=wasm32 to clang, is that the right approach? Indeed, Go's equivalent of InstCombine is quite weak so Souper will find a lot of stuff. There's an interesting piece of this puzzle that I don't think has been attacked seriously yet, that would not only be useful to solve but also would make a good submission to a conference like PLDI. This is taking patterns from Souper, perhaps abstracting them so they're more broadly applicable, and then making a very fast matcher for them. Just mentioning this since it's something I'm very interested in working on. |
Hmm, some good samples are here: https://github.com/kripken/embenchen/tree/master/asm_v_wasm (that's output from the emscripten benchmark suite). In particular the larger ones like bullet, box2d, and lua, those are important real-world codebases I think.
In general yes, however it's worth running the binaryen optimizer on it too ( |
Thanks Alon. I should have a bit of time this week to work on this! |
|
I've been reading the code looking to fix a couple of easy things, but if you're in there right now it'll be quicker for you to do these if you have time!
I'm working on updating our Inst documentation to reflect this kind of thing, thanks for your patience! |
Hi Alon, great idea to apply superoptimization on wasm! Particularly, I second the thought of applying Souper on the wasm directly that comes from non-LLVM back-ends. Let's start with getting the Souper IR right first. |
@regehr thanks! I fixed those 3 things on the branch. Output looks better now, but probably still a lot left to do to get it right. Also, to add operations aside from binaries (unaries, etc.)... |
Ok, great! First, while most LHSs have the proper var declarations there are also some degenerate LHSs that are just this, which makes Souper's parser barf:
Then second there's a common pattern of comparing the results of comparisons against zero using the wrong width as seen here:
Since Souper doesn't have any implicit sext/zext, it should be:
But leaving aside these issues I'm already seeing some perhaps-useful stuff coming out of the synthesizer! |
As a random example, here's an optimization that Souper would like to perform.
RHS:
|
Exciting to see results already! And nice optimization it found there. Those two issues should be fixed on the branch. Also added unary ops (cttz etc.). |
Ok!
we want:
|
And here are a few more results: To get these, I took wasm_lua_scimark.c.wasm and ran it through binaryen -O3, then ran the output through Souper in "constant synthesis" mode, where the only thing it does is try to prove that the LHS can be replaced by a constant. I haven't gone and tracked these back to the wasm, so some of them could be artifacts of the translation, but it looks like at least some of them represent real opportunities for shaving code size. To facilitate backtracking, would it make sense for --souperify to also print the wasm in a comment? Or print a line number for the root of the LHS maybe? |
You'll notice that sometimes the LHS contains far more context than is required to justify the optimization, making it harder to understand what's going on. I have a reducer for LHSs that I can run in the future. |
Thanks! Fixed those two minor issues on the branch. About backtracking, I need to think about that - we have a DebugInfo mechanism that might work somehow, but it's not trivial to connect it. Looks like we definitely need a way to do that for debugging, though, as several of those results do look like my translation is wrong, for example the optimizer definitely knows that multiplying an integer by zero is zero, and can add two constants, etc. |
To get something simple I added support for setting However, when trying to use this to debug those weird muls and adds, I think I might be misreading your gist:
Does that mean I should be able to search in the output from processing that lua file (after |
I see what's going on, yeah - if you ran two commands, then binaryen by default won't save function names (you need |
And yeah, now I see those trivial add/sub pairs - they are in the wasmbackend versions, not the asm2wasm ones. E.g.
|
And here's the relevant wast:
Looks like stack prologue/epilogue stuff - things added by the wasm backend, after LLVM IR. I think that could be improved, but I'm not sure. Opened https://bugs.llvm.org/show_bug.cgi?id=37299 |
Interesting! |
Here's an example where we need to ask what the cost model should be. Souper wants to eliminate an instruction, but is that an appropriate optimization here?
|
Some constants and nops:
|
Does it seem weird that Souper gets 16k LHSs out of asm2wasm_lua_scimark.c.js.bc, but --souperify gets 200k LHSs out of asm2wasm_lua_scimark.c.wasm? |
Well, in the ongoing instruction synthesis run, I'm getting quite a bit of this:
Something I'd like to do is make a custom cost function for Souper that closely mirrors the actual code size costs in the webassembly binary format. I gather that some of the sexts/zexts in --souperify output are zero-cost in the sense that they were added only to make Souper happy. One thing we might consider is marking (in --souperify output) which instructions are freebies that have zero cost in the original wasm file. I could hack Souper to take this into account when it attempts to make a profitable optimization. This, plus some handling of external uses, might significantly reduce the noise level in synthesis results. |
Does "synthesis" mean stuff aside from const/nop? In general results on the wasm backend are more important for the future (as we plan to switch to it from asm2wasm), however right now it is less optimized so more trivial stuff is possible there. But that's trivial stuff we need to fix ;) Yeah, the cost model seems like it matters in that example. For code size, a |
Yeah, sorry, "synthesis" means making new instructions that weren't there before. The issue of cost models is one that we've spent a lot of time thinking about. I think the issues are basically the same for LLVM IR and WebAssembly. Options include:
Hope this all makes sense. Summary is that this is a problem we need to solve anyway, so happy to put some effort into it. |
Interesting about the cost model. Yeah, sounds like a lot to do there. Meanwhile I generated some LHSes that might be interesting, this is from a function in the entropy coder in the AV1 media codec, which is something they tell me is relevant for superoptimization: I've also been improving the LHS generation mechanism. I'm working on a fuzzer that will verify that we generate proper LHSes from wasm, by doing trivial operations on them that should change nothing and then running to see that the output doesn't change. Already found one bug. |
Great! I'm traveling and won't do anything on this until the weekend unfortunately. Looking forward to seeing the new stuff. Instruction synthesis hadn't finished before I left home this morning argh. |
Ok, here are synthesis results from wasm_lua_scimark.c.wasm: And here's what Souper finds in the corresponding LLVM bitcode: Is the AV1 codec still the best thing to do next? |
Results from the AV1 LHSs linked above: |
Thanks! Reading the lua synthesis results, one problem is that we tell Souper to infer the artificial instructions that we add just for legal reasons, like Another issue is that the cost of A third issue is the presence of additional uses not seen in the LHS, as you brought up before. I looked into a couple manually, and in all of them that was why Binaryen didn't already perform the optimization. I wonder if this is more of an issue on wasm than on LLVM IR for some reason. In any case, I wonder if it wouldn't be more interesting to focus on single-use instructions, and I added a pass for that, Thinking about this, the single-use case might be interesting to focus on for an additional reason, that it should be much easier to create a fast matcher for such patterns - in particular, such a matcher could operate on Binaryen IR directly as opposed to first constructing the DataFlow IR. That should be far faster. Reading the AV1 results, those are very interesting :) I'll pass those along. |
Great, I'm glad we're at the point of worrying about these sorts of details. The current Souper cost model is dead simple: I'll make an alternate version that has cost of ext/trunc at 0. Regarding external uses: empirically, the single-use restriction will be a serious one, though certainly we should pursue the low hanging fruit there. But soon I'd like to move to a more sophisticated model where you mark instructions with external uses in the Souper IR and then synthesis will take them into account. The syntax for this isn't worked out yet but we'll do this soon. There's room for even more improvement beyond that, but let's not worry about that right now. |
Using the latest binaryen, I've just kicked off a synthesis run for these LHSs:
I'm worried, however, that there may be a bug because I'm seeing quite a few optimizations such as this one that are trivial due an operation having the same value passed in both argument positions:
|
OK, feel free to generate Souper that looks like this:
We'll modify synthesis to treat externally used values as fixed, and not try to replace them.
|
Thanks, yes, definitely a bug. Fixed on the branch now, and added a test.
Very good point, yeah. Fixed now. This was unnecessarily excluding a lot of LHSes! :)
Fair point, but this does bring us to a tradeoff with the speed of matching the pattern. If (aside from the root) we have only single uses, then we are just looking at some simple linear wasm expression code (with no |
Great! I'll kick off a new synthesis run tonight but then I'm leaving town tomorrow so it'll be the weekend again before I see the results (maybe I should poke holes in my home firewall but I don't really want to). |
here are a few synthesis results from wasmbackend_bullet.wasm: |
Also here's souper's cache dump of the same information (which I find easier to read without the webasm noise): https://gist.github.com/regehr/152b5e8359509c52a6c36829506d9d05 |
Thanks! Seeing some trivial stuff that shouldn't be there, I found that the single-use calculation was actually wrong - it ignored uses in things like stores and calls. Fixed, hopefully now it's correct. |
( I think that's all I had on my list of stuff to fix. |
Thanks! I'll post some new synthesis results soon. |
Background: google/souper#323 This adds a --souperify pass, which emits Souper IR in text format. That can then be read by Souper which can emit superoptimization rules. We hope that eventually we can integrate those rules into Binaryen. How this works is we emit an internal "DataFlow IR", which is an SSA-based IR, and then write that out into Souper text. This also adds a --dfo pass, which stands for data-flow optimizations. A DataFlow IR is generated, like in souperify, and then performs some trivial optimizations using it. There are very few things that can do that our other optimizations can't already, but this is also good testing for the DataFlow IR, plus it is good preparation for using Souper's superoptimization output (which would also construct DataFlow IR, like here, but then do some matching on the Souper rules).
@shrin18 please explain in more detail what you want to do |
I don't see any references to it anymore, but my understanding was that Souper supported not just LLVM IR but also MSVC's IR. With that in mind, I think it would be exciting if Souper also supported WebAssembly as an IR (potentially hinting to binaryen what optimizations it is missing).
Brief intro to WebAssembly for the uninitiated (from http://webassembly.org/ ):
WebAssembly also has an extensive specification, which hopefully makes supporting it easier than it would otherwise be.
I'm not proposing anything concrete (yet?), just trying to gauge interest and get feedback.
Thoughts?
cc @kripken @sunfishcode
The text was updated successfully, but these errors were encountered: