Operate on WebAssembly? #323

fitzgen · 2018-04-17T21:17:07Z

I don't see any references to it anymore, but my understanding was that Souper supported not just LLVM IR but also MSVC's IR. With that in mind, I think it would be exciting if Souper also supported WebAssembly as an IR (potentially hinting to binaryen what optimizations it is missing).

Brief intro to WebAssembly for the uninitiated (from http://webassembly.org/ ):

WebAssembly (abbreviated Wasm) is a binary instruction format for a stack-based virtual machine. Wasm is designed as a portable target for compilation of high-level languages like C/C++/Rust, enabling deployment on the web for client and server applications.

WebAssembly also has an extensive specification, which hopefully makes supporting it easier than it would otherwise be.

I'm not proposing anything concrete (yet?), just trying to gauge interest and get feedback.

Thoughts?

cc @kripken @sunfishcode

fitzgen · 2018-04-17T21:21:29Z

Related: a simple, offline super optimizer for wasm by @kripken: https://github.com/WebAssembly/binaryen/blob/superoptimizer/src/tools/wasm-analyze.cpp

regehr · 2018-04-17T21:26:51Z

I'll need to think about this, but I'll just note right now that Souper doesn't support MSVC, but rather MSVC IR and Souper IR are similar enough that Gratian at Microsoft wrote a quick MSVC->Souper translator and then manually implemented a bunch of optimizations that it discovered. Also, we recently learned that the Mono compiler people used Souper similarly! But I don't have much detail about that yet. I assume they used their SSA IR but I don't know for sure.

As far as non-SSA IRs go, a while ago I started on a CompCert->Souper translator but got stalled for lack of time. I don't have a good sense for how easy it would be to do a WebAssembly->Souper translator.

regehr · 2018-04-20T16:08:38Z

Ok, I thought about this a bit more.

The first issue is getting Souper to understand WebAssembly, this is less trivial than other cases due to the stack machine. On the other hand, a very nice thing about conversion to Souper is that you can stop whenever you want, for example because you run into something tricky. I've never written code to symbolically execute stack machine code to make it into register code, but conceptually this doesn't sound that hard (again, as long as we can bail out when things get tricky).

As far as I know, after un-stack-conversion (is there a word for this?) the underlying operations should map nicely between WebAssembly and Souper, and Souper's synthesizer should find optimizations perfectly normally.

The tricky part, then, is translating the synthesized results back into WebAssembly optimizations. I suppose this would be 100% manual at first, and then there are probably things that make sense to automate.

So anyhow my initial impression is that we want an unstackifier that makes Souper IR out of webasm, and then we can proceed from there. I think this makes more sense than trying to shoehorn a stack machine into the Souper implementation.

fitzgen · 2018-04-20T16:50:20Z

Thanks for the thoughtful reply :)

One benefit I see of destackifying wasm into Souper IR is that experimentation developing that tool can happen out-of-tree and can move fast and loose.

Souper's ability to bail out on tricky constructs is nice, too. This enables incremental development, where we could get an MVP working and then start ratcheting up the percent wasm that is understood by the translator.

On the other hand, I also have a concern with this approach. Code size is super important for wasm because it is delivered over the network, and therefore potential size optimizations are particularly enticing. I fear that translating wasm into Souper IR won't help us discover "silly" wasm-specific size optimizations, like https://kripken.github.io/blog/binaryen/2018/04/13/a-silly-optimization.html

Maybe this fear is unfounded, and size optimizations aren't as specific to the instruction set and its encoding as I'm assuming? Or maybe translation into Souper IR just isn't the right approach for wasm size optimization, and further development of a wasm-specific super optimizer is a more promising avenue of investigation?

regehr · 2018-04-20T17:04:49Z

I have the same worry about size! Best we can do is give it a try, and maybe it'll turn out that re-stackifying the Souper output gets back the benefits?
Synthesis is a hard enough problem that I think it's worth exploring with Souper before doing more one-offs.
Random thought: I wonder if perhaps the webassembly typechecker/bytecode verifier/whatever contains the simple symbolic execution engine that we want for unstackifying webasm?

kripken · 2018-04-20T17:35:34Z

I agree this is definitely worth trying! I could look into creating a Binaryen pass to convert Binaryen IR to Souper IR, if that would be helpful? (Binaryen IR is almost identical to wasm, can be converted to and from it, and preserves size and performance aspects.) It might be simplest for me to convert to the Souper text format maybe, to get started?

About the size concern, yeah, that seems like it could be an issue. For example if the wasm => Souper IR conversion is inefficient, then Souper's proposed optimizations might be to remove those inefficiencies. We'll just have to make sure it's a good conversion :)

regehr · 2018-04-20T17:50:26Z

Alon, if you could hack something up that generates Souper text, this would be great! I'm happy to read code, play with crappy prototypes, etc. Also I'm happy to do the Souper runs (synthesizer can be finicky).

One thing to keep in mind is that one of Souper's most powerful features is its path conditions-- so if there's a way to generate those easily as part of the conversion, this will be especially helpful in generating optimizations (though applying the optimizations becomes less straightforward).

kripken · 2018-04-20T18:09:44Z

Great! I'll see what I can hack up.

Is the Souper paper the best place for docs on the IR text format?

About path conditions, naively it seems like that should be easy since we know the conditions of conditional branches etc. But if we need to do stuff like forward conditions through multiple conditional branches etc. (if the condition is still true in them) then I'm not sure offhand.

regehr · 2018-04-20T18:15:11Z

Argh we don't really have docs for Souper IR -- but it is really simple. I guess maybe ask me lots of questions and look at the test cases?

find souper/test -name '*.opt'

regehr · 2018-04-20T18:15:59Z

Re. path conditions, extracting the condition from only the most recent branch is probably the way to go.

chenyang78 · 2018-04-20T18:16:30Z

Sorry for chiming in on this :) Beside *.opt as John says, I think souper/InstRef.rst is a good place in terms of Souper IR. Otherwise, you may have to end up scanning the relevant code.

In terms of path condition (PC in Souper's terminology), I think knowing the conditions of conditional branches should be enough for the first version. You can skip the BlockPC part at the moment.

regehr · 2018-04-20T18:17:56Z

Ah, regarding docs/InstRef.rst, let me take a look and make sure it's up to date, has not been touched for a quite a while.

chenyang78 · 2018-04-20T18:21:43Z

@regehr , right, I just realized that we haven't touched it for a while :)

kripken · 2018-04-20T19:04:07Z

Thanks @regehr and @chenyang78, those look like what I need to get started!

kripken · 2018-04-22T17:59:54Z

Ok, I wrote up something that seems like it's starting to have the right general shape for simple things. Feedback welcome! Code is in a souper branch in binaryen, the new pass is here.

To try it, you can do something like

wasm-opt input.wasm --flatten --simplify-locals-nonesting --souperify

(flatten flattens the IR; simplify-locals-nonesting simplifies it while keeping it flat; souperify runs the actual conversion). This will print out all the LHSes it can handle. Example input and output are in the test suite. For example, something conceptually like

int func(int x) {
  if (x & 1)
    x += 1;
  else
    x += 2;
  return x & 1;
}

can be written in wasm like this

    (if
      (i32.and
        (get_local $x)
        (i32.const 1)
      )
      (set_local $x
        (i32.add
          (get_local $x)
          (i32.const 1)
        )
      )
      (set_local $x
        (i32.add
          (get_local $x)
          (i32.const 2)
        )
      )
    )
    (return
      (i32.and
        (get_local $x)
        (i32.const 1)
      )
    )
)

and --souperify will emit 5 LHSes for it, the interesting one is for the returned value, which is

%0 = block 2
%1 = var
%2 = and %1, 1:i32
%3 = blockpc %0 0 %2 1:i32
%4 = blockpc %0 1 %2 0:i32
%5 = add %1, 1:i32
%6 = add %1, 2:i32
%7 = phi %0, %5, %6
%8 = and %7, 1:i32
infer %8

which looks like it might be valid Souper IR. I didn't try to load it yet though... :)

Still a lot of TODOs (IR not yet converted) and FIXMEs (no i1 conversions yet). It just stops a particular LHS when it see stuff it can't handle. For control flow so far only ifs are handled. blockpcs for them are implemented too (see example above), but no regular pcs yet. I suspect ifs are enough control flow to get interesting results.

Some thoughts as I was doing this:

One thing I would expect the Souper approach to help with is "backend" inefficiencies. wasm is often code that has been run through the LLVM optimizer (nothing new there for Souper), but then it goes through an LLVM backend, which maybe introduces extra work there.
wasm not run through the LLVM optimizer is interesting too, e.g. generated from Go or AssemblyScript, and maybe optimized by Binaryen. Souper on such wasm could find stuff the Go, AssemblyScript, and Binaryen optimizers could do better. I'm guessing there is quite a lot there!
We can't expect this to handle very low-level wasm things like -(-64) being smaller than 64, or things with tees, block and if values, etc. All those details get lost in the translation to Souper IR. Overall I expect we can benefit both from the Souper approach and from superoptimizing on wasm directly (I recently got that building again for more tests), they are probably complementary.

regehr · 2018-04-22T19:25:49Z

Awesome. Is there a particular C/C++ code base that is particularly representative of code you're interested in optimizing, that I should start with? Otherwise I'll just build a few random things.

So far I've been creating wasm by passing --target=wasm32 to clang, is that the right approach?

Indeed, Go's equivalent of InstCombine is quite weak so Souper will find a lot of stuff.

There's an interesting piece of this puzzle that I don't think has been attacked seriously yet, that would not only be useful to solve but also would make a good submission to a conference like PLDI. This is taking patterns from Souper, perhaps abstracting them so they're more broadly applicable, and then making a very fast matcher for them. Just mentioning this since it's something I'm very interested in working on.

kripken · 2018-04-22T21:46:25Z

Awesome. Is there a particular C/C++ code base that is particularly representative of code you're interested in optimizing, that I should start with? Otherwise I'll just build a few random things.

Hmm, some good samples are here: https://github.com/kripken/embenchen/tree/master/asm_v_wasm (that's output from the emscripten benchmark suite). In particular the larger ones like bullet, box2d, and lua, those are important real-world codebases I think.

So far I've been creating wasm by passing --target=wasm32 to clang, is that the right approach?

In general yes, however it's worth running the binaryen optimizer on it too (wasm-opt -Os input.wasm -o output.wasm) as it will shrink it by 10% or so. Probably not the same optimizations as Souper would look for, but while shrinking it makes the code cleaner which is more likely to work in the --souperify pass.

regehr · 2018-04-22T21:56:23Z

Thanks Alon. I should have a bit of time this week to work on this!

kripken · 2018-04-23T03:02:29Z

pc implemention (for ifs, the only control flow so far) added to that branch. Output looks like it might be correct, but I'm not sure :)

regehr · 2018-04-23T03:17:12Z

I've been reading the code looking to fix a couple of easy things, but if you're in there right now it'll be quicker for you to do these if you have time!

var instructions require a width, e.g.
%x:i32 = var
we don't support redundant comparison instructions, so we don't have sgt, sge, ugt, uge (I know this is quirky and weird, but canonicalization has zero relevance in Souper IR so we've not bothered to fix this)
pc arguments need to be i1 instead of i32

I'm working on updating our Inst documentation to reflect this kind of thing, thanks for your patience!

rsas · 2018-04-23T19:07:35Z

Hi Alon, great idea to apply superoptimization on wasm! Particularly, I second the thought of applying Souper on the wasm directly that comes from non-LLVM back-ends. Let's start with getting the Souper IR right first.

kripken · 2018-04-23T23:47:44Z

@regehr thanks! I fixed those 3 things on the branch.

Output looks better now, but probably still a lot left to do to get it right. Also, to add operations aside from binaries (unaries, etc.)...

regehr · 2018-04-24T03:40:33Z

Ok, great!
I'm now seeing two small issues.

First, while most LHSs have the proper var declarations there are also some degenerate LHSs that are just this, which makes Souper's parser barf:

; start LHS
infer %0

Then second there's a common pattern of comparing the results of comparisons against zero using the wrong width as seen here:

; start LHS
%0:i32 = var
%1 = slt -1000999:i32, %0
%2 = sle 0:i32, %0
%3 = eq %2, 0:i32
pc %3 1:i1
infer %1

Since Souper doesn't have any implicit sext/zext, it should be:

%3 = eq %2, 0:i1

But leaving aside these issues I'm already seeing some perhaps-useful stuff coming out of the synthesizer!

regehr · 2018-04-24T03:42:52Z

As a random example, here's an optimization that Souper would like to perform.
LHS:

%0:i32 = var
%1 = sle 0:i32, %0
%2:i32 = zext %1
%3:i32 = var
%4 = sle 0:i32, %3
%5:i32 = zext %4
%6 = or %2, %5
%7 = eq %6, 0:i32
infer %7

RHS:

%8:i32 = and %0, %3
%9:i1 = slt %8, 0:i32
result %9

kripken · 2018-04-24T17:35:36Z

Exciting to see results already! And nice optimization it found there.

Those two issues should be fixed on the branch. Also added unary ops (cttz etc.).

regehr · 2018-04-24T21:33:31Z

Ok!
Next two issues should also be easy:

please use lshr/ashr instead of sshr/ushr
blockpc doesn't get a label in Souper IR, so instead of:

%3 = blockpc %0 0 %2 1:i1

we want:

blockpc %0 0 %2 1:i1

regehr · 2018-04-24T21:58:46Z

And here are a few more results:
https://gist.github.com/regehr/a428f164ae60ff783e5872d5efc5416e

To get these, I took wasm_lua_scimark.c.wasm and ran it through binaryen -O3, then ran the output through Souper in "constant synthesis" mode, where the only thing it does is try to prove that the LHS can be replaced by a constant.

I haven't gone and tracked these back to the wasm, so some of them could be artifacts of the translation, but it looks like at least some of them represent real opportunities for shaving code size.

To facilitate backtracking, would it make sense for --souperify to also print the wasm in a comment? Or print a line number for the root of the LHS maybe?

regehr · 2018-04-24T22:03:45Z

You'll notice that sometimes the LHS contains far more context than is required to justify the optimization, making it harder to understand what's going on. I have a reducer for LHSs that I can run in the future.

kripken · 2018-04-24T23:31:50Z

Thanks! Fixed those two minor issues on the branch.

About backtracking, I need to think about that - we have a DebugInfo mechanism that might work somehow, but it's not trivial to connect it. Looks like we definitely need a way to do that for debugging, though, as several of those results do look like my translation is wrong, for example the optimizer definitely knows that multiplying an integer by zero is zero, and can add two constants, etc.

kripken · 2018-04-25T00:00:12Z

To get something simple I added support for setting BINARYEN_DEBUG_SOUPERIFY=1 in the env, then it prints each wasm expression before the souper one. However, it's not in a comment (no trivial way do that atm, I'll look into it) but worse it's printing the "flattened" IR, so it may not be useful enough for real debugging.

However, when trying to use this to debug those weird muls and adds, I think I might be misreading your gist:

; RHS inferred successfully
%0:i32 = add 0:i32, 1:i32
cand %0 1:i32

; RHS inferred successfully
%0:i32 = var
%1:i32 = mul 0:i32, %0
cand %1 0:i32

Does that mean I should be able to search in the output from processing that lua file (after -O3) and find the strings = add 0 or = mul 0 in the LHSes? Neither seems to show up. All the adds and muls have a non-constant left side (starting with %) as I'd expect, so maybe I'm confused? Or was this another file than https://github.com/kripken/embenchen/blob/master/asm_v_wasm/wasm_lua_scimark.c.wasm ?

kripken · 2018-05-01T02:49:50Z

I see what's going on, yeah - if you ran two commands, then binaryen by default won't save function names (you need -g to preserve them). And yeah, it's fine to compose those things, -O3, --souperify etc. are all just passes (or lists of passes), run in order, so a single command like my example is fine too.

kripken · 2018-05-01T02:53:25Z

And yeah, now I see those trivial add/sub pairs - they are in the wasmbackend versions, not the asm2wasm ones. E.g.

; start LHS (in $luaL_error)
%0:i32 = var
; (i32.sub(get_local $5)(i32.const 128))
%1 = sub %0, 128:i32
; (i32.add(get_local $3)(i32.const 128))
%2 = add %1, 128:i32
infer %2

kripken · 2018-05-01T03:04:56Z

And here's the relevant wast:

 (func $luaL_error (; 41 ;) (type $3) (param $var$0 i32) (param $var$1 i32) (param $var$2 i32) (result i32)
  (local $var$3 i32)
  (local $var$4 i32)
  (i32.store
   (i32.const 1024)
   (tee_local $var$3
    (i32.sub
     (i32.load
      (i32.const 1024)
     )
     (i32.const 128)
    )
   )
  )
  [..]
  (i32.store
   (i32.const 1024)
   (i32.add
    (get_local $var$3)
    (i32.const 128)
   )
  )
  (get_local $var$0)
 )

Looks like stack prologue/epilogue stuff - things added by the wasm backend, after LLVM IR.

I think that could be improved, but I'm not sure. Opened https://bugs.llvm.org/show_bug.cgi?id=37299

regehr · 2018-05-01T03:14:46Z

Interesting!
I'll do the next synthesis run with instructions on the asm2wasm version.
Unfortunately it may not finish before I leave town again, and my current fastest machine that I use for synthesis happens to be at home, so it may be the weekend before I post results.
Let me know if it would be useful to do a constant/nop synthesis run on the asm2wasm version, I can turn that around quickly.
I have money for faster machines at work but haven't gotten around to ordering anything, argh.

regehr · 2018-05-01T03:26:03Z

Here's an example where we need to ask what the cost model should be. Souper wants to eliminate an instruction, but is that an appropriate optimization here?

; start LHS (in $_lua_settop)
%0:i32 = var
; (i32.le_s(get_local $1)(i32.const -1))
%1 = sle %0, -1:i32
%2:i32 = zext %1
infer %2

; RHS inferred successfully
%3:i32 = lshr %0, 31:i32
result %3

regehr · 2018-05-01T05:44:42Z

Some constants and nops:
https://gist.github.com/regehr/649c148d8c154b2550062d865b30a244
From:

BINARYEN_DEBUG_SOUPERIFY=1 ~/binaryen/build/bin/wasm-opt asm2wasm_lua_scimark.c.wasm -O3 --flatten --simplify-locals-nonesting --souperify

regehr · 2018-05-01T05:51:54Z

Does it seem weird that Souper gets 16k LHSs out of asm2wasm_lua_scimark.c.js.bc, but --souperify gets 200k LHSs out of asm2wasm_lua_scimark.c.wasm?

regehr · 2018-05-01T06:08:12Z

Well, in the ongoing instruction synthesis run, I'm getting quite a bit of this:

; start LHS (in $_lua_tolstring)
%0:i32 = var
; (i32.le_s(i32.const 0)(get_local $68))
%1 = sle 0:i32, %0
%2:i32 = zext %1
; (i32.eq(unreachable)(unreachable))
%3 = eq %2, 0:i32
infer %3

; RHS inferred successfully
%4:i32 = ashr %0, 31:i32
%5:i1 = trunc %4
result %5

Something I'd like to do is make a custom cost function for Souper that closely mirrors the actual code size costs in the webassembly binary format.

I gather that some of the sexts/zexts in --souperify output are zero-cost in the sense that they were added only to make Souper happy. One thing we might consider is marking (in --souperify output) which instructions are freebies that have zero cost in the original wasm file. I could hack Souper to take this into account when it attempts to make a profitable optimization. This, plus some handling of external uses, might significantly reduce the noise level in synthesis results.

kripken · 2018-05-01T14:04:28Z

Does "synthesis" mean stuff aside from const/nop?

In general results on the wasm backend are more important for the future (as we plan to switch to it from asm2wasm), however right now it is less optimized so more trivial stuff is possible there. But that's trivial stuff we need to fix ;)

Yeah, the cost model seems like it matters in that example. For code size, a zext of an i1 is free in wasm. On the other hand, I'm not sure it's free in the compiled machine code? So depending on if we're optimizing for download size or runtime speed could matter.

regehr · 2018-05-01T15:28:08Z

Yeah, sorry, "synthesis" means making new instructions that weren't there before.

The issue of cost models is one that we've spent a lot of time thinking about. I think the issues are basically the same for LLVM IR and WebAssembly. Options include:

right now we just eyeball it, tweaking the model when synthesis comes up with something it shouldn't have. this is only minimally workable. LLVM has these somewhat complicated and poorly documented ideas about canonicalization and they're hard to capture in a simple cost model.
probably the next thing I'll try is to run a ton of small LLVM functions through opt and then build a cost model by observing LLVM's behavior. I'm not certain that this approach makes sense for WebAssembly since the optimizer isn't as thorough yet.
I also want to run a ton of small functions through a backend, apply an assembly-level cost model, and then try to map that back to the LLVM level. this should work really well when optimizing for the size of the eventual object code; it isn't clear how well it'll work when optimizing for throughput, since predicting throughput of assembly code is difficult for modern processors. we can try this approach for webassembly assuming that you have an ahead-of-time webassembly backend?

Hope this all makes sense. Summary is that this is a problem we need to solve anyway, so happy to put some effort into it.

kripken · 2018-05-03T00:59:40Z

Interesting about the cost model. Yeah, sounds like a lot to do there.

Meanwhile I generated some LHSes that might be interesting, this is from a function in the entropy coder in the AV1 media codec, which is something they tell me is relevant for superoptimization:
https://gist.github.com/kripken/b5da8f8da6b3cb88de012aa24e3393ce Would be interesting if it finds any especially clever stuff there.

I've also been improving the LHS generation mechanism. I'm working on a fuzzer that will verify that we generate proper LHSes from wasm, by doing trivial operations on them that should change nothing and then running to see that the output doesn't change. Already found one bug.

regehr · 2018-05-03T02:08:33Z

Great! I'm traveling and won't do anything on this until the weekend unfortunately. Looking forward to seeing the new stuff. Instruction synthesis hadn't finished before I left home this morning argh.

regehr · 2018-05-05T23:24:39Z

Ok, here are synthesis results from wasm_lua_scimark.c.wasm:
https://gist.github.com/regehr/0271e1060f791d7d813c000fa5596c8e
A few patterns dominate the output but there's some interesting stuff hiding in there.

And here's what Souper finds in the corresponding LLVM bitcode:
https://gist.github.com/regehr/b19854020a21a39fb34e50eff922e187

Is the AV1 codec still the best thing to do next?

regehr · 2018-05-06T16:29:35Z

Results from the AV1 LHSs linked above:
https://gist.github.com/regehr/be7fd81ee4940086fcbf1fa3dd521fe0
The last one (with bswap) might be cute for processors where bswap isn't microcoded.

kripken · 2018-05-06T23:52:59Z

Thanks!

Reading the lua synthesis results, one problem is that we tell Souper to infer the artificial instructions that we add just for legal reasons, like zexts and comparisons to zero for path conditions. I pushed an improvement to avoid that.

Another issue is that the cost of zext and trunc should be 0. Not inferring those instructions (last paragraph) will help but they can also appear in the middle of LHSes. But, maybe that will be rare enough it's not a problem.

A third issue is the presence of additional uses not seen in the LHS, as you brought up before. I looked into a couple manually, and in all of them that was why Binaryen didn't already perform the optimization. I wonder if this is more of an issue on wasm than on LLVM IR for some reason. In any case, I wonder if it wouldn't be more interesting to focus on single-use instructions, and I added a pass for that, --souperify-single-use (can use it just like --souperify).

Thinking about this, the single-use case might be interesting to focus on for an additional reason, that it should be much easier to create a fast matcher for such patterns - in particular, such a matcher could operate on Binaryen IR directly as opposed to first constructing the DataFlow IR. That should be far faster.

Reading the AV1 results, those are very interesting :) I'll pass those along.

regehr · 2018-05-07T04:46:23Z

Great, I'm glad we're at the point of worrying about these sorts of details.

The current Souper cost model is dead simple:
https://github.com/google/souper/blob/master/lib/Inst/Inst.cpp#L640

I'll make an alternate version that has cost of ext/trunc at 0.

Regarding external uses: empirically, the single-use restriction will be a serious one, though certainly we should pursue the low hanging fruit there. But soon I'd like to move to a more sophisticated model where you mark instructions with external uses in the Souper IR and then synthesis will take them into account. The syntax for this isn't worked out yet but we'll do this soon. There's room for even more improvement beyond that, but let's not worry about that right now.

regehr · 2018-05-07T04:55:11Z

Using the latest binaryen, I've just kicked off a synthesis run for these LHSs:

BINARYEN_DEBUG_SOUPERIFY=1 ~/binaryen/build/bin/wasm-opt wasmbackend_bullet.wasm -O3 --flatten --simplify-locals-nonesting --souperify-single-use

I'm worried, however, that there may be a bug because I'm seeing quite a few optimizations such as this one that are trivial due an operation having the same value passed in both argument positions:

; start LHS (in $free)
%0:i32 = var
%1:i32 = var
; (i32.ne(get_local $36)(get_local $2))
%2 = ne %0, %0
infer %2

; RHS inferred successfully
result 0:i1

regehr · 2018-05-07T17:10:05Z

OK, feel free to generate Souper that looks like this:

%0:i32 = var (hasExternalUses)
%1:i32 = add 0:i32, %0
%2:i32 = add 0:i32, %1
%3:i32 = add 0:i32, %2 (hasExternalUses)
%4:i32 = add 0:i32, %3

We'll modify synthesis to treat externally used values as fixed, and not try to replace them.
Please keep in mind that:

checking for a single use is not the right algorithm, since many values have multiple uses that are captured by the Souper LHS; we really want to only mark values that have uses that are not visible in the Souper code
the root of a LHS can obviously have as many uses as it likes and all of them are, by definition, invisible to Souper, and these have no impact on synthesis or the subsequent optimization

kripken · 2018-05-07T21:18:19Z

%2 = ne %0, %0

Thanks, yes, definitely a bug. Fixed on the branch now, and added a test.

the root of a LHS can obviously have as many uses as it likes and all of them are, by definition, invisible to Souper, and these have no impact on synthesis or the subsequent optimization

Very good point, yeah. Fixed now. This was unnecessarily excluding a lot of LHSes! :)

checking for a single use is not the right algorithm, since many values have multiple uses that are captured by the Souper LHS; we really want to only mark values that have uses that are not visible in the Souper code

Fair point, but this does bring us to a tradeoff with the speed of matching the pattern. If (aside from the root) we have only single uses, then we are just looking at some simple linear wasm expression code (with no set_locals in the middle, in particular), which is very fast to compare to a pattern. And I could be wrong but my current intuition is that we'll have many or most of the wins this way anyhow.

regehr · 2018-05-07T21:25:39Z

Great! I'll kick off a new synthesis run tonight but then I'm leaving town tomorrow so it'll be the weekend again before I see the results (maybe I should poke holes in my home firewall but I don't really want to).

regehr · 2018-05-14T15:59:41Z

here are a few synthesis results from wasmbackend_bullet.wasm:
https://gist.github.com/regehr/2378313b8b4ed859948341e79c375eeb

regehr · 2018-05-14T16:05:54Z

Also here's souper's cache dump of the same information (which I find easier to read without the webasm noise): https://gist.github.com/regehr/152b5e8359509c52a6c36829506d9d05

kripken · 2018-05-15T03:02:03Z

Thanks!

Seeing some trivial stuff that shouldn't be there, I found that the single-use calculation was actually wrong - it ignored uses in things like stores and calls. Fixed, hopefully now it's correct.

kripken · 2018-05-26T23:32:27Z

--souperify now emits (hasExternalUses) when relevant. This required some redesign of the IR to properly handle a single wasm instruction that turns into multiple Souper instructions, etc.

(--souperify-single-use still exists, and should never emit that notation, which is useful for fuzzing)

I think that's all I had on my list of stuff to fix.

regehr · 2018-05-29T14:47:50Z

Thanks! I'll post some new synthesis results soon.

Background: google/souper#323 This adds a --souperify pass, which emits Souper IR in text format. That can then be read by Souper which can emit superoptimization rules. We hope that eventually we can integrate those rules into Binaryen. How this works is we emit an internal "DataFlow IR", which is an SSA-based IR, and then write that out into Souper text. This also adds a --dfo pass, which stands for data-flow optimizations. A DataFlow IR is generated, like in souperify, and then performs some trivial optimizations using it. There are very few things that can do that our other optimizations can't already, but this is also good testing for the DataFlow IR, plus it is good preparation for using Souper's superoptimization output (which would also construct DataFlow IR, like here, but then do some matching on the Souper rules).

regehr · 2019-10-28T17:59:53Z

@shrin18 please explain in more detail what you want to do

kripken mentioned this issue Aug 18, 2018

Souper integration + DataFlow optimizations WebAssembly/binaryen#1638

Merged

kripken mentioned this issue Oct 21, 2019

Using Binaryen to optimize AssemblyScript's ARC code? WebAssembly/binaryen#2398

Closed

shrin18 mentioned this issue Nov 6, 2019

Added Benchmarks WebAssembly/binaryen#2413

Closed

Operate on WebAssembly? #323

Operate on WebAssembly? #323

Comments

fitzgen commented Apr 17, 2018

fitzgen commented Apr 17, 2018

regehr commented Apr 17, 2018

regehr commented Apr 20, 2018

fitzgen commented Apr 20, 2018

regehr commented Apr 20, 2018

kripken commented Apr 20, 2018

regehr commented Apr 20, 2018

kripken commented Apr 20, 2018

regehr commented Apr 20, 2018

regehr commented Apr 20, 2018

chenyang78 commented Apr 20, 2018

regehr commented Apr 20, 2018

chenyang78 commented Apr 20, 2018

kripken commented Apr 20, 2018

kripken commented Apr 22, 2018

regehr commented Apr 22, 2018 • edited

kripken commented Apr 22, 2018

regehr commented Apr 22, 2018

kripken commented Apr 23, 2018 • edited

regehr commented Apr 23, 2018 • edited

rsas commented Apr 23, 2018

kripken commented Apr 23, 2018

regehr commented Apr 24, 2018

regehr commented Apr 24, 2018

kripken commented Apr 24, 2018

regehr commented Apr 24, 2018

regehr commented Apr 24, 2018 • edited

regehr commented Apr 24, 2018

kripken commented Apr 24, 2018

kripken commented Apr 25, 2018

kripken commented May 1, 2018

kripken commented May 1, 2018

kripken commented May 1, 2018

regehr commented May 1, 2018

regehr commented May 1, 2018

regehr commented May 1, 2018

regehr commented May 1, 2018

regehr commented May 1, 2018 • edited

kripken commented May 1, 2018

regehr commented May 1, 2018 • edited

kripken commented May 3, 2018

regehr commented May 3, 2018

regehr commented May 5, 2018

regehr commented May 6, 2018

kripken commented May 6, 2018

regehr commented May 7, 2018

regehr commented May 7, 2018 • edited

regehr commented May 7, 2018

kripken commented May 7, 2018 • edited

regehr commented May 7, 2018

regehr commented May 14, 2018

regehr commented May 14, 2018

kripken commented May 15, 2018

kripken commented May 26, 2018

regehr commented May 29, 2018

regehr commented Oct 28, 2019

regehr commented Apr 22, 2018 •

edited

kripken commented Apr 23, 2018 •

edited

regehr commented Apr 23, 2018 •

edited

regehr commented Apr 24, 2018 •

edited

regehr commented May 1, 2018 •

edited

regehr commented May 1, 2018 •

edited

regehr commented May 7, 2018 •

edited

kripken commented May 7, 2018 •

edited