Skip to content

Build Box2D and make Cool Demos #22

Closed
kripken opened this Issue Jun 13, 2011 · 47 comments

4 participants

@kripken
Owner
kripken commented Jun 13, 2011

No description provided.

@kripken
Owner
kripken commented Jun 13, 2011

Quick compilation of Box2D + HelloWorld, unoptimized build:

http://syntensity.com/static/box2d.ll
http://syntensity.com/static/box2d.js

Partially optimized:

http://syntensity.com/static/box2d.opt.js

(To see one of the .js files work, download it and then run it in a console JS engine.)

I don't know Box2D much myself, but looks like it works, output is the same as HelloWorld compiled natively.

Main question is how to move forward. The raw C++ functions are exposed (see _main, that is main() from HelloWorld), which are not JS-friendly. We should probably wrap them in a JS API somehow?

@fjenett
fjenett commented Jun 13, 2011

Hey, this was fast! Great!
You are right, "Hello World" seems to work fine.

Regarding the wrapper, is this something that needs to be done manually? Are there known strategies? And/or can it be set up in a way that future releases of box2d can easily be wrapped again?

Another aspect to look into is size i guess. Closure compiles it from 2+mb to 640k which is still heavy for something that potentially is being used on the net. Btw. the compiler reports many (600+) "unreachable codes" which could be removed.

@kripken
Owner
kripken commented Jun 13, 2011

Well, Emscripten exposes the raw C++ functions. We don't really have a good idea of how to automate the creation of nice JS-friendly APIs. Ideas are welcome!

If the functions were C and not C++, this would be easier. You would have things like box2d_new_vector2() and so forth, with clear parameters. With C++, things are messier. But, even with C the functions are still not very JS-friendly.

The current tools Emscripten has here are something that generates the unmangled C++ function names, so you can at least identify them in the compiled code. But getting from that to a proper API isn't clear to me.

A manual API could be written. This would need some modifications for new versions of Box2D (how much depending on how much changed).

About size: First, closure advanced yields something of similar size to the native binary, in my experience. Second, when you have a final program, you can remove unused functions and dead code.

The "unreachable codes" are a bug in the compiler. Closure fixes it, so I haven't been motivated to fix the issue.

@fjenett
fjenett commented Jun 14, 2011

Ok, thanks!

Looking at the generated code and the Box2D API i think manually writing a full wrapper would be pretty much a whole project of it's own.

To get to a fancy working demo one strategy could then be to write the Box2D code in C++ with ways to create shapes (addBox, addCircle, addPoly) and add a rendering service (drawBox, drawCircle, drawPoly) from outside. These ins and outs could then be replaced on the JS side by real implementations. Something like a "tiny simple API".

Looking at the C++ code i have a hard time trying to grasp how that needs to be wrapped to be able to call it from outside, consider:

// write a wrapper to create a world
function createWorld ( gravityX, gravityY, doSleep, groundX, groundY ) {
// this should wrap anything up to
// groundBody->CreateFixture( ... )
// from "HelloWorld.cpp"
// and return the world
}

How would that look like using the generated code? How would one go about handling all that stackBase and such?

Thanks!

@kripken
Owner
kripken commented Jun 17, 2011

You shouldn't need to do anything with stackBase (which is the current position of the stack).

In general, you might do stuff like this, to create a wrapper for a single function:

function myWrapper(arg1, arg2) {
// convert JS arguments into 'c++' arguments. For example, a JS string becomes an integer, which is a 'pointer' into the string in memory
theFunction(cArg1, cArg2); // normal function call
// convert the return value back to JS
}

This will leak memory though. Managing allocations is hard here.

You can also wrap bigger stuff. For example converting the HelloWorld.cpp could start with

var gravity = Box2D.b2Vec2.__new__1(0.0, -10.0); // This uses an autogenerated wrapper for the constructor (from Emscripten's namespacer tool). It is a parallel to new b2Vec2(0, -10);

var world = Box2D.b2World.new(gravity, true);

var groundBodyDef = ...; // call a wrapper, or allocate memory and call constructor yourself. Sadly the namespacer tool didn't succeed in creating an automatic wrapper here...
// get the position element by finding the right memory offset
// call Set on the position element, achieving groundBodyDef.position.Set(0.0, -10.0);

I am starting to think we need a compiler plugin here, so we can scan all the classes and functions as we compile them, and generate wrappers automatically. Not sure how hard that is though.

@fjenett
fjenett commented Jun 17, 2011

Aha! That looks much better, i think i can work with that ...
Can you link the code generated by the namespacer tool? .. or did i miss that in the files above?
Is there a documentation of all the JS -> C/C++ type conversion options?

Thanks!

@kripken
Owner
kripken commented Jun 18, 2011

Here is for example what the namespacer generates for the scriptaclass test (python tests/runner.py clang_0_0.test_scriptaclass

Module["_"] = {
  "ScriptMe": {
    "ScriptMe": __ZN8ScriptMeC2Ei, 
    "ScriptMe__params": "int", 
    "__new__": function() { var ret = _malloc($struct_ScriptMe___SIZE); Module._.ScriptMe.ScriptMe.apply(null, [ret].concat(Array.prototype.slice.apply(arguments))); return ret; }, 
    "getVal": __ZN8ScriptMe6getValEv, 
    "getVal__params": "", 
    "mulVal": __ZN8ScriptMe6mulValEi, 
    "mulVal__params": "int"
  }
};

That is generated from


          struct ScriptMe {
            int value;
            ScriptMe(int val);
            int getVal(); // XXX Sadly, inlining these will result in LLVM not
                          // producing any code for them (when just building
                          // as a library)
            void mulVal(int mul);
          };
          ScriptMe::ScriptMe(int val) : value(val) { }
          int ScriptMe::getVal() { return value; }
          void ScriptMe::mulVal(int mul) { value *= mul; }

There is some documentation inside the tools (tools/namespacer.py, etc.).

Meanwhile, I am thinking perhaps using SWIG would make sense here. It's used for making bindings in general, maybe we can make a js/emscripten one.

@joelgwebber

@kripken : I've been trying to get Box2D working through emscripten, and have been hitting something of a brick wall. The code compiles fine, but no matter what flags I pick, the resulting code seems to produce bad output (even for Box2D's HelloWorld, all the values come out "0"). I'm sure I'm doing something boneheaded, but can't seem to pin down what it is, so any help (even if it's just the Makefile you used to compile "box2d.opt.js" above) would be greatly appreciated.

If you care to have a look, the code (with nasty hacky makefiles) I'm using is here: https://github.com/joelgwebber/bench2d (look in the "c" subdirectory -- Bench2d.c currently contains a simple benchmark #if 0'd out, with a copy of HelloWorld buried in the #else case).

@kripken
Owner
kripken commented Dec 12, 2011

jgw: I took a look at the makefile there. Some possible issues:

  1. -std-compile-opts for LLVM will run all the LLVM optimizations, which are not safe nor portable, and can break Emscripten.

  2. Try to not use any Emscripten flags. The default code generation mode is the safest. (Once everything works, to get full speed you will need those things, though.) But right now, you should first use the defaults, in particular, the defaults will not disable corrections (CORRECT_OVERFLOWS etc.), which the settings in your makefile will. (Corrections make the code slow, but are necessary in some cases; PGO is the solution for that.)

  3. RELOOP is misspelled as RELOP (so no relooping is done, but that isn't important now, it just means compilation is faster and code is slower).

  4. What OS are you on? I test on Linux, I am not entirely sure things work on OS X and Windows. If that is the issue here we should fix it, please let us know.

@joelgwebber

@kripken: Thanks for the quick reply. I'm compiling on OSX, and I've tried building clang/llvm from trunk, as well as the 2.9 and 3.0 releases, to no avail. I've got an Ubuntu machine at work, but for various reasons (specific to my company's particular Ubuntu install) have been having trouble getting a reasonably modern version of clang/llvm working. I'll keep banging on that to see if I can get it going, but for now I don't know if that will fix it or not.

I've removed all the flags and pushed a new version of the makefile (thanks for the heads-up on -std-compile-opts -- that was purely cargo-culting on my part). I'm still getting bad results -- Hello trips an assertion in b2PolygonShape now, and just prints "nan 0.00 nan" repeatedly if I disable assertions.

Again, thanks for taking the time to look at this. I'll keep banging on it to see if I can get Clang working on my Linux box at work. Hopefully that will turn out to be the problem. In the meantime, if you happen to notice anything else I'm doing that's screwy, please feel free to chime in. And if you happen to have a working emscripten makefile for box2d's hello world lying about, I'd be glad to give that a shot over here :)

@kripken
Owner
kripken commented Dec 13, 2011

My makefile from back then is too old, even if I could find it (emscripten changed a lot since then). But I am sure we can get this to work, at least on Linux (fixing everything for Windows and OS X might take a bit more work, I honestly don't know how much).

I'll take another look at your makefile later today or tomorrow.

@kripken
Owner
kripken commented Dec 13, 2011

With this diff:

https://gist.github.com/1470674

it builds for me without any optimizations. The build commands are in the diff. Sorry for the hackishness, but it's late ;)

Running it in node, I get

[..]
1.529999 :: 2.000000
1.529999 :: 1.000000
1.529999 :: 2.000000
1.529999 :: 1.000000
[..]

I hope that's good? :)

@joelgwebber

@kripken Thanks again for your help. I finally managed to get a build of LLVM on my Ubuntu box, and oddly enough it works fine there (both with my original makefile and your modified one). I can't explain that, but the stack of clang, llvm, headers, and so forth is so complex that it could be almost anything :P

I confirmed that both the Box2D HelloWorld and my benchmark produce sane output now, and the performance numbers aren't all over the map like they were with builds I did on the Mac. Now I feel comfortable moving forward with getting real benchmark numbers. To that end, I re-ran the build with all the emscripten flags in the makefile re-enabled, and confirmed that the output is a bit faster, and still generates sane values. But not being deeply familiar with emscripten, I don't have a good sense for whether those flags actually make sense.

Given that none of them breaks the code, do you have any suggestions on whether I should change them in any way to get the best performance? And have you found that running emscripten output through the Closure compiler makes any difference in performance (other than for startup)? I would like to make sure I get an accurate representation of the best code emscripten can generate.

@kripken
Owner
kripken commented Dec 14, 2011

I am actually right now working on a new compiler frontend (emcc) to make optimizing code much much easier. It should be usable in a few days. Until then, optimizing is not very convenient, but overall see the docs at

https://github.com/kripken/emscripten/wiki/Optimizing-Code

The crucial points are

  • Avoid CORRECT options, or at least use PGO
  • Use typed arrays (1 or 2, worth checking both)
  • If the code works with memory compression, do that (QUANTUM_SIZE=1, USE_TYPED_ARRAYS=1)
  • Use the emscripten JS optimizer
  • Use the emscripten eliminator
  • Use closure compiler (advanced! makes a big difference not just for startup, but performance later)

You can see all of these in action in the emscripten benchmarks (python tests/runner.py benchmark). But again, the easiest thing might be to wait a few days for emcc, if you are not in a rush I would recommend that (you can also try emcc now, there is some --help in it, but expect breakage until its stabile).

@kripken
Owner
kripken commented Dec 15, 2011

emcc is now usable for optimizing, see docs at

https://github.com/kripken/emscripten/wiki/Optimizing-Code

Basically, compile the final bitcode bc file, then do emcc -O3 box.bc and you should get an optimized JS file. Aside from -O3, there are some additional potential optimizations as mentioned in the docs there, specifically memory compression and typed arrays mode 1, it's worth trying both of those out. The Emscripten benchmark suite uses both of those, which is why the results are so good there (3-4X slower than native code).

@kripken
Owner
kripken commented Dec 15, 2011

jgw, I just noticed this benchmark is hit by the problem mentioned at the bottom of issue 132. Until that is fixed (a few days, I hope), the JS code generated here will be much slower than it should be. Sorry about this.

@joelgwebber

@kripken : Thanks for the heads-up. I've got it compiling with emcc, but am seeing 200-300ms/frame, which I presume is in line with what you'd expect given issue 132. I'll make a note in my benchmarks that the emscripten output is known to be suboptimal and on its way to being fixed.

@kripken
Owner
kripken commented Dec 16, 2011

jgw, I see you checked in some emscripten-generated code into bench2d, is that the code you benchmarked with? It's only partially optimized (no closure compiler, for example, which emcc will run automatically for you).

I fixed most of the slowdown bug in the emccbydefault branch in emscripten. It's not ready to be pushed to master yet. But I'm testing it in a fork of bench2d here, to get it to optimize as much as possible with the new emcc

https://github.com/kripken/bench2d

Edit: Note that emcc in master, while not as fast as in that branch, would be significantly faster than the generated code in the repo (due to closure, eliminator, and js optimizer passes), even with -O2.

@kripken
Owner
kripken commented Dec 16, 2011

Last comment for today, -O3 gives the same output as -O2, and is about 10% faster. Both seem to be much faster than the code in the repo, which for some reason maxes out the memory on my machine (?) on both v8 and sm.

I pushed an optimized build to to my fork. I still have some more optimizations to test (memory compression, etc.), and still need to finish fixing the slowdown bug, but most of the performance should be present in that build. If you can compare it to yours, I'm curious what the results are.

@kripken
Owner
kripken commented Dec 16, 2011

Really last comment ;) Memory compression gives another 10% speedup, and the code doesn't seem to have broken. Pushed that to my fork.

@joelgwebber

@kripken Thanks for all the help. I pulled the latest emscripten head and recompiled using EMCC and your flags. The numbers are now much more in line with the other implementations (mean=90ms, stddev=11ms). I've pushed the updated makefile and a copy of the compiled output (it's in c/bench2d.js) to save others the trouble of reproducing it.

I didn't run the closure compiler on the output because I don't currently have it setup locally, and minification made no noticeable performance difference on the mandreel output. If your results are different, let me know and I'll take a moment to compile the output.

You can see my updated numbers off to one side in the spreadsheet. I'm writing up a couple of followup edits, and will update all the graphs once I'm done with that.

@kripken
Owner
kripken commented Dec 16, 2011

Closure compiler doesn't just minify: In advanced mode, it can greatly speed up the code in many cases by coalescing variables and inlining. emcc will run it in advanced mode by default, the emscripten compilation strategy relies on closure compiler - so not running it means the code is not fully optimized for speed.

(Mandreel code is known to break on closure advanced, I spoke to them about that, and non-advanced just minifies as you said but has no effect on speed.)

What spreadsheet do you refer to?

@kripken
Owner
kripken commented Dec 16, 2011

Another question, even aside from closure compiler the code in your repo is unoptimized (it doesn't have the variable eliminator run on it, for example, which should happen with -O1 or above). But your makefile says it is running with -O3. Also, that command will not work at all if closure is not installed, so the emscripten-generated code in your repo seems to not be created by that makefile. Unless I am missing something?

edit: Also, the makefile will create bench2d.opt.js, not bench2d.js as in the repo, further confusing me

@joelgwebber

The spreadsheet I'm referring to is here: https://docs.google.com/spreadsheet/ccc?key=0Ag3_0ZPxr2HrdEdoUy1RVDQtX2k3a0ZISnRiZVZBaEE (it's the one the graphs on my writeup are derived from).

The output that you see is what came from running:

emcc -O3 -s USE_TYPED_ARRAYS=1 -s QUANTUM_SIZE=1 -s TOTAL_MEMORY=150000000 bench2d.bc -o bench2d.js

I didn't realize emscripten code would actually survive Closure advanced optimizations intact. In that case, I can definitely see how it would make a difference. I've gone ahead and spun a new build with the closure compiler enabled, and the results are still better than the previous run. I've pushed that copy of bench2d.js, and updated the spreadsheet above (see the column "Emscripten Test" off to the right).

@kripken
Owner
kripken commented Dec 16, 2011

I just pushed most of the emcc enhancements to master just now - it's worth pulling.

@kripken
Owner
kripken commented Dec 16, 2011

I should have mentioned in the docs that we use closure advanced, sorry about that. I added a note to emcc --help now.

I tried to run the other benchmarks to get some numbers on my machine, however the mandreel one seems broken in chrome and firefox, 'startApp is not defined'.

@joelgwebber

Ok, just updated the compiled output and numbers with a new pull of emscripten, and it's yet again better than before. Now the mean is a bit over 70ms. I also updated the mandreel-compiled output -- it has a slightly odd loading machanism, so you'll need to load it out of the /c/mandreel directory directly. It's now compiled with full optimizations, though the performance isn't noticeably different from before.

@kripken
Owner
kripken commented Dec 17, 2011

I also updated the mandreel-compiled output -- it has a slightly odd loading machanism, so you'll need to load it out of the /c/mandreel directory directly

Can you please elaborate? I can't figure out how to run it, I tried both using a local httpserver and as a file:// url. Same error in both cases.

@joelgwebber

Probably wasn't entirely clear -- what I meant was that I had to move the mandreel html file directly into the /c/mandreel subdirectory for it to load properly (I pushed this sometime yesterday). You can load it from a file URL with no problem: file:///path/to/bench2d/c/mandreel/bench2d_mandreel.html

Just pulled onto my home laptop and verified this works properly.

@kripken
Owner
kripken commented Dec 17, 2011

Thanks. It's still broken though,

[11:10:09.784] uncaught exception: [Exception... "A parameter or an operation is not supported by the underlying object" code: "15" nsresult: "0x8053000f (NS_ERROR_DOM_INVALID_ACCESS_ERR)" location: "http://localhost:8888/mandreel/mandreel.js Line: 963"]

I've seen this before with mandreel generated code, we discussed it with them on the FF bug tracker, it doesn't look like they test much on dev versions of browsers (I'm on FF11).

@joelgwebber

Ah, I see that now in Firefox (v8) as well. I had been testing on Chrome. Not sure why it's screwed up on FF, but I was using Chrome as the baseline for my tests of different libraries, so it at least works out ok for the benchmarks. I'll ping the guy I've been talking to at Mandreel to see if they have a fix for this.

@kripken
Owner
kripken commented Dec 17, 2011

Note that in Chrome perf results will differ a lot from FF9+ (FF9 will be stable in 1 week). Mandreel is tuned for Chrome and is significantly slower on FF, I can find the bug number where the details are discussed if you want. Emscripten is more balanced, at least in my benchmarks. So would be interesting to see results on a browser other than Chrome.

@kripken
Owner
kripken commented Feb 12, 2012

@joelgwebber I recently finished some additional optimizations in emscripten and ran your box2d benchmark on it. Comparing the printed final averages, I get

clang native -O3          7.5
clang js on sm trunk     47.4
clang js on node 0.6.6  106.8

(Note that I compare clang and not gcc as in your tests, I prefer clang because then I have the same compiler - or at least frontend - in both native code and JS).

So the fastest JS engine, at least on my machine (2Ghz core 2 duo laptop, linux) is just 6.3x slower than native code, which is about twice as fast as in your earlier tests, I think?

Code is in my fork, https://github.com/kripken/bench2d

@joelgwebber

Great work, Alon. Does your fork include a new version of the emscripten-compiled output? If so, I'd like to go ahead and pull it, then re-run numbers on my machine as well. I also have a Flash version I keep meaning to finish up. When I do so, I'll post updated numbers.

Side note: The only reason I used gcc rather than clang is that, for some reason, the clang output on my mac (using XCode's clang) produced non-functional output, and I just never had time to track down the issue. I doubt one will produce wildly better results than the other, though.

@kripken
Owner
kripken commented Feb 19, 2012

The code is in my fork (the 'inline' file), but please wait on benchmarking as I want to finish a few last things (probably take a few days).

I am very curious about Flash results. Any preliminary numbers there?

About gcc and clang, yeah, I don't see a big difference in practice, my main reason is more theoretical to keep the comparison as close as possible.

@joelgwebber
@kripken
Owner
kripken commented Feb 21, 2012

Interesting. So Java was 2.5x slower than C, I think I remember? So Flash is 5x slower? Making it faster than JS at 6.3x slower, but not by a huge amount.

Flash does have some potential advantages, aside from types and native matrix/vector classes it also has the alchemy stuff which lets it emulate memory very efficiently. That might not be relevant in this handwritten code (I think that is what Flash Box2D is?), but it is relevant in that JS engines running compiled code are slowed down by memory emulation quite a bit. However, Chrome and Firefox devs are working on this so it'll be interesting to see how much of a speedup that will give.

@joelgwebber
@kripken
Owner
kripken commented Feb 21, 2012

Alchemy aside from being a compiler has also led to some VM additions (or perhaps the VM additions came first and I got that wrong?), that make it easier to run compiled code, stuff like special arrays that are accessed very quickly (more than typed arrays in JS). That is motivating some optimizations in JS engines to get similar performance.

@kripken
Owner
kripken commented Feb 21, 2012

Ok, I just wanted to check some stuff but everything seems fine. bench2d.js in my fork is benchmarkable. Thanks for doing these benchmarks, btw :)

Btw, two feature requests for your benchmarks: Code size (after gzip), and benchmarks of raw JS engines (not just in browsers). Code size is interesting for obvious reasons I think, benchmarks of raw JS engines are useful because sometimes people do run outside of browsers, say in node.js.

Returning to the original topic of this bug, I finished porting Box2D using the emscripten bindings generator, so it's easy to use from normal JS (C++ classes get JS wrappers),

https://github.com/kripken/box2d.js

It isn't optimized yet (closure compiler breaks it, need to find out why) but a demo is up at http://syntensity.com/static/box2d.html

@kripken
Owner
kripken commented Feb 22, 2012

Fixed the closure compiler bug, box2d.js is now closured too.

@joelgwebber

Sorry for the slow response. Nathan Hammond submitted your additions as a pull request (joelgwebber/bench2d#4), which I just merged. Feel free to send any further changes, tweaks, optimizations, etc.

As for code-size, I definitely agree that's worth measuring. It's a bit of work, because it would only be fair to measure on raw Javascript that's been as optimized as possible -- but it's tricky to pin down which Closure optimizations will actually work on it. Still worth doing, though.

When you have a moment, please look at the code that's checked in under /emscripten to make sure nothing's broken. I'm getting numbers on my MacBook Pro (same as I used for the others) on the order of 90ms/frame. I seem to recall that you were getting better numbers than this. Can you try out the code in the repo to see if perhaps I'm missing something?

@kripken
Owner
kripken commented Feb 27, 2012

The pull request uses box2d, which is a library version, and might perform a little differently than the raw compiled version of the benchmark, since for example the main benchmark code is in JS and not compiled C. Also, it's much larger. It would be fairer to compare the raw compiled benchmark like the previous benchmarks did, I think.

I'll submit a pull request.

@kripken
Owner
kripken commented Feb 27, 2012

Submitted joelgwebber/bench2d#5

I get 47.5ms on my 2009 laptop running Firefox nightly with that, and 114.5ms with Chrome dev. What browser are you using to test, that's probably the biggest factor here?

@joelgwebber

[thanks, merged]

Ah, I see. My numbers were with Chrome, as it was coming out ahead on the emscripten output in the past. On FF10, I'm getting just under 30ms pretty reliably. That's pretty impressive -- I can now definitively state that emscripten+ff has broken the order-of-magnitude barrier w.r.t. C++.

A quick update on the AS3 tests -- I've confirmed that they aren't "cheating" by using built-in native vector/matrix classes (just the ones from the original Box2D, transliterated into AS3), and I'm getting a reliable 15ms/frame. I'm still surprised by this, to be honest, but so far I can't poke any holes in it.

@kripken
Owner
kripken commented Feb 27, 2012

Great!

About AS3, that's an impressive result for Flash. Part of the difference with JS is probably because the compiled JS uses patterns that JS engines haven't really optimized for, while the AS3 implementation uses patterns the runtime has been optimized for. I think that is changing though, at least in V8 and SpiderMonkey, so I hope to see big speedups later this year in Chrome and Firefox.

@juj
Collaborator
juj commented Apr 14, 2015

Closing as resolved: Box2D demo at http://kripken.github.io/box2d.js/webgl_demo/box2d.html , and since Box2D, I think we pretty much have built the best demos in town! ;)

@juj juj closed this Apr 14, 2015
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Something went wrong with that request. Please try again.