I finished hacking a simple benchmark together. It's pretty ugly code, but I think it's good enough to work with.
It's a simple 2d graphics renderer that does a little collision detection. It basically loads some images, adds a bunch of entities to the scene, and starts a loop which continually checks for collisions, updates entities, and renders them.
For collision detection, it divides the screen into a grid and caches entities that overlap each grid cell. That way, it's fast to lookup which entities are near the one you're checking. So the collision detection loops through all objects, fetches the ones around each one, and does a 2d axis-aligned box collision check. If the "player" entity collides with any other entity, it removes the other entity from the scene.
The LLJS version was compiled this way: ./LLJS/bin/ljc -o main.js main.ljs
./LLJS/bin/ljc -o main.js main.ljs
Edit: I'm using Firefox 16.0.2 on OS X.
I just tried it on my Mac, I get 60ms (34-88) with LLJS, 100ms (56-144) with JS in Firefox 19 Nightly. Also, the strange thing is Chrome Canary where LLJS is 300ms (32-321) with LLJS and 80ms (22-102) with JS. Can you try modifying the benchmark so it doesn't paint anything.
Done. Use the same URLs above.
That's very weird; I'm using nightly now (19.0a1) and with the rendering turned off still get ~200ms with LLJS and ~90ms without it. That's a stable average, not a spike. Not sure why you get such better numbers.
I have incremental gc pref'ed on. Do you? Is there anything else I should check?
Just tried it on my wife's Macbook Air (newest model) with Firefox 16, and got similar numbers to mine.
Good news! I have the LLJS version running faster.
The main thing was to refactor the cells implementation. It now avoids any allocation at run-time, and only pre-allocates stuff. This seems to be a big win. Code here: https://github.com/jlongster/game-off-2012/blob/master/main.ljs#L188
I did a few other things too, but made sure to transfer the same optimizations to the JS version to keep the comparison fair. The main thing was to put a limit on how many entities can be added to a grid cell, because realistically we don't need to handle 1000 objects all the same space.
Both tests process 3000 entities in the scene, but the first test only allows 100 entities to be stored as "near neighbors" for each entity, drastically reducing the amount of collision detection taking place. The latter one allows 1000 which basically is letting it process everything.
The LLJS version is not only way faster but also much more consistent in performance. I'm really happy this stuff actually works! I still can't really believe that it does, but now I have the numbers to prove it.
(Side note: I almost gave up tonight because for some reason I could not beat the JS version. I forgot to enable something in the JS version to fully process each entity which made it much slower and fairer, and now have these accurate numbers)
Great to hear! I'd still like to figure out why the shell version is 2x faster, though.
Were you comparing it with the in-browser version that didn't render anything? I think so, just making sure.
I'd be curious too. The big difference here is allocation. Previously a struct type was being allocated probably 10000 times a frame, so the js shell is faster at allocating for some reason.
Agh. My numbers above are not a fair comparison.
I realized that it was unfair to pre-allocate only in the LLJS version but still dynamically create 1000s of arrays each from in the JS version. If I pre-allocate in the JS version is well, it is again beating LLJS by 5-10ms and there are not noticeable GC pauses.
Strangely, the LLJS version is still about 10ms faster in Chrome with the update.
At this point, both versions avoid any allocation while running so it just comes down to how well the calculations are JIT-ed. There are ~6000 small objects though, and I'm surprised that I'm not seeing more GC pauses in Firefox.
I'm going to try push the GC harder to see if I can make it clear where LLJS will help me.
(I updated the demos on jlongster.com with the JS optimization)
Ok, here's a version that instead of removing objects as you move the dinosaur around, it adds a new entity to the scene when one if removed. This involves more allocation and theoretically stressed the GC some more.
LLJS: ~25/ms (17-35) (http://jlongster.com/s/game-off-2012-v3/)
JS: ~20/ms (16-31) (http://jlongster.com/s/game-off-2012-v3/ref/)
They are about the same, and there's still no noticeable GC pauses in Firefox.
The new implementations run similarly in the shell:
james:~/tmp/lljs% time node main
node main 7.69s user 0.06s system 100% cpu 7.691 total
james:~/tmp/lljs% time node main2
node main2 6.63s user 0.06s system 100% cpu 6.642 total
GUYS. I was wrong! Again. Sorry to keep going back and forth, benchmarking is difficult.
I've been a little bothered why I couldn't see any performance difference, because it really should be better. So I kept hacking on it and suddenly discovered a performance anomaly with the js code on this line:
Basically I was giving all the entities an x, y position as an integer. Since they never move, it never becomes a float. Evidently this is way faster with arithmetic, and was producing the fast speeds above for the js version. If I take out the Math.floor call, it shoots up to ~55ms / frame, and the the LLJS version is way faster.
Positions should always be floats because entities can move arbitrarily. I fixed the benchmark by making the entities shift around a little bit each frame. I also increased the entities a lot to make the numbers bigger.
Here are the results, with 3000 entities and caching 100 nearest neighbors at a time:
LLJS: ~87ms / frame (53-117) (http://jlongster.github.com/game-off-2012/game-lljs/)
JS: ~155ms / frame (136-427) (http://jlongster.github.com/game-off-2012/game-js/)
(you can move around a dino with the arrow keys if you didn't know that)
emscripten: ~95ms / frame (84-481) (http://jlongster.github.com/game-off-2012/game-emscripten)
It generally runs fast but seems a little choppy here and there, not sure exactly what's going on.
I get consistently faster (much faster!) numbers now. I feel confident that this stuff really works, and I've hand-optimized the pure js version as much one should, and lljs still beats it. What's great is that I can get the benefits of emscripten without having to actually write my game in C. In fact, LLJS seems to be more lightweight because it doesn't have to emulate a bunch OS stuff, and it starts quicker.
You can see the work here: https://github.com/jlongster/game-off-2012
I'm definitely using this to write my game now. Sorry the length of this, I thought you'd be interested in a thorough review from somebody not involved in the project :)
Cool. I just tried it on Chrome and strangely enough the emscripten version is really slow. Benchmarking is indeed really hard.
Oh wow, you're right. Maybe I'll ping Alon next week when I'm in MV about that. I'm sure it's a single bottleneck somewhere, I'm not familiar enough with it to debug though.