Consider bumping jit.call_til_compile #1426

Closed
hosiawak opened this Issue Nov 25, 2011 · 1 comment

Comments

Projects
None yet
3 participants
Member

hosiawak commented Nov 25, 2011

I've been playing with changing jit.call_til_compile and observing how it affects running common programs like specs/web apps etc.

What I've found after observing a lot of debug_search = true output (in state.cpp) is the default value of 4000 is too small for most of my workloads (I spend most of my dev time using Rails). It makes the JIT compile a lot of methods which seems to negatively affect the startup and execution of a majority of my programs (mostly running specs and Rails/Sinatra webapps). I've also been playing with the value 200 here:

https://github.com/rubinius/rubinius/blob/master/vm/llvm/state.cpp#L915

What follows is some benchmarks that show how changing jit_to_compile and the value 200 (above) affects the execution.

All the benchmarks were run on a 32-bit Linux box. I tested changing 4000 from 1000 up to 256000 and 200 from 100 upto 256000 as well. The benchmarks display my "sweet spot" values of 32000 / 8000 and compare them with the current defaults 4000 / 200.

First of all I noticed that running a lot of programs in a typical "everyday" way (run script, start the server, run specs etc.) is usually faster with -Xint than with JIT enabled. This was a big surprise since I'd expect the JIT to have enough time to kick in and optimize the hot code paths during a full CI mspec run but it's not the case (or it's there but very limited for some reason) or I'm missing something:

....................................

4000 / 200

karol@mint ~/projects/personal/rubinius (master) $ time ./bin/mspec ci -T -Xjit.call_til_compile=4000
rubinius 2.0.0dev (1.8.7 cf388af yyyy-mm-dd JI) [i686-pc-linux-gnu]

Finished in 196.584561 seconds

3833 files, 16371 examples, 44097 expectations, 0 failures, 0 errors

real 3m18.271s
user 2m20.833s
sys 0m9.245s

..........................................

32000 / 8000

karol@mint ~/projects/personal/rubinius (master) $ time ./bin/mspec ci -T -Xjit.call_til_compile=32000
rubinius 2.0.0dev (1.8.7 cf388af yyyy-mm-dd JI) [i686-pc-linux-gnu]

Finished in 159.089020 seconds

3833 files, 16371 examples, 44097 expectations, 0 failures, 0 errors

real 2m40.379s
user 1m31.018s
sys 0m8.073s

.....................................
-Xint

karol@mint ~/projects/personal/rubinius (master) $ time ./bin/mspec ci -T -Xint
rubinius 2.0.0dev (1.8.7 cf388af yyyy-mm-dd) [i686-pc-linux-gnu]

Finished in 160.008294 seconds

3833 files, 16371 examples, 44097 expectations, 0 failures, 0 errors

real 2m41.407s
user 1m32.330s
sys 0m6.944s

......................................................

So changing 4000 / 200 to 32000 / 8000 makes mspec run at the interpreter speed (which is about 22% faster than JIT)
Does it affect the code known to perform well under JIT then ? Let's take a look at the Red Black Tree benchmark which is known to run very well under JIT:

..................................

karol@mint ~/projects/personal/rubinius (master) $ ./bin/benchmark -T -Xjit.call_til_compile=4000 benchmark/real_world/bench_red_black_tree.rb
=== bin/rbx ===
#delete 9.2 (±0.0%) i/s - 46 in 5.012411s (cycle=1)
#add 18.6 (±5.4%) i/s - 94 in 5.047660s (cycle=1)
#search 62.2 (±1.6%) i/s - 312 in 5.020513s (cycle=6)
#inorder_walk 154.1 (±0.6%) i/s - 784 in 5.087315s (cycle=14)
#rev_inorder_walk 156.9 (±0.6%) i/s - 795 in 5.066015s (cycle=15)
#minimum 71.3 (±0.0%) i/s - 357 in 5.010415s (cycle=7)
#maximum 74.7 (±0.0%) i/s - 378 in 5.058158s (cycle=7)

...................................

karol@mint ~/projects/personal/rubinius (master) $ ./bin/benchmark -T -Xjit.call_til_compile=32000 benchmark/real_world/bench_red_black_tree.rb
=== bin/rbx ===
#delete 9.0 (±0.0%) i/s - 45 in 5.009886s (cycle=1)
#add 18.9 (±0.0%) i/s - 95 in 5.031541s (cycle=1)
#search 62.6 (±1.6%) i/s - 318 in 5.079273s (cycle=6)
#inorder_walk 154.1 (±0.6%) i/s - 784 in 5.087316s (cycle=14)
#rev_inorder_walk 158.4 (±0.0%) i/s - 795 in 5.018382s (cycle=15)
#minimum 71.9 (±1.4%) i/s - 364 in 5.061567s (cycle=7)
#maximum 76.2 (±0.0%) i/s - 385 in 5.053930s (cycle=7)

..................................

karol@mint ~/projects/personal/rubinius (master) $ ./bin/benchmark -T -Xint benchmark/real_world/bench_red_black_tree.rb
=== bin/rbx ===
#delete 3.0 (±0.0%) i/s - 16 in 5.298532s (cycle=1)
#add 7.0 (±0.0%) i/s - 35 in 5.007694s (cycle=1)
#search 13.5 (±0.0%) i/s - 68 in 5.028847s (cycle=1)
#inorder_walk 55.1 (±0.0%) i/s - 280 in 5.083075s (cycle=5)
#rev_inorder_walk 55.8 (±0.0%) i/s - 280 in 5.019383s (cycle=5)
#minimum 19.9 (±0.0%) i/s - 100 in 5.018715s (cycle=1)
#maximum 20.6 (±0.0%) i/s - 104 in 5.040521s (cycle=2)

....................................

I also tested a "Hello world" Rails 3.1 app under webrick - didn't notice any difference in reqs/s between 4000 / 200, 32000 / 800 and -Xint - they were giving roughly the same results.

Small difference was noticed in a "Hello world" Sinatra app running on Webrick.

4000 / 200 setting returned from 150 to 190 reqs/s
32000 / 8000 setting returned from 150 to 190 reqs/s
-Xint setting returned a fairly constant 170 reqs/s

I also noticed a bit faster startup time of Webrick with -Xint/32000 (Rails and Sinatra - eg. 35 seconds vs. 42 seconds)

I've been trying to find code that runs slower with 32000 / 8000 than 4000 / 200 but haven't had much luck. It seems the JIT kicks in for 32000/8000 anyway but 32000/8000 doesn't have the high compilation cost associated with 4000 / 200. What do you think ?

@dbussink dbussink closed this in ec6eede Jan 30, 2012

@hosiawak @dbussink wow, nice! Also ~12% improvement in Rails memory usage.

rails startup time (MRI 1.8.7 is ~3.5)
2012-01-30 10:58:41 UTC rbx-head    21.2 seconds    999b9395
2012-01-31 10:49:34 UTC rbx-head    16.0 seconds    e5aba274

rails requests per second (MRI 1.8.7 is ~74)
2012-01-30 10:48:09 UTC rbx-head    49.0 requests per second    999b9395
2012-01-31 10:44:03 UTC rbx-head    69.8 requests per second    e5aba274

rails memory usage (MRI 1.8.7 is ~56, 1.9.3 is ~82, JRuby is ~194)
2012-01-30 10:42:38 UTC rbx-head    159.0 MB    999b9395
2012-01-31 10:38:32 UTC rbx-head    139.9 MB    e5aba274
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment