Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Faster unpackm0 #5961

Merged
merged 4 commits into from Nov 7, 2019
Merged

Faster unpackm0 #5961

merged 4 commits into from Nov 7, 2019

Conversation

enebo
Copy link
Member

@enebo enebo commented Nov 6, 2019

Rails uses unpack1 'm0' to decode base64 cookies. Recent work had improved the performance of unpack1 but I was somewhat disturbed that MRI did so much better once the decode string got larger. This PR fast paths m0 and will also use a version which does not use ByteBuffer.

This benchmark looks at some larger strings (note: largest one is a bit bigger than the max size of a cookie but it is close enough to 4096 to get an idea of the cost at that size:

MRI 2.7
Warming up --------------------------------------
  unpack1 m len: 288     2.089k i/100ms
  unpack1 m len: 576     1.330k i/100ms
 unpack1 m len: 1152   858.000  i/100ms
 unpack1 m len: 2304   469.000  i/100ms
 unpack1 m len: 4608   249.000  i/100ms
Calculating -------------------------------------
  unpack1 m len: 288     22.723k (± 1.8%) i/s -    114.895k in   5.057910s
  unpack1 m len: 576     14.569k (± 2.3%) i/s -     73.150k in   5.023969s
 unpack1 m len: 1152      8.632k (± 2.2%) i/s -     43.758k in   5.071798s
 unpack1 m len: 2304      4.716k (± 2.4%) i/s -     23.919k in   5.074664s
 unpack1 m len: 4608      2.508k (± 2.5%) i/s -     12.699k in   5.066543s

9.2.9.0
Warming up --------------------------------------
  unpack1 m len: 288     1.168k i/100ms
  unpack1 m len: 576   633.000  i/100ms
 unpack1 m len: 1152   315.000  i/100ms
 unpack1 m len: 2304   162.000  i/100ms
 unpack1 m len: 4608    83.000  i/100ms
Calculating -------------------------------------
  unpack1 m len: 288     12.171k (± 2.3%) i/s -     61.904k in   5.089356s
  unpack1 m len: 576      6.316k (± 2.5%) i/s -     31.650k in   5.014613s
 unpack1 m len: 1152      3.282k (± 1.6%) i/s -     16.695k in   5.088612s
 unpack1 m len: 2304      1.657k (± 1.5%) i/s -      8.424k in   5.085228s
 unpack1 m len: 4608    841.896  (± 1.5%) i/s -      4.233k in   5.029165s

after
Warming up --------------------------------------
  unpack1 m len: 288     1.895k i/100ms
  unpack1 m len: 576     1.057k i/100ms
 unpack1 m len: 1152   534.000  i/100ms
 unpack1 m len: 2304   274.000  i/100ms
 unpack1 m len: 4608   142.000  i/100ms
Calculating -------------------------------------
  unpack1 m len: 288     19.856k (± 2.8%) i/s -    100.435k in   5.062359s
  unpack1 m len: 576     10.702k (± 2.8%) i/s -     53.907k in   5.041445s
 unpack1 m len: 1152      5.448k (± 1.9%) i/s -     27.234k in   5.001111s
 unpack1 m len: 2304      2.735k (± 3.1%) i/s -     13.700k in   5.014877s
 unpack1 m len: 4608      1.413k (± 3.3%) i/s -      7.100k in   5.032026s

So strangely we do not really fully catch on these larger base64 strings but the new version is quite a bit faster. I really wonder if there is a way to tweak this new version of unpack that can eliminate bounds checking. Nonetheless, these larger sizes are not what I am seeing in a simple Rails app. In that case I took the 4 decodes which happen and benched those:

system ~/work/jruby master 1328% mri27 ../snippets/unpack1.rb 
Warming up --------------------------------------
  unpack1 m len: 288     2.235k i/100ms
   unpack1 m len: 16     7.086k i/100ms
   unpack1 m len: 24     6.759k i/100ms
  unpack1 m len: 148     2.846k i/100ms
Calculating -------------------------------------
  unpack1 m len: 288     21.187k (± 7.2%) i/s -    107.280k in   5.092817s
   unpack1 m len: 16     72.179k (± 3.0%) i/s -    361.386k in   5.011767s
   unpack1 m len: 24     68.537k (± 2.1%) i/s -    344.709k in   5.031934s
  unpack1 m len: 148     32.150k (± 2.9%) i/s -    162.222k in   5.050202s

9.2.9.0
Warming up --------------------------------------
  unpack1 m len: 288     1.187k i/100ms
   unpack1 m len: 16     8.225k i/100ms
   unpack1 m len: 24     6.841k i/100ms
  unpack1 m len: 148     1.906k i/100ms
Calculating -------------------------------------
  unpack1 m len: 288     10.701k (± 1.9%) i/s -     54.602k in   5.104411s
   unpack1 m len: 16     83.846k (± 1.9%) i/s -    419.475k in   5.004838s
   unpack1 m len: 24     71.789k (± 3.4%) i/s -    362.573k in   5.056873s
  unpack1 m len: 148     19.899k (± 2.5%) i/s -    101.018k in   5.079880s

after
Warming up --------------------------------------
  unpack1 m len: 288     1.801k i/100ms
   unpack1 m len: 16     7.937k i/100ms
   unpack1 m len: 24     6.929k i/100ms
  unpack1 m len: 148     3.267k i/100ms
Calculating -------------------------------------
  unpack1 m len: 288     17.733k (± 6.9%) i/s -     90.050k in   5.102682s
   unpack1 m len: 16     98.730k (± 6.6%) i/s -    492.094k in   5.009752s
   unpack1 m len: 24     94.109k (± 2.5%) i/s -    471.172k in   5.009911s
  unpack1 m len: 148     33.853k (± 2.8%) i/s -    169.884k in   5.022907s

(note: indy makes these numbers a little bit better but the vast bulk of performance is all in Java)

We can see here that we are winning in 3 out of 4 unpacks and we are now close on the 4th.

My long term view is that ByteBuffer is impairing our unpack performance and when we rewrite unpack impls we should move to all byte[] versions. This fast path probably will not be needed once that is done.

I would like to understand why we are not catching perf with MRI here for longer strings but this is a positive step forward.

A really wildly unrealistic but not difficult idea would be to split the decode string and process pieces across multiple threads.

@enebo enebo added this to the JRuby 9.2.10.0 milestone Nov 6, 2019
@enebo
Copy link
Member Author

enebo commented Nov 6, 2019

LOL...extra commit in there for a missing add-opens as a "special" bonus unrelated to this PR :)

@enebo
Copy link
Member Author

enebo commented Nov 6, 2019

For fun here is 9.2.7.0 before some work went into unpack:

Warming up --------------------------------------
  unpack1 m len: 288   577.000  i/100ms
  unpack1 m len: 576   384.000  i/100ms
 unpack1 m len: 1152   257.000  i/100ms
 unpack1 m len: 2304   146.000  i/100ms
 unpack1 m len: 4608    78.000  i/100ms
Calculating -------------------------------------
  unpack1 m len: 288      6.557k (± 3.4%) i/s -     32.889k in   5.022262s
  unpack1 m len: 576      4.345k (± 3.7%) i/s -     21.888k in   5.045440s
 unpack1 m len: 1152      2.651k (± 2.6%) i/s -     13.364k in   5.045537s
 unpack1 m len: 2304      1.467k (± 2.4%) i/s -      7.446k in   5.078940s
 unpack1 m len: 4608    775.922  (± 2.4%) i/s -      3.900k in   5.029380s
system ~/work/jruby (9.2.7.0) 1405% jruby ../snippets/unpack1.rb 
Warming up --------------------------------------
  unpack1 m len: 288   605.000  i/100ms
   unpack1 m len: 16     1.108k i/100ms
   unpack1 m len: 24     1.074k i/100ms
  unpack1 m len: 148   767.000  i/100ms
  unpack1 m len: 288   598.000  i/100ms
Calculating -------------------------------------
  unpack1 m len: 288      5.986k (± 2.1%) i/s -     30.250k in   5.056054s
   unpack1 m len: 16     11.218k (± 2.1%) i/s -     56.508k in   5.039890s
   unpack1 m len: 24     10.705k (± 6.5%) i/s -     53.700k in   5.042451s
  unpack1 m len: 148      7.779k (± 3.8%) i/s -     39.117k in   5.036783s
  unpack1 m len: 288      6.053k (± 1.7%) i/s -     30.498k in   5.039550s

@enebo enebo merged commit c56caa3 into jruby:master Nov 7, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

1 participant