New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open addressing algorithm for RubyHash #5215

Merged
merged 25 commits into from Aug 6, 2018

Conversation

Projects
None yet
6 participants
@ChrisBr
Copy link
Contributor

ChrisBr commented Jun 11, 2018

This is a very early stage of implementing open addressing for JRuby hashes suggested in #4708. The MRI issue is located here: https://bugs.ruby-lang.org/issues/12142.

After discussing with @headius, he suggested to open a PR to continue discussion here. Please note that this is far from a mergable state.

I tried a very simple benchmark and recognize a small performance improvement:

require 'benchmark'
n = 50_000
small = 1000
tmp = {}
Benchmark.bmbm do |x|
  x.report("Add:") { n.times { small.times { |small| tmp[small] = true } } }
  x.report("Fetch:") { n.times { small.times { |small| tmp[small] } } }
end

Gives me with this PR:

Rehearsal ------------------------------------------
Add:     5.880000   0.220000   6.100000 (  5.127261)
Fetch:   5.770000   0.130000   5.900000 (  5.375460)
-------------------------------- total: 12.000000sec

             user     system      total        real
Add:     4.900000   0.000000   4.900000 (  4.884514)
Fetch:   5.270000   0.010000   5.280000 (  5.234178)

Current master:

Rehearsal ------------------------------------------
Add:     6.140000   0.160000   6.300000 (  5.220546)
Fetch:   6.020000   0.070000   6.090000 (  5.695494)
-------------------------------- total: 12.390000sec

             user     system      total        real
Add:     5.150000   0.050000   5.200000 (  5.086812)
Fetch:   6.260000   0.000000   6.260000 (  6.224505)

(Intel Core i7-7820HQ, 32 GB RAM)

However, I would need some guidance how to gather meaningful results.

Currently is also missing the approach to only use one bucket for small hashes. I have this already working in another branch, however, did not see a (speed) improvement. Could be worth to investigate from a memory improvement though.

Thanks for any feedback!


private final int[] internalCopyBins() {
int[] newBins = new int[bins.length];
for(int i = 0; i < bins.length; i++) {

This comment has been minimized.

@kares

kares Jun 12, 2018

Member

these can now do a simple System.arraycopy(bins, 0, newBins, 0, bins.length)

}
if (entry != null && entry.isLive()) {
if (entry == null) break;

This comment has been minimized.

@kares

kares Jun 12, 2018

Member

so there's no longer null buckets for old (removed) entries, is that correct?

This comment has been minimized.

@ChrisBr

ChrisBr Jun 12, 2018

Author Contributor

Currently we mark them with isLive() = false and iterate to the first null value (current end of the entries array).
But the better approach is probably set to null when deleting and iterate till size. I will implement this in the next round, this was just quick & dirty to get ruby booting again ...

This comment has been minimized.

@ChrisBr

ChrisBr Jun 12, 2018

Author Contributor

In the bins array we have three different values now:

  • -1 empty, stop searching
  • -2 deleted, jump over it
  • >= index of the hash / key
@headius

This comment has been minimized.

Copy link
Member

headius commented Jun 12, 2018

I modified your benchmark to use benchmark-ips and simple loops, since plain benchmark tends not to exercise methods well and block-based loops have much more overhead than simple loops.

require 'benchmark/ips'
N = 50_000
small = 1000
tmp = {}
Benchmark.ips do |x|
  x.report("Add:") { i = 0; while i < N; i+=1; tmp[small] = true; end }
  x.report("Fetch:") { i = 0; while i < N; i+=1; tmp[small]; end }
end

Here's numbers with standard options, invokedynamic, and invokedynamic on GraalVM.

Before your work:

[] ~/projects/jruby $ jruby hash_bench.rb 
Warming up --------------------------------------
                Add:    48.000  i/100ms
              Fetch:    62.000  i/100ms
Calculating -------------------------------------
                Add:    589.298  (± 5.1%) i/s -      2.976k in   5.065141s
              Fetch:    677.965  (± 5.8%) i/s -      3.410k in   5.049201s

[] ~/projects/jruby $ jruby -Xcompile.invokedynamic hash_bench.rb 
Warming up --------------------------------------
                Add:    70.000  i/100ms
              Fetch:    95.000  i/100ms
Calculating -------------------------------------
                Add:    912.031  (± 8.2%) i/s -      4.550k in   5.029019s
              Fetch:      1.025k (± 5.1%) i/s -      5.130k in   5.020724s

[] ~/projects/jruby $ (pickjdk 1 ; jruby -Xcompile.invokedynamic -Xfixnum.cache=false hash_bench.rb)
New JDK: graalvm
Warming up --------------------------------------
                Add:    97.000  i/100ms
              Fetch:   153.000  i/100ms
Calculating -------------------------------------
                Add:      1.487k (± 2.7%) i/s -      7.469k in   5.027800s
              Fetch:      1.744k (± 5.3%) i/s -      8.721k in   5.022021s

And after your work:

[] ~/projects/jruby $ jruby hash_bench.rb 
Warming up --------------------------------------
                Add:    59.000  i/100ms
              Fetch:    65.000  i/100ms
Calculating -------------------------------------
                Add:    723.540  (± 6.8%) i/s -      3.599k in   5.002143s
              Fetch:    726.660  (± 5.4%) i/s -      3.640k in   5.026187s

[] ~/projects/jruby $ jruby -Xcompile.invokedynamic hash_bench.rb 
Warming up --------------------------------------
                Add:    73.000  i/100ms
              Fetch:    98.000  i/100ms
Calculating -------------------------------------
                Add:    935.883  (±12.6%) i/s -      4.599k in   5.041101s
              Fetch:      1.190k (± 8.2%) i/s -      5.978k in   5.071986s

[] ~/projects/jruby $ (pickjdk 1 ; jruby -Xcompile.invokedynamic -Xfixnum.cache=false hash_bench.rb)
New JDK: graalvm
Warming up --------------------------------------
                Add:   116.000  i/100ms
              Fetch:   173.000  i/100ms
Calculating -------------------------------------
                Add:      1.634k (± 3.8%) i/s -      8.236k in   5.048905s
              Fetch:      1.959k (± 4.3%) i/s -      9.861k in   5.048234s

Looks like a good win so far!

@headius

This comment has been minimized.

Copy link
Member

headius commented Jun 12, 2018

I did a size comparison. For a Hash with 100_000 entries mapping 0-99_000 to 1, the old impl is 11.847MB in memory and the new version is 11.382MB. If we subtract some of the common objects:

  • RubyHashEntry: 37 bytes * 100_000 = 3.7MB
  • RubyFixnum: 56 bytes * 100_000 = 5.6MB
  • Total shared between old and new (roughly): 9.3

That gives us old and new sizes of 2.547 and 2.082, a savings of 0.465MB or over 18% on the base RubyHash structure. That's pretty good!

So based on that I thought I'd try some small hashes like {a:1}, which we obviously see a lot of in Ruby too.

Subtracting the size of the symbol and fixnum, this hash is 166 bytes on the new impl and 52 bytes on the old impl, so this may be justification for packing simple hashes down a bit tighter.

@ChrisBr

This comment has been minimized.

Copy link
Contributor Author

ChrisBr commented Jun 12, 2018

@headius thanks for the benchmark 👍

Subtracting the size of the symbol and fixnum, this hash is 166 bytes on the new impl and 52 bytes on the old impl, so this may be justification for packing simple hashes down a bit tighter.

Ok! I already had some code for a LinearSearch approach without the bins array. I will try to merge these two implementations tonight.

I will also look into getting rid of the RubyHashEntry elements completely, discussed yesterday with @lopex and extracting it into a Util class.

Stay tuned 🎸

@headius

This comment has been minimized.

Copy link
Member

headius commented Jun 12, 2018

If we can get rid of the RubyHashEntry objects that would be a tremendous memory savings.

entries = new RubyHashEntry[MRI_INITIAL_CAPACITY];
bins = new int[MRI_INITIAL_CAPACITY * 2];
entries = new IRubyObject[MRI_INITIAL_CAPACITY * 2];
bins = new int[MRI_INITIAL_CAPACITY * 4];

This comment has been minimized.

@ChrisBr

ChrisBr Jun 12, 2018

Author Contributor

This might not make sense. The entries array is now twice as big but is also twice as fast full (because we store key and value in separate rows now). So I think we could still go with MRI_INITIAL_CAPACITY * 2 for the bins array.

@ChrisBr

This comment has been minimized.

Copy link
Contributor Author

ChrisBr commented Jun 12, 2018

If we can get rid of the RubyHashEntry objects that would be a tremendous memory savings.

I pushed an initial version of getting rid of the RubyHashEntries. For now, I don't store them anymore but use them as temporary objects. We now store the key and values in separate rows in the entries array. Key's are stored in the even rows and values in the odd rows. This adds quite some complexity from my POV 😢

I haven't benchmarked it yet as I think there is a bug in the script you posted (you never increase small variable, do you?). Would be great if you could have a look at it again and run your benchmarks @headius.

I also tried to move this to a Util class but I have some open questions:

  • I would need to access the flags in the Utils class (e.g. COMPARE_BY_IDENTITY_F). How would I do that?
  • We also would need to implement the iterator, checkIterating, checkModify etc. I guess this would be quite some work ...

However, I we want to implement the LinearSearchHash for small hashes, I think it would be very beneficial if we could extract these two classes...

@ChrisBr ChrisBr force-pushed the ChrisBr:performance/hash branch from 90cb4b7 to f296f24 Jun 13, 2018

@ChrisBr

This comment has been minimized.

Copy link
Contributor Author

ChrisBr commented Jun 13, 2018

@headius I added now a linear search approach for small hashes (< 16 elements).

@ChrisBr

This comment has been minimized.

Copy link
Contributor Author

ChrisBr commented Jun 19, 2018

@headius any update regarding your benchmarking script or how I should proceed?

@enebo

This comment has been minimized.

Copy link
Member

enebo commented Jun 19, 2018

@ChrisBr he is on vacation this week. You could fix the bench and generate some numbers and see if removing the extra types helped (which of course means running fixed benchmark with original RubyHashEntry - linear search. Memory we can probably calculate manually. We would largely remove 3.7M - field size.

@ChrisBr

This comment has been minimized.

Copy link
Contributor Author

ChrisBr commented Jun 20, 2018

@enebo okay! Here you go:

This is the script I used:

require 'benchmark/ips'

[3, 15, 16, 100, 1_000, 10_000].each do |n|
  puts "=========== #{n} elements ============="
  tmp = {}
  Benchmark.ips do |x|
    x.report("Add:") { i = 0; while i < n; tmp[i] = true; i += 1; end }
    x.report("Fetch:") { i = 0; while i < n; tmp[i]; i += 1; end }
  end
end

Benchmark of this PR

  • linear search for hash size < 16
  • open addressing for hash size > 16
  • "no" HashEntry elements
=========== 3 elements =============
Warming up --------------------------------------
                Add:    97.614k i/100ms
              Fetch:   103.702k i/100ms
Calculating -------------------------------------
                Add:      3.957M (± 6.4%) i/s -     19.718M in   5.005939s
              Fetch:      3.560M (± 7.3%) i/s -     17.733M in   5.009620s

=========== 15 elements =============
Warming up --------------------------------------
                Add:    39.879k i/100ms
              Fetch:    24.936k i/100ms
Calculating -------------------------------------
                Add:    306.576k (± 4.2%) i/s -      1.555M in   5.082280s
              Fetch:    316.506k (± 4.4%) i/s -      1.596M in   5.052030s

=========== 16 elements =============
Warming up --------------------------------------
                Add:    63.316k i/100ms
              Fetch:    67.519k i/100ms
Calculating -------------------------------------
                Add:      1.243M (± 2.8%) i/s -      6.268M in   5.048272s
              Fetch:      1.438M (± 3.1%) i/s -      7.224M in   5.029939s

=========== 100 elements =============
Warming up --------------------------------------
                Add:    19.133k i/100ms
              Fetch:    20.735k i/100ms
Calculating -------------------------------------
                Add:    233.558k (± 1.4%) i/s -      1.186M in   5.079982s
              Fetch:    259.941k (± 3.8%) i/s -      1.306M in   5.033026s

=========== 1_000 elements =============
Warming up --------------------------------------
                Add:     1.846k i/100ms
              Fetch:     1.880k i/100ms
Calculating -------------------------------------
                Add:     19.931k (± 9.9%) i/s -     99.684k in   5.058294s
              Fetch:     21.963k (± 4.1%) i/s -    110.920k in   5.059002s

=========== 10_000 elements =============
Warming up --------------------------------------
                Add:   182.000  i/100ms
              Fetch:   201.000  i/100ms
Calculating -------------------------------------
                Add:      1.960k (± 3.9%) i/s -      9.828k in   5.021143s
              Fetch:      2.060k (± 3.0%) i/s -     10.452k in   5.077663s

Benchmark open addressing with HashEntry elements, no linear search

=========== 3 elements =============
Warming up --------------------------------------
                Add:    88.846k i/100ms
              Fetch:    89.483k i/100ms
Calculating -------------------------------------
                Add:      3.946M (± 5.4%) i/s -     19.724M in   5.015148s
              Fetch:      4.206M (± 5.1%) i/s -     21.028M in   5.015265s
=========== 15 elements =============
Warming up --------------------------------------
                Add:    61.520k i/100ms
              Fetch:    59.891k i/100ms
Calculating -------------------------------------
                Add:      1.186M (± 3.2%) i/s -      5.967M in   5.034577s
              Fetch:      1.113M (± 8.4%) i/s -      5.570M in   5.042245s

=========== 16 elements =============
Warming up --------------------------------------
                Add:    54.942k i/100ms
              Fetch:    60.876k i/100ms
Calculating -------------------------------------
                Add:      1.060M (± 6.6%) i/s -      5.329M in   5.050794s
              Fetch:      1.119M (± 4.2%) i/s -      5.600M in   5.012535s

=========== 100 elements =============
Warming up --------------------------------------
                Add:    16.894k i/100ms
              Fetch:    17.104k i/100ms
Calculating -------------------------------------
                Add:    197.118k (± 5.6%) i/s -    996.746k in   5.073804s
              Fetch:    184.110k (± 6.7%) i/s -    923.616k in   5.038682s

=========== 1_000 elements =============
Warming up --------------------------------------
                Add:     2.071k i/100ms
              Fetch:     2.644k i/100ms
Calculating -------------------------------------
                Add:     19.272k (± 3.6%) i/s -     97.337k in   5.057518s
              Fetch:     28.354k (± 9.2%) i/s -    142.776k in   5.081599s

=========== 10_000 elements =============
Warming up --------------------------------------
                Add:   186.000  i/100ms
              Fetch:   254.000  i/100ms
Calculating -------------------------------------
                Add:      1.926k (± 4.2%) i/s -      9.672k in   5.031638s
              Fetch:      3.073k (± 5.0%) i/s -     15.494k in   5.054577s

Benchmark master

=========== 3 elements =============
Warming up --------------------------------------
                Add:    84.669k i/100ms
              Fetch:    90.737k i/100ms
Calculating -------------------------------------
                Add:      3.554M (± 7.2%) i/s -     17.696M in   5.007616s
              Fetch:      3.981M (± 5.9%) i/s -     19.871M in   5.009714s

=========== 15 elements =============
Warming up --------------------------------------
                Add:    59.260k i/100ms
              Fetch:    60.791k i/100ms
Calculating -------------------------------------
                Add:      1.034M (± 6.7%) i/s -      5.156M in   5.011235s
              Fetch:      1.221M (± 4.2%) i/s -      6.140M in   5.038107s

=========== 16 elements =============
Warming up --------------------------------------
                Add:    57.091k i/100ms
              Fetch:    58.878k i/100ms
Calculating -------------------------------------
                Add:      1.063M (± 4.2%) i/s -      5.309M in   5.001955s
              Fetch:      1.148M (± 3.9%) i/s -      5.770M in   5.036016s

=========== 100 elements =============
Warming up --------------------------------------
                Add:    13.132k i/100ms
              Fetch:    15.879k i/100ms
Calculating -------------------------------------
                Add:    171.201k (± 5.3%) i/s -    866.712k in   5.077764s
              Fetch:    190.280k (± 3.4%) i/s -    952.740k in   5.013155s

=========== 1_000 elements=============
Warming up --------------------------------------
                Add:     2.091k i/100ms
              Fetch:     2.384k i/100ms
Calculating -------------------------------------
                Add:     18.318k (± 4.8%) i/s -     92.004k in   5.034774s
              Fetch:     24.077k (± 8.3%) i/s -    121.584k in   5.085813s

=========== 10_000 elements =============
Warming up --------------------------------------
                Add:   159.000  i/100ms
              Fetch:   207.000  i/100ms
Calculating -------------------------------------
                Add:      1.651k (± 4.4%) i/s -      8.268k in   5.018704s
              Fetch:      2.065k (± 6.1%) i/s -     10.350k in   5.030614s

Summary

Looks like linear search approach for 15 elements (the max size for using linear search in MRI) is quite slower than hashing (open/closed addressing). We could try to use a smaller array (e.g. only using for 8 elements) with the cost of adding one more resize.

Also 1_000 / 10_000 elements seems to be slower without HashEntry objects. Maybe because the array is general bigger?
For 1_000 elements closed addressing (master) seems even to be a little faster, no idea why this could be?

I don't have a version for linear search with HashEntry objects, @enebo do you think it makes sense to implement such a version for comparison?

Thanks

@lopex

This comment has been minimized.

Copy link
Member

lopex commented Jun 20, 2018

@ChrisBr @enebo just looking at https://github.com/ruby/ruby/blob/trunk/st.c#L73 is enough to get rid of that hash cache value for me. If we get rid of it we can store the pairs as values in the bin array

@ChrisBr

This comment has been minimized.

Copy link
Contributor Author

ChrisBr commented Jun 21, 2018

@ChrisBr @enebo just looking at https://github.com/ruby/ruby/blob/trunk/st.c#L73 is enough to get rid of that hash cache value for me. If we get rid of it we can store the pairs as values in the bin array

We already got rid of the hash cache value as we only store the key and values in the entries array. The bin array contains the index of the location in the entries array. So you would suggest to store key & value in the bins and entries array? Or getting rid of one array? Getting rid is probably not possible as the one makes it possible to access the key/values in the insertion order (entries) and the other one by the hash key index.

cc @lopex

@ChrisBr

This comment has been minimized.

Copy link
Contributor Author

ChrisBr commented Jun 21, 2018

So I changed the boundary for linear search to 8 elements as 16 seemed clearly too high. These are the results for 7 elements in the hash.

Open Addressing without RubyHashEntry

Warming up --------------------------------------
                Add:    83.871k i/100ms
              Fetch:    86.148k i/100ms
Calculating -------------------------------------
                Add:      2.060M (± 3.8%) i/s -     10.316M in   5.016326s
              Fetch:      2.129M (± 4.3%) i/s -     10.682M in   5.027352s

Linear search without RubyHashEntry (this PR)

Warming up --------------------------------------
                Add:    77.096k i/100ms
              Fetch:    78.024k i/100ms
Calculating -------------------------------------
                Add:      2.752M (± 4.5%) i/s -     13.723M in   4.999213s
              Fetch:      2.533M (± 4.6%) i/s -     12.640M in   5.001592s

master

Warming up --------------------------------------
                Add:    85.253k i/100ms
              Fetch:    85.682k i/100ms
Calculating -------------------------------------
                Add:      2.757M (± 6.0%) i/s -     13.726M in   4.997771s
              Fetch:      3.062M (± 4.3%) i/s -     15.337M in   5.018572s

Adding seems to be around the same for master and linear search but fetching is still faster with current master.

@ChrisBr

This comment has been minimized.

Copy link
Contributor Author

ChrisBr commented Jun 21, 2018

I also tried the fast skip bucket approach with calculating the hash on the fly (because we don't store it in the RubyHashEntry anymore), but is seems to be a lot slower:

For linear search with 7 elements (we need to calculate the hash two times because we don't need to calculate the hash to find the bucket as we just iterate)

Warming up --------------------------------------
                Add:    75.843k i/100ms
              Fetch:    61.195k i/100ms
Calculating -------------------------------------
                Add:      2.705M (± 3.8%) i/s -     13.500M in   4.999016s
              Fetch:    745.768k (± 3.0%) i/s -      3.733M in   5.010025s

For open addressing with 16 & 100 elements (we need to calculate the hash only once as we already have it calculated to find the bucket)

Warming up --------------------------------------
                Add:    63.963k i/100ms
              Fetch:    54.923k i/100ms
Calculating -------------------------------------
                Add:      1.391M (± 3.1%) i/s -      6.972M in   5.018705s
              Fetch:      1.085M (± 3.0%) i/s -      5.437M in   5.017587s
Warming up --------------------------------------
                Add:    16.932k i/100ms
              Fetch:    14.819k i/100ms
Calculating -------------------------------------
                Add:    245.067k (± 2.5%) i/s -      1.236M in   5.046987s
              Fetch:    180.837k (± 2.4%) i/s -    903.959k in   5.001831s

See #5215 (comment) for reference

@headius

This comment has been minimized.

Copy link
Member

headius commented Jun 26, 2018

Ok I'm back :-)

Thoughts on what we else we might try to speed up fetches? I think overall I want to go forward with this PR, since the memory sizing and locality will probably make up for a minor degradation in straight-line perf.

@ChrisBr

This comment has been minimized.

Copy link
Contributor Author

ChrisBr commented Jun 29, 2018

@headius great you're back, hope you had amazing vacations 🌴 🌞

I just saw your conversation on IRC.

For linear search, we currently store everything in the entries array (without a bin array) like this:

entries
key
value
another key
another value
...

So we do not cache the hash anymore which means we always do full object comparison (eql) for every entry when we iterate over it (fetch, add, delete ...). If we cache the hash, we could do a quick skip and only check the hash. The idea I discussed with @lopex was now:

entries
key
value
hash
another key
another value
another hash
...

However, as we need to store objects, we need to store the hash as boxed int. The other idea you mentioned could also be to introduce another hash array, something like this:

entries
key
value
another key
another value
...
hashes
hash
another hash
...

This would introduce some indirection.

So far I'm about to introduce the first approach (not done yet) to see the impact.

We also still have the RubyHashEntry as intermediate objects which we should git rid of.

In general, do we want to merge everything altogether or first only the linear search or what is our preferred way to go forward?

@ChrisBr

This comment has been minimized.

Copy link
Contributor Author

ChrisBr commented Jul 3, 2018

So I tried the hash caching from #5215 (comment)

Result storing hash in the entries array (boxing of the hash int value):

Warming up --------------------------------------
                Add:    75.137k i/100ms
              Fetch:    74.403k i/100ms
Calculating -------------------------------------

                Add:      1.836M (± 4.9%) i/s -      9.167M in   5.007020s

              Fetch:      1.921M (± 5.8%) i/s -      9.598M in   5.015082s

For comparison, this is the result without any hash caching:

Warming up --------------------------------------
                Add:    77.096k i/100ms
              Fetch:    78.024k i/100ms
Calculating -------------------------------------
                Add:      2.752M (± 4.5%) i/s -     13.723M in   4.999213s
              Fetch:      2.533M (± 4.6%) i/s -     12.640M in   5.001592s

edit: updated to make the result more clear

@eregon

This comment has been minimized.

Copy link
Member

eregon commented Jul 5, 2018

Just for information, TruffleRuby uses the same layout for small Hashes, a Object[] with (hash, key, value) per entry: https://github.com/oracle/truffleruby/blob/41797d45bd6dc95af17c8829a7dd0e9c4ef3a207/src/main/java/org/truffleruby/core/hash/PackedArrayStrategy.java#L30-L32
http://chrisseaton.com/truffleruby/small-data-structures/ explains it and related optimizations in more details.

@ChrisBr

This comment has been minimized.

Copy link
Contributor Author

ChrisBr commented Jul 5, 2018

@eregon thanks, very interesting read!

@headius

This comment has been minimized.

Copy link
Member

headius commented Jul 12, 2018

@ChrisBr Today @enebo pointed out to me that these benchmarks are probably a best case scenario for hash, which might make the real cost of eliminating the cache a bit harder to see. If these were user-defined objects, for example, they'd have a nontrivial hash calculation cost plus dynamic dispatch plus a Ruby method, which would be much more overhead than say, Float#hash.

This is just more justification for doing the tagalong long[] for caching hash values, I guess.

@ChrisBr

This comment has been minimized.

Copy link
Contributor Author

ChrisBr commented Jul 13, 2018

@headius @enebo I updated the PR and here are the latest numbers:

Here are the latest numbers:

7 Entries

=========== PR =============
Warming up --------------------------------------
                Add:    53.919k i/100ms
              Fetch:    58.226k i/100ms
Calculating -------------------------------------
                Add:      1.146M (± 4.5%) i/s -      5.715M in   4.997110s
              Fetch:      1.190M (± 3.9%) i/s -      5.939M in   4.998983s

=========== master =============
Warming up --------------------------------------
                Add:    54.473k i/100ms
              Fetch:    55.501k i/100ms
Calculating -------------------------------------
                Add:      1.238M (± 4.7%) i/s -      6.210M in   5.027007s
              Fetch:      1.150M (± 3.9%) i/s -      5.772M in   5.027577s

50

=========== PR =============
Warming up --------------------------------------
                Add:    12.941k i/100ms
              Fetch:    14.554k i/100ms
Calculating -------------------------------------
                Add:    168.951k (± 4.7%) i/s -    854.106k in   5.067267s
              Fetch:    177.780k (± 4.0%) i/s -    887.794k in   5.002208s

=========== master =============
Warming up --------------------------------------
                Add:    10.658k i/100ms
              Fetch:    11.819k i/100ms
Calculating -------------------------------------
                Add:    146.557k (± 4.2%) i/s -    735.402k in   5.027122s
              Fetch:    139.718k (± 3.3%) i/s -    709.140k in   5.081481s

100

=========== PR =============
Warming up --------------------------------------
                Add:     6.364k i/100ms
              Fetch:     7.673k i/100ms
Calculating -------------------------------------
                Add:     86.035k (± 4.1%) i/s -    432.752k in   5.038878s
              Fetch:     89.486k (± 3.5%) i/s -    452.707k in   5.065653s

=========== master =============
Warming up --------------------------------------
                Add:     5.573k i/100ms
              Fetch:     5.952k i/100ms
Calculating -------------------------------------
                Add:     70.386k (± 4.8%) i/s -    356.672k in   5.079609s
              Fetch:     63.540k (± 5.0%) i/s -    321.408k in   5.071384s

1_000

=========== PR =============
Warming up --------------------------------------
                Add:   627.000  i/100ms
              Fetch:   641.000  i/100ms
Calculating -------------------------------------
                Add:      7.508k (± 4.5%) i/s -     37.620k in   5.021288s
              Fetch:      7.404k (± 3.7%) i/s -     37.178k in   5.028579s

=========== master =============
Warming up --------------------------------------
                Add:   494.000  i/100ms
              Fetch:   535.000  i/100ms
Calculating -------------------------------------
                Add:      5.878k (± 7.0%) i/s -     29.640k in   5.069555s
              Fetch:      5.921k (± 5.8%) i/s -     29.960k in   5.078294s

10_000

=========== PR =============
Warming up --------------------------------------
                Add:    58.000  i/100ms
              Fetch:    61.000  i/100ms
Calculating -------------------------------------
                Add:    735.166  (± 4.9%) i/s -      3.712k in   5.061558s
              Fetch:    747.408  (± 5.1%) i/s -      3.782k in   5.073927s

=========== master =============
Warming up --------------------------------------
                Add:    45.000  i/100ms
              Fetch:    50.000  i/100ms
Calculating -------------------------------------
                Add:    525.851  (± 4.0%) i/s -      2.655k in   5.057292s
              Fetch:    524.928  (± 4.6%) i/s -      2.650k in   5.059683s

Summary

  • almost all RubyHashEntries (except where we need it to not break API)
  • small hashes (less than 8 entries) maintain now a hashes array for caching the hash values
  • code is almost in a mergable state, there are just a few Todo's marked in the code where I need some feedback

@ChrisBr ChrisBr force-pushed the ChrisBr:performance/hash branch from a0afbdc to 09965f3 Jul 13, 2018

@ChrisBr

This comment has been minimized.

Copy link
Contributor Author

ChrisBr commented Jul 13, 2018

This is the script I used for benchmarking. Compared to the previous script, I switched to strings as the hashes for Fixnum is basically the value which made a really bad distribution in the hash.

require 'benchmark/ips'
n = 7
tmp = {}
Benchmark.ips do |x|
  x.report("Add:") { i = 0; while i < n; tmp[i.to_s] = true; i += 1; end }
  x.report("Fetch:") { i = 0; while i < n; tmp[i.to_s]; i += 1; end }
end
@ChrisBr

This comment has been minimized.

Copy link
Contributor Author

ChrisBr commented Jul 13, 2018

Here are the results with Fixnum as keys:

7 entries

=========== PR =============
Warming up --------------------------------------
                Add:    78.345k i/100ms
              Fetch:    88.949k i/100ms
Calculating -------------------------------------
                Add:      1.992M (± 3.9%) i/s -      9.950M in   5.003316s
              Fetch:      2.813M (± 4.6%) i/s -     14.054M in   5.008151s

=========== master =============
Warming up --------------------------------------
                Add:    78.743k i/100ms
              Fetch:    91.848k i/100ms
Calculating -------------------------------------
                Add:      2.086M (± 4.2%) i/s -     10.473M in   5.030410s
              Fetch:      2.358M (± 3.9%) i/s -     11.848M in   5.032691s

100 entries

=========== PR =============
Warming up --------------------------------------
                Add:    17.493k i/100ms
              Fetch:    20.460k i/100ms
Calculating -------------------------------------
                Add:    198.844k (± 2.7%) i/s -    997.101k in   5.018367s
              Fetch:    279.817k (± 2.8%) i/s -      1.412M in   5.049486s

=========== master =============
Warming up --------------------------------------
                Add:    19.260k i/100ms
              Fetch:    20.710k i/100ms
Calculating -------------------------------------
                Add:    255.548k (± 6.9%) i/s -      1.271M in   4.998424s
              Fetch:    243.930k (± 6.0%) i/s -      1.222M in   5.026877s

1_000 entries

=========== PR =============
Warming up --------------------------------------
                Add:     1.908k i/100ms
              Fetch:     2.194k i/100ms
Calculating -------------------------------------
                Add:     23.979k (± 4.3%) i/s -    120.204k in   5.023167s
              Fetch:     25.095k (± 3.8%) i/s -    127.252k in   5.078665s
=========== master =============
Warming up --------------------------------------
                Add:     1.859k i/100ms
              Fetch:     2.124k i/100ms
Calculating -------------------------------------
                Add:     22.449k (± 4.2%) i/s -    113.399k in   5.061254s
              Fetch:     18.735k (± 3.7%) i/s -     95.580k in   5.109285s

10_000 entries

=========== PR =============
Warming up --------------------------------------
                Add:   174.000  i/100ms
              Fetch:   211.000  i/100ms
Calculating -------------------------------------
                Add:      1.966k (± 3.8%) i/s -      9.918k in   5.051684s
              Fetch:      2.300k (± 4.6%) i/s -     11.605k in   5.056634s

=========== master =============
Warming up --------------------------------------
                Add:   182.000  i/100ms
              Fetch:   206.000  i/100ms
Calculating -------------------------------------
                Add:      2.114k (± 5.9%) i/s -     10.738k in   5.098915s
              Fetch:      2.007k (± 6.9%) i/s -     10.094k in   5.053988s
@ChrisBr

This comment has been minimized.

Copy link
Contributor Author

ChrisBr commented Jul 13, 2018

As requested, here are some numbers with a Struct as key:

require 'benchmark/ips'

Struct.new("Key", :number)
[10000].each do |n|
  puts "=========== #{n} ============="
  tmp = {}
  Benchmark.ips do |x|
    x.report("Add:") { i = 0; while i < n; tmp[Struct::Key.new(i)] = true; i += 1; end }
    x.report("Fetch:") { i = 0; while i < n; tmp[Struct::Key.new(i)]; i += 1; end }
  end
end

7 entries

=========== PR =============
Warming up --------------------------------------
                Add:    24.410k i/100ms
              Fetch:    25.626k i/100ms
Calculating -------------------------------------
                Add:    363.832k (± 4.6%) i/s -      1.831M in   5.043230s
              Fetch:    368.314k (± 4.6%) i/s -      1.845M in   5.021579s
=========== master =============
Warming up --------------------------------------
                Add:    25.832k i/100ms
              Fetch:    27.607k i/100ms
Calculating -------------------------------------
                Add:    396.205k (± 4.8%) i/s -      1.989M in   5.033827s
              Fetch:    378.239k (± 4.4%) i/s -      1.905M in   5.046487s

100 entries

=========== PR =============
Warming up --------------------------------------
                Add:     2.708k i/100ms
              Fetch:     3.210k i/100ms
Calculating -------------------------------------
                Add:     31.264k (± 4.3%) i/s -    157.064k in   5.033652s
              Fetch:     30.637k (± 3.4%) i/s -    154.080k in   5.035370s

=========== master =============
Warming up --------------------------------------
                Add:     2.234k i/100ms
              Fetch:     2.439k i/100ms
Calculating -------------------------------------
                Add:     26.292k (± 3.9%) i/s -    131.806k in   5.021136s
              Fetch:     24.783k (± 4.0%) i/s -    124.389k in   5.027655s

1_000 entries

=========== PR =============
Warming up --------------------------------------
                Add:   244.000  i/100ms
              Fetch:   265.000  i/100ms
Calculating -------------------------------------
                Add:      2.747k (± 4.3%) i/s -     13.908k in   5.072170s
              Fetch:      2.648k (± 4.3%) i/s -     13.250k in   5.013028s

=========== master =============
Warming up --------------------------------------
                Add:   222.000  i/100ms
              Fetch:   220.000  i/100ms
Calculating -------------------------------------
                Add:      2.388k (± 4.5%) i/s -     11.988k in   5.030474s
              Fetch:      2.325k (± 6.4%) i/s -     11.660k in   5.036870s

10_000 entries

=========== PR =============
Warming up --------------------------------------
                Add:    23.000  i/100ms
              Fetch:    24.000  i/100ms
Calculating -------------------------------------
                Add:    256.426  (± 5.1%) i/s -      1.288k in   5.037974s
              Fetch:    248.294  (± 6.0%) i/s -      1.248k in   5.046306s

=========== master =============
Warming up --------------------------------------
                Add:    20.000  i/100ms
              Fetch:    19.000  i/100ms
Calculating -------------------------------------
                Add:    207.432  (± 4.3%) i/s -      1.040k in   5.024066s
              Fetch:    200.244  (± 4.5%) i/s -      1.007k in   5.039651s
@headius

This comment has been minimized.

Copy link
Member

headius commented Jul 14, 2018

Ok I have some great news!

I ran a little keyword argument benchmark against 9.2, 9.2.1, and 9.2.1 with this PR. The results are extremely promising:

9.2 on Hotspot C2: 1.40s
9.2 on Graal JIT: 0.61s
9.2.1 on Hotspot C2: 1.36s
9.2.1 on Graal JIT: 0.30s
9.2.1+PR on Hotspot: 1.38s
9.2.1+PR on Graal JIT: 0.10s

So while these changes have a small perf drop on Hotspot C2 versus non-PR 9.2.1 (potentially within margin of error) they are in fact MUCH faster on Graal: around 3x, which is in turn 7x faster than MRI and 14x faster than 9.2 on Hotspot C2.

If this is passing tests, I propose we land it now so it can bake for a while on master. I've pushed the PR contents to my improve_kwargs branch, which includes some tweaks for kwarg processing.

@ChrisBr This is very exciting!

@ChrisBr ChrisBr force-pushed the ChrisBr:performance/hash branch from 09965f3 to 5ae7260 Jul 14, 2018

ChrisBr added some commits Jul 19, 2018

@ChrisBr ChrisBr force-pushed the ChrisBr:performance/hash branch from f12d7fc to fc561b1 Aug 4, 2018

ChrisBr added some commits Aug 4, 2018

Remove small hash
which basically just created a linked list.
This approach is not possible anymore with open addressing.
As open addressing hashes are anyway compacter, we don't need it anymore.

@ChrisBr ChrisBr force-pushed the ChrisBr:performance/hash branch from 360e1e9 to 6ac904d Aug 5, 2018

@headius headius added this to the JRuby 9.2.1.0 milestone Aug 6, 2018

@headius

This comment has been minimized.

Copy link
Member

headius commented Aug 6, 2018

Ok, @enebo and I see no reason not to go ahead and merge!

@headius headius merged commit e483fb2 into jruby:master Aug 6, 2018

1 check failed

continuous-integration/travis-ci/pr The Travis CI build failed
Details
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment