improve Ruby's Set performance #4334

Open
kares opened this Issue Nov 24, 2016 · 18 comments

Projects

None yet

3 participants

@kares
Member
kares commented Nov 24, 2016 edited

... seems that Java's HashSet is constantly twice as fast on include? these days :

require 'set'
require 'java'
require 'benchmark'

TIMES = (ARGV[0] || 5).to_i

CONTENTS = (1..36).to_a

LOOP = 10_000_000

TIMES.times do
  Benchmark.bm(10) do |bm|
    bm.report('Ruby Set#include? hit ') do
      set = Set.new CONTENTS
      hit = 32
      LOOP.times { set.include?(hit) }
    end
    bm.report('Ruby Set#include? miss') do
      set = Set.new CONTENTS
      miss = 0
      LOOP.times { set.include?(miss) }
    end

    bm.report('HashSet#include? hit  ') do
      set = java.util.HashSet.new CONTENTS
      hit = 32
      LOOP.times { set.include?(hit) }
    end
    bm.report('HashSet#include? miss ') do
      set = java.util.HashSet.new CONTENTS
      miss = 0
      LOOP.times { set.include?(miss) }
    end
  end
end
                 user     system      total        real
Ruby Set#include? hit   1.780000   0.000000   1.780000 (  1.092435)
Ruby Set#include? miss  1.280000   0.000000   1.280000 (  1.014602)
HashSet#include? hit    0.710000   0.000000   0.710000 (  0.540562)
HashSet#include? miss   0.660000   0.000000   0.660000 (  0.524503)
                 user     system      total        real
Ruby Set#include? hit   0.850000   0.000000   0.850000 (  0.846941)
Ruby Set#include? miss  0.910000   0.000000   0.910000 (  0.901849)
HashSet#include? hit    0.480000   0.000000   0.480000 (  0.478561)
HashSet#include? miss   0.470000   0.000000   0.470000 (  0.465833)
                 user     system      total        real
Ruby Set#include? hit   0.840000   0.000000   0.840000 (  0.841024)
Ruby Set#include? miss  0.910000   0.000000   0.910000 (  0.904074)
HashSet#include? hit    0.480000   0.000000   0.480000 (  0.474359)
HashSet#include? miss   0.460000   0.000000   0.460000 (  0.465034)
                 user     system      total        real
Ruby Set#include? hit   0.850000   0.000000   0.850000 (  0.846504)
Ruby Set#include? miss  0.910000   0.000000   0.910000 (  0.905145)
HashSet#include? hit    0.470000   0.000000   0.470000 (  0.475149)
HashSet#include? miss   0.480000   0.000000   0.480000 (  0.469817)
                 user     system      total        real
Ruby Set#include? hit   0.830000   0.000000   0.830000 (  0.832843)
Ruby Set#include? miss  0.990000   0.000000   0.990000 (  0.927099)
HashSet#include? hit    0.480000   0.000000   0.480000 (  0.479107)
HashSet#include? miss   0.490000   0.000000   0.490000 (  0.468669)

NOTE: Java's Set#include? used to be slow but has been optimized along the 9K line

@kares kares added the performance label Nov 24, 2016
@enebo
Member
enebo commented Nov 28, 2016

@kares what do you plan on changing? Ruby's Set is implemented in Ruby itself and is backed by a Ruby Hash. Or are you thinking of a full Java native version?

@kares
Member
kares commented Nov 28, 2016

@enebo hey! no plan so far. thought I report - to have some feedback whether its preferable the way it is.
is there any reason why it wasn't implemented in native (it's used extensibly by Rails) besides having less to worry about ?

@enebo
Member
enebo commented Nov 28, 2016

@kares maintenance would be a primary reason but if set is significant for something as important as Rails perhaps we should be looking at things that way?

@kares
Member
kares commented Nov 29, 2016 edited

looked into whether it gets even slower with Rails (due AS patching) ... seems that Set is somehow the only Ruby std library piece they leave as is :)

                 user     system      total        real
Ruby Set#include? hit   0.660000   0.000000   0.660000 (  0.664855)
Ruby Set#include? miss  0.760000   0.000000   0.760000 (  0.753341)
HashSet#include? hit    0.470000   0.000000   0.470000 (  0.456537)
HashSet#include? miss   0.460000   0.000000   0.460000 (  0.447680)

.. with active_support/core_ext loaded - guess it's not worth the effort unless real-world use-cases shows up.

@headius
Member
headius commented Nov 29, 2016

set.rb definitely could use a rework. I've thought about writing it in Java a few times to get MAXIMUM SPEED but maintaining the "hash wrapper" behavior is tricky.

I'm actually rather impressed we only lose this much for the set.rb impl, since every operation is wrapped by Ruby code.

@headius
Member
headius commented Nov 29, 2016

Oh other rework item needed: make Set thread-safe.

@kares
Member
kares commented Nov 30, 2016

@headius yy, also kind of expected it to be slower (maybe other ops are - only micro-benched include?)
I am not sure why it should be made thread-safe - none of the core structures are (maybe concurrent-ruby should include such collection: Concurrent::Set)

@headius
Member
headius commented Dec 2, 2016

One item slowing this (and other) Set methods down is the use of Hash#[], which to IR looks like it might need a frame (for $~ in String#[]). If I modify Set#include? to call Hash#include? instead of Hash#[], things improve a bit:

before/after:

[] ~/projects/jruby $ jruby set_bench.rb 
                 user     system      total        real
Ruby Set#include? hit   2.010000   0.020000   2.030000 (  1.120571)
Ruby Set#include? miss  1.350000   0.000000   1.350000 (  1.120890)
                 user     system      total        real
Ruby Set#include? hit   0.880000   0.010000   0.890000 (  0.879133)
Ruby Set#include? miss  1.050000   0.000000   1.050000 (  1.036543)
                 user     system      total        real
Ruby Set#include? hit   0.870000   0.000000   0.870000 (  0.864939)
Ruby Set#include? miss  1.040000   0.000000   1.040000 (  1.034190)
                 user     system      total        real
Ruby Set#include? hit   0.900000   0.000000   0.900000 (  0.897126)
Ruby Set#include? miss  1.040000   0.010000   1.050000 (  1.037943)
                 user     system      total        real
Ruby Set#include? hit   0.840000   0.000000   0.840000 (  0.841677)
Ruby Set#include? miss  1.050000   0.000000   1.050000 (  1.041484)

[] ~/projects/jruby $ jruby set_bench.rb 
                 user     system      total        real
Ruby Set#include? hit   1.750000   0.020000   1.770000 (  0.967469)
Ruby Set#include? miss  1.320000   0.010000   1.330000 (  0.943466)
                 user     system      total        real
Ruby Set#include? hit   0.800000   0.000000   0.800000 (  0.804858)
Ruby Set#include? miss  0.880000   0.010000   0.890000 (  0.882824)
                 user     system      total        real
Ruby Set#include? hit   0.800000   0.000000   0.800000 (  0.800321)
Ruby Set#include? miss  0.860000   0.000000   0.860000 (  0.852134)
                 user     system      total        real
Ruby Set#include? hit   0.800000   0.000000   0.800000 (  0.800266)
Ruby Set#include? miss  0.850000   0.000000   0.850000 (  0.856337)
                 user     system      total        real
Ruby Set#include? hit   0.900000   0.000000   0.900000 (  0.799154)
Ruby Set#include? miss  0.860000   0.000000   0.860000 (  0.847760)

I'll open a bug with MRI to change that include? impl.

@enebo
Member
enebo commented Dec 2, 2016

@headius they have an optaref instr in yarv which will not dyndispatch in hash case for []. Have you timed this bench using this change with MRI?

@headius
Member
headius commented Dec 2, 2016

@enebo Bah, you're right. https://bugs.ruby-lang.org/issues/10754

They are just a bit slower on the bench if it uses Hash#include?.

They probably will reject my change, but I guess this means we could implement it using Hash#include? if we want to.

@kares
Member
kares commented Dec 2, 2016

heh, najs trick - esp. the Hash.new(false) part to get [] to return false for missing keys :)
... seems like set.rb is hacked with MRI internals in mind -> no point for JRuby not to do its own bits.

@kares
Member
kares commented Dec 7, 2016

decided to look into that -> realized that I also miss JI for Ruby's Set and the set.rb code is a bit ugly on places :

  • SortedSet doing eval on first initialize,
  • singleton class eval in Set's divide algorithm.
  • Thread.current in inspect due recursion detection (JRuby has native API for doing that)

all seem unnecessary + there's no consistency on API (Set#add seems to be an API to use for all additions but its not -> there's certainly room for improvements when going native)

hopefully I will manage to get this acceptable for Ruby 2.4 (and do not change my mind 🎱)

@kares kares self-assigned this Dec 7, 2016
@headius
Member
headius commented Dec 7, 2016

@kares Ahh, so you're looking into some set.rb improvements! That's great! Try to make sure you isolate jruby-specific stuff so merging from MRI upstream doesn't get too complicated. I'm not opposed to having a native ext if there's something really hot to go there, but the overall performance of set.rb shouldn't usually be limited by the Ruby code.

One place it may be necessary would be synchronization; if we decide we want Set to be thread-safe, adding block-based synchronization as in Monitor would be a significant performance hit. Having some native synchronization would help there.

Of course, I still hold out hope that some day we might have thread-safe Array and Hash anyway.

@headius
Member
headius commented Dec 7, 2016 edited

Oh one other thought about the [] calls that were slowing down include?: we have talked about adding pragmas in the past, and this would be a good case for it. Specifically, we could have a pragma that indicates to the IR compiler that this is NOT going to be a String#[Regexp] call that would require backref logic. It would be a way of saying "I know this method or call will never need backref".

Of course we still plan to find a way to lazily allocate space for backref only when it's needed, but for now static compiler tricks are not bad.

@enebo
Member
enebo commented Dec 7, 2016

I wonder if we could do this another way?

@lookup = Hash.method(:[])

#...

def include?(element)
  @lookup.call(@hash, element)
end

I don't neccesarily mean using unbound method it could be our own API which returns a callable thing but we will know what it is because it is just a method. IR could then know for our thing what it is and make it into a static dispatch.

@kares
Member
kares commented Dec 8, 2016

I am still interested in going native (full set.rb rewrite in Java). set.rb is not changing that much (and isn't actually used beyond some level -> surprised to no find any SortedSet usage in Rails). anyway, SortedSet is quite ugly (behaves differently whether a particular gem is installed) and could be implemented easily with Java's TreeSet.

what I mostly desire - beyond performance - is Set implements java.util.Set (like RubyArray, RubyHash do -> no collection conversion between Java <-> Ruby)

@kares
Member
kares commented Dec 8, 2016

looks promising, 9.1.6.0 (set.rb) :

                 user     system      total        real
Ruby Set#include? hit   0.870000   0.000000   0.870000 (  0.866756)
Ruby Set#include? miss  0.930000   0.000000   0.930000 (  0.912139)
HashSet#include? hit    0.500000   0.000000   0.500000 (  0.493965)
HashSet#include? miss   0.510000   0.000000   0.510000 (  0.506165)
     Set.new([1,2,3])    8.630000   0.000000   8.630000 (  8.590427)
SortedSet.new([1,2,3])  14.520000   0.030000  14.550000 ( 14.508678)
     Set#inspect         9.150000   0.010000   9.160000 (  9.137015)
SortedSet#inspect        5.900000   0.010000   5.910000 (  5.836307)
     Set#flatten         8.610000   0.010000   8.620000 (  8.602421)
     Set#each {|e| e}    1.540000   0.000000   1.540000 (  1.533745)

... native Set/SortedSet (without call site-caching) :

                 user     system      total        real
Ruby Set#include? hit   0.460000   0.000000   0.460000 (  0.461569)
Ruby Set#include? miss  0.520000   0.010000   0.530000 (  0.521436)
HashSet#include? hit    0.490000   0.000000   0.490000 (  0.486099)
HashSet#include? miss   0.460000   0.000000   0.460000 (  0.455156)
     Set.new([1,2,3])    2.080000   0.000000   2.080000 (  2.076242)
SortedSet.new([1,2,3])   4.310000   0.000000   4.310000 (  4.299778)
     Set#inspect         4.830000   0.000000   4.830000 (  4.821318)
SortedSet#inspect        4.770000   0.000000   4.770000 (  4.757981)
     Set#flatten         1.710000   0.010000   1.720000 (  1.701329)
     Set#each {|e| e}    1.690000   0.000000   1.690000 (  1.691799)

its roughly ~ the same for max compatibility (e.g. has the @hash instance-var on Set instances)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment