wrapper for google's hash functions, for ruby
C++ HTML C Shell Ruby Makefile Other
Switch branches/tags
Nothing to show
Latest commit 19264e6 Dec 15, 2015 @rdp cleanup
Permalink
Failed to load latest commit information.
ext cleanup Dec 15, 2015
spec cleanup Dec 15, 2015
ChangeLog.txt changelog Oct 12, 2015
README readme May 28, 2014
Rakefile rakefile Jan 21, 2015
TODO ready for release May 28, 2014
VERSION Version bump to 0.8.9 Oct 12, 2015
google_hash.gemspec Regenerate gemspec for version 0.8.9 Oct 12, 2015
results.txt update readme for consumption Oct 10, 2012
to_build_locally_run_ext_go_bat possibly work with more compilers Jan 2, 2013

README

The "google_hash" gem.

Its goal. To boldly be faster than any ruby hash has before (cue star trek TNG theme...).

Well, really the goal is a better hash for Ruby, either one that is faster or more space efficient than ruby's default.  
To attempt to accomplish this, this library wraps the google hash sparse and dense hashes [1], which may perform better
for your use case [make sure to benchmark before and after!].  
It also creates some "specialized" hashes, for instance, those that take an integer for their key, for even better performance
and decreased "garbage collected" RAM use, which can significantely speed up certain apps.

# how to run this benchmark:

$ cd ext && ruby extconf.rb && make
$ cd ../spec
$ ruby benchmark.rb

ruby 1.9.3p194 (2012-04-20 revision 35410) [i686-linux]
inserting 400000 objects

Ruby Standard Hash
populate integer        0.324
#each                   0.660
lookup int              0.083

GoogleHashDenseIntToRuby
populate integer        0.114
#each                   0.050
lookup int              0.080


GoogleHashSparseIntToInt # said to be more memory efficient
populate integer        0.242
#each                   0.046
lookup int              0.099

These also use less memory, because (if you specify IntToInt, it stores only 4 bytes per int, instead of Ruby's
usual 20 bytes), and usually less overall RAM for the hash store.
This also frees up Ruby so it doesn't hvae to garbage collect as much.  Yea!

For instance, with 1M ints, doing a GC takes this long, comparatively:
GoogleHashDenseIntToInt "dense took 0.002"
"ruby hash took 0.103"

per GC.  And those garbage collects happen all the time, so this is meant to speed those up.

See also the results.txt file for more OS benchmark results.

You can also run your own benchmarks to see how much faster it would be for you ("try before you buy"), see spec/benchmark.rb file.

The best benchmark, of course, is to integrate and run it in your own app.

Here is how it performs, if used as a "replacement" for the Ruby standard Hash:

Ruby Standard Hash
populate string         0.252
populate symbol         0.109
populate integer        0.324
#each                   0.660
lookup int              0.083
lookup string           0.642
lookup symbol           0.082

GoogleHashDenseRubyToRuby
populate string         0.312 # slower here
populate symbol         0.136
populate integer        0.172
#each                   0.077
lookup int              0.102
lookup string           0.271
lookup symbol           0.098

GoogleHashSparseRubyToRuby
populate string         0.276
populate symbol         0.102
populate integer        0.428
#each                   0.047
lookup int              0.098
lookup string           0.293
lookup symbol           0.099

basically slightly slower, except for an #each method that is *much* faster, and, as I said, RAM usage might be much better using this than the standard Ruby hash, esp. 
for the case of the Sparse version.

== Installation ==

gem install google_hash (if on windows, you'll also need the devkit installed).

== usage ==

a = GoogleHashDenseRubyToRuby.new # like a normal Ruby hash, with slightly different performance characteristics, see results.txt
b = GoogleHashDenseIntToRuby.new # :int => Ruby -- keys are stored as "native" ints, values are ruby objects
c = GoogleHashSparseIntToInt.new # :int => :int

Here's the full list of availables:

>> puts Object.constants.select{|k| k =~ /google/i}.sort; nil

GoogleHashDenseLongToLong
GoogleHashSparseLongToLong
GoogleHashDenseLongToInt
GoogleHashSparseLongToInt
GoogleHashDenseLongToRuby
GoogleHashSparseLongToRuby
GoogleHashDenseDoubleToLong
GoogleHashSparseDoubleToLong
GoogleHashDenseDoubleToInt
GoogleHashSparseDoubleToInt
GoogleHashDenseDoubleToRuby
GoogleHashSparseDoubleToRuby
GoogleHashDenseIntToLong
GoogleHashSparseIntToLong
GoogleHashDenseIntToInt
GoogleHashSparseIntToInt
GoogleHashDenseIntToRuby
GoogleHashSparseIntToRuby
GoogleHashDenseRubyToLong
GoogleHashSparseRubyToLong
GoogleHashDenseRubyToInt
GoogleHashSparseRubyToInt
GoogleHashDenseRubyToRuby
GoogleHashSparseRubyToRuby

(long is "better" than int on 64 bit systems only, on 32 bit it's the same)

and how to use them:

a[3] = 4
b[4] = 'abc'
b['abc'] = 'some complex object' 
c[3] = 4 # all you can use are ints...

Use them like "normal" hashes, the method names try to map well.

== feedback ==

If you have a desired use case that's not covered, let me know and I might well be able to code it up for you and add it.

ex: currently it uses longs internally instead of ints--if you want ints or strings added, let me know.

if you want it to remember insertion order, I could do that, too, or native "store away" strings/bignums, whatever.

Could also add vectors, vector(pairs), priority queues, floats, native bignums, other more complex types, if anybody wants me to.

This is meant to be one more tool in the rubyists toolbelt when trying to optimize speed-wise, and plans to expand to more types, but at least with this release it has a #each method.

Could add #sum methods, etc. for the numeric types, for instance.

Enjoy.

-r

[1] http://code.google.com/p/google-sparsehash

If you want to see the code/hack on it, run extconf.rb within the ext directory, to create the code it actually uses (from a template).

Related:

judy http://groups.google.com/group/ruby-talk-google/browse_thread/thread/05ed587925526a7f/314375891d12b672?lnk=raot

NArray gem : provides "native" type arrays (and X-Dimensional arrays--all in native memory, so also saves memory as this gem does)