Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Investigate using an external hash library #244

Closed
coke opened this issue Dec 18, 2008 · 5 comments
Closed

Investigate using an external hash library #244

coke opened this issue Dec 18, 2008 · 5 comments

Comments

@coke
Copy link
Contributor

coke commented Dec 18, 2008

We spend a lot of cycles on our hand-rolled hashing, both in terms of developer effort and CPU.

It would be nice if we could use an off the shelf hash package.

Any package under consideration would have to have a compatible license, and work on our minimal core platforms.

Originally http://trac.parrot.org/parrot/ticket/61

@Whiteknight
Copy link
Contributor

I set this ticket up as a task for GCI 2010. I'm hoping some intrepid young student can come up with a good list of information and recommendations for us to follow. If we can't make some kind of decision after that, we need to close this old ticket.

@Whiteknight
Copy link
Contributor

From GCI student Atanas:

I checked some hashing libraries and I chose the following:
LIBHASHISH

  • Build-in key support for char arrays (strings) and uint_{8,16,32} types and support for own (possible complex) key data types
  • Support rbtree's as collision strategy instead of bucked lists (avoid worst case O(n) scenario)
  • Dynamic or manual adjustable table size (reordering if proportion entries/table_size is unfavorable)
  • Build as an static or dynamic library (.a and .so)
  • Iterator support (equivalent to ruby's hash.each() method)
  • Thread clean - fine-grained lock mechanisms (mutex locks, reader writer locks, ...)
  • Bloom filter implementation
  • Many built-in hash algorithms (from trivial algorithms till cryptographic ones)
  • Architecture clean - runs on 32bit machines as well as 64bit machines
  • As lightweight as possible - no bloated code
  • Makefile test target plus benchmark applications for comparing the different hashing algorithm
  • Dual licensed under the GNU General Public and BSD License
    Website is http://libhashish.sourceforge.net/. I think this is the best hashing library for your project.

So there's that input. I'm going to reassign this RFC to cotto while we figure out if we want to move to a new hashing library and, if so, which one.

(Fixing formatting -benabik)

@nwellnhof
Copy link
Member

LSHKIT is for locality sensitive hashing, CMPH for minimal perfect hashing and Mhash for cryptographic hash functions. That's not what we need.

libghthash  http://www.ipd.bth.se/ska/sim_home/libghthash.html would be an example for a useful hash table library.

But I don't think we could gain much by using an external library. If we want to optimize our current implementation, I'd suggest a dead simple hash table with open addressing and linear probing. That would give us at most one dcache miss for practically all reads if we make sure that the fill factor stays low enough by rehashing.

Such a hash table would only require 2 words per entry vs. 4 words per entry for our current chained implementation. So even if we limit the fill factor to 50%, we wouldn't need more memory than now.

We'd need two additional bits to mark occupied and deleted entries. For pointer keys or values we could store them in the low bits of the pointers. For integer keys and values, we'd need an additional array for those flags.

@Benabik
Copy link
Member

Benabik commented Jan 25, 2012

I was poking around for information on hash libraries and came across this comparison page: http://attractivechaos.wordpress.com/2008/08/28/comparison-of-hash-table-libraries/

Of particular interest to me is the "khash (C)" line in the table, which is his one include file hash library that allows for type specific hashes in C (via preprocessor fun). It seems to perform very well in both space and time. It's MIT licensed, so I don't know if that would be a problem to include in parrot.

@Whiteknight
Copy link
Contributor

This task is too open-ended and vague to be actionable. We've clearly looked through several hashing libraries and listed the results of those inquiries above and in other places. The questions to be pursued now are: Do we want to use a hashing library internally instead of our hand-rolled variants, and what exactly are the use-cases we want it for? I'm closing this ticket since the "discovery phase" is over now. We can pursue more concrete issues later as necessary.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

4 participants