-
Notifications
You must be signed in to change notification settings - Fork 138
-
Notifications
You must be signed in to change notification settings - Fork 138
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Investigate using an external hash library #244
Comments
I set this ticket up as a task for GCI 2010. I'm hoping some intrepid young student can come up with a good list of information and recommendations for us to follow. If we can't make some kind of decision after that, we need to close this old ticket. |
From GCI student Atanas:
So there's that input. I'm going to reassign this RFC to cotto while we figure out if we want to move to a new hashing library and, if so, which one. (Fixing formatting -benabik) |
LSHKIT is for locality sensitive hashing, CMPH for minimal perfect hashing and Mhash for cryptographic hash functions. That's not what we need. libghthash http://www.ipd.bth.se/ska/sim_home/libghthash.html would be an example for a useful hash table library. But I don't think we could gain much by using an external library. If we want to optimize our current implementation, I'd suggest a dead simple hash table with open addressing and linear probing. That would give us at most one dcache miss for practically all reads if we make sure that the fill factor stays low enough by rehashing. Such a hash table would only require 2 words per entry vs. 4 words per entry for our current chained implementation. So even if we limit the fill factor to 50%, we wouldn't need more memory than now. We'd need two additional bits to mark occupied and deleted entries. For pointer keys or values we could store them in the low bits of the pointers. For integer keys and values, we'd need an additional array for those flags. |
I was poking around for information on hash libraries and came across this comparison page: http://attractivechaos.wordpress.com/2008/08/28/comparison-of-hash-table-libraries/ Of particular interest to me is the "khash (C)" line in the table, which is his one include file hash library that allows for type specific hashes in C (via preprocessor fun). It seems to perform very well in both space and time. It's MIT licensed, so I don't know if that would be a problem to include in parrot. |
This task is too open-ended and vague to be actionable. We've clearly looked through several hashing libraries and listed the results of those inquiries above and in other places. The questions to be pursued now are: Do we want to use a hashing library internally instead of our hand-rolled variants, and what exactly are the use-cases we want it for? I'm closing this ticket since the "discovery phase" is over now. We can pursue more concrete issues later as necessary. |
We spend a lot of cycles on our hand-rolled hashing, both in terms of developer effort and CPU.
It would be nice if we could use an off the shelf hash package.
Any package under consideration would have to have a compatible license, and work on our minimal core platforms.
Originally http://trac.parrot.org/parrot/ticket/61
The text was updated successfully, but these errors were encountered: