Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Consider sxhash as a generic function? #173

Open
mseddon opened this issue Aug 30, 2020 · 5 comments
Open

Consider sxhash as a generic function? #173

mseddon opened this issue Aug 30, 2020 · 5 comments

Comments

@mseddon
Copy link

mseddon commented Aug 30, 2020

In many situations, a domain object can be more efficiently hashed in some context (e.g. it has a giant blob of irrelevant cache data that a naive hashing algorithm would have to traverse, or there is a particular restriction that we, the designers, understand).

Should sxhash be a generic function, defaulting to the current implementation, but extendable for users of SICL based compilers?

Care must be taken with subclasses, however, so perhaps it should only apply to a particular class via a mechanism outside of generic functions, or compose the default fallback sxhash onto subclass members not implementing sxhash.

It must also always be true that when (equal x y) then (= (sxhash x) (sxhash y)), of course.

@mseddon
Copy link
Author

mseddon commented Aug 30, 2020

Hm, coming back, I worry, since following this to it's natural conclusion, now (equal x y) would also need to be a generic function. which, to me seems right; if we lived in an ideal world, but at the same time sort of hoses the entire semantics of CL... closing because the consequences are probably far hairier than I can imagine.

With enough time there may yet be gold in them hills though. But at this point it's probably not really CL.

@mseddon mseddon closed this as completed Aug 30, 2020
@Bike
Copy link
Collaborator

Bike commented Aug 30, 2020

sxhash generally does different things depending on the type of object it's given anyway, so making it generic seems reasonable to me. if a client wants to use its own implementation it can just do that, though, it's not like cleavir depends on sxhash deets

@no-defun-allowed
Copy link
Contributor

no-defun-allowed commented Aug 30, 2020

In the case of SICL's current SXHASH implementation, SXHASH is implemented as the value of an internal function which passes a hash state through limited depth-first traversal of an object. Here, SXHASH doesn't recursively call itself, and the protocol is a bit more complicated, as the implementor must combine whatever entropy-producing values they hash using the internal hash function (currently FNV-1a).

Hash tables are also a little more complicated; we use a random initial state for the hash function, so that different sets of keys will be stored differently, preventing complexity attacks. (lwn.net's Denial of service via hash collisions describes why we want this pretty well.) That also precludes using SXHASH there, and we use the aforementioned internal functions in hash tables as well.

We could make the internal hash functions generic, and either have the client provide data for the hash function to combine, or perform the combination themselves; but that probably would not manifest in specializing SXHASH itself. It might look more like the murmurhash generic function in cl-murmurhash.

Another less invasive option would be to give a specialised :hash-function to hash tables that have domain objects that should be hashed specially.

@mseddon mseddon reopened this Aug 31, 2020
@mseddon
Copy link
Author

mseddon commented Sep 8, 2020

I actually realise after some thought that sxhash as a gf could lead to all manner of subtle bugs, and the hash algorithm MUST be associated with the table. One could, for example, destroy other system hashtables by innocently adding a definition of sxhash for a particular type after it had already been inserted into another hashtable. That or you would have to re-hash all tables every time someone adds a definition to sxhash, at least. Maybe timestamp the latest change, and lazily rehash when needed.

@no-defun-allowed
Copy link
Contributor

Lazy rehashing would increase the minimum complexity of a hash table implementation quite a lot; providing a :hash-function would probably be the easiest and least dangerous solution.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants