Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

better instance Hashable IntSet? #269

Open
jwaldmann opened this issue Jul 17, 2023 · 5 comments
Open

better instance Hashable IntSet? #269

jwaldmann opened this issue Jul 17, 2023 · 5 comments

Comments

@jwaldmann
Copy link

I have an application that uses HashTable IntSet _ (see https://gitlab.imn.htwk-leipzig.de/waldmann/fdaln/-/blob/master/DFA.hs for benchmark - concluding that HashTable is a good choice, and actual use case https://gitlab.imn.htwk-leipzig.de/waldmann/presburger-arithmetic )

I find that it spends quite some time in Data.HashMap.Internal.hash, evaluating (what I assume to be inlined)

instance Hashable IntSet.IntSet where
    hashWithSalt salt x = IntSet.foldl' hashWithSalt
        (hashWithSalt salt (IntSet.size x))
        x

Can we do better here? If the IntSet is small, all the information is in (very few, often just one) Word64. The above implementation will unpack those - via Data.IntSet.Internals.fold'Bits which (I guess) is not fully inlined (it has a recursive go function. In theory, that could be unrolled, then fused ...)

Well, such an instance must then go into containers (not hashable) since it accesses the internal representation?

I may find some time (or a student) to work on this but I wanted to get some general comment and input first.

@phadej
Copy link
Contributor

phadej commented Jul 17, 2023

You are correct, the current implementation doesn't dig into internals of IntSet on purpose.

It's probably possible to be more efficient if IntSet structure is used, after all hash @IntSet xs doesn't need to be the same as hash (size xs, toList xs) (which current code computes), as long as it the value respects the equality, includes size (structure bits, to prevent issues like #74) and a digest of elements.

But such hashIntSet, which uses internals, should live in containers indeed.

@treeowl
Copy link

treeowl commented Jul 17, 2023

Yes, we definitely should do something custom. Where it belongs is going to be awkward. There are two good options I see.

  1. Split containers-internals from containers (which I want to do anyway) and then have hashable depend on the former. I like this best.
  2. Expose some sort of non-hashable-specific hashing function from containers. Don't like.

Things have been really crazy for me the last couple weeks, so I can't guarantee I'll be able to get (1) done as soon as you might like, but I'll try.

@phadej
Copy link
Contributor

phadej commented Jul 17, 2023

Split containers-internals from containers (which I want to do anyway) and then have hashable depend on the former. I like this best.

This is not much better than having hashable depend on containers internals as now.

containers-internals would be a properly versioned package so that is a good thing (but I think you can just make containers Internals module be versioned - when you really changed something there?)

On the other hand https://hackage.haskell.org/package/containers-0.6.7/docs/Data-IntSet-Internal.html#t:IntSet is completely undocumented. I have no idea how to I would start, so I repeat: having versioned internals won't do it alone, unless they are properly documented so they can be actually used by non-containers authors.

There is precedence for "Expose some sort of non-hashable-specific hashing function" however. For example https://hackage.haskell.org/package/base-4.18.0.0/docs/Data-Typeable.html#v:typeRepFingerprint makes it possible to implement Hashable TypeRep without knowing anything about (somewhat complicated) TypeRep internal structure.

@treeowl
Copy link

treeowl commented Jul 18, 2023

Splitting containers would include removing those warning notes. We could follow Typeable and produce a 128-bit fingerprint from which to generate salted hashes, but that seems like overkill.

@phadej
Copy link
Contributor

phadej commented Jul 18, 2023

Typeable's Fingeprint was just an example. (They produce fingerprints not only for hashing but also to implement equality - and the fingerprints are cached.) The point was that fingerprint is there, so Hashable can be written too.

There won't be internal-using Hashable IntSet in hashable until the internals

  • Are versioned
  • Are properly documented

The two above is more work then just adding hashIntSet to containers. But it's your choice.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants