You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Currently, we use the MurmurHash3 hash function from the rust-fasthash (to be more similar to scikit-learn implementation). That crate also supports a number of other hash functions,
City Hash
Farm Hash
Metro Hash
Mum Hash
Sea Hash
Spooky Hash
T1 Hash
xx Hash
I'm not convinced hashing is currently the performance bottleneck, but in any case using a faster hash function such as xxhash would not hurt.
Currently, we use the MurmurHash3 hash function from the rust-fasthash (to be more similar to scikit-learn implementation). That crate also supports a number of other hash functions,
I'm not convinced hashing is currently the performance bottleneck, but in any case using a faster hash function such as xxhash would not hurt.
This would involve updating the text-vectorize crate and adding
hasher
parameter to the HashingVectorizer python estimator.Another use case could to use different hash functions to reduce the effect of collisions Svenstrup et. al. 2017, discussed e.g. in https://stackoverflow.com/q/53767469/1791279
The text was updated successfully, but these errors were encountered: