New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
The SHA1 hash of maps should not depend on the key ordering #362
Comments
Thanks for reporting this @asciiansi - I hope you find the fix satisfactory. We will be releasing imminently. |
As mentioned in docs, maps are allowed as valid #crux/id {:type :person :source_id 42 :key "age"} instead of uuid, or string like |
Yep, that's correct - we convert entity ids, attributes and values to content hashes early on and then use those content hashes throughout the query engine. |
Do you use SHA1 hashes and if so do you have any collision handle? |
Yes - we currently use SHA1 without a collision handle, with the assumption that collisions are sufficiently unlikely. |
Since xtdb#362 we sort the collections before we freeze them, to get a deterministic hash Unfortunately this doesn't work if the collections aren't `Comparable` So, if we can't sort the collections, we sort by the hashes of the elements, which is deterministic. Much as we'd like to apply this across the board, we also take care not to change the hash of any data structure that was correctly hashed previously.
Since xtdb#362 we sort the collections before we freeze them, to get a deterministic hash Unfortunately this doesn't work if the collections aren't `Comparable` So, if we can't sort the collections, we sort by the hashes of the elements, which is deterministic. Much as we'd like to apply this across the board, we also take care not to change the hash of any data structure that was correctly hashed previously.
Since #362 we sort the collections before we freeze them, to get a deterministic hash Unfortunately this doesn't work if the collections aren't `Comparable` So, if we can't sort the collections, we sort by the hashes of the elements, which is deterministic. Much as we'd like to apply this across the board, we also take care not to change the hash of any data structure that was correctly hashed previously.
(original question below)
Currently, the content/value hashes for a map depend on the key ordering - with different orders (possibly arising from different concrete map implementations), we generate different content/value hashes for the 'same' map, which means:
(old proposed solution, for posterity, see #474 for what I ended up doing):
(original question)
Is it intended that map keys for a
crux.db/id
always must be in the same order, despite having the same content/hash? Querying for an entity using(crux/entity db {:a 1 :b 1})
is not the same as(crux/entity db {:b 1 :a 1})
for example. The query only seems to work for me using the same key order as the transacted entity.I can semi-understand why this is case but nonetheless it's a bit surprising for keys and lead to some brief debugging. Specifically, it seems a bit strange as maps are by definition unordered and often manipulating programatically.
Moreover, it's typical in a larger system when dealing with dynamically constructed keys to make use of
assoc
,update
, etc. which becomes very dangerous and instead requires construction of a new map, creating even more garbage. I currently work around this simply by using a function every time I create keys dynamically, but there are a few cases where this is a bit annoying.I find maps in general useful as keys since they allow a composite and additionally remove some overhead for reaching into an entity via a query using
crux/q
. I would prefer other options for key and would love to discuss further if you are willing along with use-cases and reasoning. For now, I'm mostly using maps to workaround dissatisfaction with other key types which aren't really suitable for my apps.For context, I am using 100% RocksDB on both ends (though I prefer/will eventually switch to LMDB). Here are my deps:
At the very least, this should be documented given I found it surprising if it's not considered a bug. Thanks.
The text was updated successfully, but these errors were encountered: