Join GitHub today
GitHub is home to over 50 million developers working together to host and review code, manage projects, and build software together.
Sign upbincode doesn't generate stable serializations of HashMap #230
Comments
|
Yeah, this is mostly a limitation of Rust Hashmaps: Iteration order is not guaranteed. Have you tried using a HashMap with a Hasher other than the default? If that doesn't work, I'm afraid that your only other option will be to pull the items out into a vec, sort them, and then serialize the vec. |
|
For now I've worked around this by using the indexmap crate. Wouldn't it be possible to do a |
|
That would have to be done in serde. Have you tried using other hashers? Is using BTreeMap an option? I think those have guaranteed ordering. |
|
I'm going to close this issue because I don't think that there's anything that can be done on the bincode side. But feel free to use this issue for more communication; I'm interested in seeing your use-case succeed. |
|
I solved this for now with BTreeMap indeed: This is not ideal because it imposes the sorting requirement on all operations when I only really care about it for serialization. Another solution would be to provide my own serialization implementation: That would still impose a penalty on serialization. An even better solution may be to figure out how to seed the Thanks for the help! |
|
Have you considered linked-hash-map? It is "sorted" by order of insertion and should have minimal runtime overhead. |
|
I used |
|
Yeah, if you need this behavior, I think that sorting (or using a pre-sorted container) is the only way to make this work. |
|
The ideal solution is to make |
|
last-thought: have you tried using https://doc.servo.org/fnv/struct.FnvHasher.html? I read through the source and there's no randomization of the seed. |
The problem with the default hash isn't that it's seeded, it's that I can't just provide the seed myself and that you're now supposed to use https://docs.rs/metrohash/1.0.5/metrohash/struct.MetroHash64.html#method.with_seed |
|
That looks like it might work! However, I'm not sure that the stdlib guarantees iteration order even when the seed is constant. I think rust uses robbinhood hashing, and from what I remember of that algorithm, it does really weird stuff with array location and item placement. |
|
That might be a problem indeed and even trickier to detect. In most cases it will work the same for the same size of hash table and yet if on one iteration you do a lot of inserts and removes and on another iteration you don't it may happen that one resizes the table and the other doesn't and the iteration order is different. So you're right, it's probably not a good solution, good catch! Just doing the key sorting is probably the better idea then. The slowdown is probably not even large enough to care about. |
When serializing a struct with a HashMap it's easy to end up with two values that are equal to each other but serialize to different output. An example that fails sometimes:
This is most likely because rust is seeding the hash with different values on different runs and bincode serializes in hash order and not key order. This makes it hard to use bincode to hash a struct for deduplication/content addressing purposes.