-
Notifications
You must be signed in to change notification settings - Fork 8
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
fix: make sure that structs are serialized correctly #34
Conversation
Rust structs have their entries ordered the same way as they are defined. When serialized to CBOR, they become maps by default. In DAG-CBOR the maps need to have a specific order, hence the keys might need to be re-ordered, so that they are independent of the order they were defined. This commit makes sure that it's actually the case. Fixes #31.
// comparison gives us the right order as keys in DAG-CBOR are always (text) strings, hence | ||
// have the same CBOR major type 3. The length of the string is encoded in the following | ||
// bits. This means that a shorter string sorts before a longer string. | ||
self.entries.sort_unstable(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think this is enough. We're stuck with the 7049 rules of length-first sorting. https://ipld.io/specs/codecs/dag-cbor/spec/#strictness
A good extension to your tests would be to add a new field to the struct "a1" which should come after "b".
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for pointing out and proposing better tests. Though I think it'ss correct. In a previous PR Steb asked me to clarify this a bit in a comment, but it still doesn't seem to be clear. When you've an idea to word it in a better way, please let me know.
As CBOR prefixes the strings with the length, shorter strings are sorted first. I even think this is the reason they came up with this (counter-intuitive) ordering. So that you can take the full encoded value (not just the string itself) and memcmp it for sorting.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Importantly, even though the length itself is variable length, the length itself is either:
- 0-23: inlined into the 5-bit additional data field.
- 24-27: 1-8 byte length.
Which means that lexicographical sorting actually works.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh, right, I missed that this was sorting over encoded values; and yeah, neat that it actually works given the flexible prefix!
@@ -65,8 +65,8 @@ impl<'a, W: enc::Write> serde::Serializer for &'a mut Serializer<W> { | |||
type SerializeTupleStruct = BoundedCollect<'a, W>; | |||
type SerializeTupleVariant = BoundedCollect<'a, W>; | |||
type SerializeMap = CollectMap<'a, W>; | |||
type SerializeStruct = BoundedCollect<'a, W>; | |||
type SerializeStructVariant = BoundedCollect<'a, W>; | |||
type SerializeStruct = CollectMap<'a, W>; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you educate me on BoundedCollet vs CollectMap?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The names might not be great, but the idea now is that BoundedCollect
does take the input in the same order as it comes in without further checks, the CollectMap
does the whole sorting thing.
Those structs then implement specific traits so that they can be used with those type definitions.
d685c6c
to
a2959cb
Compare
Co-authored-by: Rod Vagg <rod@vagg.org>
Rust structs have their entries ordered the same way as they are defined. When serialized to CBOR, they become maps by default. In
DAG-CBOR the maps need to have a specific order, hence the keys
might need to be re-ordered, so that they are independent of the order they were defined. This commit makes sure that it's actually the case.
Fixes #31.