-
Notifications
You must be signed in to change notification settings - Fork 747
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
WIP: Parsing and emitting tagged values #301
Conversation
/// | ||
/// The tags are provided as a pair of format (like `cbor`) and a numeric | ||
/// tag. It is possible to supply different tags for different formats. | ||
/// The serialization library will select an approbiate tag if available |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Typo: Appropriate
This means that types need to know about the formats they serialize to... This would break the current separation of serializers and serializees. I'm assuming you want the same type to yield different tag values depending on the serializer? Otherwise this could maybe be merged with some kind of general enum discriminant mapping. Then your tagged types would simply be an enum whose discriminant is converted to a specific type. Another possibility might be to allow specialization to create different impls depending on the serializer, but I have not put any thought into whether that is feasible. |
The tag values are defined by the format. Some formats do not support tags at all and formats that support them will have different values for the same thing. For this reason I need to yield different tag values depending on the serializer. In CBOR every application can define custom tags so any solution using an enum with a fixed list of types is unusable. Currently there are tags for URLs (tag 32) and UUIDs (tag 37) for example. Such types are useful for multiple formats. But one day a user might define a tag 6877 he uses in an application to mark color names in french language. The user must be able to serialize this specific tag. There is a list of IANA registered tags. I have not yet worked with specialization and don't know if how it could be used in this case. |
See also #163 and pyfisch/cbor#3
Would it be possible to generalize this to formats in which tags are not u64? YAML tags are strings. |
Probably I should generalize it. But I don't know how to do that. Because if I support two different formats ( |
Well... If we can all agree that aggregate types make no sense here, we have a rather short list: enum Tag<'a> {
U64(u64), U32(u32), U16(u16), U8(u8),
I64(i64), I32(i32), I16(i16), I8(i8),
String(&'a str),
Bytes(&'a [u8]),
F64(f64), F32(f32),
} Then the second tuple field of the tag arguments can simply be that enum, and everyone can decide what they want to do with it. |
I've thought about trying to implement something like this for cbor, because I had to do that annoying hack to get tags to work. I'm not crazy about that mapping from serializer to tag though, especially since it would result in linear scan of tag types. Have you considered other options? I suppose one option could be a central registry of tag names that both the serializers and serializees use, but then that'd make it hard for people to create a custom type. The other option could be to just use a string, but that might make things non-portable. However, tags are to a certain extent a non-portable construct. If I have, say, a |
@oli-obk Also floats make no real sense since they are inexact and for this reason no good keys. @erickt as I said before people will want to use custom types so this should be easy. An alternative to the linear scan could be a callback. A closure is passed to |
Some serialization formats like CBOR or BSON allow the definition of "tagged values" or "subtypes" to extend the format or to add additional type information to values. They can only supported with some help from serde.
This patch should add support to deserialize and serialize these tagged values. The default behavior will be to discard the tag and only use the value so they are completely optional to use.
For now the patch only supports serialization but I would like to get early feedback about the inclusion in serde.
See also #163 and pyfisch/cbor#3