Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

WIP: Parsing and emitting tagged values #301

Closed
wants to merge 1 commit into from
Closed

WIP: Parsing and emitting tagged values #301

wants to merge 1 commit into from

Conversation

pyfisch
Copy link
Contributor

@pyfisch pyfisch commented May 1, 2016

Some serialization formats like CBOR or BSON allow the definition of "tagged values" or "subtypes" to extend the format or to add additional type information to values. They can only supported with some help from serde.

This patch should add support to deserialize and serialize these tagged values. The default behavior will be to discard the tag and only use the value so they are completely optional to use.

For now the patch only supports serialization but I would like to get early feedback about the inclusion in serde.

See also #163 and pyfisch/cbor#3

///
/// The tags are provided as a pair of format (like `cbor`) and a numeric
/// tag. It is possible to supply different tags for different formats.
/// The serialization library will select an approbiate tag if available
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Typo: Appropriate

@oli-obk
Copy link
Member

oli-obk commented May 1, 2016

This means that types need to know about the formats they serialize to... This would break the current separation of serializers and serializees. I'm assuming you want the same type to yield different tag values depending on the serializer? Otherwise this could maybe be merged with some kind of general enum discriminant mapping. Then your tagged types would simply be an enum whose discriminant is converted to a specific type.

Another possibility might be to allow specialization to create different impls depending on the serializer, but I have not put any thought into whether that is feasible.

@pyfisch
Copy link
Contributor Author

pyfisch commented May 1, 2016

The tag values are defined by the format. Some formats do not support tags at all and formats that support them will have different values for the same thing. For this reason I need to yield different tag values depending on the serializer.

In CBOR every application can define custom tags so any solution using an enum with a fixed list of types is unusable. Currently there are tags for URLs (tag 32) and UUIDs (tag 37) for example. Such types are useful for multiple formats. But one day a user might define a tag 6877 he uses in an application to mark color names in french language. The user must be able to serialize this specific tag. There is a list of IANA registered tags.

I have not yet worked with specialization and don't know if how it could be used in this case.

@dtolnay
Copy link
Member

dtolnay commented May 1, 2016

Would it be possible to generalize this to formats in which tags are not u64? YAML tags are strings.

@pyfisch
Copy link
Contributor Author

pyfisch commented May 4, 2016

Probably I should generalize it. But I don't know how to do that. Because if I support two different formats (u64 and String) I should also support i8 (used by BSON but can be mapped to u64) and probably other formats.

@oli-obk
Copy link
Member

oli-obk commented May 4, 2016

Well... If we can all agree that aggregate types make no sense here, we have a rather short list:

enum Tag<'a> {
    U64(u64), U32(u32), U16(u16), U8(u8),
    I64(i64), I32(i32), I16(i16), I8(i8),
    String(&'a str),
    Bytes(&'a [u8]),
    F64(f64), F32(f32),
}

Then the second tuple field of the tag arguments can simply be that enum, and everyone can decide what they want to do with it.

@erickt
Copy link
Member

erickt commented May 4, 2016

I've thought about trying to implement something like this for cbor, because I had to do that annoying hack to get tags to work. I'm not crazy about that mapping from serializer to tag though, especially since it would result in linear scan of tag types. Have you considered other options?

I suppose one option could be a central registry of tag names that both the serializers and serializees use, but then that'd make it hard for people to create a custom type. The other option could be to just use a string, but that might make things non-portable.

However, tags are to a certain extent a non-portable construct. If I have, say, a DateTime object that I want to serialize to cbor and yaml (assuming they have custom), I would have to modify the impl of DateTime to support each of these backends to add in the right tag names for each serializer, so we might not actually need something super generic. Specialization might be the right approach here once it's stabilized.

@pyfisch
Copy link
Contributor Author

pyfisch commented May 4, 2016

@oli-obk Also floats make no real sense since they are inexact and for this reason no good keys.

@erickt as I said before people will want to use custom types so this should be easy. An alternative to the linear scan could be a callback. A closure is passed to serialize_tagged_value that takes a single argument describing the used serialization format.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants