Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add support for alternate encodings of bytes types #656

Closed
wants to merge 2 commits into from

Conversation

jethrogb
Copy link

@jethrogb jethrogb commented Apr 23, 2020

JSON does not natively support binary data. serde_json currently just supports integer arrays and a deserialization mode for raw strings. However, protocols using JSON specify their own mechanisms to handle binary data in JSON, and it's likely not to be what serde_json currently does.

How types in the serde data model are to be encoded should be specified by the (de)serializer, not the type. Suggestions such as #360 (comment) are therefore not appropriate. By specifying the encoding that way in the type, you can't (de)serialize that type anymore with a (de)serializer that does natively support bytes (such as CBOR).

This PR introduces the concept of alternate encodings for bytes types. When creating a JSON (de)serializer, you can specify the mode you want. The default mode is still integer arrays. This PR adds base64-encoding as an option. All changes can be enabled with features and should be backwards-compatible and zero-cost for existing users. The functionality is thoroughly documented with the BytesMode type.

There are a couple of TODO items but I wanted to gather feedback before continuing the work.

TODO:

  • some kind of support for Value as Deserializer
  • tests

cc @jseyfried

#[cfg(feature = "bytes_mode")]
#[derive(Clone, Eq, PartialEq, Debug, Hash)]
#[non_exhaustive]
pub enum BytesMode {
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not too happy about having this in lib.rs, but no existing module seems like a good place.

@@ -76,3 +77,9 @@ raw_value = []
# overflow the stack after deserialization has completed, including, but not
# limited to, Display and Debug and Drop impls.
unbounded_depth = []

# Support alternate encoding modes for bytes. Available on Rust 1.40+
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@agausmann
Copy link

agausmann commented May 10, 2020

How types in the serde data model are to be encoded should be specified by the (de)serializer, not the type. Suggestions such as #360 (comment) are therefore not appropriate. By specifying the encoding that way in the type, you can't (de)serialize that type anymore with a (de)serializer that does natively support bytes (such as CBOR).

I don't agree entirely; I haven't encountered a scenario where I have to implement a specific protocol and efficiently support multiple data formats simultaneously.

My focus when implementing a protocol (de)serializer is making it compatible with that specific protocol, and I am fine with creating custom Deserialize impls to deal with that. If I want to be able to (de)serialize some of that data in another format or send/receive it using another incompatible protocol, I will create another data model to match that other structure and convert between the two.

As an example, I've recently been implementing a game data parser for one of my projects. I wanted to pick out the relevant bits from a dump of the game data, and so I made a parser for that (raw.rs) which required some special logic on top of the existing Deserializer impl that I selected. However, those complex parsing rules and special helpers make it difficult to use as a general-purpose (de)serializable model, and I'd prefer a simpler model that is easier to use for the other modules. So, I implemented a more high-level data model (mod.rs) that is meant to be persistent/cached, and that doesn't use any special serde rules, just (de)serializing the data using whatever the selected format thinks is the best.

@jethrogb
Copy link
Author

jethrogb commented Jun 9, 2020

@dtolnay ping for review?

Copy link
Member

@dtolnay dtolnay left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the PR!

I would prefer not to build this into serde_json in this way, but I opened dtolnay/request-for-implementation#48 to propose how a separate library could wrap our serializer/deserializer (or any other serde format's) to expose control over the representation of the bytes type.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Development

Successfully merging this pull request may close these issues.

3 participants