Serialize to binary if the serde format is not human readable #1044

Marwes · 2017-09-07T14:16:40Z

This implements the KISS suggested in #790.
It is possible that one of the other approaches may be better but this
seemed like the simplest one to reignite som discussion.

Personally I find the original suggestion of adding two traits perhaps slightly
cleaner in theory but I think it ends up more complicated in the end
since the added traits also need to be duplicated to to the Seed
traits.

Example usage in https://github.com/rust-lang-nursery/uuid: Marwes/uuid@f908fd3

Closes #790

Marwes · 2017-09-07T14:18:38Z

FYI: The is not fully fleshed out in method names and documentation, will fix that before merging.

This implements the KISS suggested in serde-rs#790. It is possible that one of the other approaches may be better but this seemed like the simplest one to reignite som discussion. Personally I find the original suggestion of adding two traits perhaps slightly cleaner in theory but I think it ends up more complicated in the end since the added traits also need to be duplicated to to the `Seed` traits. Closes serde-rs#790

dtolnay

I am still on board with this approach. Thanks for getting things moving!

Could you brainstorm a few advantages and disadvantages of using &self vs no &self?

Marwes · 2017-09-07T17:45:41Z

With &self it becomes possible for [de]serializers to choose at runtime whether to be human readable or not. This is probably not common occurrence though so not a big deal either way.

The erased-serde crate will need to take &self so taking &self here would be consistent.

Downside is that one needs an instance to check if the format is human readable. Seems to me that would always be the case? Maybe in the future with const functions it would be useful to query is_human_readable at compile time? If so, maybe it should simply be an associated constant?

&self is in theory some extra overhead but I can't imagine LLVM struggling with that!

dtolnay · 2017-09-07T19:29:47Z

Makes sense to me. The erased-serde case is a great point, I had not thought of how that would work.

What would you recommend for rolling this out? If I release this now, it would be a breaking change to use this in any of our currently built-in Serialize impls later because you might have serialized one in human-readable form to a non-self-describing format and then be unable to deserialize it in non-human-readable form. That seems to mean we need to update all built-in impls like IpAddr in the same release as this change, right?

Changing a format from human-readable to non-human-readable must be done in a breaking change right?

clarfonthey · 2017-09-07T19:44:53Z

Wouldn't it make the most sense for formats to simply allow deserialisation from all formats (readable and not) and only serialise to one or the other depending on whether it's marked as readable or not?

That makes the most sense to me.

Marwes · 2017-09-07T19:49:37Z

Wouldn't it make the most sense for formats to simply allow deserialisation from all formats (readable and not) and only serialise to one or the other depending on whether it's marked as readable or not?

That is not possible for formats such as bincode which require that the deserialized value tell the deserializer what the upcoming data is as the serialized data is just bytes without knowledge what values it contains.

Marwes · 2017-09-07T19:56:10Z

Changing a format from human-readable to non-human-readable must be done in a breaking change right?

Yeah :/ . In theory we could document that making a format aware of the human readable distinction require a breaking change. The formats could then avoid the breaking change by making is_human_readable() == false opt-in or by themselves making a breaking change. Does that make sense?

dtolnay · 2017-09-07T20:37:50Z

Makes sense. We still need to update all the relevant Serialize and Deserialize impls in this PR.

Since we know exactly how many bytes we should serialize as we can hint to the serializer that it is not required which further reduces the serialized size when compared to just serializing as bytes.

Marwes · 2017-09-11T14:17:24Z

I believe only ip addresses and socket address needs a non-human readble serialization? Those are the only types which use str::parse or did I miss some other types?.

Just need to document a bit better and this should be done.

EDIT: Also need to fix serialization of net::IpAddr as they I broke roundtripping in ad3335e as well as tests for that.

Marwes · 2017-09-15T08:06:11Z

This is ready for review now!

Marwes · 2017-09-22T14:13:39Z

@dtolnay Any problems with the current implementation?

dtolnay · 2017-09-25T07:43:49Z

Sorry I didn't get a chance to review this weekend, but it is high on my list and I will try to get to it in the next couple days.

Can you think of any other backward-compatible ways to expose this in serde_test without adding so many new functions?

Marwes · 2017-09-25T08:27:44Z

Can you think of any other backward-compatible ways to expose this in serde_test without adding so many new functions?

I could change it to use the builder pattern, but that won't make it smaller now but would only serve to reduce the API surface if more configuration parameters were added in the future.

Marwes · 2017-09-25T13:17:57Z

I suppose I could move the assert_tokens functions to be methods on Serializer and Deserializer (forwarding the current implementations to those). That way the current functions are just for convenience/backwards compatibility while the more general way to assert requires one to use Serializer and Deserializer as "builders".

dtolnay · 2017-09-25T16:20:45Z

How about either of these?

Having the &[Token] arguments accept a type parameter instead, where the type parameter bound is implemented for &[Token] as well as Readable(&[Token]) or something like that. I think this may require Unsize to implement backward compatibly but you would have to experiment.
Provide a wrapper around a T: Serialize that always serializes it as readable or always as unreadable, regardless of what the T wants to do. Then the caller invokes assert_tokens(&Unreadable(T), ...).

Not saying these are backward compatibly or better, but a possible direction to think about.

Marwes · 2017-09-26T08:52:42Z

Having the &[Token] arguments accept a type parameter instead, where the type parameter bound is implemented for &[Token] as well as Readable(&[Token]) or something like that. I think this may require Unsize to implement backward compatibly but you would have to experiment.

That would work and would be backwards compatible modulo deref coercions failing due to being a generic parameter. Seems more difficult to understand though.

Provide a wrapper around a T: Serialize that always serializes it as readable or always as unreadable, regardless of what the T wants to do. Then the caller invokes assert_tokens(&Unreadable(T), ...).

How would that work? A wrapper around a generic T can't affect how Tserializes/deserializes.

dtolnay · 2017-09-26T16:44:08Z

A wrapper around a generic T can't affect how T serializes/deserializes.

struct Unreadable<T>(T);

impl<T> Serialize for Unreadable<T>
    where T: Serialize
{
    fn serialize<S>(&self, serializer: S) -> Result<S::Ok, S::Error>
        where S: Serializer
    {
        self.0.serialize(UnreadableSerializer(serializer))
    }
}

struct UnreadableSerializer<S>(S);

impl<S> Serializer for UnreadableSerializer<S>
    where S: Serializer
{
    fn is_human_readable(&self) -> bool {
        false
    }

    /* forward others */
}

Marwes · 2017-09-26T19:25:17Z

Ah, yep that would work. Doing the same thing for Deserialize would work as well I guess.

Personally I feel like it is not very intuitive. If I were to come in with no prior knowledge and look at serde_test I'd find the solution rather strange :/

dtolnay · 2017-09-27T02:23:36Z

I agree that we may be able to come up with something more intuitive. I will keep brainstorming alternatives.

For completeness, here is what that approach would have looked like in tests:

let s = /* ... */;

// In human readable formats, field `timestamp` serializes as ISO 8601.
assert_tokens(&Readable(s), &[
    Token::Struct { name: "S", len: 1 },
    Token::Str("timestamp"),
    Token::String("2017-09-26T21:49:25Z"),
    Token::StructEnd,
]);

// In binary formats, field `timestamp` serializes in compact byte form.
assert_tokens(&Unreadable(s), &[
    Token::Struct { name: "S", len: 1 },
    Token::Str("timestamp"),
    Token::Bytes(&[226, 152, 131]),
    Token::StructEnd,
]);

Marwes · 2017-09-27T08:59:56Z

Builder pattern on Serializer/Deserializer

Serializer::new(tokens)
    .readable(false)
    .assert_serialize(&serialize_value)
Deserializer::new(tokens)
    .readable(false)
    .assert_deserialize()

dtolnay · 2017-10-05T21:54:31Z

I am still not sold on the serde_test piece of this but I don't want that to block the feature any longer. Let's get this released in serde and then take a bit more time to design the test api.

To that end:

Please #[doc(hidden)] the new serde_test code with a mini comment something like // Not public API. That way we can continue to use this in our test suite without committing to it publicly.
To keep our options open, please have serde_test's serializer and deserializer panic in is_human_readable unless the readableness has been set explicitly through one of the hidden functions. I kind of want to force types that have distinct readable/compact representations to be tested explicitly in one or the other, rather than with a plain assert_tokens which arbitrarily picks one. What do you think?
File an issue to follow up on exposing this in serde_test, and link to the issue from the panic message.

Marwes · 2017-10-10T13:03:34Z

@dtolnay Sounds reasonable, should get to it in a few days.

arthurprs · 2017-10-11T20:22:31Z

This is sort of bikeshedding but is_text_format (or something similar) might make more sense.

Btw, this is a great improvement 😄

Marwes · 2017-10-13T15:32:25Z

I kind of want to force types that have distinct readable/compact representations to be tested explicitly in one or the other, rather than with a plain assert_tokens which arbitrarily picks one. What do you think?

assert_tokens does not arbitrarily pick one though, it will always beis_human_readable == true since that is the default. So I am not sure how I can make it panic since that would break code using serde_test but which happen to serialize one of the types that now happen to call is_human_readble.

It should be enough to just hide the functions though as is_human_readble == true will still mean that the behavior is preserved.

Until a good API can be found

... by utilizing that bincode is not human readable. Uses the changes in serde-rs/serde#1044 which allows data formats to report that they are not human readable. This lets certain types serialize themselves into a more compact form as they know that the serialized form does not need to be readable. BREAKING CHANGE This changes how types serialize themselves if they detect the `is_human_readable` state.

... by utilizing that bincode is not human readable. Uses the changes in serde-rs/serde#1044 which allows data formats to report that they are not human readable. This lets certain types serialize themselves into a more compact form as they know that the serialized form does not need to be readable. Closes bincode-org#215 BREAKING CHANGE This changes how types serialize themselves if they detect the `is_human_readable` state.

... by utilizing that bincode is not human readable. Uses the changes in serde-rs/serde#1044 which allows data formats to report that they are not human readable. This lets certain types serialize themselves into a more compact form as they know that the serialized form does not need to be readable. Closes #215 BREAKING CHANGE This changes how types serialize themselves if they detect the `is_human_readable` state.

Marwes force-pushed the human_readable branch from cff024d to 0dccbb1 Compare September 7, 2017 14:21

dtolnay reviewed Sep 7, 2017

View reviewed changes

Add non-human readable serializations for ip addresses

40c670e

dtolnay added the wip label Sep 8, 2017

Markus Westerlind added 2 commits September 11, 2017 15:54

Serialize non-human-readble ip addresses as tuples

ad3335e

Since we know exactly how many bytes we should serialize as we can hint to the serializer that it is not required which further reduces the serialized size when compared to just serializing as bytes.

Fix rustc 1.13 and clippy errors on travis

a52f436

Markus Westerlind added 2 commits September 11, 2017 17:18

Document that is_human_readable == false is a breaking change

c2474bf

Fix the non-readble IpAddr serialize implementations

85c05d3

dtolnay removed the wip label Sep 14, 2017

Markus Westerlind added 2 commits September 14, 2017 17:08

Properly deserialize non-readable IpAddr and SocketAddr

e369153

Use the variant_identifier macro for OsString

945d12c

Marwes force-pushed the human_readable branch from 2ba61b3 to 945d12c Compare September 14, 2017 15:08

Try to fix compilation on 1.13

3b13543

Hide is_human_readable constructors in serde_test

e9b530a

Until a good API can be found

Marwes force-pushed the human_readable branch from 73c372d to e9b530a Compare October 13, 2017 15:37

Marwes mentioned this pull request Oct 13, 2017

Design a good API for non-readable [de]serialization in serde_test serde-rs/test#12

Closed

dtolnay merged commit 030459a into serde-rs:master Oct 15, 2017

dtolnay mentioned this pull request Oct 20, 2017

Set is_human_readable to false bincode-org/bincode#215

Closed

Marwes deleted the human_readable branch October 23, 2017 09:41

This was referenced Nov 2, 2017

Allow serialized types to use a more compact representation ... bincode-org/bincode#217

Merged

Serialize to binary if the serde format is not human readable uuid-rs/uuid#104

Merged

Dessix mentioned this pull request Apr 11, 2022

Manually implemented RawObjectId de/serialization to/from string rustyscreeps/screeps-game-api#368

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Serialize to binary if the serde format is not human readable #1044

Serialize to binary if the serde format is not human readable #1044

Marwes commented Sep 7, 2017 •

edited

Marwes commented Sep 7, 2017

dtolnay left a comment

Marwes commented Sep 7, 2017

dtolnay commented Sep 7, 2017

clarfonthey commented Sep 7, 2017

Marwes commented Sep 7, 2017

Marwes commented Sep 7, 2017

dtolnay commented Sep 7, 2017

Marwes commented Sep 11, 2017 •

edited

Marwes commented Sep 15, 2017

Marwes commented Sep 22, 2017

dtolnay commented Sep 25, 2017

Marwes commented Sep 25, 2017

Marwes commented Sep 25, 2017

dtolnay commented Sep 25, 2017

Marwes commented Sep 26, 2017

dtolnay commented Sep 26, 2017

Marwes commented Sep 26, 2017

dtolnay commented Sep 27, 2017

Marwes commented Sep 27, 2017

dtolnay commented Oct 5, 2017

Marwes commented Oct 10, 2017

arthurprs commented Oct 11, 2017 •

edited

Marwes commented Oct 13, 2017

Serialize to binary if the serde format is not human readable #1044

Serialize to binary if the serde format is not human readable #1044

Conversation

Marwes commented Sep 7, 2017 • edited

Marwes commented Sep 7, 2017

dtolnay left a comment

Choose a reason for hiding this comment

Marwes commented Sep 7, 2017

dtolnay commented Sep 7, 2017

clarfonthey commented Sep 7, 2017

Marwes commented Sep 7, 2017

Marwes commented Sep 7, 2017

dtolnay commented Sep 7, 2017

Marwes commented Sep 11, 2017 • edited

Marwes commented Sep 15, 2017

Marwes commented Sep 22, 2017

dtolnay commented Sep 25, 2017

Marwes commented Sep 25, 2017

Marwes commented Sep 25, 2017

dtolnay commented Sep 25, 2017

Marwes commented Sep 26, 2017

dtolnay commented Sep 26, 2017

Marwes commented Sep 26, 2017

dtolnay commented Sep 27, 2017

Marwes commented Sep 27, 2017

dtolnay commented Oct 5, 2017

Marwes commented Oct 10, 2017

arthurprs commented Oct 11, 2017 • edited

Marwes commented Oct 13, 2017

Marwes commented Sep 7, 2017 •

edited

Marwes commented Sep 11, 2017 •

edited

arthurprs commented Oct 11, 2017 •

edited