Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Serialize to binary if the serde format is not human readable #1044

Merged
merged 10 commits into from
Oct 15, 2017

Conversation

Marwes
Copy link
Contributor

@Marwes Marwes commented Sep 7, 2017

This implements the KISS suggested in #790.
It is possible that one of the other approaches may be better but this
seemed like the simplest one to reignite som discussion.

Personally I find the original suggestion of adding two traits perhaps slightly
cleaner in theory but I think it ends up more complicated in the end
since the added traits also need to be duplicated to to the Seed
traits.

Example usage in https://github.com/rust-lang-nursery/uuid: Marwes/uuid@f908fd3

Closes #790

@Marwes
Copy link
Contributor Author

Marwes commented Sep 7, 2017

FYI: The is not fully fleshed out in method names and documentation, will fix that before merging.

This implements the KISS suggested in serde-rs#790.
It is possible that one of the other approaches may be better but this
seemed like the simplest one to reignite som discussion.

Personally I find the original suggestion of adding two traits perhaps slightly
cleaner in theory but I think it ends up more complicated in the end
since the added traits also need to be duplicated to to the `Seed`
traits.

Closes serde-rs#790
Copy link
Member

@dtolnay dtolnay left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am still on board with this approach. Thanks for getting things moving!

Could you brainstorm a few advantages and disadvantages of using &self vs no &self?

@Marwes
Copy link
Contributor Author

Marwes commented Sep 7, 2017

With &self it becomes possible for [de]serializers to choose at runtime whether to be human readable or not. This is probably not common occurrence though so not a big deal either way.

The erased-serde crate will need to take &self so taking &self here would be consistent.

Downside is that one needs an instance to check if the format is human readable. Seems to me that would always be the case? Maybe in the future with const functions it would be useful to query is_human_readable at compile time? If so, maybe it should simply be an associated constant?

&self is in theory some extra overhead but I can't imagine LLVM struggling with that!

@dtolnay
Copy link
Member

dtolnay commented Sep 7, 2017

Makes sense to me. The erased-serde case is a great point, I had not thought of how that would work.

What would you recommend for rolling this out? If I release this now, it would be a breaking change to use this in any of our currently built-in Serialize impls later because you might have serialized one in human-readable form to a non-self-describing format and then be unable to deserialize it in non-human-readable form. That seems to mean we need to update all built-in impls like IpAddr in the same release as this change, right?

Changing a format from human-readable to non-human-readable must be done in a breaking change right?

@clarfonthey
Copy link
Contributor

Wouldn't it make the most sense for formats to simply allow deserialisation from all formats (readable and not) and only serialise to one or the other depending on whether it's marked as readable or not?

That makes the most sense to me.

@Marwes
Copy link
Contributor Author

Marwes commented Sep 7, 2017

Wouldn't it make the most sense for formats to simply allow deserialisation from all formats (readable and not) and only serialise to one or the other depending on whether it's marked as readable or not?

That is not possible for formats such as bincode which require that the deserialized value tell the deserializer what the upcoming data is as the serialized data is just bytes without knowledge what values it contains.

@Marwes
Copy link
Contributor Author

Marwes commented Sep 7, 2017

Changing a format from human-readable to non-human-readable must be done in a breaking change right?

Yeah :/ . In theory we could document that making a format aware of the human readable distinction require a breaking change. The formats could then avoid the breaking change by making is_human_readable() == false opt-in or by themselves making a breaking change. Does that make sense?

@dtolnay
Copy link
Member

dtolnay commented Sep 7, 2017

Makes sense. We still need to update all the relevant Serialize and Deserialize impls in this PR.

@dtolnay dtolnay added the wip label Sep 8, 2017
Markus Westerlind added 2 commits September 11, 2017 15:54
Since we know exactly how many bytes we should serialize as we can hint
to the serializer that it is not required which further reduces the
serialized size when compared to just serializing as bytes.
@Marwes
Copy link
Contributor Author

Marwes commented Sep 11, 2017

I believe only ip addresses and socket address needs a non-human readble serialization? Those are the only types which use str::parse or did I miss some other types?.

Just need to document a bit better and this should be done.

EDIT: Also need to fix serialization of net::IpAddr as they I broke roundtripping in ad3335e as well as tests for that.

@dtolnay dtolnay removed the wip label Sep 14, 2017
@Marwes
Copy link
Contributor Author

Marwes commented Sep 15, 2017

This is ready for review now!

@Marwes
Copy link
Contributor Author

Marwes commented Sep 22, 2017

@dtolnay Any problems with the current implementation?

@dtolnay
Copy link
Member

dtolnay commented Sep 25, 2017

Sorry I didn't get a chance to review this weekend, but it is high on my list and I will try to get to it in the next couple days.

Can you think of any other backward-compatible ways to expose this in serde_test without adding so many new functions?

@Marwes
Copy link
Contributor Author

Marwes commented Sep 25, 2017

Can you think of any other backward-compatible ways to expose this in serde_test without adding so many new functions?

I could change it to use the builder pattern, but that won't make it smaller now but would only serve to reduce the API surface if more configuration parameters were added in the future.

@Marwes
Copy link
Contributor Author

Marwes commented Sep 25, 2017

I suppose I could move the assert_tokens functions to be methods on Serializer and Deserializer (forwarding the current implementations to those). That way the current functions are just for convenience/backwards compatibility while the more general way to assert requires one to use Serializer and Deserializer as "builders".

@dtolnay
Copy link
Member

dtolnay commented Sep 25, 2017

How about either of these?

  • Having the &[Token] arguments accept a type parameter instead, where the type parameter bound is implemented for &[Token] as well as Readable(&[Token]) or something like that. I think this may require Unsize to implement backward compatibly but you would have to experiment.
  • Provide a wrapper around a T: Serialize that always serializes it as readable or always as unreadable, regardless of what the T wants to do. Then the caller invokes assert_tokens(&Unreadable(T), ...).

Not saying these are backward compatibly or better, but a possible direction to think about.

@Marwes
Copy link
Contributor Author

Marwes commented Sep 26, 2017

Having the &[Token] arguments accept a type parameter instead, where the type parameter bound is implemented for &[Token] as well as Readable(&[Token]) or something like that. I think this may require Unsize to implement backward compatibly but you would have to experiment.

That would work and would be backwards compatible modulo deref coercions failing due to being a generic parameter. Seems more difficult to understand though.

Provide a wrapper around a T: Serialize that always serializes it as readable or always as unreadable, regardless of what the T wants to do. Then the caller invokes assert_tokens(&Unreadable(T), ...).

How would that work? A wrapper around a generic T can't affect how Tserializes/deserializes.

@dtolnay
Copy link
Member

dtolnay commented Sep 26, 2017

A wrapper around a generic T can't affect how T serializes/deserializes.

struct Unreadable<T>(T);

impl<T> Serialize for Unreadable<T>
    where T: Serialize
{
    fn serialize<S>(&self, serializer: S) -> Result<S::Ok, S::Error>
        where S: Serializer
    {
        self.0.serialize(UnreadableSerializer(serializer))
    }
}

struct UnreadableSerializer<S>(S);

impl<S> Serializer for UnreadableSerializer<S>
    where S: Serializer
{
    fn is_human_readable(&self) -> bool {
        false
    }

    /* forward others */
}

@Marwes
Copy link
Contributor Author

Marwes commented Sep 26, 2017

Ah, yep that would work. Doing the same thing for Deserialize would work as well I guess.

Personally I feel like it is not very intuitive. If I were to come in with no prior knowledge and look at serde_test I'd find the solution rather strange :/

@dtolnay
Copy link
Member

dtolnay commented Sep 27, 2017

I agree that we may be able to come up with something more intuitive. I will keep brainstorming alternatives.


For completeness, here is what that approach would have looked like in tests:

let s = /* ... */;

// In human readable formats, field `timestamp` serializes as ISO 8601.
assert_tokens(&Readable(s), &[
    Token::Struct { name: "S", len: 1 },
    Token::Str("timestamp"),
    Token::String("2017-09-26T21:49:25Z"),
    Token::StructEnd,
]);

// In binary formats, field `timestamp` serializes in compact byte form.
assert_tokens(&Unreadable(s), &[
    Token::Struct { name: "S", len: 1 },
    Token::Str("timestamp"),
    Token::Bytes(&[226, 152, 131]),
    Token::StructEnd,
]);

@Marwes
Copy link
Contributor Author

Marwes commented Sep 27, 2017

Builder pattern on Serializer/Deserializer

Serializer::new(tokens)
    .readable(false)
    .assert_serialize(&serialize_value)
Deserializer::new(tokens)
    .readable(false)
    .assert_deserialize()

@dtolnay
Copy link
Member

dtolnay commented Oct 5, 2017

I am still not sold on the serde_test piece of this but I don't want that to block the feature any longer. Let's get this released in serde and then take a bit more time to design the test api.

To that end:

  • Please #[doc(hidden)] the new serde_test code with a mini comment something like // Not public API. That way we can continue to use this in our test suite without committing to it publicly.
  • To keep our options open, please have serde_test's serializer and deserializer panic in is_human_readable unless the readableness has been set explicitly through one of the hidden functions. I kind of want to force types that have distinct readable/compact representations to be tested explicitly in one or the other, rather than with a plain assert_tokens which arbitrarily picks one. What do you think?
  • File an issue to follow up on exposing this in serde_test, and link to the issue from the panic message.

@Marwes
Copy link
Contributor Author

Marwes commented Oct 10, 2017

@dtolnay Sounds reasonable, should get to it in a few days.

@arthurprs
Copy link
Contributor

arthurprs commented Oct 11, 2017

This is sort of bikeshedding but is_text_format (or something similar) might make more sense.

Btw, this is a great improvement 😄

@Marwes
Copy link
Contributor Author

Marwes commented Oct 13, 2017

I kind of want to force types that have distinct readable/compact representations to be tested explicitly in one or the other, rather than with a plain assert_tokens which arbitrarily picks one. What do you think?

assert_tokens does not arbitrarily pick one though, it will always beis_human_readable == true since that is the default. So I am not sure how I can make it panic since that would break code using serde_test but which happen to serialize one of the types that now happen to call is_human_readble.

It should be enough to just hide the functions though as is_human_readble == true will still mean that the behavior is preserved.

@dtolnay dtolnay merged commit 030459a into serde-rs:master Oct 15, 2017
@Marwes Marwes deleted the human_readable branch October 23, 2017 09:41
Marwes pushed a commit to Marwes/bincode that referenced this pull request Nov 2, 2017
... by utilizing that bincode is not human readable.

Uses the changes in serde-rs/serde#1044 which
allows data formats to report that they are not human readable. This
lets certain types serialize themselves into a more compact form as they
know that the serialized form does not need to be readable.

BREAKING CHANGE

This changes how types serialize themselves if they detect the
`is_human_readable` state.
Marwes pushed a commit to Marwes/bincode that referenced this pull request Nov 2, 2017
... by utilizing that bincode is not human readable.

Uses the changes in serde-rs/serde#1044 which
allows data formats to report that they are not human readable. This
lets certain types serialize themselves into a more compact form as they
know that the serialized form does not need to be readable.

Closes bincode-org#215

BREAKING CHANGE

This changes how types serialize themselves if they detect the
`is_human_readable` state.
Marwes pushed a commit to Marwes/bincode that referenced this pull request Nov 2, 2017
... by utilizing that bincode is not human readable.

Uses the changes in serde-rs/serde#1044 which
allows data formats to report that they are not human readable. This
lets certain types serialize themselves into a more compact form as they
know that the serialized form does not need to be readable.

Closes bincode-org#215

BREAKING CHANGE

This changes how types serialize themselves if they detect the
`is_human_readable` state.
TyOverby pushed a commit to bincode-org/bincode that referenced this pull request Nov 20, 2017
... by utilizing that bincode is not human readable.

Uses the changes in serde-rs/serde#1044 which
allows data formats to report that they are not human readable. This
lets certain types serialize themselves into a more compact form as they
know that the serialized form does not need to be readable.

Closes #215

BREAKING CHANGE

This changes how types serialize themselves if they detect the
`is_human_readable` state.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants