Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Revamp Map/Seq serialization #437

Merged
merged 18 commits into from
Jul 15, 2016
Merged

Revamp Map/Seq serialization #437

merged 18 commits into from
Jul 15, 2016

Conversation

oli-obk
Copy link
Member

@oli-obk oli-obk commented Jul 12, 2016

I took the liberty of also simplifying some of serde codegen (mainly the parts where we do a lot of effort to create types that are going to get optimized out). I think all that was necessary to fit the visitor pattern. The new system as imagined by @dpc is much easier to implement, although it requires additional helper objects

fixes #386

cc @dpc

Serializers compatible with the new API

dpc and others added 2 commits July 12, 2016 11:46
Visitor is "pull", while `MapSerializer` and `SeqSerializer`
are "push" like the rest of the API.
@oli-obk oli-obk removed this from the v0.8.0 milestone Jul 12, 2016
@@ -363,8 +305,11 @@ impl<A> Serialize for ops::Range<A>
fn serialize<S>(&self, serializer: &mut S) -> Result<(), S::Error>
where S: Serializer,
{
let len = iter::Step::steps_between(&self.start, &self.end, &A::one());
serializer.serialize_seq(SeqIteratorVisitor::new(self.clone(), len))
let mut seq_serializer = try!(serializer.serialize_seq(Some(self.len())));
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are you sure this is the same len as the original code?

iter::Step::steps_between(&self.start, &self.end, &A::one())

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

copy paste error

@dpc
Copy link
Contributor

dpc commented Jul 12, 2016

Oh, so you went with a route of getting everything into Serializer and adding "start sequence", "serialize element" * x, "end sequence"? Looks OK to me (the API, as I didn't thoroughly review the code) - the inverstion of control is gone.

Just for learning purposes: did you hit any blockers with the breaking up API into Serializer MapSerializer andSeqSerializer`? Any other noteworthy Rust limitation (or just problems with breaking it up)?

@dtolnay
Copy link
Member

dtolnay commented Jul 12, 2016

I am so excited for this.

I only made it partway through but I like the approach. I will review more thoroughly tonight or tomorrow.

This may make tuples and structs a lot faster because they no longer have to do match self.index when serializing each element. On the other hand some Serializers will need to do a lot more work bookkeeping (and maybe more memory allocation) in order to associate the seq_elt and seq_end calls with the corresponding start call. We may need to provide a way to thread some state through these calls without allocation. For example instead of this:

try!(serializer.serialize_seq(Some(self.len()));
for e in self.iter() {
    try!(serializer.serialize_seq_elt(e));
}
serializer.serialize_seq_end(Some(self.len()))

... something like this:

let mut seq_state = try!(serializer.serialize_seq(Some(self.len())));
for e in self.iter() {
    try!(serializer.serialize_seq_elt(&mut seq_state, e));
}
serializer.serialize_seq_end(seq_state)

... or equivalently:

let mut seq_ser = try!(serializer.serialize_seq(Some(self.len())));
for e in self.iter() {
    try!(seq_ser.elt(e));
}
seq_ser.end()

I think the last one is more similar to some of the proposals in #386.

I would like to have working commits against serde_json, bincode, serde_yaml and maybe one more before merging this, just so we validate it against fairly different use cases.

@dpc
Copy link
Contributor

dpc commented Jul 12, 2016

@dtolnay : Yes, so last version you posted:

let mut seq_ser = try!(serializer.serialize_seq(Some(self.len())));
for e in self.iter() {
    try!(seq_ser.elt(e));
}
seq_ser.end()

Has a problem of how will seq_ser access original serializer.

What I've mentioned in #386 , whereserializer.serialize_seq takes serializer by value and wraps it, and then returns it back in seq_ser.end() step. This way only once sequence can be serialized at once, and API becomes harder to use wrong.

But I guess the problem is that serializer needs to be passed by value everywhere then and returned back to the caller. So:

fn serialize<S>(&self, serializer: &mut S) -> Result<(), S::Error>
         where S: Serializer;

would have to become:

fn serialize<S>(&self, serializer: S) -> Result<S, S::Error>
         where S: Serializer;

or something like that.

Would that hinder the API too much? Does this introduce additional copying?

Nice alternative where seq_ser is actually borrowing serializer for the time of it's existance does not work well with traits, as it requires HKT(?).

What about:

let mut seq_ser = try!(serializer.serialize_seq(Some(self.len())));
for e in self.iter() {
    try!(seq_ser.elt(e, serializer));
}
seq_ser.end(serializer)

This way seq_ser does not have to keep reference to serializer as it will be provided with it on every SeqSer::elt(...). API can still be misused:

let mut seq_ser1 = try!(serializer.serialize_seq(Some(self.len())));
let mut seq_ser2 = try!(serializer.serialize_seq(Some(self.len())));
try!(seq_ser1.elt(e, serializer));
try!(seq_ser2.elt(e, serializer));
seq_ser2.end(serializer);
seq_ser1.end(serializer);

and user can forget about calling SeqSer::end, but it would still be improvement, I guess.

@dtolnay
Copy link
Member

dtolnay commented Jul 13, 2016

I tried a few variations of having separate SeqSerializer and MapSerializer rather than having all the methods on the Serializer trait, and they were all confusing or limited. I am satisfied with the approach in this PR. I still think it will be beneficial to some serializers to have associated types in which they can track state - we can default them to ().

// I don't know which of these really need to be distinct types...
type SeqState = ();
type MapState = ();
type StructState = ();
type TupleState = ();
type StructVariantState = ();
type TupleVariantState = ();
type FixedSizeArrayState = ();
let mut seq_state = try!(serializer.serialize_seq(Some(self.len())));
for e in self.iter() {
    try!(serializer.serialize_seq_elt(&mut seq_state, e));
}
serializer.serialize_seq_end(seq_state)

Also I noticed that YAML does something weird where it calls the visitor using a Serializer that is different from the one on which serialize_tuple_variant/serialize_struct_variant were called: yaml/src/ser.rs#L168-L171. We need to figure out how that maps to the new API.

@oli-obk
Copy link
Member Author

oli-obk commented Jul 13, 2016

... or equivalently:

actually that was my first design, but it got very tedious to keep it in sync with my very frequent changes to adapt to situations I hadn't considered. We can add this without a problem, but during the design phase I'd rather stay with manual control.

@dpc: borrowing from the Serializer is not a problem:

fn serialize_seq<'a>(&'a mut self) -> Result<SeqSerializer<'a, Self>, Self::Error>

The final end call then consumes the object. In debug mode we can add an additional Drop impl that panics, so tests and debug runs will hit that if the code path is tested and end is forgotten.

If we ever need the Serializer back in the same scope, we can have end return it.

Also I'll add a clippy lint that warns about mis-use. These checks combined should keep everyone from mis-using the API.

@dtolnay
Copy link
Member

dtolnay commented Jul 13, 2016

@oli-obk:

fn serialize_seq<'a>(&'a mut self) -> Result<SeqSerializer<'a, Self>, Self::Error>

I don't think that works if the Serializer has an explicit lifetime as in serde-yaml because the trait can't require the Serializer's lifetime to outlive serialize_seq's 'a.

@dtolnay
Copy link
Member

dtolnay commented Jul 13, 2016

I can take care of trying this out on serde-yaml. I filed dtolnay/serde-yaml#21 to track that.

@dtolnay dtolnay mentioned this pull request Jul 13, 2016
@oli-obk
Copy link
Member Author

oli-obk commented Jul 14, 2016

I still think it will be beneficial to some serializers to have associated types in which they can track state - we can default them to ().

Jup, json-value can profit from it and toml-rs needs it, otherwise the toml-rs serializer needs an internal stack of temporary stuff.

I'll add it now

@oli-obk
Copy link
Member Author

oli-obk commented Jul 14, 2016

we can default them to ().

sadly not on stable rust

error: associated type defaults are unstable (see issue #29661)
help: add #![feature(associated_type_defaults)] to the crate attributes to enable

@oli-obk
Copy link
Member Author

oli-obk commented Jul 14, 2016

toml works now, adjusting the others is trivial (just add the MapState and SeqState associated types as () and add the state: () arg to the *_end methods

where V: Serialize,
{
/// By default, structs are serialized as a map with the field name as the key.
fn serialize_struct_variant_elt<K, V>(&mut self, key: K, value: V) -> Result<(), Self::Error> where K: Serialize, V: Serialize {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same questions as on serialize_struct_elt.

@dtolnay
Copy link
Member

dtolnay commented Jul 15, 2016

I opened oli-obk#1 with a Serializer API that I believe is more flexible and more consistent. It also addresses #219.

@dtolnay
Copy link
Member

dtolnay commented Jul 15, 2016

serde_codegen and serde_test need to be updated for my changes. I will start doing it in the morning if you do not get to it.

@oli-obk
Copy link
Member Author

oli-obk commented Jul 15, 2016

all serializers have been brought up to date.

One more experiment would be to check if we now can get rid of the stack in the serde-value serializer

@erickt
Copy link
Member

erickt commented Jul 15, 2016

This is so cool. You all are awesome. We never quite got to the whole "the type system enforces the serde protocol" so might as well not constrain ourselves to having all these visitors.

Have you also considered how this approach might affect the deserialization API?

@erickt
Copy link
Member

erickt commented Jul 15, 2016

I'm a +1 on this (once the tests pass), and definitely curious if this has any performance implications as well. Great work.

@dtolnay
Copy link
Member

dtolnay commented Jul 15, 2016

I read through the whole PR one more time and this looks good to me. Nicely done @oli-obk.

@erickt I think Visitors still make sense in the Deserializer API to allow the Deserializer to push values out to the Deserialize, for example serde_json::Value.

@dpc
Copy link
Contributor

dpc commented Jul 15, 2016

I've just looked and Visitor pattern in de seems OK to me. It's still "push", as per my original comment about ser::Vistior (which was "pull").

Nice. I'm happy about the result.

@dtolnay dtolnay removed the wip label Jul 15, 2016
@dtolnay dtolnay merged commit 9d015a2 into serde-rs:master Jul 15, 2016
@dtolnay
Copy link
Member

dtolnay commented Jul 16, 2016

I merged this along with 855f3d9 to remind us that master is no longer 0.7. I made an 0.7.x branch in case we need any more 0.7 releases.

@KodrAus
Copy link

KodrAus commented Jul 16, 2016

A bit late to the party, but looking through this it looks like it'll eliminate a lot of code in my manual struct serialisation 👍

I'm guessing it'll still be a little while before this hits crates.io since codegen will need updating?

@dtolnay
Copy link
Member

dtolnay commented Jul 16, 2016

Codegen has been updated in this PR. I would expect sometime next week - there are a few other fixes we want in 0.8.

@dtolnay
Copy link
Member

dtolnay commented Jul 18, 2016

For those subscribed to this issue - I filed #447 to track the release of these changes.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Development

Successfully merging this pull request may close these issues.

Unable to serialize sequence without knowing all elements.
5 participants