Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Higher order serialization #552

Closed
withoutboats opened this Issue Sep 19, 2016 · 15 comments

Comments

4 participants
@withoutboats
Copy link

withoutboats commented Sep 19, 2016

If I have to conform to some format schema which contains multiple "levels" (e.g. a JSON array inside a JSON object), I have to create a new serializable type for every level in the schema, because serialize_map_value and serialize_seq_elt and so on take a type T: Serialize.

It would be great to have higher order versions of every function that takes a T: Serialize, which instead takes an Fn(&mut Self) -> Result<(), Self::Error>, so that these intermediate types could go away. This would also make heterogenous collections much more manageable than they are right now.

For example:

struct Foo {
    tag: u32,
    members: Vec<Bar>,
}

impl Serialize for Foo {
    fn serialize<S: Serializer>(&self, serializer: &mut S) -> Result<(), S::Error> {
        let mut state = try!(serializer.serialize_map());
        try!(serializer.serialize_map_key(&mut state, "tag"));
        try!(serializer.serialize_map_value(&mut state, &self.tag));
        try!(serializer.serialize_map_key(&mut state, "members"));
        try!(serializer.serialize_map_value_of(&mut state, |serializer| {
            let mut state = try!(serializer.serialize_seq());
            for member in &self.members {
                try!(serializer.serialize_seq_elt(member);
            }
            serializer.serialize_seq_end(state)
        }));
        serializer.serialize_map_end(state)
    }
}

Obviously in this case the derived Serialize will suffice, but I have some cases where it won't. I don't know if there are Serializer impls that wouldn't be able to define functions like this.

@oli-obk

This comment has been minimized.

Copy link
Member

oli-obk commented Sep 19, 2016

I'm not sure I understand. Could you show an example where you can't express it in serde without intermediate serializeables?

@withoutboats

This comment has been minimized.

Copy link
Author

withoutboats commented Sep 19, 2016

Sure

Serialize is not object safe, but this similar trait is:

trait ObjectSafeSerialize<S: Serializer> {
    fn serialize(&self, serializer: &mut S) -> Result<(), S::Error>;
}

That is, using this definition, if you know what Serializer you're serializing to, you can possibly serialize a heterogenous collection.

You can implement this trait for all Serialize types, forwarding to the real serialize impl:

impl<T, S> ObjectSafeSerialize<S> for T where T: Serialize {
    fn serialize(&self, serializer: &mut S) -> Result<(), S::Error> {
        <Self as Serialize>::serialize(self, serializer)
    }
}

You can also include methods which forward serialization in specific positions; that is, you could add this method to that trait and impl and it will work (because Self: Serialize in that impl):

fn serialize_seq_elt(&self, serializer: &mut S, state: &mut S::SeqState) -> Result<(), S::Error> {
    serializer.serialize_seq_elt(state, self)
}

So I now have the ability to serialize heterogenous types if I have the serializer present when those trait objects are constructed. I have this type, and I want to serialize it the same way the previous type was serialized:

struct Foo {
    tag: u32
}

impl Foo {
    fn members<S: Serializer>(&self) -> Vec<ObjectSafeSerialize<S>> { ... }
}

The problem is that I cannot define Serialize for Vec<ObjectSafeSerialize<S>>, even ignoring the orphan issues (which can be solved with a wrapper type), because the trait objects inside of it do not implement Serialize, they can only be serialized if the serializer were already known.

However, if I had these higher order functions, I could do this:

impl Serialize for Foo {
    fn serialize<S: Serializer>(&self, serializer: &mut S) -> Result<(), S::Error> {
        let mut state = try!(serializer.serialize_map());
        try!(serializer.serialize_map_key(&mut state, "tag"));
        try!(serializer.serialize_map_value(&mut state, &self.tag));
        try!(serializer.serialize_map_key(&mut state, "members"));
        try!(serializer.serialize_map_value_of(&mut state, |serializer| {
            let mut state = try!(serializer.serialize_seq());
            for member in self.members::<S>() {
                try!(member.serialize_seq_elt(serializer, &mut state));
            }
            serializer.serialize_seq_end(state)
        }));
        serializer.serialize_map_end(state)
    }
}
@withoutboats

This comment has been minimized.

Copy link
Author

withoutboats commented Sep 19, 2016

However, beyond this example, I think these functions would provide a meaningful win in ergonomics for users trying to define serialization to some complex schema.

@withoutboats

This comment has been minimized.

Copy link
Author

withoutboats commented Sep 19, 2016

Oh gosh, I just reread your comment and I realized I misunderstood. I thought you meant something that can't be be expressed even with intermediate structs, but you wanted something where intermediate structs are necessary. The example above is something that can't be expressed in serde at all right now.

The more general case that this improves the ergonomics of is this:

I have a schema like this:

{ "bar": { ... }, "baz": { ... } }

But internally, the feilds of both are in one struct.

struct Foo {
    // bar fields,
    // baz fields
}

impl Serialize for Foo {
    fn serialize<S: Serializer>(&self, serializer: &mut S) -> Result<(), S::Error> {
        let bar = Bar { /* bar fields of self */ };
        let baz = Baz { /* baz fields of self  */ };
        //serialize a map with both bar and baz.
    }
}

struct Bar<'a> {
    // references to bar fields
}

impl<'a> Serialize for Bar<'a> { ... }

struct Baz<'a> {
    // references to baz fields
}

impl<'a> Serialize for Baz<'a> { ... }

With these methods, you could drop these temporaries entirely and just provide this multi-level structure in the serialize for Foo.

Basically, the issue is when you have to meet a schema that isn't consistent with your internal representation of the data.

@oli-obk

This comment has been minimized.

Copy link
Member

oli-obk commented Sep 20, 2016

I wonder if we could simply implement Serialize for FnOnce(Serializer)... that might give you exactly what you want, without any api change on our side.

@withoutboats

This comment has been minimized.

Copy link
Author

withoutboats commented Sep 20, 2016

Yes, I think that would work if there aren't any coherence problems.

@erickt

This comment has been minimized.

Copy link
Member

erickt commented Sep 20, 2016

@withoutboats: hello! Yes, as of right now serde isn't object safe because we found a lot of value with being able to switch serializers and deserializers during the process. I originally did have closures but switched towards this visitor pattern way back when our closures had an implicit allocation. I've been meaning on exploring how they could help with modern design.

However, one limitation with this approach is how would you handle the whole deserialization problem? Right now serde_codegen generates a bit of an obnoxious state machine to pull apart fields since they might come back in arbitrary order. You'd end up having to do this yourself to avoid these intermediate structures.

Speaking for your specific example though, this will work with serde today since there's already an implementation for slices (and assuming someone's already implemented the right impls for the Bar type.

struct Foo {
    tag: u32,
    members: Vec<Bar>,
}

impl Serialize for Foo {
    fn serialize<S: Serializer>(&self, serializer: &mut S) -> Result<(), S::Error> {
        let mut state = try!(serializer.serialize_map());
        try!(serializer.serialize_map_key(&mut state, "tag"));
        try!(serializer.serialize_map_value(&mut state, &self.tag));
        try!(serializer.serialize_map_key(&mut state, "members"));
        try!(serializer.serialize_map_value(&mut state, &self.members));
        serializer.serialize_map_end(state)
    }
}

Furthermore, with serde_codegen, you can really just do:

#[derive(Serialize)]
struct Foo {
    tag: u32,
    members: Vec<Bar>,
}

And serde will generate the exact same code that I wrote above.

@oli-obk: that's an interesting idea! I haven't thought about doing crazy magic like that.

@withoutboats

This comment has been minimized.

Copy link
Author

withoutboats commented Sep 20, 2016

@erickt hey :-) The original example was a pretty bad explanation of the problem, because you're right - the current implementation works fine for that example. The problem right now is twofold:

  1. If you need to meet a schema that doesn't match your internal representation, you need to create temporaries in order to temporarily restructure your data into the schema form, because you can't provide a serialization which moves through multiple layers.
  2. Serializing heterogenous collections is just not possible right now.

My real use case is an implementation of JSON API. Here is the code in question. Because the included field is a heterogenous collection, I'm forced to temporarily serialize these objects to json::Values.

But if I could have this higher order representation, I could define it this way instead:

trait Wrapper<T: api::Resource> {
     fn include<S: Serializer>(&self) -> Vec<Box<ObjectSafeSerialize<S>>>;
     ...
}

struct ResourceDocument<T: api::Resource> {
     resource: Resource<T>,
}

impl<T> Serialize for ResourceDocument<T> where T: api::Resource, Resource<T>: Wrapper<T> {
    fn serialize<S: Serializer>(&self, serializer: &mut S) -> Result<(), S::Error> {
        let included = self.resource.included::<S>();
        if included.is_empty() {
            ...
        } else {
            let mut state = try!(serializer.serialize_map(Some(3)));
            ...
            try!(serializer.serialize_map_key(&mut state, "included"));
            try!(serializer.serialize_map_value(&mut state, &|serializer| {
                let mut state = try!(serializer.serialize_seq(Some(included.len());
                for resource in included {
                    try!(resource.serialize_seq_elt(serializer, &mut state));
                }
                serializer.serialize_seq_end(state)
            }));
            ...
            serializer.serialize_map_end(state)
        }
    }
}

The current implementation has to construct multiple BTreeMaps, Strings, and so on for each record, whereas this solution would just have a single virtual call and then serialize them directly to the final representation (probably just writing JSON into the buffer for the HttpResponse body).

@oli-obk

This comment has been minimized.

Copy link
Member

oli-obk commented Sep 21, 2016

We are going to need the Serializeable<S> trait (discussed in other issues, and PRs) to pull off the closure trick, because the following obviously can't work.

impl<S, F> Serialize for F
    where S: Serializer,
          F: Fn(&mut S),
{
    #[inline]
    fn serialize<S>(&self, serializer: &mut S) -> Result<(), S::Error>
        where S: Serializer,
    {
        self(serializer)
    }
}
@withoutboats

This comment has been minimized.

Copy link
Author

withoutboats commented Sep 21, 2016

Iiinteresting.

The trade off in the Serialize trait is informative. I definitely think the Serialize<S> form has uses - usually you are only serializing to a single format, so if you can include that in your typing context having heterogeneous collections is very useful. But it will infect everything, including other types and traits which are bound by Serialize.

What we really want here are higher rank type parameters in trait bounds.

trait SerializeTo<S: Serializer> {
    fn serialize_to(&self, serializer: &mut S) -> Result<(), S::Error>;
}

impl<T> Serialize for T where T: for<X: Serializer> SerializeTo<X> {
     fn serialize<S: Serializer>(&self, serializer: &mut S) -> Result<(), S::Error> {
          self.serialize_to(serializer)
     }
}

There are other configurations of this that might be more sensible also; the point is that higher rank type parameters have the advantage of allowing us to essentially arbitrary move the type parameter up and down scopes.

But barring that feature, I don't know what serde should do. Expanding the API surface is definitely eugh, so I understand if you don't want to support my rather specific use case.

@oli-obk

This comment has been minimized.

Copy link
Member

oli-obk commented Sep 21, 2016

Well. We can simply impl<S: Serializer, T: Serialize> Serializeable<S> for T, but without negative trait bounds or fancy oibits we can't also do an impl for all Fn(&mut Serializer)

@withoutboats

This comment has been minimized.

Copy link
Author

withoutboats commented Sep 21, 2016

Actually, you can, because Fn traits are tagged #[fundamental]. However, this doesn't help make the Fn traits meet the Serialize bound so its not helpful.

Scratch that, #[fundamental] makes individual type impls not conflict with the trait impl, not other blank impls.

@dtolnay dtolnay added the discussion label Sep 23, 2016

@dtolnay

This comment has been minimized.

Copy link
Member

dtolnay commented Sep 29, 2016

@withoutboats

My real use case is an implementation of JSON API. Here is the code in question. Because the included field is a heterogenous collection, I'm forced to temporarily serialize these objects to json::Values.

Does erased-serde make this any better? You can use Vec<Box<erased_serde::Serialize>>.

I think erased_serde::Serialize is just like your ObjectSafeSerialize<S> except it works for any Serializer. Is there any advantage to ObjectSafeSerialize<S>?

@withoutboats

This comment has been minimized.

Copy link
Author

withoutboats commented Sep 29, 2016

@dtolnay ObjectSafeSerialize<S> produces only 1 virtual call for each object, whereas erased_serde will produce significantly more, because every call in the handshake is now passing trait objects back and forth down to the leaf members of these types. This limits the ability for LLVM to optimize this code.

@dtolnay

This comment has been minimized.

Copy link
Member

dtolnay commented Dec 6, 2017

I would love to see this explored in a different serialization library. In the absence of higher rank type parameters in trait bounds, I believe the current approach will continue to serve us well.

@dtolnay dtolnay closed this Dec 6, 2017

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
You can’t perform that action at this time.