True push-based serialization #768

dtolnay · 2017-02-17T19:40:57Z

Consider a program that receives a stream of integers:

impl Program {
    fn new(out: &mut io::Write) -> Self {
        /* ... */
    }

    fn write(&mut self, v: u64) -> Result<()> {
        /* ... */
    }

    fn end(self) -> Result<()> {
        /* ... */
    }
}

and writes out a JSON array:

[4751,2389,6529,3343]

This is fine in Serde 0.9.

But if we want to write out pairs of numbers:

[[4751,2389],[6529,3343]]

it is no longer possible to accomplish this in a streaming way with the current API. The program needs to buffer the entire content of each inner array before writing it. The problem is that SerializeSeq::serialize_element and similar functions are pull-based; they pull data out of the Serialize that you pass, and the entire data must be available.

Let's map out the advantages and possible disadvantages of moving to an entirely push-based serialization model.

dpc · 2017-02-21T05:27:08Z

I was just going to open an issue about but I've found it's already open. I've noticed recently that things changed at some point, since I reported #386. (@dtolnay, weren't we talking on IRC recently about it)? .

I think it's especially important in the light of tokio being a standard Rust async solution. If Rust went with coroutines like in Go, it wouldn't be a problem for any code that is being blocked on input to yield and resume when more data is available. With approach based on Futures, a state machine has to drive serialization from the outside. Therefore it seems to me that standard serialization crate must be entirely push-based, otherwise we're going to be in trouble.

Also the inconsistency is the worst part. Either everything has to be push, or everything has to be pull. Otherwise it doesn't work well. And for the reason explained above, push is the right way here, I think.

BTW. The whole push vs pull thing has been bitting me a lot recently in my own code. I should write a blogpost / article about it. I think in Rust, due to ability to reasonably easily express very efficient, zero-copy data handling code, I immediately notice this ... "inversion of control" (the term is used for something different, but I feel it fits well here) .

dtolnay · 2017-03-26T17:09:11Z

Here is some possible inspiration:

pub trait Serialize {
    fn serialize<S>(&self, serializer: S) -> Result<S::Ok, S::Error>
        where S: Serializer;
}

pub trait Serializer: Sized {
    type Ok;
    type Error;

    type SerializeSeq: SerializeSeq<Ok = Self::Ok, Error = Self::Error>;

    fn serialize_seq(self, len: Option<usize>) -> Result<Self::SerializeSeq, Self::Error>;
}

pub trait SerializeSeq {
    type Ok;
    type Error;

    type Element: Serializer<Ok = Self::Ok, Error = Self::Error>;

    fn element(&mut self) -> Self::Element;
    fn end(self) -> Result<Self::Ok, Self::Error>;
}

impl<T> Serialize for Vec<T>
    where T: Serialize
{
    fn serialize<S>(&self, serializer: S) -> Result<S::Ok, S::Error>
        where S: Serializer
    {
        let mut seq = serializer.serialize_seq(Some(self.len()))?;
        for v in self {
            v.serialize(seq.element())?;
        }
        seq.end()
    }
}

dtolnay · 2017-03-26T18:41:30Z

I think this is going to be blocked on associated type constructors: type Element<'a>.

dtolnay · 2017-03-26T18:54:25Z

Oh this could be interesting:

pub trait Serialize {
    fn serialize<S>(&self, serializer: S) -> Result<S::Ok, S::Error>
        where S: Serializer;
}

pub trait Serializer: Sized {
    type Ok;
    type Error;

    type SerializeSeq: SerializeSeq<Ok = Self::Ok, Error = Self::Error>;

    fn serialize_seq(self, len: Option<usize>) -> Result<Self::SerializeSeq, Self::Error>;
}

pub trait SerializeSeq: Sized {
    type Ok;
    type Error;

    type Element: Serializer<Ok = Self, Error = Self::Error>;

    fn element(self) -> Self::Element;
    fn end(self) -> Result<Self::Ok, Self::Error>;
}

impl<T> Serialize for Vec<T>
    where T: Serialize
{
    fn serialize<S>(&self, serializer: S) -> Result<S::Ok, S::Error>
        where S: Serializer
    {
        let mut seq = serializer.serialize_seq(Some(self.len()))?;
        for element in self {
            seq = element.serialize(seq.element())?;
        }
        seq.end()
    }
}

dtolnay · 2017-03-26T22:38:47Z

@dpc I don't plan to put much more time into this for Serde 1.0. Would you be interested in helping me prototype an approach that works for you?

Here is some playground code of a simplified Serde 0.9 serializer API and two existing use cases to work off of.

dpc · 2017-03-27T00:44:47Z

I'm happy to review and comment as much as I can. BTW. Isn't this issue exactly what #386 was? I think at the end things settled on having an explicit State-like type. That was implemented in 0.8, and then changed into 0.9.

I've reviewed the playground link and it looks great to me. Though I wouldn't trust a review of me alone for such design decisions. I'm sure asking eg. on reddit for comments could bring some eyeballs to it. I spoke too soon. I have some questions.

Also, how are things with deserialization? I haven't really used it too much myself, so I'm not sure.

dpc · 2017-03-27T00:55:03Z

I have always had trouble reasoning about trait where T : &'a Something.

dtolnay · 2017-03-27T02:11:29Z

Here is a brief history of #386:

You open Unable to serialize sequence without knowing all elements. #386.
You propose a way to support your use case in Unable to serialize sequence without knowing all elements. #386 (comment).
We release the API you came up with in serde 0.8.0.
We adjust the API in serde 0.9.0 for a variety of reasons explained in the release notes.
Both 0.8 and 0.9 (but not 0.7) support push based serialization of top-level data structures but not nested data structures, as explained at the top of this issue.

The playground code is identical to the 0.9 API so I'm glad it looks great to you. Note that it does not support push-based serialization of nested data structures. I am leaning toward leaving this as is for 1.0 and possibly building format-specific functionality to fill in any gaps. For serde_json this would just be a slightly higher level version of ser::Formatter.

For deserialization I expect people will use tokio-serde which uses FramedRead behind the scenes.

dpc · 2017-03-27T03:55:33Z

After more thinking, I don't think this API will work for non-copy streaming in tokio the way I was thinking about. Basically, types implementing Future would need to be able to save the arbitrary state inside itself. If it's split between two objects with a lifetime dependency, borrowck won't allow to save it during that time.

But I guess deserialization story is more important here, since it's more common to have "half-of a a vector" as a byte representation that is being deserialized, than half of a vector as a Vec type that is being serialized. And to be honest, I am completely lost with deserialization API.

I'm sorry for not being much of a help here. I feel like I'm neither that competent, nor I have enough time to help here. :D

The optimistic part would be that even if serde has an API that might require some buffering, one can always resort to coroutines (as futures) if that would be a problem.

dtolnay · 2017-12-06T07:00:12Z

It doesn't seem like this is going to happen. As discussed upthread, we can flesh out this functionality in format-specific ways, similar to what serde_json does with serde_json::ser::Formatter.

hjfreyer · 2020-09-11T13:33:42Z

It may be worth reconsidering this, especially in the context of some libraries replacing Read with AsyncRead. See discussion in: serde-rs/json#575

haydnv · 2021-01-19T17:21:15Z

Folks stumbling across this thread from a Google search may want to take a look at destream, which I wrote by modifying serde::Deserialize and serde::Serialize to support deserializing from and serializing into a Stream: https://docs.rs/destream/

This didn't require that many changes, so it might make sense to add traits like AsyncDeserialize and AsyncSerialize to Serde and have them forward to Deserialize and Serialize by default (if there's a pressing need for async functionality at all).

dtolnay added the discussion label Feb 17, 2017

dtolnay mentioned this issue Feb 17, 2017

1.0 triage #769

Closed

conradev mentioned this issue Feb 21, 2017

Tokio #654

Closed

dtolnay closed this as completed Dec 6, 2017

hjfreyer mentioned this issue Sep 11, 2020

Consider adding async methods serde-rs/json#575

Closed

c-nixon mentioned this issue Jan 25, 2021

Rewrite rust client types/buffer handling logdna/logdna-rust#2

Merged

2 tasks

serde-rs locked and limited conversation to collaborators Jan 23, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

True push-based serialization #768

True push-based serialization #768

dtolnay commented Feb 17, 2017

dpc commented Feb 21, 2017 •

edited

dtolnay commented Mar 26, 2017

dtolnay commented Mar 26, 2017

dtolnay commented Mar 26, 2017

dtolnay commented Mar 26, 2017

dpc commented Mar 27, 2017 •

edited

dpc commented Mar 27, 2017 •

edited

dtolnay commented Mar 27, 2017

dpc commented Mar 27, 2017

dtolnay commented Dec 6, 2017

hjfreyer commented Sep 11, 2020

haydnv commented Jan 19, 2021

True push-based serialization #768

True push-based serialization #768

Comments

dtolnay commented Feb 17, 2017

dpc commented Feb 21, 2017 • edited

dtolnay commented Mar 26, 2017

dtolnay commented Mar 26, 2017

dtolnay commented Mar 26, 2017

dtolnay commented Mar 26, 2017

dpc commented Mar 27, 2017 • edited

dpc commented Mar 27, 2017 • edited

dtolnay commented Mar 27, 2017

dpc commented Mar 27, 2017

dtolnay commented Dec 6, 2017

hjfreyer commented Sep 11, 2020

haydnv commented Jan 19, 2021

dpc commented Feb 21, 2017 •

edited

dpc commented Mar 27, 2017 •

edited

dpc commented Mar 27, 2017 •

edited