Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

True push-based serialization #768

Closed
dtolnay opened this issue Feb 17, 2017 · 12 comments
Closed

True push-based serialization #768

dtolnay opened this issue Feb 17, 2017 · 12 comments

Comments

@dtolnay
Copy link
Member

dtolnay commented Feb 17, 2017

Consider a program that receives a stream of integers:

impl Program {
    fn new(out: &mut io::Write) -> Self {
        /* ... */
    }

    fn write(&mut self, v: u64) -> Result<()> {
        /* ... */
    }

    fn end(self) -> Result<()> {
        /* ... */
    }
}

and writes out a JSON array:

[4751,2389,6529,3343]

This is fine in Serde 0.9.

But if we want to write out pairs of numbers:

[[4751,2389],[6529,3343]]

it is no longer possible to accomplish this in a streaming way with the current API. The program needs to buffer the entire content of each inner array before writing it. The problem is that SerializeSeq::serialize_element and similar functions are pull-based; they pull data out of the Serialize that you pass, and the entire data must be available.

Let's map out the advantages and possible disadvantages of moving to an entirely push-based serialization model.

@dtolnay dtolnay mentioned this issue Feb 17, 2017
@conradev conradev mentioned this issue Feb 21, 2017
@dpc
Copy link
Contributor

dpc commented Feb 21, 2017

I was just going to open an issue about but I've found it's already open. I've noticed recently that things changed at some point, since I reported #386. (@dtolnay, weren't we talking on IRC recently about it)? .

I think it's especially important in the light of tokio being a standard Rust async solution. If Rust went with coroutines like in Go, it wouldn't be a problem for any code that is being blocked on input to yield and resume when more data is available. With approach based on Futures, a state machine has to drive serialization from the outside. Therefore it seems to me that standard serialization crate must be entirely push-based, otherwise we're going to be in trouble.

Also the inconsistency is the worst part. Either everything has to be push, or everything has to be pull. Otherwise it doesn't work well. And for the reason explained above, push is the right way here, I think.

BTW. The whole push vs pull thing has been bitting me a lot recently in my own code. I should write a blogpost / article about it. I think in Rust, due to ability to reasonably easily express very efficient, zero-copy data handling code, I immediately notice this ... "inversion of control" (the term is used for something different, but I feel it fits well here) .

@dtolnay
Copy link
Member Author

dtolnay commented Mar 26, 2017

Here is some possible inspiration:

pub trait Serialize {
    fn serialize<S>(&self, serializer: S) -> Result<S::Ok, S::Error>
        where S: Serializer;
}

pub trait Serializer: Sized {
    type Ok;
    type Error;

    type SerializeSeq: SerializeSeq<Ok = Self::Ok, Error = Self::Error>;

    fn serialize_seq(self, len: Option<usize>) -> Result<Self::SerializeSeq, Self::Error>;
}

pub trait SerializeSeq {
    type Ok;
    type Error;

    type Element: Serializer<Ok = Self::Ok, Error = Self::Error>;

    fn element(&mut self) -> Self::Element;
    fn end(self) -> Result<Self::Ok, Self::Error>;
}

impl<T> Serialize for Vec<T>
    where T: Serialize
{
    fn serialize<S>(&self, serializer: S) -> Result<S::Ok, S::Error>
        where S: Serializer
    {
        let mut seq = serializer.serialize_seq(Some(self.len()))?;
        for v in self {
            v.serialize(seq.element())?;
        }
        seq.end()
    }
}

@dtolnay
Copy link
Member Author

dtolnay commented Mar 26, 2017

I think this is going to be blocked on associated type constructors: type Element<'a>.

@dtolnay
Copy link
Member Author

dtolnay commented Mar 26, 2017

Oh this could be interesting:

pub trait Serialize {
    fn serialize<S>(&self, serializer: S) -> Result<S::Ok, S::Error>
        where S: Serializer;
}

pub trait Serializer: Sized {
    type Ok;
    type Error;

    type SerializeSeq: SerializeSeq<Ok = Self::Ok, Error = Self::Error>;

    fn serialize_seq(self, len: Option<usize>) -> Result<Self::SerializeSeq, Self::Error>;
}

pub trait SerializeSeq: Sized {
    type Ok;
    type Error;

    type Element: Serializer<Ok = Self, Error = Self::Error>;

    fn element(self) -> Self::Element;
    fn end(self) -> Result<Self::Ok, Self::Error>;
}

impl<T> Serialize for Vec<T>
    where T: Serialize
{
    fn serialize<S>(&self, serializer: S) -> Result<S::Ok, S::Error>
        where S: Serializer
    {
        let mut seq = serializer.serialize_seq(Some(self.len()))?;
        for element in self {
            seq = element.serialize(seq.element())?;
        }
        seq.end()
    }
}

@dtolnay
Copy link
Member Author

dtolnay commented Mar 26, 2017

@dpc I don't plan to put much more time into this for Serde 1.0. Would you be interested in helping me prototype an approach that works for you?

Here is some playground code of a simplified Serde 0.9 serializer API and two existing use cases to work off of.

@dpc
Copy link
Contributor

dpc commented Mar 27, 2017

I'm happy to review and comment as much as I can. BTW. Isn't this issue exactly what #386 was? I think at the end things settled on having an explicit State-like type. That was implemented in 0.8, and then changed into 0.9.

I've reviewed the playground link and it looks great to me. Though I wouldn't trust a review of me alone for such design decisions. I'm sure asking eg. on reddit for comments could bring some eyeballs to it. I spoke too soon. I have some questions.

Also, how are things with deserialization? I haven't really used it too much myself, so I'm not sure.

@dpc
Copy link
Contributor

dpc commented Mar 27, 2017

I have always had trouble reasoning about trait where T : &'a Something.

@dtolnay
Copy link
Member Author

dtolnay commented Mar 27, 2017

Here is a brief history of #386:

The playground code is identical to the 0.9 API so I'm glad it looks great to you. Note that it does not support push-based serialization of nested data structures. I am leaning toward leaving this as is for 1.0 and possibly building format-specific functionality to fill in any gaps. For serde_json this would just be a slightly higher level version of ser::Formatter.

For deserialization I expect people will use tokio-serde which uses FramedRead behind the scenes.

@dpc
Copy link
Contributor

dpc commented Mar 27, 2017

After more thinking, I don't think this API will work for non-copy streaming in tokio the way I was thinking about. Basically, types implementing Future would need to be able to save the arbitrary state inside itself. If it's split between two objects with a lifetime dependency, borrowck won't allow to save it during that time.

But I guess deserialization story is more important here, since it's more common to have "half-of a a vector" as a byte representation that is being deserialized, than half of a vector as a Vec type that is being serialized. And to be honest, I am completely lost with deserialization API.

I'm sorry for not being much of a help here. I feel like I'm neither that competent, nor I have enough time to help here. :D

The optimistic part would be that even if serde has an API that might require some buffering, one can always resort to coroutines (as futures) if that would be a problem.

@dtolnay
Copy link
Member Author

dtolnay commented Dec 6, 2017

It doesn't seem like this is going to happen. As discussed upthread, we can flesh out this functionality in format-specific ways, similar to what serde_json does with serde_json::ser::Formatter.

@hjfreyer
Copy link

It may be worth reconsidering this, especially in the context of some libraries replacing Read with AsyncRead. See discussion in: serde-rs/json#575

@haydnv
Copy link

haydnv commented Jan 19, 2021

Folks stumbling across this thread from a Google search may want to take a look at destream, which I wrote by modifying serde::Deserialize and serde::Serialize to support deserializing from and serializing into a Stream: https://docs.rs/destream/

This didn't require that many changes, so it might make sense to add traits like AsyncDeserialize and AsyncSerialize to Serde and have them forward to Deserialize and Serialize by default (if there's a pressing need for async functionality at all).

@serde-rs serde-rs locked and limited conversation to collaborators Jan 23, 2022
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Development

No branches or pull requests

4 participants