-
Notifications
You must be signed in to change notification settings - Fork 741
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
True push-based serialization #768
Comments
I was just going to open an issue about but I've found it's already open. I've noticed recently that things changed at some point, since I reported #386. (@dtolnay, weren't we talking on IRC recently about it)? . I think it's especially important in the light of Also the inconsistency is the worst part. Either everything has to be push, or everything has to be pull. Otherwise it doesn't work well. And for the reason explained above, push is the right way here, I think. BTW. The whole push vs pull thing has been bitting me a lot recently in my own code. I should write a blogpost / article about it. I think in Rust, due to ability to reasonably easily express very efficient, zero-copy data handling code, I immediately notice this ... "inversion of control" (the term is used for something different, but I feel it fits well here) . |
Here is some possible inspiration: pub trait Serialize {
fn serialize<S>(&self, serializer: S) -> Result<S::Ok, S::Error>
where S: Serializer;
}
pub trait Serializer: Sized {
type Ok;
type Error;
type SerializeSeq: SerializeSeq<Ok = Self::Ok, Error = Self::Error>;
fn serialize_seq(self, len: Option<usize>) -> Result<Self::SerializeSeq, Self::Error>;
}
pub trait SerializeSeq {
type Ok;
type Error;
type Element: Serializer<Ok = Self::Ok, Error = Self::Error>;
fn element(&mut self) -> Self::Element;
fn end(self) -> Result<Self::Ok, Self::Error>;
}
impl<T> Serialize for Vec<T>
where T: Serialize
{
fn serialize<S>(&self, serializer: S) -> Result<S::Ok, S::Error>
where S: Serializer
{
let mut seq = serializer.serialize_seq(Some(self.len()))?;
for v in self {
v.serialize(seq.element())?;
}
seq.end()
}
} |
I think this is going to be blocked on associated type constructors: |
Oh this could be interesting: pub trait Serialize {
fn serialize<S>(&self, serializer: S) -> Result<S::Ok, S::Error>
where S: Serializer;
}
pub trait Serializer: Sized {
type Ok;
type Error;
type SerializeSeq: SerializeSeq<Ok = Self::Ok, Error = Self::Error>;
fn serialize_seq(self, len: Option<usize>) -> Result<Self::SerializeSeq, Self::Error>;
}
pub trait SerializeSeq: Sized {
type Ok;
type Error;
type Element: Serializer<Ok = Self, Error = Self::Error>;
fn element(self) -> Self::Element;
fn end(self) -> Result<Self::Ok, Self::Error>;
}
impl<T> Serialize for Vec<T>
where T: Serialize
{
fn serialize<S>(&self, serializer: S) -> Result<S::Ok, S::Error>
where S: Serializer
{
let mut seq = serializer.serialize_seq(Some(self.len()))?;
for element in self {
seq = element.serialize(seq.element())?;
}
seq.end()
}
} |
@dpc I don't plan to put much more time into this for Serde 1.0. Would you be interested in helping me prototype an approach that works for you? Here is some playground code of a simplified Serde 0.9 serializer API and two existing use cases to work off of. |
I'm happy to review and comment as much as I can. BTW. Isn't this issue exactly what #386 was? I think at the end things settled on having an explicit
Also, how are things with deserialization? I haven't really used it too much myself, so I'm not sure. |
I have always had trouble reasoning about trait where |
Here is a brief history of #386:
The playground code is identical to the 0.9 API so I'm glad it looks great to you. Note that it does not support push-based serialization of nested data structures. I am leaning toward leaving this as is for 1.0 and possibly building format-specific functionality to fill in any gaps. For serde_json this would just be a slightly higher level version of For deserialization I expect people will use |
After more thinking, I don't think this API will work for non-copy streaming in tokio the way I was thinking about. Basically, types implementing But I guess deserialization story is more important here, since it's more common to have "half-of a a vector" as a byte representation that is being deserialized, than half of a vector as a I'm sorry for not being much of a help here. I feel like I'm neither that competent, nor I have enough time to help here. :D The optimistic part would be that even if serde has an API that might require some buffering, one can always resort to coroutines (as futures) if that would be a problem. |
It doesn't seem like this is going to happen. As discussed upthread, we can flesh out this functionality in format-specific ways, similar to what serde_json does with |
It may be worth reconsidering this, especially in the context of some libraries replacing |
Folks stumbling across this thread from a Google search may want to take a look at destream, which I wrote by modifying serde::Deserialize and serde::Serialize to support deserializing from and serializing into a Stream: https://docs.rs/destream/ This didn't require that many changes, so it might make sense to add traits like AsyncDeserialize and AsyncSerialize to Serde and have them forward to Deserialize and Serialize by default (if there's a pressing need for async functionality at all). |
Consider a program that receives a stream of integers:
and writes out a JSON array:
This is fine in Serde 0.9.
But if we want to write out pairs of numbers:
it is no longer possible to accomplish this in a streaming way with the current API. The program needs to buffer the entire content of each inner array before writing it. The problem is that SerializeSeq::serialize_element and similar functions are pull-based; they pull data out of the Serialize that you pass, and the entire data must be available.
Let's map out the advantages and possible disadvantages of moving to an entirely push-based serialization model.
The text was updated successfully, but these errors were encountered: