Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Deserializer -> Deserialize Communication #1325

Closed
mitsuhiko opened this issue Jun 28, 2018 · 12 comments
Closed

Deserializer -> Deserialize Communication #1325

mitsuhiko opened this issue Jun 28, 2018 · 12 comments

Comments

@mitsuhiko
Copy link
Contributor

For some more complex parsing situations we're in need of being able to inject some data into some Deserialize calls with data the Deserializer has available. In particular we're wrapping a deserializer similar to serde-ignored does to track paths and to provide access to some data that the deserializer might have.

There is currently no way to exchange some state between the two however. I was thinking about this a bit and was wondering if it makes sense to extend the Deserializer trait to add two new methods to it: get_state<T>() -> Option<&T> which uses an AnyMap internally. That way our wrapping deserializer could have a type Info and then the Deserialize that are interested can call something like if let Some(info) = deserializer.get_state::<Info>() to access it:

use serde::de::{self, Deserialize, Deserializer};

impl<'de, T: Deserialize> Deserialize<'de> for Annotated<T> {
    fn deserialize<D>(deserializer: D) -> Result<Self, D::Error>
    where
        D: Deserializer<'de>,
    {
        let path = if let Some(info) = deserializer.get_state::<Info>() {
            Some(info.get_path().to_owned())
        } else {
            None
        };

        Annotated {
            value: T::deserialize(deserializer)?,
            path: path
        }
    }
}

Sadly because deserializers are generally consumed on deserialization it means it can only be accessed before parsing. It also means that this data needs to be cloned upon use. So it probably means data needs to be stored in Rc and similar things.

The upside of this implementation would be that it is generally backwards compatible.

@mitsuhiko
Copy link
Contributor Author

I just realized that this does not work the moment buffering is involved as the deserializer wrapper runs ahead of the inner one making this facility effectively useless.

@mitsuhiko
Copy link
Contributor Author

It can be done by storing the state in the content.

@Arnavion
Copy link

Arnavion commented Jul 3, 2018

Why does DeserializeSeed not work for you?

extern crate serde;
#[macro_use] extern crate serde_derive;
extern crate serde_json;

use serde::{ Deserialize, Deserializer, de::DeserializeSeed };

#[derive(Debug)]
struct Annotated<T> {
    value: T,
    path: Option<String>,
}

struct Info {
    path: String,
}

struct InfoSeed<T>(Option<Info>, std::marker::PhantomData<T>);

impl<'de, T> DeserializeSeed<'de> for InfoSeed<T> where T: Deserialize<'de> {
    type Value = Annotated<T>;

    fn deserialize<D>(self, deserializer: D) -> Result<Self::Value, D::Error> where D: Deserializer<'de> {
        let path = self.0.as_ref().map(|info| info.path.to_owned());
        Ok(Annotated {
            value: Deserialize::deserialize(deserializer)?,
            path,
        })
    }
}

#[derive(Debug, Deserialize)]
struct Event {
    event_id: String,
}

fn main() {
    let json = r#"{
        "event_id": "864ee97977bf43ac96d74f7486d138ab"
    }"#;
    let mut deserializer = serde_json::Deserializer::from_str(json);

    let seed = InfoSeed(Some(Info { path: "foo".to_string() }), Default::default());
    let result: Annotated<Event> = seed.deserialize(&mut deserializer).unwrap();
    println!("{:#?}", result);
}
Annotated {
    value: Event {
        event_id: "864ee97977bf43ac96d74f7486d138ab"
    },
    path: Some(
        "foo"
    )
}

@mitsuhiko
Copy link
Contributor Author

Because DeserializeSeed is not Deserialize and Deserializer does not know about the seed. So your solution only works on one level. We have a tree of Annotated objects that want to use derive.

@quark-zju
Copy link

quark-zju commented Sep 1, 2018

I think I encountered a similar issue. I'm trying to get something similar to Thrift TCompact encoding. That is, using "field_id"s instead of "field_name"s to do efficient and compact serialization. It'd be ideal to use attributes to label field id for each field. But I cannot figure out a way to pass that "field id" information to the Serializer or Deserializer (one-level seems fine, recursively is problematic).

@xoac
Copy link

xoac commented Oct 28, 2018

I saw the serde extension here: https://github.com/Marwes/serde_state. Maybe this will be useful in your case. I would be more happy to support this extension officially or manage with state in other way.

@mitsuhiko
Copy link
Contributor Author

Sadly not. This crate while nice in theory does not really work in practice for the situations we tried. We BTW dropped serde's derives now and use our own traits. We just deserialize into a JSON value and then transmute the data after the fact. While not as efficient and pretty terrible in theory we already needed buffering due to serde-flatten and this approach turns out to be about as fast in practice.

@xoac
Copy link

xoac commented Oct 28, 2018

@mitsuhiko Do you have this implementation available public or it's closed software?

@mitsuhiko
Copy link
Contributor Author

@xoac the serde solution (using a fork for state) is at getsentry/marshal on github. The new one is currently on a private repo. I can open it later.

@dtolnay
Copy link
Member

dtolnay commented Jan 6, 2019

I am closing this as an acknowledged issue that I don't see getting fixed within the design of Serde. Other serialization libraries (like your Serde fork) that make different tradeoffs may be able to fit this use case better.

@dtolnay dtolnay closed this as completed Jan 6, 2019
@mitsuhiko
Copy link
Contributor Author

Does this imply that #1183 is also unlikely to be fixed now? They in a way were both related by the internal buffer.

@dtolnay
Copy link
Member

dtolnay commented Jan 6, 2019

I'll keep #1183 open and give it another attempt as associated type defaults (rust-lang/rust#29661) approaches stabilization. I believe the workaround without associated type defaults could have been made to work but would have been too complicated to maintain.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Development

No branches or pull requests

5 participants