Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Internal buffering disrupts format-specific deserialization features #1183

Open
mitsuhiko opened this issue Mar 18, 2018 · 9 comments
Open
Labels

Comments

@mitsuhiko
Copy link
Contributor

mitsuhiko commented Mar 18, 2018

Was not sure if I should file this against serde_json or here. As far as I can tell the bug is more likely in serde itself. While implementing #1179 I noticed that non string keys are not supported for flattening. However the same behavior happens if buffering happens for internally tagged enums:

Example that shows the problem:

#[macro_use]
extern crate serde_derive;

extern crate serde;
extern crate serde_json;

use std::collections::HashMap;

#[derive(Deserialize, Debug)]
#[serde(tag = "type")]
enum E {
    M(HashMap<u32, ()>),
}

fn main() {
    let j = r#" {"type": "M", "1": null} "#;
    println!("{}", serde_json::from_str::<E>(j).unwrap_err());
}

This currently fails with this error:

invalid type: string "1", expected u32

However a HashMap<u32, ()> is otherwise entirely permissible by serde_json.

@dtolnay
Copy link
Member

dtolnay commented Mar 19, 2018

There are some other JSON-specific assumptions baked into Content too. For example the representation of Serde enums happens to be what serde_json uses but not necessarily shared by other formats.

I suspect we will need a format-specific optional API on Deserializer for buffering data in a way that is consistent for that format. The default implementation would basically be equivalent to today's Content but formats could override it to use their own representation. Even serde_json may want to switch and use something like serde_json::Value for buffering rather than Content.

We should start by identifying all the ways Content is used in our generated code and generalize those into an API that can be provided by the deserializer.

@dtolnay dtolnay added the bug label Mar 28, 2018
@dtolnay dtolnay changed the title Internal buffering breaks non string types in serde_json Internal buffering breaks non string key types in serde_json Apr 19, 2018
@dtolnay
Copy link
Member

dtolnay commented May 23, 2018

This is complicated by associated type defaults being unstable rust-lang/rust#29661. I thought we might want a format-specific buffer type as a Deserializer associated type with the default being Content, but we will need to find a different way.

@dtolnay
Copy link
Member

dtolnay commented May 23, 2018

Here is a backdoor way to add an associated type.

struct Content;

trait Deserializer {
    // Suppose we want something like this but associated type defaults are unstable.
    type Buffer = Content;
}
trait Deserializer {
    // Instead write it as an associated function that forwards to a generic callback.
    fn deserialize_with_buffering<F>(callback: F) -> F::Value
    where
        F: DeserializeWithBuffer,
    {
        callback.run::<Content>()
    }
}

trait DeserializeWithBuffer {
    type Value;
    fn run<Buffer>(self) -> Self::Value;
}

@kardeiz
Copy link

kardeiz commented Jul 5, 2018

Is there any update on this?

@dtolnay
Copy link
Member

dtolnay commented Aug 11, 2018

Customizing the buffer type by Deserializer could provide a powerful alternative to #1327. The key constraint being satisfied in that approach (as compared to just stashing state in some thread local) was to accurately expose state during deserialization of untagged, internally tagged, and flattened members.

If a Deserializer can replace the buffer type, a library could provide a stateful Deserializer built on TLS by wrapping any existing Deserializer and wrapping that Deserializer's buffer type to correctly track the state.

@dtolnay
Copy link
Member

dtolnay commented Aug 11, 2018

I implemented a proof of concept in #1354.

@dtolnay dtolnay changed the title Internal buffering breaks non string key types in serde_json Internal buffering disrupts format-specific deserialization features Jan 7, 2019
@dtolnay
Copy link
Member

dtolnay commented Jan 7, 2019

Let's give this another attempt as associated type defaults (rust-lang/rust#29661) approaches stabilization. I believe the workaround without associated type defaults in #1354 could have been made to work but would have been too complicated to maintain.

@davepacheco
Copy link

davepacheco commented Aug 4, 2020

Is it possible that this issue explains the behavior described in nox/serde_urlencoded#66? In that case, the author is trying to deserialize the key-value pair bar=123 into a variant B { bar: i32 } in an untagged enum. It seems like this should work. Deserialization fails because "data did not match any variant of untagged enum Q2". When using a similar enum Q1 where the variant is B { bar: String }, it works as expected, with bar = "123".

@dtolnay
Copy link
Member

dtolnay commented Aug 4, 2020

Yeah that looks like a consequence of this issue. #[serde(with = "serde_with::rust::display_fromstr")] on the field (https://docs.rs/serde_with/1.4.0/serde_with/rust/display_fromstr/index.html) would work around it in that case but obviously is not ideal.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Development

Successfully merging a pull request may close this issue.

4 participants