Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Internal buffering disrupts format-specific deserialization features #1183

Open
mitsuhiko opened this issue Mar 18, 2018 · 17 comments
Open

Internal buffering disrupts format-specific deserialization features #1183

mitsuhiko opened this issue Mar 18, 2018 · 17 comments
Labels

Comments

@mitsuhiko
Copy link
Contributor

mitsuhiko commented Mar 18, 2018

Was not sure if I should file this against serde_json or here. As far as I can tell the bug is more likely in serde itself. While implementing #1179 I noticed that non string keys are not supported for flattening. However the same behavior happens if buffering happens for internally tagged enums:

Example that shows the problem:

#[macro_use]
extern crate serde_derive;

extern crate serde;
extern crate serde_json;

use std::collections::HashMap;

#[derive(Deserialize, Debug)]
#[serde(tag = "type")]
enum E {
    M(HashMap<u32, ()>),
}

fn main() {
    let j = r#" {"type": "M", "1": null} "#;
    println!("{}", serde_json::from_str::<E>(j).unwrap_err());
}

This currently fails with this error:

invalid type: string "1", expected u32

However a HashMap<u32, ()> is otherwise entirely permissible by serde_json.

@dtolnay
Copy link
Member

dtolnay commented Mar 19, 2018

There are some other JSON-specific assumptions baked into Content too. For example the representation of Serde enums happens to be what serde_json uses but not necessarily shared by other formats.

I suspect we will need a format-specific optional API on Deserializer for buffering data in a way that is consistent for that format. The default implementation would basically be equivalent to today's Content but formats could override it to use their own representation. Even serde_json may want to switch and use something like serde_json::Value for buffering rather than Content.

We should start by identifying all the ways Content is used in our generated code and generalize those into an API that can be provided by the deserializer.

@dtolnay dtolnay added the bug label Mar 28, 2018
@dtolnay dtolnay changed the title Internal buffering breaks non string types in serde_json Internal buffering breaks non string key types in serde_json Apr 19, 2018
@dtolnay
Copy link
Member

dtolnay commented May 23, 2018

This is complicated by associated type defaults being unstable rust-lang/rust#29661. I thought we might want a format-specific buffer type as a Deserializer associated type with the default being Content, but we will need to find a different way.

@dtolnay
Copy link
Member

dtolnay commented May 23, 2018

Here is a backdoor way to add an associated type.

struct Content;

trait Deserializer {
    // Suppose we want something like this but associated type defaults are unstable.
    type Buffer = Content;
}
trait Deserializer {
    // Instead write it as an associated function that forwards to a generic callback.
    fn deserialize_with_buffering<F>(callback: F) -> F::Value
    where
        F: DeserializeWithBuffer,
    {
        callback.run::<Content>()
    }
}

trait DeserializeWithBuffer {
    type Value;
    fn run<Buffer>(self) -> Self::Value;
}

@kardeiz
Copy link

kardeiz commented Jul 5, 2018

Is there any update on this?

@dtolnay
Copy link
Member

dtolnay commented Aug 11, 2018

Customizing the buffer type by Deserializer could provide a powerful alternative to #1327. The key constraint being satisfied in that approach (as compared to just stashing state in some thread local) was to accurately expose state during deserialization of untagged, internally tagged, and flattened members.

If a Deserializer can replace the buffer type, a library could provide a stateful Deserializer built on TLS by wrapping any existing Deserializer and wrapping that Deserializer's buffer type to correctly track the state.

@dtolnay
Copy link
Member

dtolnay commented Aug 11, 2018

I implemented a proof of concept in #1354.

@dtolnay dtolnay changed the title Internal buffering breaks non string key types in serde_json Internal buffering disrupts format-specific deserialization features Jan 7, 2019
@dtolnay
Copy link
Member

dtolnay commented Jan 7, 2019

Let's give this another attempt as associated type defaults (rust-lang/rust#29661) approaches stabilization. I believe the workaround without associated type defaults in #1354 could have been made to work but would have been too complicated to maintain.

@davepacheco
Copy link

Is it possible that this issue explains the behavior described in nox/serde_urlencoded#66? In that case, the author is trying to deserialize the key-value pair bar=123 into a variant B { bar: i32 } in an untagged enum. It seems like this should work. Deserialization fails because "data did not match any variant of untagged enum Q2". When using a similar enum Q1 where the variant is B { bar: String }, it works as expected, with bar = "123".

@dtolnay
Copy link
Member

dtolnay commented Aug 4, 2020

Yeah that looks like a consequence of this issue. #[serde(with = "serde_with::rust::display_fromstr")] on the field (https://docs.rs/serde_with/1.4.0/serde_with/rust/display_fromstr/index.html) would work around it in that case but obviously is not ideal.

MaxFangX pushed a commit to lexe-app/lexe-public that referenced this issue Aug 4, 2023
It turns out that the `PaymentIndex` in e.g. `get_new_payments` was
silently getting dropped due to a `#[serde(flatten)]` + query string
limitation.

For query strings, using `#[serde(flatten)]` on a field requires that
all inner fields must be string-ish types (&str, String, Cow<'_, str>,
etc...) OR use `SerializeDisplay` and `DeserializeFromStr`.

This issue is due to a limitation in `serde`. See:
<serde-rs/serde#1183>

The fix here is to remove `#[serde(flatten)]` for querystring
serializeable types (actually just `PaymentIndex`) and instead use
`SerializeDisplay` and `DeserializeFromStr`
@espindola
Copy link

espindola commented Sep 5, 2023

This might be related: dtolnay/serde-yaml#388

Yaml can treat a value of 42 as a string. This works without flatten, but fails with it.

Stepping on the code in gdb, the interesting transition seems to be:

#0  serde::__private::de::content::{impl#15}::deserialize_string<serde_yaml::error::Error, serde::de::impls::StringVisitor> (self=..., visitor=...)
    at /home/espindola/.cargo/registry/src/index.crates.io-6f17d22bba15001f/serde-1.0.188/src/private/de.rs:1260                  
#1  0x000055555556516d in serde::de::impls::{impl#8}::deserialize<serde::__private::de::content::ContentDeserializer<serde_yaml::error::Error>> (deserializer=...)
    at /home/espindola/.cargo/registry/src/index.crates.io-6f17d22bba15001f/serde-1.0.188/src/de/impls.rs:582
#2  0x00005555555681f0 in serde::de::{impl#5}::deserialize<alloc::string::String, serde::__private::de::content::ContentDeserializer<serde_yaml::error::Error>> (
    deserializer=...) at /home/espindola/.cargo/registry/src/index.crates.io-6f17d22bba15001f/serde-1.0.188/src/de/mod.rs:794
#3  0x0000555555571b68 in serde::__private::de::{impl#10}::next_value_seed<serde_yaml::error::Error, core::marker::PhantomData<alloc::string::String>> (
    self=0x7fffffffd340, seed=...) at /home/espindola/.cargo/registry/src/index.crates.io-6f17d22bba15001f/serde-1.0.188/src/private/de.rs:2814
#4  0x00005555555716c6 in serde::de::MapAccess::next_value<serde::__private::de::FlatStructAccess<serde_yaml::error::Error>, alloc::string::String> (
    self=0x7fffffffd340) at /home/espindola/.cargo/registry/src/index.crates.io-6f17d22bba15001f/serde-1.0.188/src/de/mod.rs:1865
#5  0x0000555555563867 in serde_yaml::_::{impl#0}::deserialize::{impl#2}::visit_map<serde::__private::de::FlatStructAccess<serde_yaml::error::Error>> (__map=...)
    at src/main.rs:3  
#6  0x0000555555571e31 in serde::__private::de::{impl#8}::deserialize_struct<serde_yaml::error::Error, serde_yaml::_::{impl#0}::deserialize::__Visitor> (self=..., 
    fields=&[&str](size=1) = {...}, visitor=...) at /home/espindola/.cargo/registry/src/index.crates.io-6f17d22bba15001f/serde-1.0.188/src/private/de.rs:2674
#7  0x0000555555563300 in serde_yaml::_::{impl#0}::deserialize<serde::__private::de::FlatMapDeserializer<serde_yaml::error::Error>> (__deserializer=...)
    at src/main.rs:3
#8  0x0000555555564d80 in serde_yaml::_::{impl#0}::deserialize::{impl#2}::visit_map<&mut serde_yaml::de::MapAccess> (__map=0x7fffffffd6d0) at src/main.rs:8

We go from using deserialize_string from serde_yaml, to using one from serde, which doesn't have the required logic.

@juntyr
Copy link

juntyr commented Sep 9, 2023

Let's give this another attempt as associated type defaults (rust-lang/rust#29661) approaches stabilization. I believe the workaround without associated type defaults in #1354 could have been made to work but would have been too complicated to maintain.

It seems like rust-lang/rust#29661 still has a way to go, so in case #![feature(return_position_impl_trait_in_trait)] is stabilized earlier, perhaps something like the following could also work:

pub trait Deserializer<'de>: Sized {
    /* ... */

    fn deserialize_buffered(self) ->
        Result<impl Clone + Deserializer<'de>, Self::Error>
    {
        crate::__private::de::Content::deserialize(self)
    }
}

Are there other options that could perhaps be used to provide this feature earlier, since several serde-supporting formats have to battle buffering-based issues until then?

@juntyr
Copy link

juntyr commented Oct 15, 2023

@dtolnay Since return impl trait in traits will now be stabilised soon, do you think that using it to solve this issue would be an avenue worth pursuing?

@YuhanunCitgez
Copy link

Any update on this matter? Anything we can do to assist?

@docwilco
Copy link

I'll repeat the last comment:

Any update on this matter? Anything we can do to assist?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Development

Successfully merging a pull request may close this issue.

10 participants