Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

More permissive support for sequence -> struct deser #2432

Closed
alexheretic opened this issue Apr 25, 2023 · 5 comments
Closed

More permissive support for sequence -> struct deser #2432

alexheretic opened this issue Apr 25, 2023 · 5 comments

Comments

@alexheretic
Copy link

alexheretic commented Apr 25, 2023

It is possible to deserialize sequence data into a struct. E.g.

#[derive(Debug, Deserialize)]
struct A {
    number: i32,
    text: String,
    _dont: IgnoredAny,
    _really: IgnoredAny,
    _care: IgnoredAny,
}

let json = r#"[1234,"hello",[],345.234,true]"#;

let data: A = serde_json::from_str(json).unwrap();
// A { number: 1234, text: "hello", _dont: IgnoredAny, _really: IgnoredAny, _care: IgnoredAny }

However, the number of fields are required to match the sequence length. This has a couple of consequences:

  • Each sequence item I'm not interested in must be explicitly ignored.
  • If the sequence API changes and adds another item this will break deserialization.

Ideally I'd like a way of ignoring the tail.

#[derive(Debug, Deserialize)]
struct A {
    number: i32,
    text: String,
    _tail: IgnoredSequenceTail,
}

And json sequences [1234,"hello",[],345.234,true] & [1234,"hello",[],345.234,true,123] would work.

Example: Implementing the trait for this behaviour is fairly verbose
#[derive(Debug)]
struct A {
    number: i32,
    text: String,
}

impl<'de> Deserialize<'de> for A {
    fn deserialize<D>(deserializer: D) -> Result<A, D::Error>
    where
        D: Deserializer<'de>,
    {
        struct SeqVisitor;

        impl<'de> Visitor<'de> for SeqVisitor {
            type Value = A;

            fn expecting(&self, formatter: &mut Formatter) -> std::fmt::Result {
                formatter.write_str("An array [number, text, ...]")
            }

            fn visit_seq<SeqAccessor>(
                self,
                mut seq: SeqAccessor,
            ) -> Result<Self::Value, SeqAccessor::Error>
            where
                SeqAccessor: SeqAccess<'de>,
            {
                let number = seq
                    .next_element::<i32>()?
                    .ok_or_else(|| serde::de::Error::missing_field("number"))?;
                let text = seq
                    .next_element::<String>()?
                    .ok_or_else(|| serde::de::Error::missing_field("text"))?;
                while seq.next_element::<IgnoredAny>()?.is_some() {}
                Ok(A { number, text })
            }
        }

        deserializer.deserialize_seq(SeqVisitor)
    }
}

Is there, or could there be, a way of achieving this without implementing Deserialize for A?

@Mingun
Copy link
Contributor

Mingun commented Apr 29, 2023

Your use-case could be solved by using #[serde(flatten)] for IgnoredAny:

#[derive(Debug, Deserialize)]
struct A {
    number: i32,
    text: String,

    #[serde(flatten)]
    _extra: IgnoredAny,
}

...if not for several obstacles:

  • IgnoredAny cannot be flatten:
    deserialize_ignored_any()

    forwarded to

    serde/serde/src/private/de.rs

    Lines 2704 to 2706 in f583401

    fn deserialize_other<V>() -> Result<V, E> {
    Err(Error::custom("can only flatten structs and maps"))
    }
  • Instead you could use HashMap<IgnoredAny, IgnoredAny>, but IgnoredAny doesn't implement neither Hash, PartialEq, Eq or Ord (so BTreeMap also cannot be used)
  • So you need to use HashMap<Any, IgnoredAny>, where
    #[derive(Debug, Deserialize, Clone, Copy)]
    #[serde(transparent)]
    struct Any(IgnoredAny);
    impl PartialEq for Any {
        fn eq(&self, _: &Any) -> bool { true }
    }
    impl Eq for Any {}
    impl Hash for Any {
        fn hash<H>(&self, h: &mut H) where H: Hasher { ().hash(h) }
    }
  • This is slightly inefficient because you'll forced to create a HashMap. You could implement you own Blackhole type as a copy of IgnoredAny, but use deserialize_map instead of deserialize_ignored_any so it can be flattened
  • Unfortunately, all this is useless because structs with flatten fields cannot be deserialized from sequences... because flatten support is badly designed and not properly tested.

Mingun added a commit to Mingun/serde that referenced this issue Apr 30, 2023
failures (6):
    regression::issue2432::deny_unknown_fields::seq
    regression::issue2432::deny_unknown_fields::tuple
    regression::issue2432::deny_unknown_fields::tuple_struct
    regression::issue2432::no_deny_unknown_fields::seq
    regression::issue2432::no_deny_unknown_fields::tuple
    regression::issue2432::no_deny_unknown_fields::tuple_struct
@dtolnay
Copy link
Member

dtolnay commented May 5, 2023

I think this use case would be a better fit for a helper library, as opposed to changing how permissive everybody's derived impls are. I've found usually the fact that some formats (e.g. serde_json, but not serde_yaml) allow deserializing serde structs from an array is already surprising for people, even with only matching lengths. I think accepting arbitrarily long sequences of other data would be worse / more often not what you want.

Depending on your use case, an appropriate implementation might look something like:

#[derive(Deserialize)]
#[serde(remote = "Self")]
struct A {
    number: i32,
    text: String,
}

impl<'de> Deserialize<'de> for A {
    fn deserialize<D: Deserializer<'de>>(deserializer: D) -> Result<Self, D::Error> {
        ignore_tail(deserializer, Self::deserialize)
    }
}

or a derive macro which produces the above. And/or something you could use with serde(deserialize_with = "ignore_tail") inside of the surrounding data structure.

@dtolnay dtolnay closed this as completed May 5, 2023
@Mingun
Copy link
Contributor

Mingun commented May 5, 2023

I've found usually the fact that some formats (e.g. serde_json, but not serde_yaml) allow deserializing serde structs from an array is already surprising for people, even with only matching lengths.

Then maybe change Deserialize derive and do not allow deserializing from sequences for maps? First of all, why is it even allowed? If it makes sense, it's worth at least documenting it in the code

@dtolnay
Copy link
Member

dtolnay commented May 5, 2023

Whether it is allowed is a property of the format just as much as the Deserialize impl. Both need to support it. You can write Deserialize impls that uniformly only deserialize from a map, even in data formats that would ordinarily support deserializing structs from an array (like serde_json). And you can write data formats that uniformly do not allow deserializing any struct from an array (serde_yaml does this).

Regarding why derived Deserialize impls have that functionality and leave it up to the data format whether to use it: it's because formats like postcard, where field names do not exist, can only deserialize structs by feeding them a list of field data in order. If derived Deserialize impls did not support receiving a list of fields in order without field names, you could not deserialize them from postcard and other compact binary formats.

@alexheretic
Copy link
Author

Thanks for the suggestion. But I'm still not sure how I implement ignore_tail to work this way

impl<'de> Deserialize<'de> for A {
    fn deserialize<D: Deserializer<'de>>(deserializer: D) -> Result<Self, D::Error> {
        ignore_tail(deserializer, Self::deserialize)
    }
}

Mingun added a commit to Mingun/serde that referenced this issue Jun 9, 2023
failures (6):
    regression::issue2432::deny_unknown_fields::seq
    regression::issue2432::deny_unknown_fields::tuple
    regression::issue2432::deny_unknown_fields::tuple_struct
    regression::issue2432::no_deny_unknown_fields::seq
    regression::issue2432::no_deny_unknown_fields::tuple
    regression::issue2432::no_deny_unknown_fields::tuple_struct
Mingun added a commit to Mingun/serde that referenced this issue Jul 23, 2023
failures (6):
    regression::issue2432::deny_unknown_fields::seq
    regression::issue2432::deny_unknown_fields::tuple
    regression::issue2432::deny_unknown_fields::tuple_struct
    regression::issue2432::no_deny_unknown_fields::seq
    regression::issue2432::no_deny_unknown_fields::tuple
    regression::issue2432::no_deny_unknown_fields::tuple_struct
Mingun added a commit to Mingun/serde that referenced this issue Jul 28, 2023
failures (6):
    regression::issue2432::deny_unknown_fields::seq
    regression::issue2432::deny_unknown_fields::tuple
    regression::issue2432::deny_unknown_fields::tuple_struct
    regression::issue2432::no_deny_unknown_fields::seq
    regression::issue2432::no_deny_unknown_fields::tuple
    regression::issue2432::no_deny_unknown_fields::tuple_struct
Mingun added a commit to Mingun/serde that referenced this issue Aug 4, 2023
failures (6):
    regression::issue2432::deny_unknown_fields::seq
    regression::issue2432::deny_unknown_fields::tuple
    regression::issue2432::deny_unknown_fields::tuple_struct
    regression::issue2432::no_deny_unknown_fields::seq
    regression::issue2432::no_deny_unknown_fields::tuple
    regression::issue2432::no_deny_unknown_fields::tuple_struct
Mingun added a commit to Mingun/serde that referenced this issue Jul 27, 2024
failures (6):
    regression::issue2432::deny_unknown_fields::seq
    regression::issue2432::deny_unknown_fields::tuple
    regression::issue2432::deny_unknown_fields::tuple_struct
    regression::issue2432::no_deny_unknown_fields::seq
    regression::issue2432::no_deny_unknown_fields::tuple
    regression::issue2432::no_deny_unknown_fields::tuple_struct
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Development

Successfully merging a pull request may close this issue.

3 participants