Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix identifier deserialization from non-u32 #963

Merged
merged 1 commit into from
Sep 9, 2017

Conversation

sfackler
Copy link
Contributor

Closes #962

@clarfonthey
Copy link
Contributor

Maybe @dtolnay can talk about why u32 was decided over u64?

@erickt
Copy link
Member

erickt commented Jun 22, 2017

This looks good to me. If I recall correctly, I think the compiler represents c-like variants as u32, so we just were expecting that. I'm not sure if there's any real cost here to casting up to u64. If we really wanted to be fancy we could generate methods without the cast, but I bet the compiler can optimize all this away.

@dtolnay
Copy link
Member

dtolnay commented Jun 23, 2017

(Responding jointly to this and #961.) I don't have a strong opinion on these, but there are some downsides.

  • Together these PRs make it easy for a deserializer to accidentally support field indices, even though that may not be desirable in that format. For instance if we merge both PRs, suddenly serde_yaml can deserialize "{1:B,0:A}" into struct S { a: String, b: String }. I don't think YAML would want to support field indices the way CBOR does.

    Forcing deserializers to write code to opt out of field indices, rather than code to opt in, isn't great. I think CBOR's usage is unusual and should not leak into other formats so mindlessly.

  • There is some value in using a consistent type for variant indices across the serializer and deserializer APIs. The u32 that you pass to Serializer::serialize_struct_variant is the u32 that you can expect to get back in the visitor. People will see the visit_u64 in generated code, write deserializers that call visit_u64, and now variant indices are u32 on the way in and u64 on the way out which is gross.

pyfisch/cbor#37 is unfortunate but is that a real use case? I would just say packed CBOR does not support skip_serializing_if, as we do with Bincode.

@sfackler
Copy link
Contributor Author

serde_yaml can already deserialize [B,A] into struct S { a: String, b: String } - this isn't really any weirder than that IMO.

It's also gross for deserializers to have to remember that deserialize_identifier was called and turn u8/u16/u64 into u32 rather than just calling straight into visit_u8/visit_u16/visit_u64 like it normally would.

The only reason to use a map for the packed encoding is so that you can skip fields. Otherwise you'd just serialize the fields into an array.

@sfackler
Copy link
Contributor Author

The behavior here is also equivalent to how the deserializers for every numeric type work, right?

@Phaiax
Copy link

Phaiax commented Jul 5, 2017

There are binary formats that do not keep the size of an int.
https://github.com/3Hren/msgpack-rust/blob/master/rmp/src/encode/uint.rs#L138

@sfackler
Copy link
Contributor Author

@dtolnay ping - again, this is just making the behavior for identifiers match that of other integer types.

@sfackler
Copy link
Contributor Author

sfackler commented Sep 4, 2017

This is what's required to work around this in the case of CBOR, for reference:

macro_rules! forward_visit {
    ($name:ident, bound $bound:path) => {
        fn $name<A>(self, a: A) -> result::Result<V::Value, A::Error>
        where
            A: $bound,
        {
            (self.0).$name(a)
        }
    };
    ($name:ident, $ty:ty) => {
        fn $name<E>(self, v: $ty) -> result::Result<V::Value, E>
        where
            E: de::Error,
        {
            (self.0).$name(v)
        }
    };
    ($name:ident) => {
        fn $name<E>(self) -> result::Result<V::Value, E>
        where
            E: de::Error,
        {
            (self.0).$name()
        }
    }
}

macro_rules! cast_visit {
    ($name:ident, $ty:ty) => {
        fn $name<E>(self, v: $ty) -> result::Result<V::Value, E>
            where E: de::Error
        {
            self.visit_u32(v as u32)
        }
    }
}

// identifiers only deserialize from u32, but we compact down to u8/u16 during
// serialization
struct IdentifierVisitor<V>(V);

impl<'de, V> de::Visitor<'de> for IdentifierVisitor<V>
where
    V: de::Visitor<'de>,
{
    type Value = V::Value;

    fn expecting(&self, fmt: &mut fmt::Formatter) -> fmt::Result {
        self.0.expecting(fmt)
    }

    forward_visit!(visit_bool, bool);
    forward_visit!(visit_i8, i8);
    forward_visit!(visit_i16, i16);
    forward_visit!(visit_i32, i32);
    forward_visit!(visit_i64, i64);
    cast_visit!(visit_u8, u8);
    cast_visit!(visit_u16, u16);
    forward_visit!(visit_u32, u32);
    forward_visit!(visit_u64, u64);
    forward_visit!(visit_f32, f32);
    forward_visit!(visit_f64, f64);
    forward_visit!(visit_char, char);
    forward_visit!(visit_str, &str);
    forward_visit!(visit_borrowed_str, &'de str);
    forward_visit!(visit_string, String);
    forward_visit!(visit_bytes, &[u8]);
    forward_visit!(visit_borrowed_bytes, &'de [u8]);
    forward_visit!(visit_byte_buf, Vec<u8>);
    forward_visit!(visit_none);
    forward_visit!(visit_some, bound de::Deserializer<'de>);
    forward_visit!(visit_unit);
    forward_visit!(visit_newtype_struct, bound de::Deserializer<'de>);
    forward_visit!(visit_seq, bound de::SeqAccess<'de>);
    forward_visit!(visit_map, bound de::MapAccess<'de>);
    forward_visit!(visit_enum, bound de::EnumAccess<'de>);
}

@dtolnay dtolnay merged commit 8e86942 into serde-rs:master Sep 9, 2017
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Development

Successfully merging this pull request may close these issues.

None yet

5 participants