Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Record serialization is sensitive to order of fields in struct #47

Open
collosi opened this issue Jul 25, 2018 · 8 comments · Fixed by apache/avro#1650
Open

Record serialization is sensitive to order of fields in struct #47

collosi opened this issue Jul 25, 2018 · 8 comments · Fixed by apache/avro#1650

Comments

@collosi
Copy link
Contributor

collosi commented Jul 25, 2018

I'm not sure this is not intended behavior, but I just got tripped up by this so I thought I'd point it out. If you declare a schema like this:

{
  "type": "record",
  "name": "my_record",
  "fields": [
    {"name": "a", "type": "long"},
    {"name": "b", "type": "string"},
  ]
}

and then try to de/serialize it into a rust struct of this type:

#[derive(Deserialize, Serialize)]
pub struct MyRecord {
  b: String,
  a: i64,
}

You will get an error, "value does not match schema". I'm not sure that should be an error, seems like there's an obvious way to translate from one to another.

The operation that is erroring for me is:

to_avro_datum(&record_scheme, to_value(record)?)?

@flavray
Copy link
Owner

flavray commented Jul 26, 2018

Hmm good point, thanks for raising that. I feel like it should still be an error, as for these two different schemas:

"fields": [
  {"name": "a", "type": "long"},
  {"name": "b", "type": "string"},
]

and

"fields": [
  {"name": "b", "type": "string"},
  {"name": "a", "type": "long"},
]

Avro will encode them differently, and schema resolution will not succeed if writing from one and reading from the other. 🙂


Another limitation is that we currently do not rely on the schema to transform a Serialize-able struct into a Value. This could (probably) be changed if it is really needed, but would make the code a bit less straightforward.

@collosi
Copy link
Contributor Author

collosi commented Jul 26, 2018

Yeah, I understand the point about two differing schema, but I think it may be confusing for people coming from the Rust community where I don't think order of fields on a struct ever matters. I don't know much about other implementations of Serde serializers, but if, for example, many other implementations where able to handle this case, it might be a sticking point.

Either way, I don't think its critical, once you understand what's going on, it's easy to handle. Maybe a note in the documentation somewhere?

@samschlegel
Copy link

samschlegel commented Oct 30, 2018

Avro fields are resolved by name not order, so reordering field order is completely legal according to the Schema Resolution rules. I haven't looked into avro-rs at all so I'm not sure how much work this would be, but this is how it's done in Java for reference. We generate a lot of our Avro schemas, so this is a crucial feature in practice.

EDIT: Also here are their parsing and resolution docs https://avro.apache.org/docs/1.8.1/api/java/org/apache/avro/io/parsing/doc-files/parsing.html

@Pritesh-Patel
Copy link

Pritesh-Patel commented Feb 23, 2019

I just ran into this - was pretty confusing, if the plan is to stick this - it would be helpful if it was mentioned somewhere.

@codehearts
Copy link
Contributor

I've changed the record validation loop to this just to remove the ordering constraint, although this implementation isn't the cleanest. This issue makes it difficult to enforce the presence of fields with a common Event struct when pre-existing schemas may not be in the same order, which is something I've been dealing with

            use std::collections::HashMap;
            if fields.len() == record_fields.len() {
                let data_fields: HashMap<String, Value> = record_fields.iter().cloned().collect();

                fields
                        .iter()
                        .try_for_each(|field| {
                            if let Some(value) = data_fields.get(&field.name) {
                                validate(value, &field.schema)
                            } else {
                                Err(Error::from(format!(
                                    "Expected field named {} but could not find it",
                                    field.name
                                )))
                            }
                        })
// …

@mirosval
Copy link

It appears that this is also a problem when you want to use #[serde(flatten)] as the Struct that gets spliced will mess up the order of the parent struct, fascinating.

@martin-g
Copy link

A PR with a fix at apache/avro#1650

@ctb
Copy link

ctb commented Jan 1, 2023

Looks like apache/avro#1650 was merged - should this be closed?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

9 participants