Avro schema: Invalid record names #1269

Samrose-Ahmed · 2022-10-07T19:36:11Z

Currently, when mapping an Arrow schema to Avro schema, a struct type is mapped to an Avro record with empty string ("") as the Avro record name.

See https://github.com/jorgecarleitao/arrow2/blob/main/src/io/avro/write/schema.rs#L54

This is incorrect. Record names in avro cannot be empty and cannot be repeated across different records.

So, currently avro files with written by arrow2 cannot be deserialized.

Would have to modify to use a real record name, probably the field name.

jorgecarleitao · 2022-10-12T04:38:10Z

Do you know what would be a good or standard name here?

Samrose-Ahmed · 2022-10-13T14:19:16Z

A simple way would be just generate names with an auto incrementing integer (like r1, r2, ..., r43). I don't think the avro record names actually matter in the context of arrow (e.g. in avro iirc they can be used for stuff like aliasing so they have to be unique).

Otherwise, a perhaps more 'proper' but more involved solution would be get the full name of each nested record from the fields (a_b_c_d) and generate an avro name for it. This would require some more work in sanitizing the names, since avro names can't contain any characters except alphanum & '_' and also has to start with an alpha character (e.g. see)

Samrose-Ahmed · 2022-10-21T17:46:26Z

I took a stab at a PR following the first approach let me know what you think.

jorgecarleitao added the bug Something isn't working label Oct 12, 2022

Samrose-Ahmed mentioned this issue Oct 21, 2022

Added avro record names when converting arrow schema to avro #1279

Merged

jorgecarleitao closed this as completed in #1279 Nov 23, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Avro schema: Invalid record names #1269

Avro schema: Invalid record names #1269

Samrose-Ahmed commented Oct 7, 2022 •

edited

jorgecarleitao commented Oct 12, 2022

Samrose-Ahmed commented Oct 13, 2022

Samrose-Ahmed commented Oct 21, 2022

Avro schema: Invalid record names #1269

Avro schema: Invalid record names #1269

Comments

Samrose-Ahmed commented Oct 7, 2022 • edited

jorgecarleitao commented Oct 12, 2022

Samrose-Ahmed commented Oct 13, 2022

Samrose-Ahmed commented Oct 21, 2022

Samrose-Ahmed commented Oct 7, 2022 •

edited