You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
{{ message }}
This repository has been archived by the owner on Feb 18, 2024. It is now read-only.
A simple way would be just generate names with an auto incrementing integer (like r1, r2, ..., r43). I don't think the avro record names actually matter in the context of arrow (e.g. in avro iirc they can be used for stuff like aliasing so they have to be unique).
Otherwise, a perhaps more 'proper' but more involved solution would be get the full name of each nested record from the fields (a_b_c_d) and generate an avro name for it. This would require some more work in sanitizing the names, since avro names can't contain any characters except alphanum & '_' and also has to start with an alpha character (e.g. see)
Currently, when mapping an Arrow schema to Avro schema, a struct type is mapped to an Avro record with empty string ("") as the Avro record name.
See https://github.com/jorgecarleitao/arrow2/blob/main/src/io/avro/write/schema.rs#L54
This is incorrect. Record names in avro cannot be empty and cannot be repeated across different records.
So, currently avro files with written by arrow2 cannot be deserialized.
Would have to modify to use a real record name, probably the field name.
The text was updated successfully, but these errors were encountered: