Skip to content
This repository has been archived by the owner on Feb 18, 2024. It is now read-only.

Avro schema: Invalid record names #1269

Closed
Samrose-Ahmed opened this issue Oct 7, 2022 · 3 comments · Fixed by #1279
Closed

Avro schema: Invalid record names #1269

Samrose-Ahmed opened this issue Oct 7, 2022 · 3 comments · Fixed by #1279
Labels
bug Something isn't working

Comments

@Samrose-Ahmed
Copy link
Contributor

Samrose-Ahmed commented Oct 7, 2022

Currently, when mapping an Arrow schema to Avro schema, a struct type is mapped to an Avro record with empty string ("") as the Avro record name.

See https://github.com/jorgecarleitao/arrow2/blob/main/src/io/avro/write/schema.rs#L54

This is incorrect. Record names in avro cannot be empty and cannot be repeated across different records.

So, currently avro files with written by arrow2 cannot be deserialized.

Would have to modify to use a real record name, probably the field name.

@jorgecarleitao jorgecarleitao added the bug Something isn't working label Oct 12, 2022
@jorgecarleitao
Copy link
Owner

Do you know what would be a good or standard name here?

@Samrose-Ahmed
Copy link
Contributor Author

A simple way would be just generate names with an auto incrementing integer (like r1, r2, ..., r43). I don't think the avro record names actually matter in the context of arrow (e.g. in avro iirc they can be used for stuff like aliasing so they have to be unique).

Otherwise, a perhaps more 'proper' but more involved solution would be get the full name of each nested record from the fields (a_b_c_d) and generate an avro name for it. This would require some more work in sanitizing the names, since avro names can't contain any characters except alphanum & '_' and also has to start with an alpha character (e.g. see)

@Samrose-Ahmed
Copy link
Contributor Author

I took a stab at a PR following the first approach let me know what you think.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
bug Something isn't working
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants