Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support for parsing a list of schemas which may have cross dependencies #173

Merged
merged 12 commits into from
Jan 29, 2021

Conversation

batconjurer
Copy link
Contributor

The current library only allows the parsing of jsons that are completely specified in their definitions, i.e. it is not possible to specify two types, one of which depends on the other, and be able to parse this without first composing the jsons.

For example, there is currently no way to parse the following situation "out of the box"

{
 "name": "A",
 "type": "array",
 "items": "B"
},
{
  "name": "B",
  "type": "map",
  "values": "double"
}

This means that in real life, one must work with very deeply nested json strings which can be very unwieldy. Additionally, many users would naturally like to specify lots of types with cross dependencies on each other but do not want to do lots of json manipulation to be able to parse their types.

This also affect libraries downstream. I attempted a fix for this with rsgen-avro since I wanted this feature for code generation (necessary for my work). However, large amounts of logic necessary to do this replicated logic already present in this repository. Thus I agreed with the maintainer that this feature more naturally belonged here.

This adds a new parsing function Schema::parse_list which takes a list of json strings, which is allowed to have cross dependencies. An internal data structure Parser is created which does all the parsing and also maintains a map of parsed types to draw from as necessary. This data structure is dropped after parsing is finished so that the statelessness of the Schema type is maintained.

In the added tests, concrete examples can be found.

@poros
Copy link
Collaborator

poros commented Jan 25, 2021

Hello! As of today, I am stepping down as a maintainer of this library for a while (see #174). Please ping @flavray from now on.
Thanks again for your contribution :)

Copy link
Owner

@flavray flavray left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks really solid to me, and great test suite, thank you for the great contribution!
I've got a few comments and questions, but it should be mergeable really soon. 🙂

This feature has been in my backlog for a loong time, albeit I had another design in mind (a schema store, where schemas would be added one by one and referenced, but I actually like your stateless/do-everything-at-once approach!). So, thank you again!

src/schema.rs Outdated Show resolved Hide resolved
src/schema.rs Outdated Show resolved Hide resolved
src/schema.rs Outdated Show resolved Hide resolved
src/schema.rs Show resolved Hide resolved
src/schema.rs Outdated Show resolved Hide resolved
src/schema.rs Show resolved Hide resolved
tests/schema.rs Show resolved Hide resolved
@batconjurer
Copy link
Contributor Author

This looks really solid to me, and great test suite, thank you for the great contribution!
I've got a few comments and questions, but it should be mergeable really soon. slightly_smiling_face

This feature has been in my backlog for a loong time, albeit I had another design in mind (a schema store, where schemas would be added one by one and referenced, but I actually like your stateless/do-everything-at-once approach!). So, thank you again!

That's so much for your quick response and helpful review. I have tried to address your comments as best I can. The biggest question I still have involves only supporting name-able types. I think that makes sense, and I added to the doc-string something to that effect, but it is currently not enforced.

So the question is, do we return and error or simply discourage types other than name-ables, although in some cases it may work?

Copy link
Owner

@flavray flavray left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks good to me, thank you for addressing comments so quickly!
Just one small nit, but nothing blocking. 🙂

Regarding name-able types, I think the docstring is sufficient. The code will return an error at L432 if anyone attempts to use non-named schemas, so fine by me!

src/schema.rs Outdated Show resolved Hide resolved
src/schema.rs Show resolved Hide resolved
Base automatically changed from master to main January 29, 2021 17:47
@flavray
Copy link
Owner

flavray commented Jan 29, 2021

Would it also make sense to include a small example of how to use this in src/lib.rs (here)? This will then show up in the README and users will be able to find this more easily!

To update README.md once you've updated src/lib.rs, you can run:

cargo install cargo-readme
make readme

@batconjurer
Copy link
Contributor Author

I have updated the readme and src/lib.rs with an example. Hopefully that wraps this PR up. Thanks again for the prompt reviews.

Copy link
Owner

@flavray flavray left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for the contribution!

@flavray flavray merged commit eef23fc into flavray:main Jan 29, 2021
@flavray
Copy link
Owner

flavray commented Jan 29, 2021

I've merged this, and will cut a new release tomorrow to have this available. 🙂

@flavray
Copy link
Owner

flavray commented Jan 29, 2021

v0.13.0 has just been released with this change. 🙂

@markfarnan
Copy link

This is a great feature, however one key question.

Once this structure is parsed, how can I USE it in the rest of the functions which only seem to take 'schema'. rather than Vec ?

I've been reading the code of Avro-Rust and this lib for ages, but can't see how to use this result when calling Reader, Writer, or in my case, to_avro_datum

If I just one one of the Schemas from the array (the root schema), it throws SchemResolutionError as it can't find the child schema for the referenced type.

Any advice greatly appreciated.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants