allow for complex schema to refer to previously named schema #53

stew · 2018-09-14T08:13:31Z

No description provided.

Instead of trying to resolve the references at parse time, we store references in the AST, and then use the SchemaParseContext not only for parsing but also for decoding. This allows us to support recursive type definitions. I believe this patch should also include some improvements in how we handle namespaces and aliases. Tests are still needed to prove aliases are working correctly. Tests are still needed for testing encoding.

stew · 2018-09-14T08:15:42Z

I believe this allow for recursive schemata which admits thing such as linked lists and trees.

flavray

First of all, thank you for taking a stab at this, this is massive work!

I was not able to go through all the changes yet, but I left a few comments already, on the overall design of the change that I feel are worth discussing.

The fact that avro allows recursive types personally feels like madness to me, but heh, it's in the spec so we have to consider adding support for it, so I am truly grateful that you are doing this. 🙂

flavray · 2018-09-29T09:32:24Z

src/de.rs

            Value::Union(ref inner) if inner.as_ref() == &Value::Null => visitor.visit_none(),
            Value::Union(ref inner) => visitor.visit_some(&mut Deserializer::new(inner)),
-            _ => Err(Error::custom("not a union")),
+            ref inner =>visitor.visit_some(&mut Deserializer::new(inner)),


nit: missing space after the =>

flavray · 2018-09-29T09:38:48Z

src/decode.rs

@@ -24,8 +25,8 @@ fn decode_len<R: Read>(reader: &mut R) -> Result<usize, Error> {
 }

 /// Decode a `Value` from avro format given its `Schema`.
-pub fn decode<R: Read>(schema: &Schema, reader: &mut R) -> Result<Value, Error> {
-    match *schema {
+pub fn decode<R: Read>(schema: &Arc<Schema>, reader: &mut R, context: &mut SchemaParseContext) -> Result<Value, Error> {


In order to keep backwards compatibility, would it make sense to have a new function (e.g: decode_with_context) with this signature?

That way, we can keep a decode function with the previous signature, that would just call decode_with_context(schema, reader, <empty context>)

flavray · 2018-09-29T09:38:54Z

src/de.rs

@@ -172,9 +172,10 @@ impl<'a, 'de> de::Deserializer<'de> for &'a mut Deserializer<'de> {
        V: Visitor<'de>,
    {
        match *self.input {
+            Value::Null => visitor.visit_none(),


Not sure I understand why this is needed now. Would you mind giving some details? 🙂
The deserializer code is responsible for decoding data that exactly matches the Rust data type (all sorts of transformations can then be done via Value::resolve)

(Same for the ref inner 3 lines below)

I'm not sure either, I tossed it in because it fixed a problem I was having deserializing an Option. If I can reproduce this, it deserves a unit test and to go into its own PR

flavray · 2018-09-29T09:41:22Z

src/decode.rs

@@ -24,8 +25,8 @@ fn decode_len<R: Read>(reader: &mut R) -> Result<usize, Error> {
 }

 /// Decode a `Value` from avro format given its `Schema`.
-pub fn decode<R: Read>(schema: &Schema, reader: &mut R) -> Result<Value, Error> {
-    match *schema {
+pub fn decode<R: Read>(schema: &Arc<Schema>, reader: &mut R, context: &mut SchemaParseContext) -> Result<Value, Error> {


It seems unclear to me why the &Schema had to be changed into Arc<Schema>, would you mind elaborating on this as well? 🙂

What I have done was to take extend the previous PR trying to accomplish this which made the transition from &Schema to Rc when we started temporarily additionally storing the schemata in the SchemaParseContext, and then I changed Rc to Arc because of the new requirement for Send to be implemented here:

https://gh/stew/avro-rs/blob/feature%2Frefernce-types/src/schema.rs#L1234

This is an area where my lack of experience with Rust is going to be obvious, so I welcome any advice, but it does seem to me that we should be able to go back to using references if we track the lifetimes all they way through so that we are proving that the additional references we are keeping only last for the lifetime of the decode operation. That was a battle I wasn't ready to wage with the typechecker when I was trying to get all this proved out, but I can take another stab at it.

flavray · 2018-09-29T09:42:59Z

src/decode.rs

@@ -152,5 +154,8 @@ pub fn decode<R: Read>(schema: &Schema, reader: &mut R) -> Result<Value, Error>
                Err(DecodeError::new("enum symbol not found").into())
            }
        },
+        Schema::TypeReference(ref name) => context.lookup_type(name, &context)
+            .map_or_else(|| Err(DecodeError::new("enum symbol not found").into()),


nit: maybe change enum into something else in the error message?

flavray · 2018-09-29T10:00:52Z

src/encode.rs

@@ -10,7 +11,11 @@ use util::{zig_i32, zig_i64};
 /// be valid with regards to the schema. Schema are needed only to guide the
 /// encoding for complex type values.
 pub fn encode(value: &Value, schema: &Schema, buffer: &mut Vec<u8>) {
-    encode_ref(&value, schema, buffer)
+    encode_ref_inner(&value, &Arc::new(schema.clone()), buffer, &mut SchemaParseContext::new())


Wouldn't cloning a schema every time a value is encoding end up being expensive?

We would have to run the benchmarks to see if there is a noticeable difference, but it seems like extra work is being done in this function

flavray · 2018-09-29T10:01:54Z

src/encode.rs

@@ -32,7 +37,7 @@ fn encode_int(i: i32, buffer: &mut Vec<u8>) {
 /// **NOTE** This will not perform schema validation. The value is assumed to
 /// be valid with regards to the schema. Schema are needed only to guide the
 /// encoding for complex type values.
-pub fn encode_ref(value: &Value, schema: &Schema, buffer: &mut Vec<u8>) {
+pub(crate)  fn encode_ref_inner(value: &Value, schema: &Arc<Schema>, buffer: &mut Vec<u8>, context: &mut SchemaParseContext) {


As for decode, would it make sense to keep encode_ref available somewhere, to avoid backwards-incompatible changes?

flavray · 2018-09-29T10:03:27Z

src/reader.rs

                return Ok(())
            },
-            Err(e) => if let ErrorKind::UnexpectedEof = e.downcast::<::std::io::Error>()?.kind() {
+            Err(e) =>{


nit: missing space

flavray · 2018-09-29T10:05:36Z

src/schema.rs

+    /// A reference to a type defined in this schema
+    TypeReference(NameRef),
+    /// A `array` Avro schema.
+    /// `Array` holds a counted reference (`Rc`) to the `Schema` of its items.


nit: Rc -> Arc

flavray · 2018-09-29T10:06:22Z

src/schema.rs

    /// A `map` Avro schema.
    /// `Map` holds a pointer to the `Schema` of its values, which must all be the same schema.
    /// `Map` keys are assumed to be `string`.
-    Map(Box<Schema>),
+    Map(Arc<Schema>),


I am also curious as to why Array and Map now need an Arc, instead of a simple Box. Would you mind elaborating a bit? 🙂

stew · 2018-10-01T04:26:13Z

Wow, thanks for all your feedback 😄

I'm going to try to address all the items you brought up, might take me a day or two. I'm pretty new to rust, so I'm likely making some simple mistakes, so thanks for bearing with me. I also am expecting that I've negatively impacted performance and I haven't yet taken a look at how bad, I wanted to get some feedback on the approach before getting too hung up on any details, and I've probably taken some shortcuts with some negative performance impacts (like using Arc), which I'm happy to try my best to figure out.

Of course, the very first set of types I'm needing to encode includes a Tree, which is naturally a recursive data structure, which is where my motivation is coming from. In trying to further my own project while waiting for feedback from you, I've tossed some bugfixes into this branch which don't necessarily fit with this PR, I can make some effort to separate those and rebase those against master to simplify this PR if that's wanted.

HarkonenBade · 2018-10-29T17:11:13Z

I'd be very interested in this as well. As while the schema i'm using does not define recursive types, it does define some custom types (like a type for UUID values) which are defined at first use and then afterwards only referenced by name in the schemata (which was generated using avro tools from a idl spec).

jackmead515 · 2019-06-07T16:45:14Z

Been a while. How's this pull going? Need some help? How far to completion?

stew · 2020-04-18T09:30:55Z

closing in favor of #99 ; I'm thrilled that someone else is making an effort here, (thanks @GregBowyer :) )

I apologize that the conditions that got me motivated to get this working didn't exist for long :(

Kyle Phelps and others added 6 commits June 26, 2018 21:17

Support referencing named types

61b5800

Merge branch 'master' into feature/refernce-types

0497ba7

union tests are passing

b9ea87e

fix warnings from clippy

a65d950

get rid of some println! I was using for debugging

1315db1

stew changed the title ~~allow for complex schemata to refer to~~ allow for complex schemata to refer to previously named schema Sep 14, 2018

stew changed the title ~~allow for complex schemata to refer to previously named schema~~ allow for complex schema to refer to previously named schema Sep 14, 2018

stew added 3 commits September 14, 2018 09:20

seems like I need this to deserialize options

1dc7c7d

hmm, also this

7a4ce6f

hmm

2545ca6

flavray self-requested a review September 15, 2018 08:08

stew added 2 commits September 18, 2018 15:58

suppport encoding of enums

2722f60

need to track current namespace while validating

f957411

stew force-pushed the feature/refernce-types branch from 0ba3ab5 to f957411 Compare September 25, 2018 18:48

make error type public since it is returned from a public api

cd4e5c8

flavray requested changes Sep 29, 2018

View reviewed changes

poros mentioned this pull request Oct 13, 2019

Properly support recursive types #92

Open

poros added the feature request label Oct 13, 2019

GregBowyer mentioned this pull request Oct 29, 2019

Allow the parsing of recursive types (#91) #99

Open

4 tasks

poros mentioned this pull request Apr 11, 2020

fix-for-reusing-enum-references #118

Closed

stew closed this Apr 18, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

allow for complex schema to refer to previously named schema #53

allow for complex schema to refer to previously named schema #53

stew commented Sep 14, 2018

stew commented Sep 14, 2018

flavray left a comment

flavray Sep 29, 2018

flavray Sep 29, 2018

flavray Sep 29, 2018

stew Oct 1, 2018

flavray Sep 29, 2018

stew Oct 1, 2018

flavray Sep 29, 2018

flavray Sep 29, 2018

flavray Sep 29, 2018

flavray Sep 29, 2018

flavray Sep 29, 2018

flavray Sep 29, 2018

stew commented Oct 1, 2018

HarkonenBade commented Oct 29, 2018

jackmead515 commented Jun 7, 2019

stew commented Apr 18, 2020

allow for complex schema to refer to previously named schema #53

allow for complex schema to refer to previously named schema #53

Conversation

stew commented Sep 14, 2018

stew commented Sep 14, 2018

flavray left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

stew commented Oct 1, 2018

HarkonenBade commented Oct 29, 2018

jackmead515 commented Jun 7, 2019

stew commented Apr 18, 2020