Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Difficulty serializing recursive structure with unions of record types. #129

Closed
paulcoyle opened this issue Sep 15, 2017 · 3 comments
Closed

Comments

@paulcoyle
Copy link

paulcoyle commented Sep 15, 2017

First of all, thanks for this project; it's an extremely useful library.

I'm a bit new to Avro so my apologies if this problem is a function of my ignorance about it.

I'm trying to create a schema that describes a simple recursive tree structure where the contents of each node in the tree contain one of some known set of records (contained in value fields in example below). With just one type in this union (workingRecursiveNodeType below), serialization works fine. However, adding an additional type (failingRecursiveNodeType below) causes what appear to be schema violations.

Am I doing something wrong in declaring that a field can have a union of record types? If so, is there another way to accomplish this?

Example:

const avsc = require('avsc')

const nodeValueTypeA = avsc.Type.forSchema({
  type: 'record',
  name: 'A',
  fields: [
    { name: 'type', type: { type: 'enum', symbols: ['TYPE_A'] } },
    { name: 'x', type: 'int' }
  ]
})

const nodeValueTypeB = avsc.Type.forSchema({
  type: 'record',
  name: 'B',
  fields: [
    { name: 'type', type: { type: 'enum', symbols: ['TYPE_B'] } },
    { name: 'y', type: 'string' }
  ]
})

const workingRecursiveNodeType = avsc.Type.forSchema({
  type: 'record',
  name: 'R',
  fields: [
    { name: 'value', type: [nodeValueTypeA] },
    { name: 'children', type: { type: 'map', values: 'R' } }
  ]
})

const failingRecursiveNodeType = avsc.Type.forSchema({
  type: 'record',
  name: 'R',
  fields: [
    { name: 'value', type: [nodeValueTypeA, nodeValueTypeB] },
    { name: 'children', type: { type: 'map', values: 'R' } }
  ]
})

const nodeValueA = {
  type: 'TYPE_A',
  x: 123
}

const nodeValueB = {
  type: 'TYPE_B',
  y: 'abc'
}

// With a single "value" type union in the schema, and data with only that type,
// serializing works.
const workingRecursiveNodeData = {
  value: nodeValueA,
  children: {
    childOne: {
      value: nodeValueA,
      children: {}
    },
    childTwo: {
      value: nodeValueA,
      children: {}
    }
  }
}

workingRecursiveNodeType.toBuffer(workingRecursiveNodeData) //-> works
failingRecursiveNodeType.toBuffer(workingRecursiveNodeData) //-> fails with 'Error: invalid ["A","B"]: {"type":"TYPE_A","x":123}'

// The aim is to have the following types of structures:
const goalRecursiveNodeData = {
  value: nodeValueA,
  children: {
    childOne: {
      value: nodeValueB,
      children: {}
    },
    childTwo: {
      value: nodeValueA,
      children: {}
    }
  }
}
@mtth
Copy link
Owner

mtth commented Sep 16, 2017

Hey, good question, this is definitely one of the less intuitive parts of avsc. failingRecursiveNodeType doesn't behave as you expect because [nodeValueTypeA, nodeValueTypeB] is an ambiguous union.

avsc is able to infer which value comes from which type for most unions (for example ['null', 'float'] or ['int', 'string']), which lets us to use the more natural UnwrappedUnionType to represent them.

However, in this case, the union contains two different 'record' types and there is no general good way to efficiently perform the same inference in that case. So avsc falls back to using a WrappedUnionType. We can check that this is the case by tweaking the data to use wrapped values:

const wrappedRecursiveNodeData = {
  value: {A: nodeValueA}, // Wrapped here.
  children: {
    childOne: {
      value: {B: nodeValueB}, // And here.
      children: {}
    },
    childTwo: {
      value: {A: nodeValueA}, // And here.
      children: {}
    }
  }
};

failingRecursiveNodeType.toBuffer(wrappedRecursiveNodeData); // OK.

(You can control how avsc chooses union types via Type.forSchema's wrapUnions option. The method's API doc has information about the various modes available.)

At this point you might be wondering whether there is a way to avoid wrapping values since it's easy to tell which type they belong to from their definition (via their type field). We can do so by using a logical type, which will use the additional knowledge we have about the structure of the data:

class RecursiveType extends avsc.types.LogicalType {

  _fromValue(val) {
    // This is the easy case, we convert the wrapped value to our representation.
    return {
      value: val.value.unwrap(),
      children: val.children
    };
  }

  _toValue(any) {
    // Here we do a little more work to wrap the value based on its `type` field.
    return {
      value: {[any.value.type === 'TYPE_A' ? 'A' : 'B']: any.value},
      children: any.children
    };
  }
}

const recursiveNodeType = avsc.Type.forSchema({
  type: 'record',
  name: 'R',
  logicalType: 'recursive', // Apply our logical type, its name here...
  fields: [
    { name: 'value', type: [nodeValueTypeA, nodeValueTypeB] },
    { name: 'children', type: { type: 'map', values: 'R' } }
  ]
}, {logicalTypes: {recursive: RecursiveType}}); // ...must match the key here.

recursiveNodeType.toBuffer(goalRecursiveNodeData) // Works!

Would either of these solutions work out for your use-case?

@paulcoyle
Copy link
Author

Thank you very much for the excellent description and sorry for the delay in replying. It looks like Logical Types will be exactly what we need. Cheers!

@huttj
Copy link

huttj commented Jan 25, 2019

We can do so by using a logical type, which will use the additional knowledge we have about the structure of the data

This was very helpful, in my case. I think it would be great to include it in the documentation, somewhere.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants