Avro does not respect default values defined in schema #416

basimons · 2023-12-04T19:20:37Z

Hello,

I encountered something strange while doing some tests with the avro decoding.

Example here, was ran in version 2.16.0:

 String avroWithDefault = """
        {
        "type": "record",
        "name": "Employee",
        "fields": [
         {"name": "name", "type": ["string", "null"], "default" : "bram"},
         {"name": "age", "type": "int"},
         {"name": "emails", "type": {"type": "array", "items": "string"}},
         {"name": "boss", "type": ["Employee","null"]}
        ]}
        """;

// Notice no name field
String employeeJson = """
{
    "age" : 26,
    "emails" : ["test@test.com"],
    "boss" : {
         "name" : "test",
         "age" : 33,
         "emails" : ["test@test.com"]
    }
}
""";

SchemaFormat schema = new AvroMapper().schemaFrom(avroWithDefault);
JsonNode jsonObject = new ObjectMapper().reader().readTree(payload);
byte[] objectAsBytes = new AvroMapper().writer().with(formatSchema).writeValueAsBytes(jsonObject);

// Decode it again
JsonNode decodedObject = new AvroMapper().reader(schema).readTree(payload);

System.out.println(decodedObject.toString());

If you look at this object you see that the default value is not filled. It is just a null, all the other fields are filled just as expected. I tried this with different schemas and not having a union with a null, but just the default, but that would result in a JsonMappingException.

Am I doing something wrong here, or is this not supported? It doesn't say that it does not support default values like it says in the protobuffer one.

Thanks in advance

EDIT: This makes sense that it does not work, as you cannot write a AVRO file with a default without a value for it. I think it should've thrown an error on writing. But the main question is why it doesn't work with a reading schema that has a default, but a writing schema that does have one. See my other question.

cowtowncoder · 2023-12-04T22:28:10Z

I think this is not supported, at least with Jackson's native Avro read implementation. Apache Avro-lib -backed variant, while slower, might handle default values correctly.

As to how to enable Apache Avro lib backend, I think there are unit tests that do that.

I agree, it'd be good to document this gap.

basimons · 2023-12-05T09:08:30Z

Thanks for your response.

I tried looking for a unit test, but I couldn't find one. I did however find the ApacheAvroparserImpl. When I implemented it like this:

  try (AvroParser parser =new ApacheAvroFactory(new AvroMapper()).createParser(payload)) {
            parser.setSchema(schema);
            
            TreeNode treeNode = parser.readValueAsTree();
            System.out.println(treeNode);
        };

It does not work unfortunately (as in no default values). Am I doing it correctly or should I also use a different codec?

basimons · 2023-12-05T10:13:30Z

I made some changes, as of course the code that I showed in my first message does not fully make sense. You cannot not write a value, even if it has a default. So I changed it to this:

 String writingSchema = """
        {
        "type": "record",
        "name": "Employee",
        "fields": [
         {"name": "age", "type": "int"},
         {"name": "emails", "type": {"type": "array", "items": "string"}},
         {"name": "boss", "type": ["Employee","null"]}
        ]}
        """;

        String readingSchema = """
        {
        "type": "record",
        "name": "Employee",
        "fields": [
         {"name": "name", "type": ["string", "null"], "default" : "bram"},
         {"name": "age", "type": "int"},
         {"name": "emails", "type": {"type": "array", "items": "string"}},
         {"name": "boss", "type": ["Employee","null"]}
        ]}
        """;


        String employeeJson = """
            {
                "age" : 26,
                "emails" : ["test@test.com", "test@test.com"],
                "boss" : {
                    "age" : 33,
                    "emails" : ["test@test.blockbax.com"]
                }
            }
            """;

When I do this, when I read the values, I get the following exception: java.io.IOException: Invalid Union index (26); union only has 2 types. Which is the same as reported here: #164

cowtowncoder · 2023-12-05T16:28:10Z

The only other note I have is that this:

new ApacheAvroFactory(new AvroMapper()).

is wrong way around: it should be

new AvroMapper(new ApacheAvroFactory)

to have correct linking; and then you should be able to create ObjectReader / ObjectWriter through which you can assign schema.

But I suspect that won't change things too much: you should either way have ApacheAvroFactory that is using Apache Avro lib.

basimons · 2023-12-06T07:39:46Z

Ah thanks, didn't know that. I tried it, but as you said it did indeed not work.

Whats weird, I even tried decoding it with the apache avro library myself. I just used GenericDatumReader (and all things that come with it), but I would get exactly the same error. This does not make sense right? As I'm sure that what I'm doing is allowed by Avro (adding a default field in a reader schema, that is not in the write schema), as I have done it many times in my Kafka cluster.

Do you happen to know what the difference might be? Do my Kafka clients do anything special for this?

basimons · 2023-12-06T08:01:04Z

I finally get it. In your kafka cluster it saves the writing schema with it. If you parse it like this:

  Schema avroSchema = ((AvroSchema) schema).getAvroSchema();
        GenericDatumReader<GenericRecord> objectGenericDatumReader = new GenericDatumReader<>(writingschema, avroSchema);

BinaryDecoder binaryDecoder = DecoderFactory.get().binaryDecoder(payload, null);
GenericRecord read = objectGenericDatumReader.read(null, binaryDecoder);

So with the specific writer schema.

It does work. Normally kafka does it this way for you, but I don't think the AvroMapper has a way do to it with 2 schemas.

cowtowncoder · 2023-12-06T18:11:51Z

@basimons Avro module does indeed allow use of 2 schema (read/write) configuration -- it's been a while so I'll have to see how it was done. I think AvroMapper has methods to construct Jackson AvroSchema from 2 separate schemas.

cowtowncoder · 2023-12-06T18:19:21Z

Ah. Close: AvroSchema has method withReaderSchema(AvroSchema rs) where you get both schema instances, then call method on "writer schema" (one used on writing records). From ArrayEvolutionTest:

        final AvroSchema srcSchema = MAPPER.schemaFrom(SCHEMA_XY_ARRAY_JSON);
        final AvroSchema dstSchema = MAPPER.schemaFrom(SCHEMA_XYZ_ARRAY_JSON);
        final AvroSchema xlate = srcSchema.withReaderSchema(dstSchema);

and then you construct ObjectReader as usual.

cowtowncoder added the avro label Dec 4, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Avro does not respect default values defined in schema #416

Avro does not respect default values defined in schema #416

basimons commented Dec 4, 2023 •

edited

Loading

cowtowncoder commented Dec 4, 2023

basimons commented Dec 5, 2023

basimons commented Dec 5, 2023

cowtowncoder commented Dec 5, 2023 •

edited

Loading

basimons commented Dec 6, 2023

basimons commented Dec 6, 2023 •

edited

Loading

cowtowncoder commented Dec 6, 2023 •

edited

Loading

cowtowncoder commented Dec 6, 2023

Avro does not respect default values defined in schema #416

Avro does not respect default values defined in schema #416

Comments

basimons commented Dec 4, 2023 • edited Loading

cowtowncoder commented Dec 4, 2023

basimons commented Dec 5, 2023

basimons commented Dec 5, 2023

cowtowncoder commented Dec 5, 2023 • edited Loading

basimons commented Dec 6, 2023

basimons commented Dec 6, 2023 • edited Loading

cowtowncoder commented Dec 6, 2023 • edited Loading

cowtowncoder commented Dec 6, 2023

basimons commented Dec 4, 2023 •

edited

Loading

cowtowncoder commented Dec 5, 2023 •

edited

Loading

basimons commented Dec 6, 2023 •

edited

Loading

cowtowncoder commented Dec 6, 2023 •

edited

Loading