Improve fast deserialization by avoiding field schema retrieval cost #49

volauvent · 2020-05-17T04:58:01Z

Improved fast deserialization speed by avoiding current field schema retrieval cost.
Changed to use fields' schema directly instead of registering and then retrieving
them from HashMap.

JMH benchmark results of fast-deserialization time

Enum array with 200 elements

        |   Avro 1.4(ns)   |  Avro 1.8(ns)
Before  |   7452           |   13101 
After   |   5374           |   7858

Record array with 200 elements

        |   Avro 1.4(ns)   |   Avro 1.8(ns)
Before  |   23068          |   24519
After   |   17854          |   18549

@gaojieliu @FelixGV @radai-rosenblatt

… cost Change to use fields' schema directly instead of registering and then retrieving them from HashMap.

gaojieliu

If I understand correctly, the code change looks good :D

Since the code gen part is quite tricky, Felix and I had a discussion to share a generated seriliazer/de-serializer with the new code gen logic, so that we would be able to see the effect of this change.
So could you share one sample?

volauvent · 2020-05-19T01:52:41Z

Here are deserializers of EnumArray schema:

Generated deserializer before this PR:
https://gist.github.com/volauvent/76e0f69131bf2ce5b9eeba6a72319d85

Generated deserializer after this PR:
https://gist.github.com/volauvent/13ca4ed55b48e761c8b8e474f0a5da4b

FelixGV · 2020-05-26T22:57:49Z

Nice results! The generated code LGTM. The code generator is a bit puzzling. I've seen nothing wrong but I'm not confident that I've looked at everything in-depth enough... in any case, if the tests pass and Gaojie thinks it's good, then I guess it's good enough.

BTW, this is off-topic, but do we have a grasp of why 1.8 would perform slower than 1.4?

gaojieliu · 2020-05-26T23:00:14Z

Sorry, I forgot to follow up, and the generated de-serializer looks good!

volauvent · 2020-05-27T04:18:46Z

BTW, this is off-topic, but do we have a grasp of why 1.8 would perform slower than 1.4?

The reason why previous fast-avro 1.8 EnumArray deserialization performs much slower than 1.4 is that Avro 1.8 uses Schema and Symbol to construct EnumSymbol while avro 1.4 only uses Symbol.

It leads to fast-deserializer 1.8 suffers 2X cost of retrieving schema from HashMap as bellow

// deserializer in avro 1.8
enumArray2 .add(new org.apache.avro.generic.GenericData.EnumSymbol(
readerSchemaMap.get(4483722390578694240L), 
readerSchemaMap.get(4483722390578694240L).getEnumSymbols().get((decoder.readEnum()))));

// deserializer in avro 1.4
enumArray2 .add(new org.apache.avro.generic.GenericData.EnumSymbol(
readerSchemaMap.get(4483722390578694240L).getEnumSymbols().get((decoder.readEnum()))));

@FelixGV

Improve fast deserialization speed by avoiding field schema retrieval…

70d7211

… cost Change to use fields' schema directly instead of registering and then retrieving them from HashMap.

volauvent mentioned this pull request May 18, 2020

Regression in EnumArray fast-avro serialization under Avro 1.4 #50

Closed

gaojieliu reviewed May 18, 2020

View reviewed changes

gaojieliu merged commit 0d79aa6 into linkedin:master May 26, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Improve fast deserialization by avoiding field schema retrieval cost #49

Improve fast deserialization by avoiding field schema retrieval cost #49

volauvent commented May 17, 2020

gaojieliu left a comment

volauvent commented May 19, 2020

FelixGV commented May 26, 2020

gaojieliu commented May 26, 2020

volauvent commented May 27, 2020

Improve fast deserialization by avoiding field schema retrieval cost #49

Improve fast deserialization by avoiding field schema retrieval cost #49

Conversation

volauvent commented May 17, 2020

gaojieliu left a comment

Choose a reason for hiding this comment

volauvent commented May 19, 2020

FelixGV commented May 26, 2020

gaojieliu commented May 26, 2020

volauvent commented May 27, 2020