Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improve fast deserialization by avoiding field schema retrieval cost #49

Merged

Conversation

volauvent
Copy link
Collaborator

Improved fast deserialization speed by avoiding current field schema retrieval cost.
Changed to use fields' schema directly instead of registering and then retrieving
them from HashMap.

JMH benchmark results of fast-deserialization time

  1. Enum array with 200 elements
        |   Avro 1.4(ns)   |  Avro 1.8(ns)
Before  |   7452           |   13101 
After   |   5374           |   7858 
  1. Record array with 200 elements
        |   Avro 1.4(ns)   |   Avro 1.8(ns)
Before  |   23068          |   24519
After   |   17854          |   18549

@gaojieliu @FelixGV @radai-rosenblatt

… cost

Change to use fields' schema directly instead of registering and then retrieving
them from HashMap.
Copy link
Collaborator

@gaojieliu gaojieliu left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If I understand correctly, the code change looks good :D

Since the code gen part is quite tricky, Felix and I had a discussion to share a generated seriliazer/de-serializer with the new code gen logic, so that we would be able to see the effect of this change.
So could you share one sample?

@volauvent
Copy link
Collaborator Author

Here are deserializers of EnumArray schema:

Generated deserializer before this PR:
https://gist.github.com/volauvent/76e0f69131bf2ce5b9eeba6a72319d85

Generated deserializer after this PR:
https://gist.github.com/volauvent/13ca4ed55b48e761c8b8e474f0a5da4b

@FelixGV
Copy link
Collaborator

FelixGV commented May 26, 2020

Nice results! The generated code LGTM. The code generator is a bit puzzling. I've seen nothing wrong but I'm not confident that I've looked at everything in-depth enough... in any case, if the tests pass and Gaojie thinks it's good, then I guess it's good enough.

BTW, this is off-topic, but do we have a grasp of why 1.8 would perform slower than 1.4?

@gaojieliu
Copy link
Collaborator

Sorry, I forgot to follow up, and the generated de-serializer looks good!

@gaojieliu gaojieliu merged commit 0d79aa6 into linkedin:master May 26, 2020
@volauvent
Copy link
Collaborator Author

BTW, this is off-topic, but do we have a grasp of why 1.8 would perform slower than 1.4?

The reason why previous fast-avro 1.8 EnumArray deserialization performs much slower than 1.4 is that Avro 1.8 uses Schema and Symbol to construct EnumSymbol while avro 1.4 only uses Symbol.

It leads to fast-deserializer 1.8 suffers 2X cost of retrieving schema from HashMap as bellow

// deserializer in avro 1.8
enumArray2 .add(new org.apache.avro.generic.GenericData.EnumSymbol(
readerSchemaMap.get(4483722390578694240L), 
readerSchemaMap.get(4483722390578694240L).getEnumSymbols().get((decoder.readEnum()))));

// deserializer in avro 1.4
enumArray2 .add(new org.apache.avro.generic.GenericData.EnumSymbol(
readerSchemaMap.get(4483722390578694240L).getEnumSymbols().get((decoder.readEnum()))));

@FelixGV

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants