Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix(NODE-3451): fix performance regression from v1 #451

Merged
merged 5 commits into from Aug 18, 2021

Conversation

dariakp
Copy link
Contributor

@dariakp dariakp commented Aug 16, 2021

Description

NODE-3451 documents a performance regression in the node driver v4, which is actually due to a performance regression in js-bson v4 deserialization method (compared to v1).

The notable culprits were:

  • Mandatory rewrapping of all input types, even buffer, into a Buffer class via ensureBuffer
  • Looping over all string bytes via validateUtf8
  • Suboptimal loops over object keys to identify and handle DBRefs

What changed?

  • The entry point deserialize method has been updated to check instanceof Buffer and skip the rewrapping in those instances; this is a temporary measure that only addresses performance for Node.js buffers
    • NOTE: deserializeStream was left untouched for scope reasons
  • The deserializeObject method was updated to check for the presence of potential DBRef keys as it goes, removing the negative performance impact for any objects that do not contain any DBRef keys; there is some further optimization that could be done to eliminate the isDBRefLike check altogether, but since we expect these to be pretty rare, it didn't seem worth optimizing that specific edge case
  • The validateUtf8 method was updated to only run if the \uFFFD character is present: technically, this makes the performance worse for strings that do contain that special character, however, for all other strings, the loop over the resulting string with charCodeAt is faster; unfortunately there is not much else that can be done to optimize string deserialization without losing the validation (short of doing our own decoding)
    • NOTE: the validateUtf8 call in DBPOINTER type was left untouched for scope reasons

After these changes, there may still be a residual 5% performance degradation for the typical use case relative to v1 which can be attributed to the remaining buffer and string validation.

@dariakp dariakp changed the title fix(NODE-3451): partially fix performance regression from v1 fix(NODE-3451): fix performance regression from v1 Aug 16, 2021
@dariakp dariakp requested a review from nbbeeken August 17, 2021 15:22
@dariakp dariakp added the Primary Review In Review with primary reviewer, not yet ready for team's eyes label Aug 17, 2021
src/bson.ts Show resolved Hide resolved
src/parser/deserializer.ts Show resolved Hide resolved
src/parser/deserializer.ts Show resolved Hide resolved
@nbbeeken nbbeeken requested a review from emadum August 17, 2021 18:01
@nbbeeken nbbeeken added Team Review Needs review from team and removed Primary Review In Review with primary reviewer, not yet ready for team's eyes labels Aug 17, 2021
@nbbeeken nbbeeken marked this pull request as ready for review August 17, 2021 18:01
@dariakp dariakp merged commit 2330ab1 into master Aug 18, 2021
@dariakp dariakp deleted the NODE-3451/fix-performance-regression branch August 18, 2021 15:12
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Team Review Needs review from team
Projects
None yet
3 participants