Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat(NODE-5957): add BSON indexing API #654

Merged
merged 6 commits into from Mar 7, 2024
Merged

Conversation

nbbeeken
Copy link
Contributor

@nbbeeken nbbeeken commented Mar 4, 2024

Description

What is changing?

  • Add new parseToElements API for experimental use only
  • Breaks out the step of parsing BSON into its components by ONLY determining types, and offsets and lengths of element names and values
  • Adds a new BSONOffsetError, also experimental, that is specific to this API
Algorithm
  1. Defaults offset to 0
  2. Create an empty array of "elements"
  3. Always operates on a BSON document, so it asserts the leading 4 bytes are a positive int32 that is not larger than the length of the input minus the starting offset
    1. We also assert that the position in the input equal to the declared size minus 1 is 0.
  4. The first type of every document begins at offset=4 while that offset is less than the document size plus the starting offset we read a single byte from the input
  5. If that type is 0 and the current offset is equal to the document length then we have reached the end of the document
  6. If that type is 0 and the current offset is NOT equal to the document length, then the BSON is invalid
  7. If the type is any other number we attempt to determine the length of a null-terminated string from the input
    1. The search for null should never overrun the size of the document
  8. Depending on the type we either store a hardcoded length or read the input to determine its length
    1. If Double (1), Long (18), Date (9), Timestamp (17), then the length is 8
    2. If Int (16), then the length is 4
    3. If ObjectId (7), then the length is 12
    4. If Decimal128 (19), then the length is 16
    5. If Boolean (8), then the length is 1
    6. If Null (10), Undefined (6), MaxKey (127), MinKey (255), then the length is 0
    7. If Regexp (11)
      1. We search for a null terminator of the pattern string and use that length plus 1 to search for another null terminator of the flags string
    8. If Object (3), Array (4), CodeWithScope (15)
      1. We use the next 4 bytes as a positive int32 to calculate the length
    9. If String (2), Binary (5), DBpointer (12), Code (13), Symbol (14)
      1. We use the next 4 bytes as a positive int32 plus 4 bytes to calculate the length
      2. If Binary (5) we add 1 to the length for the subtype
      3. If DBpointer (12) we add 12 to the length for the ObjectId
    10. Else we throw an error that the type is unknown
  9. If the length calculated exceeds the documentSize then we throw an error (malformed int32)
  10. Push the type, nameOffset, nameLength, offset, and length to the array of elements
  11. Increment the offset by length
  12. When the loop finishes return elements
Is there new documentation needed for these changes?

There is TS doc marking the new APIs as @experimental

What is the motivation for this change?

This API is intended to be a building block for unlocking optimizations around lazily parsing BSON.

Double check the following

  • Ran npm run check:lint script
  • Self-review completed using the steps outlined here
  • PR title follows the correct format: type(NODE-xxxx)[!]: description
    • Example: feat(NODE-1234)!: rewriting everything in coffeescript
  • Changes are covered by tests
  • New TODOs have a related JIRA ticket

@nbbeeken nbbeeken force-pushed the NODE-5957-parse-to-elements branch 3 times, most recently from ffb41a1 to 290b9ce Compare March 4, 2024 20:51
@nbbeeken nbbeeken force-pushed the NODE-5957-parse-to-elements branch from 290b9ce to 05ad18c Compare March 4, 2024 20:51
@nbbeeken nbbeeken changed the title feat(NODE-5957): add parse to elements API feat(NODE-5957): add BSON indexing API Mar 5, 2024
@nbbeeken nbbeeken marked this pull request as ready for review March 5, 2024 14:48
@W-A-James W-A-James self-assigned this Mar 5, 2024
@W-A-James W-A-James added the Primary Review In Review with primary reviewer, not yet ready for team's eyes label Mar 5, 2024
src/parser/on_demand/parse_to_elements.ts Outdated Show resolved Hide resolved
src/parser/on_demand/parse_to_elements.ts Outdated Show resolved Hide resolved
src/parser/on_demand/parse_to_elements.ts Show resolved Hide resolved
src/utils/number_utils.ts Outdated Show resolved Hide resolved
test/node/parser/on_demand/parse_to_elements.test.ts Outdated Show resolved Hide resolved
test/node/parser/on_demand/parse_to_elements.test.ts Outdated Show resolved Hide resolved
@nbbeeken nbbeeken requested a review from W-A-James March 6, 2024 18:48
W-A-James
W-A-James previously approved these changes Mar 6, 2024
@W-A-James W-A-James added Team Review Needs review from team and removed Primary Review In Review with primary reviewer, not yet ready for team's eyes labels Mar 6, 2024
Copy link
Contributor

@aditi-khare-mongoDB aditi-khare-mongoDB left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, left a few clarifying comments

test/node/error.test.ts Show resolved Hide resolved
src/utils/number_utils.ts Outdated Show resolved Hide resolved
src/parser/on_demand/parse_to_elements.ts Show resolved Hide resolved
src/parser/on_demand/parse_to_elements.ts Outdated Show resolved Hide resolved
src/parser/on_demand/parse_to_elements.ts Show resolved Hide resolved
W-A-James
W-A-James previously approved these changes Mar 7, 2024
Copy link
Contributor

@aditi-khare-mongoDB aditi-khare-mongoDB left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM!

@W-A-James W-A-James dismissed baileympearson’s stale review March 7, 2024 22:57

Concerns addressed and go-ahead to merge given via slack

@W-A-James W-A-James merged commit 2ac17ec into main Mar 7, 2024
4 checks passed
@W-A-James W-A-James deleted the NODE-5957-parse-to-elements branch March 7, 2024 23:02
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Team Review Needs review from team
Projects
None yet
4 participants