Skip to content

Conversation

patrickfreed
Copy link
Contributor

SWIFT-1070

This PR improves the performance of the BSONDecoder. The goal of this PR is simply to match the libbson-based BSON library's performance, so there are still more possible improvements to be done in the future.

We actually ended up improving upon the existing libbson-based BSONDecoder pretty significantly:

BSON to Native Benchmark Results

benchmark libbson based median time target time (1.25x) post-optimizations
flat BSON 6.002 7.503 2.072
deep BSON 3.463 4.329 2.683
full BSON 3.876 4.845 2.017

We also improved upon the improved Native to BSON benchmarks we originally saw in swift-bson:

Native to BSON Benchmark Results

benchmark libbson based median time target time (1.25x) unoptimized swift-bson post-optimizations
flat BSON 6.793 8.491 2.789 1.456
deep BSON 4.117 5.146 3.504 2.918
full BSON 6.399 7.999 4.479 2.195

return .array(doc.values)
var values: [BSON] = []
let it = doc.makeIterator()
while let (_, val) = try it.nextThrowing() {
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

since BSONDocument.read doesn't validate anymore, we need to use nextThrowing to ensure we don't fatalError. Full validation will be performed in Array.validate if necessary.

public func map<T>(
_ transform: (Element) throws -> T
) rethrows -> [T] {
try AnySequence(self).map(transform)
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This was calling count a lot, which now requires a full walk over the document. To avoid any issues, I just did a naive manual implementation of map.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LOL I was looking at the method body but not the actual context of what method you were implementing here and I was like "... why isn't he just calling map?" duh, because you are implementing map 🤦‍♀️

default:
try doc.set(key: self.keys[i], to: value.bson)
}
doc.append(key: self.keys[i], value: value.bson)
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

since we manually handle overwriting keys in the keyed encoding, we can just append directly without checking prior bytes of the document. This means we don't have to throw in here anymore and can use the .bson computed property.

@patrickfreed patrickfreed marked this pull request as ready for review January 21, 2021 20:43
@patrickfreed patrickfreed requested a review from kmahar January 21, 2021 20:43
@patrickfreed
Copy link
Contributor Author

Oh I forgot to mention: I left off the improvements to BSONDocumentIterator that make use of the unsafe ByteBuffer functionality since they are primarily used to speed up withID / the insertion benchmarks. The next PR I put up for insertion benchmarks will include those changes.

Copy link
Contributor

@kmahar kmahar left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

looking good! just a couple questions


internal init(keyValuePairs: [(String, BSON)]) {
self.keySet = Set(keyValuePairs.map { $0.0 })
guard self.keySet.count == keyValuePairs.count else {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should we still be making this assertion, even if we don't store the key set? I think without this we might allow a user to do something like let doc: BSONDocument = ["a": 1, "a": 2] since the dictionary literal init delegates to this method.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

good catch. Since this is primarily a concern of dictionary literal initializers, I've moved the validation into them instead of here, since this initializer is also used elsewhere (on potentially un-validated documents in the future).

public func map<T>(
_ transform: (Element) throws -> T
) rethrows -> [T] {
try AnySequence(self).map(transform)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LOL I was looking at the method body but not the actual context of what method you were implementing here and I was like "... why isn't he just calling map?" duh, because you are implementing map 🤦‍♀️

Copy link
Contributor

@kmahar kmahar left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm! nice work, exciting to end up with better perf than we had with libbson.

@patrickfreed patrickfreed merged commit ac0e0ee into mongodb:master Jan 22, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants