Skip to content

Conversation

patrickfreed
Copy link
Contributor

SWIFT-1072

This PR includes the BSON library portion of the insertion improvements. A subsequent PR on the driver repo will be opened after this one goes through.

benchmark libbson based median time target time (4x) unoptimized swift-bson post-optimizations
small doc bulk insert 0.096 0.12 4258.673 0.117
large doc bulk insert 0.332 0.415 59.637 0.454
small doc insertOne 3.258 4.0725 8.276 3.412
large doc insertOne 0.327 0.409 28.742 0.456

As noted in the design, we missed on the large doc benchmarks by a little, but those ones are still super fast compared to other MongoDB drivers, so it isn't a huge concern.

*
* - Throws: `BSONError.InvalidArgumentError` if the provided BSON's length does not match the encoded length.
*/
public init(fromBSONWithoutValidatingElements bson: ByteBuffer) throws {
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this will be the initializer that we use in the driver when dealing with BSON from libmongoc. It has to be a part of the public API since we use it in our other package, but I think that's okay because I can imagine users may find themselves in a similar situation.

I'm open to less verbose / different names for this by the way. I chose this one since I figured we might as well make it as explicit as possible if we don't want users to accidentally use this initializer wrongly.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't have an idea for a better name at the moment. however, I wonder if it's worth considering combining this with the previous initializer and adding a validateElements or something parameter that defaults to true? I don't feel that strongly in either direction.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hm yeah that's also an interesting idea. I think I lean slightly towards a separate initializer here since the error outcomes and performance characteristics change a lot depending on the value of the parameter, and having separate initializers may help to communicate this more explicitly. In general though, adding the parameter instead of having a mouthful single parameter seems to make more sense.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

keeping them separate for that reason makes sense to me. I'm fine with keeping this as-is


internal init(fromUnsafeBSON storage: BSONDocumentStorage) {
internal init(fromBSONWithoutValidatingElements storage: BSONDocumentStorage) throws {
try storage.validateLength()
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since it's basically free to validate the length and it makes the iterator logic a lot safer, it made sense to do it as a bare minimum validation.

internal convenience init(over doc: BSONDocument) {
self.init(over: doc.buffer)
}

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A lot of the changes in this file are for making it safe to use an iterator with invalid BSON, defaulting to just returning early or nil when errors are encountered.

@patrickfreed patrickfreed marked this pull request as ready for review January 22, 2021 22:11
@patrickfreed patrickfreed requested a review from kmahar January 22, 2021 22:11
*
* - Throws: `BSONError.InvalidArgumentError` if the provided BSON's length does not match the encoded length.
*/
public init(fromBSONWithoutValidatingElements bson: ByteBuffer) throws {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't have an idea for a better name at the moment. however, I wonder if it's worth considering combining this with the previous initializer and adding a validateElements or something parameter that defaults to true? I don't feel that strongly in either direction.

*/
internal mutating func set(key: String, to value: BSON?) throws {
guard let range = try BSONDocumentIterator.findByteRange(for: key, in: self) else {
internal mutating func set(key: String, to value: BSON?) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we only call this method from the subscript setter, could we make it private?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fixed


/// Search for the value associated with the given key, returning its type if found and nil otherwise.
/// This moves the iterator right up to the first byte of the value.
internal func findValue(forKey key: String) -> BSONType? {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

could we possibly match by accident on the right string but in the wrong place? like if a subdocument contains the same key, or a string value contains it, etc.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think so, since we use skipNextValue between each attempt to read a key.

Copy link
Contributor

@kmahar kmahar Jan 25, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

oh I missed that 🤦‍♀️ I think I actually understand what this method is doing now.... I feel like the readWithUnsafeReadableBytes API is kind of confusing, IMO it's not obvious that its moving the reader index forward or that it's returning the second value in the returned tuple.
I guess the reader index moving anyway is kind of hidden by the fact that we call this on a mutable buffer owned by the iterator.

*
* - Throws: `BSONError.InvalidArgumentError` if the provided BSON's length does not match the encoded length.
*/
public init(fromBSONWithoutValidatingElements bson: ByteBuffer) throws {
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hm yeah that's also an interesting idea. I think I lean slightly towards a separate initializer here since the error outcomes and performance characteristics change a lot depending on the value of the parameter, and having separate initializers may help to communicate this more explicitly. In general though, adding the parameter instead of having a mouthful single parameter seems to make more sense.

*/
internal mutating func set(key: String, to value: BSON?) throws {
guard let range = try BSONDocumentIterator.findByteRange(for: key, in: self) else {
internal mutating func set(key: String, to value: BSON?) {
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fixed


/// Search for the value associated with the given key, returning its type if found and nil otherwise.
/// This moves the iterator right up to the first byte of the value.
internal func findValue(forKey key: String) -> BSONType? {
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think so, since we use skipNextValue between each attempt to read a key.

/// element.
internal static func findByteRange(for searchKey: String, in document: BSONDocument) throws -> Range<Int>? {
/// Returns nil if invalid BSON is encountered.
internal static func findByteRange(for searchKey: String, in document: BSONDocument) -> Range<Int>? {
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since this wasn't required by the benchmarks, I didn't originally update this to use the fast / unsafe findValue. It seems like a natural fit so I've updated it now.

*
* - Throws: `BSONError.InvalidArgumentError` if the provided BSON's length does not match the encoded length.
*/
public init(fromBSONWithoutValidatingElements bson: ByteBuffer) throws {
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One thing that isn't reflected in this PR is that something like the following could panic:

let doc = try BSONDocument(fromBSONWithoutValidatingElements: bson)
for i in 0..<doc.count {
    print(doc[i])
}

This is because the index-based subscript doesn't return an optional since it just fatalErrors when an out-of-bounds index is provided. Given that this subscript already can fatalError pretty easily, it seemed okay to me to fatalError on invalid BSON too, even if count says its okay. What do you think?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

hmmm. I'm not sure we really have any other option besides like returning some placeholder value, right? it seems OK to me, I honestly kind of doubt people are using this subscript much anyway


/// Search for the value associated with the given key, returning its type if found and nil otherwise.
/// This moves the iterator right up to the first byte of the value.
internal func findValue(forKey key: String) -> BSONType? {
Copy link
Contributor

@kmahar kmahar Jan 25, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

oh I missed that 🤦‍♀️ I think I actually understand what this method is doing now.... I feel like the readWithUnsafeReadableBytes API is kind of confusing, IMO it's not obvious that its moving the reader index forward or that it's returning the second value in the returned tuple.
I guess the reader index moving anyway is kind of hidden by the fact that we call this on a mutable buffer owned by the iterator.

*
* - Throws: `BSONError.InvalidArgumentError` if the provided BSON's length does not match the encoded length.
*/
public init(fromBSONWithoutValidatingElements bson: ByteBuffer) throws {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

keeping them separate for that reason makes sense to me. I'm fine with keeping this as-is

*
* - Throws: `BSONError.InvalidArgumentError` if the provided BSON's length does not match the encoded length.
*/
public init(fromBSONWithoutValidatingElements bson: ByteBuffer) throws {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

hmmm. I'm not sure we really have any other option besides like returning some placeholder value, right? it seems OK to me, I honestly kind of doubt people are using this subscript much anyway

@patrickfreed patrickfreed merged commit cbb9662 into mongodb:master Jan 26, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants