-
Notifications
You must be signed in to change notification settings - Fork 23
SWIFT-1072 Improve insertion performance #55
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
SWIFT-1072 Improve insertion performance #55
Conversation
* | ||
* - Throws: `BSONError.InvalidArgumentError` if the provided BSON's length does not match the encoded length. | ||
*/ | ||
public init(fromBSONWithoutValidatingElements bson: ByteBuffer) throws { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this will be the initializer that we use in the driver when dealing with BSON from libmongoc. It has to be a part of the public API since we use it in our other package, but I think that's okay because I can imagine users may find themselves in a similar situation.
I'm open to less verbose / different names for this by the way. I chose this one since I figured we might as well make it as explicit as possible if we don't want users to accidentally use this initializer wrongly.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't have an idea for a better name at the moment. however, I wonder if it's worth considering combining this with the previous initializer and adding a validateElements
or something parameter that defaults to true
? I don't feel that strongly in either direction.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hm yeah that's also an interesting idea. I think I lean slightly towards a separate initializer here since the error outcomes and performance characteristics change a lot depending on the value of the parameter, and having separate initializers may help to communicate this more explicitly. In general though, adding the parameter instead of having a mouthful single parameter seems to make more sense.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
keeping them separate for that reason makes sense to me. I'm fine with keeping this as-is
|
||
internal init(fromUnsafeBSON storage: BSONDocumentStorage) { | ||
internal init(fromBSONWithoutValidatingElements storage: BSONDocumentStorage) throws { | ||
try storage.validateLength() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Since it's basically free to validate the length and it makes the iterator logic a lot safer, it made sense to do it as a bare minimum validation.
internal convenience init(over doc: BSONDocument) { | ||
self.init(over: doc.buffer) | ||
} | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A lot of the changes in this file are for making it safe to use an iterator with invalid BSON, defaulting to just returning early or nil when errors are encountered.
* | ||
* - Throws: `BSONError.InvalidArgumentError` if the provided BSON's length does not match the encoded length. | ||
*/ | ||
public init(fromBSONWithoutValidatingElements bson: ByteBuffer) throws { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't have an idea for a better name at the moment. however, I wonder if it's worth considering combining this with the previous initializer and adding a validateElements
or something parameter that defaults to true
? I don't feel that strongly in either direction.
Sources/SwiftBSON/BSONDocument.swift
Outdated
*/ | ||
internal mutating func set(key: String, to value: BSON?) throws { | ||
guard let range = try BSONDocumentIterator.findByteRange(for: key, in: self) else { | ||
internal mutating func set(key: String, to value: BSON?) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we only call this method from the subscript setter, could we make it private
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
fixed
|
||
/// Search for the value associated with the given key, returning its type if found and nil otherwise. | ||
/// This moves the iterator right up to the first byte of the value. | ||
internal func findValue(forKey key: String) -> BSONType? { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
could we possibly match by accident on the right string but in the wrong place? like if a subdocument contains the same key, or a string value contains it, etc.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think so, since we use skipNextValue
between each attempt to read a key.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
oh I missed that 🤦♀️ I think I actually understand what this method is doing now.... I feel like the readWithUnsafeReadableBytes
API is kind of confusing, IMO it's not obvious that its moving the reader index forward or that it's returning the second value in the returned tuple.
I guess the reader index moving anyway is kind of hidden by the fact that we call this on a mutable buffer owned by the iterator.
* | ||
* - Throws: `BSONError.InvalidArgumentError` if the provided BSON's length does not match the encoded length. | ||
*/ | ||
public init(fromBSONWithoutValidatingElements bson: ByteBuffer) throws { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hm yeah that's also an interesting idea. I think I lean slightly towards a separate initializer here since the error outcomes and performance characteristics change a lot depending on the value of the parameter, and having separate initializers may help to communicate this more explicitly. In general though, adding the parameter instead of having a mouthful single parameter seems to make more sense.
Sources/SwiftBSON/BSONDocument.swift
Outdated
*/ | ||
internal mutating func set(key: String, to value: BSON?) throws { | ||
guard let range = try BSONDocumentIterator.findByteRange(for: key, in: self) else { | ||
internal mutating func set(key: String, to value: BSON?) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
fixed
|
||
/// Search for the value associated with the given key, returning its type if found and nil otherwise. | ||
/// This moves the iterator right up to the first byte of the value. | ||
internal func findValue(forKey key: String) -> BSONType? { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think so, since we use skipNextValue
between each attempt to read a key.
/// element. | ||
internal static func findByteRange(for searchKey: String, in document: BSONDocument) throws -> Range<Int>? { | ||
/// Returns nil if invalid BSON is encountered. | ||
internal static func findByteRange(for searchKey: String, in document: BSONDocument) -> Range<Int>? { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Since this wasn't required by the benchmarks, I didn't originally update this to use the fast / unsafe findValue
. It seems like a natural fit so I've updated it now.
* | ||
* - Throws: `BSONError.InvalidArgumentError` if the provided BSON's length does not match the encoded length. | ||
*/ | ||
public init(fromBSONWithoutValidatingElements bson: ByteBuffer) throws { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
One thing that isn't reflected in this PR is that something like the following could panic:
let doc = try BSONDocument(fromBSONWithoutValidatingElements: bson)
for i in 0..<doc.count {
print(doc[i])
}
This is because the index-based subscript doesn't return an optional since it just fatalErrors
when an out-of-bounds index is provided. Given that this subscript already can fatalError
pretty easily, it seemed okay to me to fatalError
on invalid BSON too, even if count
says its okay. What do you think?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
hmmm. I'm not sure we really have any other option besides like returning some placeholder value, right? it seems OK to me, I honestly kind of doubt people are using this subscript much anyway
|
||
/// Search for the value associated with the given key, returning its type if found and nil otherwise. | ||
/// This moves the iterator right up to the first byte of the value. | ||
internal func findValue(forKey key: String) -> BSONType? { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
oh I missed that 🤦♀️ I think I actually understand what this method is doing now.... I feel like the readWithUnsafeReadableBytes
API is kind of confusing, IMO it's not obvious that its moving the reader index forward or that it's returning the second value in the returned tuple.
I guess the reader index moving anyway is kind of hidden by the fact that we call this on a mutable buffer owned by the iterator.
* | ||
* - Throws: `BSONError.InvalidArgumentError` if the provided BSON's length does not match the encoded length. | ||
*/ | ||
public init(fromBSONWithoutValidatingElements bson: ByteBuffer) throws { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
keeping them separate for that reason makes sense to me. I'm fine with keeping this as-is
* | ||
* - Throws: `BSONError.InvalidArgumentError` if the provided BSON's length does not match the encoded length. | ||
*/ | ||
public init(fromBSONWithoutValidatingElements bson: ByteBuffer) throws { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
hmmm. I'm not sure we really have any other option besides like returning some placeholder value, right? it seems OK to me, I honestly kind of doubt people are using this subscript much anyway
SWIFT-1072
This PR includes the BSON library portion of the insertion improvements. A subsequent PR on the driver repo will be opened after this one goes through.
As noted in the design, we missed on the large doc benchmarks by a little, but those ones are still super fast compared to other MongoDB drivers, so it isn't a huge concern.