SWIFT-1072 Improve insertion performance #55

patrickfreed · 2021-01-22T21:49:48Z

This PR includes the BSON library portion of the insertion improvements. A subsequent PR on the driver repo will be opened after this one goes through.

benchmark	libbson based median time	target time (4x)	unoptimized swift-bson	post-optimizations
small doc bulk insert	0.096	0.12	4258.673	0.117
large doc bulk insert	0.332	0.415	59.637	0.454
small doc insertOne	3.258	4.0725	8.276	3.412
large doc insertOne	0.327	0.409	28.742	0.456

As noted in the design, we missed on the large doc benchmarks by a little, but those ones are still super fast compared to other MongoDB drivers, so it isn't a huge concern.

patrickfreed · 2021-01-22T21:51:29Z

Sources/SwiftBSON/BSONDocument.swift

+     *
+     * - Throws: `BSONError.InvalidArgumentError` if the provided BSON's length does not match the encoded length.
+     */
+    public init(fromBSONWithoutValidatingElements bson: ByteBuffer) throws {


this will be the initializer that we use in the driver when dealing with BSON from libmongoc. It has to be a part of the public API since we use it in our other package, but I think that's okay because I can imagine users may find themselves in a similar situation.

I'm open to less verbose / different names for this by the way. I chose this one since I figured we might as well make it as explicit as possible if we don't want users to accidentally use this initializer wrongly.

I don't have an idea for a better name at the moment. however, I wonder if it's worth considering combining this with the previous initializer and adding a validateElements or something parameter that defaults to true? I don't feel that strongly in either direction.

Hm yeah that's also an interesting idea. I think I lean slightly towards a separate initializer here since the error outcomes and performance characteristics change a lot depending on the value of the parameter, and having separate initializers may help to communicate this more explicitly. In general though, adding the parameter instead of having a mouthful single parameter seems to make more sense.

keeping them separate for that reason makes sense to me. I'm fine with keeping this as-is

patrickfreed · 2021-01-22T21:52:09Z

Sources/SwiftBSON/BSONDocument.swift


-    internal init(fromUnsafeBSON storage: BSONDocumentStorage) {
+    internal init(fromBSONWithoutValidatingElements storage: BSONDocumentStorage) throws {
+        try storage.validateLength()


Since it's basically free to validate the length and it makes the iterator logic a lot safer, it made sense to do it as a bare minimum validation.

patrickfreed · 2021-01-22T21:53:37Z

Sources/SwiftBSON/BSONDocumentIterator.swift

    internal convenience init(over doc: BSONDocument) {
        self.init(over: doc.buffer)
    }



A lot of the changes in this file are for making it safe to use an iterator with invalid BSON, defaulting to just returning early or nil when errors are encountered.

Sources/SwiftBSON/BSONDocumentIterator.swift

kmahar · 2021-01-22T23:51:27Z

Sources/SwiftBSON/BSONDocument.swift

+     *
+     * - Throws: `BSONError.InvalidArgumentError` if the provided BSON's length does not match the encoded length.
+     */
+    public init(fromBSONWithoutValidatingElements bson: ByteBuffer) throws {


I don't have an idea for a better name at the moment. however, I wonder if it's worth considering combining this with the previous initializer and adding a validateElements or something parameter that defaults to true? I don't feel that strongly in either direction.

kmahar · 2021-01-23T00:21:00Z

Sources/SwiftBSON/BSONDocument.swift

     */
-    internal mutating func set(key: String, to value: BSON?) throws {
-        guard let range = try BSONDocumentIterator.findByteRange(for: key, in: self) else {
+    internal mutating func set(key: String, to value: BSON?) {


I think we only call this method from the subscript setter, could we make it private?

Tests/SwiftBSONTests/BSONCorpusTests.swift

Sources/SwiftBSON/BSONDocumentIterator.swift

kmahar · 2021-01-23T00:38:01Z

Sources/SwiftBSON/BSONDocumentIterator.swift


+    /// Search for the value associated with the given key, returning its type if found and nil otherwise.
+    /// This moves the iterator right up to the first byte of the value.
+    internal func findValue(forKey key: String) -> BSONType? {


could we possibly match by accident on the right string but in the wrong place? like if a subdocument contains the same key, or a string value contains it, etc.

I don't think so, since we use skipNextValue between each attempt to read a key.

oh I missed that 🤦‍♀️ I think I actually understand what this method is doing now.... I feel like the readWithUnsafeReadableBytes API is kind of confusing, IMO it's not obvious that its moving the reader index forward or that it's returning the second value in the returned tuple.
I guess the reader index moving anyway is kind of hidden by the fact that we call this on a mutable buffer owned by the iterator.

patrickfreed · 2021-01-25T17:47:08Z

Sources/SwiftBSON/BSONDocument.swift

+     *
+     * - Throws: `BSONError.InvalidArgumentError` if the provided BSON's length does not match the encoded length.
+     */
+    public init(fromBSONWithoutValidatingElements bson: ByteBuffer) throws {


Hm yeah that's also an interesting idea. I think I lean slightly towards a separate initializer here since the error outcomes and performance characteristics change a lot depending on the value of the parameter, and having separate initializers may help to communicate this more explicitly. In general though, adding the parameter instead of having a mouthful single parameter seems to make more sense.

patrickfreed · 2021-01-25T17:47:56Z

Sources/SwiftBSON/BSONDocument.swift

     */
-    internal mutating func set(key: String, to value: BSON?) throws {
-        guard let range = try BSONDocumentIterator.findByteRange(for: key, in: self) else {
+    internal mutating func set(key: String, to value: BSON?) {


Sources/SwiftBSON/BSONDocumentIterator.swift

patrickfreed · 2021-01-25T18:01:13Z

Sources/SwiftBSON/BSONDocumentIterator.swift


+    /// Search for the value associated with the given key, returning its type if found and nil otherwise.
+    /// This moves the iterator right up to the first byte of the value.
+    internal func findValue(forKey key: String) -> BSONType? {


I don't think so, since we use skipNextValue between each attempt to read a key.

patrickfreed · 2021-01-25T18:09:41Z

Sources/SwiftBSON/BSONDocumentIterator.swift

    /// element.
-    internal static func findByteRange(for searchKey: String, in document: BSONDocument) throws -> Range<Int>? {
+    /// Returns nil if invalid BSON is encountered.
+    internal static func findByteRange(for searchKey: String, in document: BSONDocument) -> Range<Int>? {


Since this wasn't required by the benchmarks, I didn't originally update this to use the fast / unsafe findValue. It seems like a natural fit so I've updated it now.

patrickfreed · 2021-01-25T18:12:10Z

Sources/SwiftBSON/BSONDocument.swift

+     *
+     * - Throws: `BSONError.InvalidArgumentError` if the provided BSON's length does not match the encoded length.
+     */
+    public init(fromBSONWithoutValidatingElements bson: ByteBuffer) throws {


One thing that isn't reflected in this PR is that something like the following could panic:

let doc = try BSONDocument(fromBSONWithoutValidatingElements: bson) for i in 0..<doc.count { print(doc[i]) }

This is because the index-based subscript doesn't return an optional since it just fatalErrors when an out-of-bounds index is provided. Given that this subscript already can fatalError pretty easily, it seemed okay to me to fatalError on invalid BSON too, even if count says its okay. What do you think?

hmmm. I'm not sure we really have any other option besides like returning some placeholder value, right? it seems OK to me, I honestly kind of doubt people are using this subscript much anyway

kmahar · 2021-01-25T18:34:31Z

Sources/SwiftBSON/BSONDocumentIterator.swift


+    /// Search for the value associated with the given key, returning its type if found and nil otherwise.
+    /// This moves the iterator right up to the first byte of the value.
+    internal func findValue(forKey key: String) -> BSONType? {


oh I missed that 🤦‍♀️ I think I actually understand what this method is doing now.... I feel like the readWithUnsafeReadableBytes API is kind of confusing, IMO it's not obvious that its moving the reader index forward or that it's returning the second value in the returned tuple.
I guess the reader index moving anyway is kind of hidden by the fact that we call this on a mutable buffer owned by the iterator.

kmahar · 2021-01-25T19:18:04Z

Sources/SwiftBSON/BSONDocument.swift

+     *
+     * - Throws: `BSONError.InvalidArgumentError` if the provided BSON's length does not match the encoded length.
+     */
+    public init(fromBSONWithoutValidatingElements bson: ByteBuffer) throws {


keeping them separate for that reason makes sense to me. I'm fine with keeping this as-is

kmahar · 2021-01-25T19:19:55Z

Sources/SwiftBSON/BSONDocument.swift

+     *
+     * - Throws: `BSONError.InvalidArgumentError` if the provided BSON's length does not match the encoded length.
+     */
+    public init(fromBSONWithoutValidatingElements bson: ByteBuffer) throws {


hmmm. I'm not sure we really have any other option besides like returning some placeholder value, right? it seems OK to me, I honestly kind of doubt people are using this subscript much anyway

patrickfreed added 3 commits January 22, 2021 12:39

wip add unsafe initializer

2be5c65

add unsafe find, improve handling of invalid BSON

e5d7c0e

minor fixups

3f51435

patrickfreed commented Jan 22, 2021

View reviewed changes

patrickfreed marked this pull request as ready for review January 22, 2021 22:11

patrickfreed requested a review from kmahar January 22, 2021 22:11

kmahar reviewed Jan 23, 2021

View reviewed changes

patrickfreed added 3 commits January 25, 2021 12:48

make set private

d9a8a76

mention error return cases in docstrings

8d2df21

speed up findByteRange

fb1ed82

patrickfreed commented Jan 25, 2021

View reviewed changes

kmahar approved these changes Jan 26, 2021

View reviewed changes

patrickfreed merged commit cbb9662 into mongodb:master Jan 26, 2021

SWIFT-1072 Improve insertion performance #55

SWIFT-1072 Improve insertion performance #55

Uh oh!

Conversation

patrickfreed commented Jan 22, 2021

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

kmahar Jan 25, 2021 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

kmahar Jan 25, 2021 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

kmahar Jan 25, 2021 •

edited

Loading

kmahar Jan 25, 2021 •

edited

Loading