Charlesmchen/batches2 #11

charlesmchen-signal · 2019-09-04T14:11:11Z

Make sure we always use batches while enumerating with YDB.

The changes are very mechanical and repetitive, and match similar changes we've already made.
There were a few cases I didn't bother applying batching to.
There were some cases that we never use, I just removed them.

charlesmchen-signal · 2019-09-04T14:11:27Z

YapDatabase/Extensions/RTreeIndex/YapDatabaseRTreeIndexTransaction.h

- * YapDatabaseQuery *query =
- *   [YapDatabaseQuery queryWithFormat:@"WHERE minLon > 0 AND maxLat <= 10 AND rowid IN (?)", rowids];
- **/
- (NSDictionary<NSString*, NSNumber*> *)rowidsForKeys:(NSArray<NSString *> *)keys


I removed some methods we never use.

Hm, even though we're getting away from it, I'd think we would want to minimize our upstream diff wherever possible. e.g. there are some patches from yap/master I still might like to pull in and this will make it ever more difficult.

Okay, I've reverted these deletes. The intent was to be certain that all YDB enumerations were batched without converting the enumerations we don't use.

charlesmchen-signal · 2019-09-04T14:11:50Z

YapDatabase/YapDatabaseTransaction.m

@@ -26,7 +26,7 @@
 #pragma unused(ydbLogLevel)

 typedef BOOL (^YapBoolBlock)(void);
-const NSUInteger kDefaultChunkSize = 10 * 1000;
+const NSUInteger kDefaultBatchSize = 10 * 1000;


I renamed "chunk" to "batch" for clarity.

charlesmchen-signal · 2019-09-04T14:12:24Z

YapDatabase/YapDatabaseTransaction.m

+
+                           status = sqlite3_step(statement);
+                           return (BOOL) (status == SQLITE_ROW);
+                       } loopBlock:^{


To keep the diffs readable, I didn't bother re-indenting the contents of these loops.

charlesmchen-signal · 2019-09-04T14:13:56Z

Note: since there's a perf cost to these changes in the normal usage of the YDB-based app, we could hold off on merging this until we've committed to migrating to GRDB in the next release. However any stress or correctness testing of the migration should definitely include these changes.

charlesmchen-signal · 2019-09-16T18:47:32Z

PTAL @michaelkirk-signal

michaelkirk-signal

It's a bit hard to see in the diff - but how is this different from the last patch you did to do batching in yap?

Could you smash them together so we can rebase more easily, or would that be unreasonable?

charlesmchen-signal · 2019-09-17T13:38:23Z

It's a bit hard to see in the diff - but how is this different from the last patch you did to do batching in yap?

This is very similar, it just applies the batching everywhere else we use YDB enumerations.

Previously we:

Only batched a few narrow cases.

Prior to your review we:

Batched everywhere we used YDB enumerations.
Removed unused YDB enumerations.

The intent was to have 100% confidence we were batching everywhere, without doing the error-prone work of converting a bunch of unused enumerations to use batching.

Now we:

Batched everywhere we use YDB enumerations.
Unused YDB enumerations do not use batching.

Could you smash them together so we can rebase more easily, or would that be unreasonable?

The diff is much smaller now, and even smaller if you hide (the many) whitespace changes:

https://github.com/signalapp/YapDatabase/pull/11/files?w=1

charlesmchen-signal · 2019-09-17T13:39:10Z

PTAL @michaelkirk-signal

charlesmchen-signal · 2019-09-17T13:41:01Z

Could you smash them together so we can rebase more easily, or would that be unreasonable?

To be clear, I'll squash this branch with the previous changes if you like, but I'm not sure how that'll make this more readable. This builds on the previous changes by:

applying them in more places (but in the exact same way we did before).
renaming "chunk" to "batch".

charlesmchen-signal · 2019-09-17T13:56:48Z

I'll add one more thing: the diff is very repetitive because:

We're doing the same thing (applying batching) in N places.
The YDB code is extremely repetitive.

So you should see a very similar set of changes applied again and again without any variation necessary.

michaelkirk-signal · 2019-09-17T14:39:30Z

YapDatabase/YapDatabaseTransaction.m

@@ -3180,7 +3206,6 @@ - (void)_enumerateKeysAndObjectsInAllCollectionsUsingBlock:
                           status = sqlite3_step(statement);
                           return (BOOL) (status == SQLITE_ROW);
                       } loopBlock:^{
-        __unsafe_unretained YapDatabaseConnection *connection = self.connection;


Where did this go? Was it just some unused variable in the original source?

Oh I see, you brought it up outside the block - why?

Ah, working through this... elsewhere where there were similar changes, we introduced a new scope (converting an actual while loop to this whileLoop... method. So we need to capture the connection in the block, and elsewhere you used this pattern, so you just applied the same pattern here for consistency I guess?

michaelkirk-signal · 2019-09-17T14:46:47Z

YapDatabase/Extensions/View/YapDatabaseViewTransaction.m

-		if (stop || [parentConnection->mutatedGroups containsObject:group]) break;
-
-		pageOffset += pageMetadata->count;
+        @autoreleasepool {


I believe yap mostly uses tabs for indentation, which is why this diff looks goofy.

You mentioned that, for readability, you didn't bother re-indenting the places where we'd previously gotten this wrong, which is reasonable, but this chunk is a new file, could you keep the tabs here please? Again, just trying to minimize the upstream diff if we ever need to pull in a patch.

I guess it's all going to change anyway since we're introducing an autoreleasepool. 🤷‍♂ TIOLI

michaelkirk-signal · 2019-09-17T14:52:13Z

To be clear, I'll squash this branch with the previous changes if you like, but I'm not sure how that'll make this more readable. This builds on the previous changes by:

applying them in more places (but in the exact same way we did before).
renaming "chunk" to "batch".

The thing I was trying to optimize for wasn't so much readability per se, but rather long term maintenance of the patch. Rebasing patches on upstream gets more painful the more we diverge from upstream.

When we have conceptually one feature, "introduce batching" in my experience it's easiest if that's one patch, rather than "introduce batching part 1" and "introduce batching part 2, which partially undoes part 1".

Specifically, the more intermediate commits it's broken into, the more opportunities you have for unnecessary merge conflicts.

It's kind of a small point, but wanted to clarify what my goals were in case you found it more compelling that way.

michaelkirk-signal

LGTM!

charlesmchen-signal added 6 commits September 4, 2019 10:34

Enumerate using batches.

8ce3a8b

Enumerate using batches.

c6a39e2

Enumerate using batches.

63422b2

Enumerate using batches.

aa72f50

Enumerate using batches.

0587af6

Enumerate using batches.

27c8a3b

charlesmchen-signal commented Sep 4, 2019

View reviewed changes

charlesmchen-signal added 2 commits September 4, 2019 13:51

Enumerate using batches.

40c2110

Enumerate using batches.

418d833

michaelkirk-signal reviewed Sep 16, 2019

View reviewed changes

charlesmchen-signal added 3 commits September 17, 2019 10:31

Enumerate using batches.

c070e9b

Enumerate using batches (revert earlier changes).

5f06b6f

Enumerate using batches (revert earlier changes).

1297ecb

michaelkirk-signal reviewed Sep 17, 2019

View reviewed changes

michaelkirk-signal approved these changes Sep 17, 2019

View reviewed changes

charlesmchen-signal merged commit 1297ecb into signal-release Oct 1, 2019

charlesmchen-signal deleted the charlesmchen/batches2 branch October 1, 2019 14:40

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Charlesmchen/batches2 #11

Charlesmchen/batches2 #11

charlesmchen-signal commented Sep 4, 2019

charlesmchen-signal Sep 4, 2019

michaelkirk-signal Sep 16, 2019

charlesmchen-signal Sep 17, 2019

charlesmchen-signal Sep 4, 2019

charlesmchen-signal Sep 4, 2019

charlesmchen-signal commented Sep 4, 2019

charlesmchen-signal commented Sep 16, 2019

michaelkirk-signal left a comment

charlesmchen-signal commented Sep 17, 2019

charlesmchen-signal commented Sep 17, 2019

charlesmchen-signal commented Sep 17, 2019

charlesmchen-signal commented Sep 17, 2019

michaelkirk-signal Sep 17, 2019

michaelkirk-signal Sep 17, 2019

michaelkirk-signal Sep 17, 2019

michaelkirk-signal Sep 17, 2019

michaelkirk-signal Sep 17, 2019

michaelkirk-signal commented Sep 17, 2019

michaelkirk-signal left a comment

Charlesmchen/batches2 #11

Charlesmchen/batches2 #11

Conversation

charlesmchen-signal commented Sep 4, 2019

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

charlesmchen-signal commented Sep 4, 2019

charlesmchen-signal commented Sep 16, 2019

michaelkirk-signal left a comment

Choose a reason for hiding this comment

charlesmchen-signal commented Sep 17, 2019

charlesmchen-signal commented Sep 17, 2019

charlesmchen-signal commented Sep 17, 2019

charlesmchen-signal commented Sep 17, 2019

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

michaelkirk-signal commented Sep 17, 2019

michaelkirk-signal left a comment

Choose a reason for hiding this comment