Skip to content
This repository has been archived by the owner on Feb 12, 2022. It is now read-only.

Charlesmchen/batches2 #11

Merged
merged 11 commits into from
Oct 1, 2019
Merged

Conversation

charlesmchen-signal
Copy link

Make sure we always use batches while enumerating with YDB.

  • The changes are very mechanical and repetitive, and match similar changes we've already made.
  • There were a few cases I didn't bother applying batching to.
  • There were some cases that we never use, I just removed them.

* YapDatabaseQuery *query =
* [YapDatabaseQuery queryWithFormat:@"WHERE minLon > 0 AND maxLat <= 10 AND rowid IN (?)", rowids];
**/
- (NSDictionary<NSString*, NSNumber*> *)rowidsForKeys:(NSArray<NSString *> *)keys
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I removed some methods we never use.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hm, even though we're getting away from it, I'd think we would want to minimize our upstream diff wherever possible. e.g. there are some patches from yap/master I still might like to pull in and this will make it ever more difficult.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Okay, I've reverted these deletes. The intent was to be certain that all YDB enumerations were batched without converting the enumerations we don't use.

@@ -26,7 +26,7 @@
#pragma unused(ydbLogLevel)

typedef BOOL (^YapBoolBlock)(void);
const NSUInteger kDefaultChunkSize = 10 * 1000;
const NSUInteger kDefaultBatchSize = 10 * 1000;
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I renamed "chunk" to "batch" for clarity.


status = sqlite3_step(statement);
return (BOOL) (status == SQLITE_ROW);
} loopBlock:^{
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To keep the diffs readable, I didn't bother re-indenting the contents of these loops.

@charlesmchen-signal
Copy link
Author

Note: since there's a perf cost to these changes in the normal usage of the YDB-based app, we could hold off on merging this until we've committed to migrating to GRDB in the next release. However any stress or correctness testing of the migration should definitely include these changes.

@charlesmchen-signal
Copy link
Author

PTAL @michaelkirk-signal

Copy link

@michaelkirk-signal michaelkirk-signal left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's a bit hard to see in the diff - but how is this different from the last patch you did to do batching in yap?

Could you smash them together so we can rebase more easily, or would that be unreasonable?

@charlesmchen-signal
Copy link
Author

It's a bit hard to see in the diff - but how is this different from the last patch you did to do batching in yap?

This is very similar, it just applies the batching everywhere else we use YDB enumerations.

Previously we:

  • Only batched a few narrow cases.

Prior to your review we:

  • Batched everywhere we used YDB enumerations.
  • Removed unused YDB enumerations.

The intent was to have 100% confidence we were batching everywhere, without doing the error-prone work of converting a bunch of unused enumerations to use batching.

Now we:

  • Batched everywhere we use YDB enumerations.
  • Unused YDB enumerations do not use batching.

Could you smash them together so we can rebase more easily, or would that be unreasonable?

The diff is much smaller now, and even smaller if you hide (the many) whitespace changes:

https://github.com/signalapp/YapDatabase/pull/11/files?w=1

@charlesmchen-signal
Copy link
Author

PTAL @michaelkirk-signal

@charlesmchen-signal
Copy link
Author

Could you smash them together so we can rebase more easily, or would that be unreasonable?

To be clear, I'll squash this branch with the previous changes if you like, but I'm not sure how that'll make this more readable. This builds on the previous changes by:

  • applying them in more places (but in the exact same way we did before).
  • renaming "chunk" to "batch".

@charlesmchen-signal
Copy link
Author

I'll add one more thing: the diff is very repetitive because:

  • We're doing the same thing (applying batching) in N places.
  • The YDB code is extremely repetitive.

So you should see a very similar set of changes applied again and again without any variation necessary.

@@ -3180,7 +3206,6 @@ - (void)_enumerateKeysAndObjectsInAllCollectionsUsingBlock:
status = sqlite3_step(statement);
return (BOOL) (status == SQLITE_ROW);
} loopBlock:^{
__unsafe_unretained YapDatabaseConnection *connection = self.connection;

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Where did this go? Was it just some unused variable in the original source?

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh I see, you brought it up outside the block - why?

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, working through this... elsewhere where there were similar changes, we introduced a new scope (converting an actual while loop to this whileLoop... method. So we need to capture the connection in the block, and elsewhere you used this pattern, so you just applied the same pattern here for consistency I guess?

if (stop || [parentConnection->mutatedGroups containsObject:group]) break;

pageOffset += pageMetadata->count;
@autoreleasepool {

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I believe yap mostly uses tabs for indentation, which is why this diff looks goofy.

You mentioned that, for readability, you didn't bother re-indenting the places where we'd previously gotten this wrong, which is reasonable, but this chunk is a new file, could you keep the tabs here please? Again, just trying to minimize the upstream diff if we ever need to pull in a patch.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess it's all going to change anyway since we're introducing an autoreleasepool. 🤷‍♂ TIOLI

@michaelkirk-signal
Copy link

To be clear, I'll squash this branch with the previous changes if you like, but I'm not sure how that'll make this more readable. This builds on the previous changes by:

applying them in more places (but in the exact same way we did before).
renaming "chunk" to "batch".

The thing I was trying to optimize for wasn't so much readability per se, but rather long term maintenance of the patch. Rebasing patches on upstream gets more painful the more we diverge from upstream.

When we have conceptually one feature, "introduce batching" in my experience it's easiest if that's one patch, rather than "introduce batching part 1" and "introduce batching part 2, which partially undoes part 1".

Specifically, the more intermediate commits it's broken into, the more opportunities you have for unnecessary merge conflicts.

It's kind of a small point, but wanted to clarify what my goals were in case you found it more compelling that way.

Copy link

@michaelkirk-signal michaelkirk-signal left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM!

@charlesmchen-signal charlesmchen-signal merged commit 1297ecb into signal-release Oct 1, 2019
@charlesmchen-signal charlesmchen-signal deleted the charlesmchen/batches2 branch October 1, 2019 14:40
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Development

Successfully merging this pull request may close these issues.

2 participants