Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat(bigtable): support BulkApply throttling #13124

Merged
merged 6 commits into from Nov 16, 2023

Conversation

dbolduc
Copy link
Member

@dbolduc dbolduc commented Nov 13, 2023

Part of the work for #12959

Add an option to enable throttling of BulkApply requests. The option deserves scrutiny!!!!

API discussion

I decided not to put the option in an experimental namespace.

The option is only a bool. We could have exposed the other parameters like kInitialPeriod, k{Min,Max}Period, k{Min,Max}Factor.

Of those, I think only kInitialPeriod (or its inverse, InitialQps is useful. We have a default initial QPS of 20, but not all queries are created equally. Applications might know more about their queries than us. They might find this too low, and that it takes unnecessarily long to ramp up to Bigtable's limits.

An alternative would be the more general, but potentially more verbose:

struct BulkApplyThrottlingParams {
  bool enabled;
  // grow as needed. e.g. double initial_rate;
};

struct BulkApplyThrottlingOption {
  using Type = BulkApplyThrottlingParams;
};

Tracing

Defer tracing because I made an oopsie in the internal helper, and the PR is large enough already. FTR, enabling tracing for the integration test will eventually look like this:

image

image

Note how Table::BulkApply() takes ~50ms, matching our kInitialQps.

No, I do not have an explanation for why a few of the requests end up taking 6.5s 🤷


This change is Reviewable

@product-auto-label product-auto-label bot added the api: bigtable Issues related to the Bigtable API. label Nov 13, 2023
@dbolduc dbolduc marked this pull request as ready for review November 13, 2023 23:31
@dbolduc dbolduc requested a review from a team as a code owner November 13, 2023 23:31
Copy link

codecov bot commented Nov 13, 2023

Codecov Report

All modified and coverable lines are covered by tests ✅

Comparison is base (343c05b) 92.99% compared to head (b6aa428) 92.99%.
Report is 2 commits behind head on main.

Additional details and impacted files
@@           Coverage Diff           @@
##             main   #13124   +/-   ##
=======================================
  Coverage   92.99%   92.99%           
=======================================
  Files        2137     2137           
  Lines      185782   185814   +32     
=======================================
+ Hits       172774   172806   +32     
  Misses      13008    13008           

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@devbww
Copy link
Collaborator

devbww commented Nov 14, 2023

The option is only a bool. We could have exposed the other parameters like kInitialPeriod, k{Min,Max}Period, k{Min,Max}Factor.

Of those, I think only kInitialPeriod is useful.

Then how about...

struct BulkApplyThrottlingOption {
  using Type = std::chrono::duration<Rep, Period>;  // initial period
};

If the option is set, you get a ThrottlingMutateRowsLimiter configured with the initial period.

If the option is not set, you get a NoopMutateRowsLimiter.

@dbolduc
Copy link
Member Author

dbolduc commented Nov 14, 2023

The option is only a bool. We could have exposed the other parameters like kInitialPeriod, k{Min,Max}Period, k{Min,Max}Factor.
Of those, I think only kInitialPeriod is useful.

Let me amend my statement: Of those I think only kInitialPeriod could be useful.

Then how about...

struct BulkApplyThrottlingOption {
  using Type = std::chrono::duration<Rep, Period>;  // initial period
};

I do like your suggestion.

We cannot template the Type, right? I think you are just illustrating the idea. I think std::chrono::milliseconds would be fine for the duration type.

It would be nice if the application could easily set the option, and let the library give it a default. This could mean:

  1. we document a suggested default
  2. we use Type = absl::optional<std::chrono::milliseconds>
  3. we say a value of duration::zero() means the library can pick the default. (I don't like this idea, because zero() can and should mean: start without any throttling).

I'll sit on it for a day. If anyone else reads this, please jump in with your opinions, however weakly held they may be.

@coryan
Copy link
Member

coryan commented Nov 14, 2023

I'll sit on it for a day. If anyone else reads this, please jump in with your opinions, however weakly held they may be.

I think this deserves a document. Should not be more than a couple of pages, maybe we can review in the team meeting.

@dbolduc dbolduc marked this pull request as draft November 15, 2023 22:02
@dbolduc
Copy link
Member Author

dbolduc commented Nov 15, 2023

We discussed this in the team meeting and decided not to expose the initial period. We also decided that bool options aren't exactly ON/OFF, we should use absl::monostate instead. We also decided to initially put this in an experimental namespace. We know of one internal Google team who wants to adopt this feature. We should get feedback from them on whether ON/OFF is sufficient before we decide to release the feature.

Googlers can see the discussion at: go/cloud-cxx:bulk-apply-throttling-api

* https://cloud.google.com/bigtable/docs/routing#single-cluster
*/
struct BulkApplyThrottlingOption {
using Type = absl::monostate;
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

#include "absl/types/variant.h"

And did you consider absl::variant<absl::monostate>?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

#include "absl/types/variant.h"

yep

And did you consider absl::variant<absl::monostate>?

Not really. I guess that lets us grow the variant to include a second type. Although if we end up with absl::variant<absl::monostate, std::chrono::microseconds>, we might have preferred absl::optional<std::chrono::microseconds>...

Hmm.. I changed it.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess that lets us grow the variant

Nevermind. We can't grow the variant. This code stops compiling if Type = absl::variant<absl::monostate, T>.

absl::variant<absl::monostate> v = options.get<BulkApplyThrottlingOption>();

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@devbww I reverted the type to bool. I will wait for your OK before submitting the PR.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure I'm claiming things will compile unchanged (although I think they could if someone didn't care about new variants), but rather that the migration path is clear. But, I can't predict the future (and this wouldn't be high on the list if I could!), so OK. Thanks for taking a look.

TableTestEnvironment::instance_id(),
TableTestEnvironment::table_id()));

// This test will take at least 100 queries / (20 QPS) = 5s.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Perhaps we should be able to inject a "sleeper" via the options.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ack. We could also pass MockConnections to Make*Connection(Options const& options) via the options for better testing of those functions.

Although, at some point we are going to need an integration test for the real implementation.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK, but if tests start sleeping for 5s here and there, we're only hurting ourselves.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reduced from 100 -> 10 requests.

@dbolduc dbolduc marked this pull request as ready for review November 16, 2023 01:17
TableTestEnvironment::instance_id(),
TableTestEnvironment::table_id()));

// This test will take at least 100 queries / (20 QPS) = 5s.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK, but if tests start sleeping for 5s here and there, we're only hurting ourselves.

Comment on lines 132 to 133
MakeDataConnection(Options{}.set<experimental::BulkApplyThrottlingOption>(
absl::monostate{})),
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Q. Would this test fail without the option?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No, but maybe something would break if we implemented the feature incorrectly?

The behavior of the feature is observable. We could time the test to make sure it takes at least the initial period * N requests. I am meh on adding that sort of check though.

The clock we would use in our test would be the same clock as is used in the limiter. So I do not think we would see time based errors/flakes.

Anecdotally from my own testing, the server takes a few requests before it sends any rate limiting information back. I don't want to rely on that behavior though. If the server tells us to speed up immediately, then the test may take < initial period * N requests, breaking the expectation.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

maybe something would break if we implemented the feature incorrectly?

Maybe? I guess code coverage is a good thing, but we should probably try not to leave an impression that we're testing any behavior.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I added a comment to leave no impression that we're testing any behavior.

@dbolduc dbolduc enabled auto-merge (squash) November 16, 2023 19:08
@dbolduc dbolduc merged commit 294d92e into googleapis:main Nov 16, 2023
59 checks passed
@dbolduc dbolduc deleted the bigtable-make-limiter branch November 16, 2023 22:43
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
api: bigtable Issues related to the Bigtable API.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants