Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

rate limit auto-tuning #2899

Closed
wants to merge 7 commits into from
Closed

rate limit auto-tuning #2899

wants to merge 7 commits into from

Conversation

ajkr
Copy link
Contributor

@ajkr ajkr commented Sep 18, 2017

Dynamic adjustment of rate limit according to demand for background I/O. It increases by a factor when limiter is drained too frequently, and decreases by the same factor when limiter is not drained frequently enough. The parameters for this behavior are fixed in GenericRateLimiter::Tune. Other changes:

  • make rate limiter's Env* configurable for testing
  • track num drain intervals in RateLimiter so we don't have to rely on stats, which may be shared across different DB instances from the ones that share the RateLimiter.

Test Plan:

  • new unit test, make check

@facebook-github-bot
Copy link
Contributor

@ajkr has imported this pull request. If you are a Facebook employee, you can view this diff on Phabricator.

@ajkr ajkr requested a review from siying September 19, 2017 00:02
@siying
Copy link
Contributor

siying commented Sep 19, 2017

Can you share your benchmark results?

@ajkr
Copy link
Contributor Author

ajkr commented Sep 19, 2017

I will provide some db_bench numbers. But keep in mind the real benefit should come in variable write rate scenarios, which I think db_bench doesn't support.

@ajkr
Copy link
Contributor Author

ajkr commented Sep 19, 2017

high-rate-limit

$ ./db_bench -benchmarks=fillrandom -compression_type=none -statistics -max_background_compactions=8 -write_buffer_size=8388608 -max_bytes_for_level_base=33554432 -benchmark_write_rate_limit=1048576 -rate_limiter_bytes_per_sec=5242880 -num=10000000 -wal_bytes_per_sync=1048576 -rate_limiter_auto_tuned={true,false} -target_file_size_base=8388608

@ajkr
Copy link
Contributor Author

ajkr commented Sep 19, 2017

Result if we make a change to reset rate limit whenever there's a period with too few rate limiter drains:

high-rate-limit-with-sharp-drops

@facebook-github-bot
Copy link
Contributor

@ajkr has updated the pull request. View: changes, changes since last import

@ajkr
Copy link
Contributor Author

ajkr commented Sep 20, 2017

Results with latest parameters. Command is the same but tested with high benchmark write rate as well (with 20M keys).

low-write-rate

high-write-rate

@facebook-github-bot
Copy link
Contributor

@ajkr has updated the pull request. View: changes, changes since last import

@facebook-github-bot
Copy link
Contributor

@ajkr has imported this pull request. If you are a Facebook employee, you can view this diff on Phabricator.

@facebook-github-bot
Copy link
Contributor

@ajkr has updated the pull request. View: changes, changes since last import

@ajkr ajkr mentioned this pull request Sep 23, 2017
Copy link
Contributor

@siying siying left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Awesome!

std::chrono::microseconds(1)) /
std::chrono::microseconds(refill_period_us_);
int64_t drained_pct =
(num_drains_ - prev_num_drains_) * 100 / elapsed_intervals;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Either assert (elapsed_intervals > 0) or return if it is 0.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sure

} else if (drained_pct < kLowWatermarkPct) {
new_bytes_per_sec =
std::max(max_bytes_per_sec_ / kAllowedRangeFactor,
prev_bytes_per_sec * 100 / (100 + kAdjustFactorPct));
Copy link
Contributor

@siying siying Sep 28, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In this function, we need to care about overflow in general. For example, prev_bytes_per_sec * 100 can technically overflow. It is a possible that in somewhere else, the value of GetBytesPerSecond() has been sanitized so that it is not close to overflow here, but it's not super clear when I read this function. Can user set a very high rate limiter threshold, like max_int and it gets weird results here? It ever caused problem to rate limiter before. Some users think if they don't need to limit the rate, just set it to max_int. So I suggest it is explicitly handled in the code here.

Copy link
Contributor Author

@ajkr ajkr Oct 4, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure, done.

@facebook-github-bot
Copy link
Contributor

@ajkr has updated the pull request.

Copy link
Contributor

@facebook-github-bot facebook-github-bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@ajkr has imported this pull request. If you are a Facebook employee, you can view this diff on Phabricator.

Copy link
Contributor

@facebook-github-bot facebook-github-bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@ajkr is landing this pull request. If you are a Facebook employee, you can view this diff on Phabricator.

@lday0321
Copy link

hi @ajkr prev_num_drains_ is always 0 and never changed. I find the logic for auto tunning is OK,but according to the naming, prev_num_drains_ should be the total number of drains before this round and num_drains_ will be the total number of drains including this round, and the delta ( num_drains_ - prev_num_drains_ ) is the number of drains during this round.

I guess the update line at: https://github.com/facebook/rocksdb/blob/main/util/rate_limiter.cc#L407 should be:
prev_num_drains_ = num_drains_; so that the prev_num_drains will be updated to num_drains_ after every round of tunning.

@ajkr
Copy link
Contributor Author

ajkr commented Jan 25, 2022

Good point, thanks for pointing it out. I forgot we ever tried to use the delta in drain count for tuning, and none of the docs suggest we do that, even the original announcement post (http://rocksdb.org/blog/2017/12/18/17-auto-tuned-rate-limiter.html). So I think now we can adopt the bug as part of the feature by deleting prev_num_drains_.

ajkr added a commit to ajkr/rocksdb that referenced this pull request Feb 1, 2022
As reported in
facebook#2899 (comment),
`prev_num_drains_` is confusing as we never set it to nonzero. So this
PR removes it.

Test Plan: `make check -j24`
facebook-github-bot pushed a commit that referenced this pull request Feb 1, 2022
Summary:
As reported in
#2899 (comment),
`prev_num_drains_` is confusing as we never set it to nonzero. So this
PR removes it.

Pull Request resolved: #9484

Test Plan: `make check -j24`

Reviewed By: hx235

Differential Revision: D33923203

Pulled By: ajkr

fbshipit-source-id: 6277d50a198b90646583ee8094c2e6a1bbdadc7b
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants