Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

partitioner: report more accurate metrics #3769

Merged
merged 10 commits into from
Dec 19, 2022

Conversation

dimitarvdimitrov
Copy link
Contributor

@dimitarvdimitrov dimitarvdimitrov commented Dec 19, 2022

This commit fixes a partitioner metric and adds a new one

The partitioner currently exposes 4 counters

  • cortex_bucket_store_partitioner_requested_bytes_total
  • cortex_bucket_store_partitioner_requested_ranges_total
  • cortex_bucket_store_partitioner_expanded_bytes_total
  • cortex_bucket_store_partitioner_expanded_ranges_total

Currently, cortex_bucket_store_partitioner_requested_bytes_total
exposes the total sizes of all ranges independant of each other. This
represents their total size without merging any overlapping sizes.

This is not very useful information. This PR changes that metric to
report the number of bytes that would be fetched if the gap that the
partitioner fills is 0. This way you can use
cortex_bucket_store_partitioner_requested_bytes_total in combination
with cortex_bucket_store_partitioner_expanded_bytes_total to figure
out how much more data the partitioner is fetching that wasn't in the
original ranges.

This PR also adds cortex_bucket_store_partitioner_extended_ranges_total
This metric tracks the number of ranges that were merged together due to
the gap that the partitioner fills. This metric will help determine
whether the partitioner is mergning too many ranges and overfetching
small gaps very often. Or whether it is overfetching a single very big
range. This will help in tuning the default gap for the partitioner
(currently at 512KB)

Signed-off-by: Dimitar Dimitrov dimitar.dimitrov@grafana.com

This commit fixes a partitioner metric and adds a new one

The partitioner currently exposes 4 counters
* `cortex_bucket_store_partitioner_requested_bytes_total`
* `cortex_bucket_store_partitioner_requested_ranges_total`
* `cortex_bucket_store_partitioner_expanded_bytes_total`
* `cortex_bucket_store_partitioner_expanded_ranges_total`

Currently, `cortex_bucket_store_partitioner_requested_bytes_total`
exposes the total sizes of all ranges independant of each other. This
represents their total size without merging any overlapping sizes.

This is not very useful information. This PR changes that metric to
report the number of bytes that would be fetched if the gap that the
partitioner fills is 0. This way you can use
`cortex_bucket_store_partitioner_requested_bytes_total` in combination
with `cortex_bucket_store_partitioner_expanded_bytes_total` to figure
out how much more data the partitioner is fetching that wasn't in the
original ranges.

This PR also adds `cortex_bucket_store_partitioner_extended_ranges_total`
This metric tracks the number of ranges that were merged together due to
 the gap that the partitioner fills. This metric will help determine
 whether the partitioner is mergning too many ranges and overfetching
 small gaps very often. Or whether it is overfetching a single very big
 range. This will help in tuning the default gap for the partitioner
 (currently at 512KB)

Signed-off-by: Dimitar Dimitrov <dimitar.dimitrov@grafana.com>
@dimitarvdimitrov dimitarvdimitrov requested a review from a team as a code owner December 19, 2022 16:13
Signed-off-by: Dimitar Dimitrov <dimitar.dimitrov@grafana.com>
@pracucci pracucci self-requested a review December 19, 2022 16:21
CHANGELOG.md Outdated Show resolved Hide resolved
pkg/storegateway/partitioner.go Outdated Show resolved Hide resolved
pkg/storegateway/partitioner.go Outdated Show resolved Hide resolved
pkg/storegateway/partitioner.go Outdated Show resolved Hide resolved
pkg/storegateway/partitioner.go Outdated Show resolved Hide resolved
Signed-off-by: Dimitar Dimitrov <dimitar.dimitrov@grafana.com>
Signed-off-by: Dimitar Dimitrov <dimitar.dimitrov@grafana.com>
Signed-off-by: Dimitar Dimitrov <dimitar.dimitrov@grafana.com>
…es_total

Signed-off-by: Dimitar Dimitrov <dimitar.dimitrov@grafana.com>
@@ -42,38 +43,41 @@ func newGapBasedPartitioner(maxGapBytes uint64, reg prometheus.Registerer) *gapB
Name: "cortex_bucket_store_partitioner_expanded_ranges_total",
Help: "Total number of byte ranges returned by the partitioner after they've been combined together to reduce the number of bucket API calls.",
}),
extendedRanges: promauto.With(reg).NewCounter(prometheus.CounterOpts{
Name: "cortex_bucket_store_partitioner_extended_ranges_total",
Help: "Total number of byte ranges that were not overlapping but were joined because they were closer than the configured maximum gap.",
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[nit] Is my edit correct?

Suggested change
Help: "Total number of byte ranges that were not overlapping but were joined because they were closer than the configured maximum gap.",
Help: "Total number of byte ranges that were not adjacent or overlapping but were joined because they were closer than the configured maximum gap.",

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

you are correct 👍

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

given [0, 10) and [10, 20) and gap size 0 the partitioner will merge these two

given [0, 10) and [11, 20) and gap size 1 the partitioner will merge these two

given [0, 10) and [12, 20) and gap size 1 the partitioner will not merge these two

i added a test case for the last two cases

pkg/storegateway/partitioner.go Outdated Show resolved Hide resolved
if p.End >= s {
// The start of the next range overlaps with the current range's end, so we can merge them.
// We count the extra bytes between the current range's end and the next one's end - that's what's been requested.
stats.requestedRangesTotal += e - p.End
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Isn't there a bug in the case of the ranges [10, 20], [15, 18]? I haven't tried to execute the code with this case, but I think we subtract for the 2 range instead of "adding zero".

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nice catch. I will add a test case for this

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

added one in f646b87

Signed-off-by: Dimitar Dimitrov <dimitar.dimitrov@grafana.com>
Signed-off-by: Dimitar Dimitrov <dimitar.dimitrov@grafana.com>
Signed-off-by: Dimitar Dimitrov <dimitar.dimitrov@grafana.com>
Copy link
Collaborator

@pracucci pracucci left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, thanks!

Signed-off-by: Dimitar Dimitrov <dimitar.dimitrov@grafana.com>
@dimitarvdimitrov dimitarvdimitrov enabled auto-merge (squash) December 19, 2022 17:54
@dimitarvdimitrov dimitarvdimitrov merged commit 98f6f4d into main Dec 19, 2022
@dimitarvdimitrov dimitarvdimitrov deleted the dimitar/fix-partitioner-metrics branch December 19, 2022 18:10
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants