Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

add additional queue dimensions to query scheduler queue duration histogram #6960

Merged

Conversation

francoposa
Copy link
Member

This is to allow us to observe the effects of turning on multidimensional queueing, by breaking out the queue duration metric used in the our Mimir / Reads Latency (Time in Queue) panel as well as alerts.

Went back and forth on how to label this, but stuck with the idea that we shouldn't label the additional queue dimensions with a specific meaning in Mimir by doing something like "take the first additional queue dimension and assign it to a label named query_component.

Instead I am just concatenating and shipping the additional queue dimensions as is, and we can use the alerts & dashboards to assign meanings to the labels.

Open to feedback on that choice of approach!

What this PR does

Which issue(s) this PR fixes or relates to

Fixes #

Checklist

  • Tests updated.
  • Documentation added.
  • CHANGELOG.md updated - the order of entries should be [CHANGE], [FEATURE], [ENHANCEMENT], [BUGFIX].
  • about-versioning.md updated with experimental features.

@francoposa francoposa requested a review from a team as a code owner December 18, 2023 18:31
Copy link
Contributor

@56quarters 56quarters left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

Name: "cortex_query_scheduler_queue_duration_seconds",
Help: "Time spent by requests in queue before getting picked up by a querier.",
Buckets: prometheus.DefBuckets,
})
}, []string{"user", "additional_queue_dimensions"})
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Something shorter for this label like dimensions would be my preference

@@ -399,7 +400,8 @@ func (s *Scheduler) QuerierLoop(querier schedulerpb.SchedulerForQuerier_QuerierL
r := req.(*queue.SchedulerRequest)

queueTime := time.Since(r.EnqueueTime)
s.queueDuration.Observe(queueTime.Seconds())
additionalQueueDimensionLabels := strings.Join(r.AdditionalQueueDimensions, ":")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This there any guarantee that these are always in the same order?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No, but we don't want to sort either because order does matter, as the additional dimensions represent a path through the tree.

Sending additional queue dimensions ["A" "B"] would represent path root->tenantID->A->B, which is a different queue than additional queue dimensions ["B" "A"], which would represent path root->tenantID->B->A.

But for right now we send no (nil slice) additional dimensions ,or one additional dimension representing the expected query component(s) used, making the path through the tree essentially like root -> tenantID -> queryComponent.

The fact that we always have the first queuing dimension as tenant ID for now is also what makes me hesitant to remove "additional" from the variable and label naming.

@francoposa francoposa merged commit 5cc586d into main Dec 18, 2023
28 checks passed
@francoposa francoposa deleted the francoposa/query-scheduler-observe-additional-queue-dimensions branch December 18, 2023 21:13
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants