Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Cortex kv panel improvements #371

Draft
wants to merge 2 commits into
base: main
Choose a base branch
from
Draft

Cortex kv panel improvements #371

wants to merge 2 commits into from

Conversation

bboreham
Copy link
Contributor

@bboreham bboreham commented Aug 17, 2021

What this PR does:

Distributor uses multiple kv stores - for global limits and ha-tracker, as well as reading from the ingester ring - so we need to narrow the panel to just the one it says it is showing, e.g. kv_name="distributor-hatracker".

For consistency, do the same on the ingester panel, although currently ingesters only have one kv store.

Created as Draft because the renamed recording rule will mean that dashboards show no data for latency prior to the change. Do we want to cope with this? We could add an extra three queries on the latency panels with the old name, but it will get quite ugly in the code. Discuss.

Also, I added a couple of panels with more info on what the KV store is doing.

  • For HA-tracker, show which tenants are changing election.
  • For ingester, show how many are active, leaving, etc.

Checklist

  • CHANGELOG.md updated

Distributor uses multiple kv stores - for global limits and ha-tracker,
as well as reading from the ingester ring - so we need to narrow the
panel to just the one it says it is showing.

For consistency, do the same on the ingester panel, although currently
ingesters only have one kv store.

Note that the renamed recording rule will mean that dashboards show
no data for latency prior to the change.
For HA-tracker, show which tenants are changing election.
For ingester, show how many are active, leaving, etc.
Copy link
Collaborator

@pracucci pracucci left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Created as Draft because the renamed recording rule will mean that dashboards show no data for latency prior to the change. Do we want to cope with this? We could add an extra three queries on the latency panels with the old name, but it will get quite ugly in the code. Discuss.

If it was the read/write latency, I would have said definitely yes. In this case it's the KV latency so we can probably accept the fact we'll "loose the history" when watching this dashboard in the past (or over la large period), so to me it's OK.

.addPanel(
$.panel('Elected replica changes / min') +
$.queryPanel([
'max by(exported_cluster, user)(increase(cortex_ha_tracker_elected_replica_changes_total{%s}[1m])) >0' % $.jobMatcher($._config.job_names.distributor),
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This can be a pretty high cardinality query. For example, I just tested in a cluster with few thousands of tenants and running it over "Last 12h" took 25s on cold caches. What if we add a recording rule for that? I would expect the > 0 narrows down a lot the cardinality (other than squashing it by a "distributors replicas" number factor).

@CLAassistant
Copy link

CLAassistant commented Jun 15, 2022

CLA assistant check
All committers have signed the CLA.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants