Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Extend Alertmanager dashboard with currently unused metrics. #313

Merged
merged 4 commits into from
Jul 30, 2021

Conversation

stevesg
Copy link
Contributor

@stevesg stevesg commented May 25, 2021

What this PR does:
Metrics for general operation:

  • Added "Tenants" stat panel using:
    cortex_alertmanager_tenants_discovered

  • Added "Tenant Configuration Sync" row using:
    cortex_alertmanager_sync_configs_failed_total
    cortex_alertmanager_sync_configs_total
    cortex_alertmanager_ring_check_errors_total

Metrics specific to sharding operation:

  • Added "Sharding Initial State Sync" row using:
    cortex_alertmanager_state_initial_sync_completed_total
    cortex_alertmanager_state_initial_sync_completed_total
    cortex_alertmanager_state_initial_sync_duration_seconds

  • Added "Sharding State Operations" row using:

    cortex_alertmanager_state_fetch_replica_state_total
    cortex_alertmanager_state_fetch_replica_state_failed_total
    cortex_alertmanager_state_replication_total
    cortex_alertmanager_state_replication_failed_total
    cortex_alertmanager_partial_state_merges_total
    cortex_alertmanager_partial_state_merges_failed_total
    cortex_alertmanager_state_persist_total
    cortex_alertmanager_state_persist_failed_total

I did not add a configuration to enable/disable the sharding-specific dashboards as the resulting jsonnet is somewhat messy, but I am happy to add it if deemed necessary.

Checklist

  • CHANGELOG.md updated - the order of entries should be [CHANGE], [FEATURE], [ENHANCEMENT], [BUGFIX]

@stevesg stevesg marked this pull request as ready for review May 25, 2021 09:07
@stevesg stevesg requested a review from a team as a code owner May 25, 2021 09:07
Copy link
Collaborator

@pracucci pracucci left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Very good job! It definitely improves the visibility over the new alertmanager sharding. I left few comments I would be glad if you could take a look.

Thanks!

cortex-mixin/dashboards/alertmanager.libsonnet Outdated Show resolved Hide resolved
cortex-mixin/dashboards/alertmanager.libsonnet Outdated Show resolved Hide resolved
cortex-mixin/dashboards/alertmanager.libsonnet Outdated Show resolved Hide resolved
cortex-mixin/dashboards/alertmanager.libsonnet Outdated Show resolved Hide resolved
cortex-mixin/dashboards/alertmanager.libsonnet Outdated Show resolved Hide resolved
cortex-mixin/dashboards/alertmanager.libsonnet Outdated Show resolved Hide resolved
cortex-mixin/dashboards/alertmanager.libsonnet Outdated Show resolved Hide resolved
cortex-mixin/dashboards/alertmanager.libsonnet Outdated Show resolved Hide resolved
cortex-mixin/dashboards/alertmanager.libsonnet Outdated Show resolved Hide resolved
Copy link
Collaborator

@pracucci pracucci left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good job! I left few final nits, but consider it already approved. Please test it in a dev env before merging. Thanks!

cortex-mixin/dashboards/alertmanager.libsonnet Outdated Show resolved Hide resolved
cortex-mixin/dashboards/alertmanager.libsonnet Outdated Show resolved Hide resolved
cortex-mixin/dashboards/alertmanager.libsonnet Outdated Show resolved Hide resolved
cortex-mixin/dashboards/alertmanager.libsonnet Outdated Show resolved Hide resolved
CHANGELOG.md Outdated Show resolved Hide resolved
CHANGELOG.md Show resolved Hide resolved
@stevesg stevesg marked this pull request as draft July 30, 2021 09:43
Metrics for general operation:

- Added "Tenants" stat panel using:
  `cortex_alertmanager_tenants_discovered`

- Added "Tenant Configuration Sync" row using:
  `cortex_alertmanager_sync_configs_failed_total`
  `cortex_alertmanager_sync_configs_total`
  `cortex_alertmanager_ring_check_errors_total`

Metrics specific to sharding operation:

- Added "Sharding Initial State Sync" row using:
  `cortex_alertmanager_state_initial_sync_completed_total`
  `cortex_alertmanager_state_initial_sync_completed_total`
  `cortex_alertmanager_state_initial_sync_duration_seconds`

- Added "Sharding State Operations" row using:

  `cortex_alertmanager_state_fetch_replica_state_total`
  `cortex_alertmanager_state_fetch_replica_state_failed_total`
  `cortex_alertmanager_state_replication_total`
  `cortex_alertmanager_state_replication_failed_total`
  `cortex_alertmanager_partial_state_merges_total`
  `cortex_alertmanager_partial_state_merges_failed_total`
  `cortex_alertmanager_state_persist_total`
  `cortex_alertmanager_state_persist_failed_total`
@stevesg stevesg marked this pull request as ready for review July 30, 2021 10:46
@stevesg
Copy link
Contributor Author

stevesg commented Jul 30, 2021

Updated and tested.

@stevesg stevesg merged commit ee591ee into grafana:main Jul 30, 2021
simonswine pushed a commit to grafana/mimir that referenced this pull request Oct 18, 2021
…er-sharding

Extend Alertmanager dashboard with currently unused metrics.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants