Skip to content

ci: grafana-lint --strict fails on main (charon.json) — multiple pre-existing violations #315

@obchain

Description

@obchain

Problem

`.github/workflows/grafana-lint.yml` runs `dashboard-linter lint --strict deploy/grafana/charon.json` on every push and PR that touches `deploy/grafana/**`. The latest run on `main` fails (`workflow_run id 24902714948`). The same lint failed on the very PR that introduced the dashboard (`#54 feat/26-grafana-dashboard`) and has been failing ever since because every subsequent fixup touched the dashboard JSON without addressing the strict-mode violations.

Concrete failures (full list in CI logs):

  1. Panel `Build info` has no `unit` defined.
  2. Every PromQL target is missing a `job=~"$job"` selector — the dashboard has no `job` template variable.
  3. Rate-based queries hardcode `[1m]` / `[5m]` instead of `$__rate_interval`.
  4. `instance` template `allValue` is `'.*'`, the linter expects `'.+'`.
  5. Dashboard top-level `editable: true` should be `false`.
  6. Missing top-level `job` template variable.

Impact

  • Every PR that touches `deploy/grafana/**` now arrives with a red CI tile, conditioning maintainers to merge despite a failing check.
  • Future regressions in the dashboard JSON are masked by the constant-failure baseline — a real `promtool check rules` regression on `alerts.yaml` would not stand out.
  • New contributors cannot tell whether a lint failure on their PR is theirs or pre-existing.

Fix

Single PR that simultaneously:

  1. Adds a `job` template variable with `label_values(charon_build_info, job)`, default `charon`.
  2. Adds `job=~"$job"` to every existing target query.
  3. Replaces hardcoded `[1m]` / `[5m]` with `$__rate_interval` on every `rate()` / `irate()` expression.
  4. Sets `unit: "short"` on `Build info` (or `unit: "none"` — value is always 1).
  5. Sets the `instance` template `allValue: ".+"`.
  6. Sets dashboard `editable: false`.

After landing, `grafana-lint` should turn green on `main` and surface real regressions.

Severity

Medium — CI noise that erodes signal. Not a runtime bug.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions