Re-metric your stack - find waste in Prometheus, Grafana & Loki.
remetric is a read-only doctor for self-hosted Prometheus stacks. Point it at a
Prometheus server and it prints a ranked, actionable list of cardinality
problems with suggested metric_relabel_configs fixes.
Status: alpha - cardinality, label-pattern, unused-metric, and alert-hygiene analyzers are wired up. JSON output, Grafana integration, unified
remetric scan, and HTML/Markdown reports shipped.
The demo above runs against the demo/ docker-compose stack
(Prometheus + Grafana + a deliberately misbehaving cardinality-bomb
service). Reproduce with cd demo && docker compose up -d.
curl -sSL https://remetric.dev/install.sh | shInstalls the latest release into $HOME/.local/bin. Override with
REMETRIC_INSTALL_DIR=/usr/local/bin sh install.sh (may need sudo) or pin a
specific version via REMETRIC_VERSION=v0.1.0 sh install.sh.
brew install remetric-dev/tap/remetricOr, two-line for shorter subsequent invocations:
brew tap remetric-dev/tap
brew install remetricThe formula is auto-published on each release from
remetric-dev/homebrew-tap.
Multi-arch image (linux/amd64, linux/arm64).
Pick the invocation that matches where Prometheus is running:
# Prometheus on the host (Linux) - share the host network
docker run --rm --net=host \
ghcr.io/remetric-dev/remetric:latest \
doctor --prometheus http://127.0.0.1:9090
# Prometheus on the host (macOS / Windows Docker Desktop) - use the magic DNS name
docker run --rm \
ghcr.io/remetric-dev/remetric:latest \
doctor --prometheus http://host.docker.internal:9090
# Prometheus reachable on the public internet or a corporate URL
docker run --rm \
ghcr.io/remetric-dev/remetric:latest \
doctor --prometheus https://prom.example.com
# Prometheus in the same docker network (compose / k8s) - use the service name
docker run --rm --network my-prom-net \
ghcr.io/remetric-dev/remetric:latest \
doctor --prometheus http://prometheus:9090GitHub Releases ship signed tarballs and a checksums.txt. See
https://github.com/remetric-dev/remetric/releases.
Two paths, depending on what you have:
Option A - all-in-one demo (Docker only). Spins up a synthetic stack (Prometheus + Grafana + a misbehaving metric) and writes an HTML report:
git clone https://github.com/remetric-dev/remetric
cd remetric/demo
docker compose up
# wait ~45s, then:
open output/report.html # macOS
xdg-open output/report.html # LinuxSee demo/README.md for what's in the stack and how to poke at it.
Option B - local binary against an e2e stack.
git clone https://github.com/remetric-dev/remetric
cd remetric
make build
# in another shell - spin up an e2e Prometheus stack
make e2e-up
sleep 10
./bin/remetric doctor --prometheus http://localhost:9090
./bin/remetric cardinality top --prometheus http://localhost:9090
make e2e-downFind labels whose names look like unbounded identifiers (user_id, trace_id,
path, …) and rank them by uniqueness:
remetric cardinality suspicious \
--prometheus http://localhost:9090 \
--min-severity mediumInspect the labels of a single metric, sorted by unique value count:
remetric cardinality labels \
--metric http_requests_total \
--prometheus http://localhost:9090Both commands accept --output json for machine-readable output.
Diff ingested metrics against everything Grafana, alert rules, and recording rules actually reference. Anything left over is a candidate to drop.
remetric metrics unused \
--prometheus http://localhost:9090 \
--grafana http://localhost:3000--grafana-token TOKEN uses a bearer (service-account API key);
--grafana-basic-auth user:pass for basic auth.
Run every analyzer in one shot:
remetric scan \
--prometheus http://localhost:9090 \
--grafana http://localhost:3000scan emits a findings.Report (see spec §5.5) - combine with --output json
for CI.
remetric works against VictoriaMetrics out of the box. The Prometheus
HTTP API surface VM exposes is auto-detected on first call. Backend
detection is logged once; override via --backend=victoria (or
prometheus, auto) if needed.
# Single-binary VM (default port 8428)
remetric scan --prometheus http://vm:8428
# VM cluster fronted by vmauth, vmalert separate
remetric scan \
--prometheus https://vmauth.example.com:8427 \
--prom-token "$VMAUTH_TOKEN" \
--vmalert http://vmalert:8880
# Force VM mode (skip auto-detection)
remetric scan --prometheus http://vm:8428 --backend victoria/api/v1/rules is served by vmalert, not vmselect. Without
--vmalert, metrics unused and scan warn with rules unavailable
and may report false-positives for metrics referenced only by recording
rules. Point --vmalert at the vmalert HTTP listener (default :8880)
to get full coverage. Auth flags --vmalert-token / --vmalert-basic-auth
exist for split-credential setups; if omitted, vmalert inherits auth from
--prom-token / --prom-basic-auth.
doctorshowsretention: n/a- VM does not expose/api/v1/status/runtimeinfo.cardinality topderivesnumSeriesby summingseriesCountByMetricName(VM does not returnheadStats).- Cortex/Mimir-style multi-tenancy headers (
X-Scope-OrgID) are not supported; URL-prefix-based tenant routing throughvmauthworks.
remetric inspects the ALERTS series via query_range to flag alerts that
either never fire or fire continuously (broken thresholds, alert noise).
# Alerts that did not fire in the last 7 days (default lookback)
remetric alerts unused \
--prometheus http://localhost:9090
# Alerts that fire >=95% of the lookback window
remetric alerts always-firing \
--prometheus http://localhost:9090 \
--lookback 24h \
--step 5mTune the sampling window with --lookback (default 168h) and --step
(default 1h). For VictoriaMetrics, point --vmalert at the vmalert API.
remetric report runs every analyzer and emits a single document in
terminal, JSON, HTML, or Markdown format.
# Self-contained HTML report (opens in any browser, mobile-friendly)
remetric report --prometheus http://localhost:9090 \
--format html --out report.html
# Markdown for PR comments / inboxes
remetric report --prometheus http://localhost:9090 \
--format markdown > report.mdFormats: terminal (default), json, html, markdown. Use --out FILE
to write to a file, or - (the default) for stdout. The global --output
flag is ignored by report - use --format instead.
| Command | What it does |
|---|---|
remetric doctor |
Connectivity + version + permission self-check |
remetric cardinality top |
List the worst-offending high-cardinality metric/label pairs |
remetric cardinality labels |
Per-metric label inventory (unique counts + sample values) |
remetric cardinality suspicious |
Flag labels matching unbounded-identifier patterns |
remetric metrics unused |
Ingested ∖ used metrics (needs Grafana for dashboard coverage) |
remetric alerts unused |
Alerts that never fired in the lookback window |
remetric alerts always-firing |
Alerts firing >=95% of the lookback window |
remetric report |
Run every analyzer, render terminal/json/html/markdown |
remetric scan |
Run every available analyzer, emit a unified Report |
Global flags (subset; see --help for the full list):
--prometheus URL- Prometheus base URL. Env:REMETRIC_PROMETHEUS_URL.--prom-token TOK- Bearer token. Env:REMETRIC_PROMETHEUS_TOKEN.--grafana URL- Grafana base URL. Env:REMETRIC_GRAFANA_URL.--grafana-token TOK- Grafana service-account API key. Env:REMETRIC_GRAFANA_TOKEN.--grafana-basic-auth USER:PASS- Basic auth for Grafana.--grafana-tls-skip-verify- Skip TLS verification for Grafana.--backend {auto|prometheus|victoria}- backend dialect. Env:REMETRIC_BACKEND.--vmalert URL- vmalert base URL for /api/v1/rules. Env:REMETRIC_VMALERT_URL.--vmalert-token TOK- Bearer for vmalert (inherits from --prom-token if unset). Env:REMETRIC_VMALERT_TOKEN.--vmalert-basic-auth USER:PASS- Basic auth for vmalert (inherits from --prom-basic-auth if unset).--vmalert-tls-skip-verify- Skip TLS verify for vmalert.--prom-basic-auth USER:PASS- Basic auth.--prom-max-in-flight N- Concurrency cap (default 5).--output FORMAT-terminal(default) orjson.--fail-on SEV- Exit 3 if any finding is at or above this severity. Env:REMETRIC_FAIL_ON. Defaultnone.--no-color- Disable colored output (NO_COLORenv also respected).--verbose- Debug-level slog logging on stderr.
Full reference at remetric.dev - one page per finding class with detection rules, fix snippets, and false-positive notes.
- No dashboard sprawl analyzer.
This lands in a subsequent release.
Pair any analyzer command with --fail-on=critical to fail the build when a
finding at or above the chosen severity is present. Default behaviour
(--fail-on=none) preserves zero-exit regardless of findings.
# Fail the build if any critical-severity finding is present
remetric scan --prometheus http://localhost:9090 --fail-on=criticalExit codes:
| Code | Meaning |
|---|---|
| 0 | Clean exit (no findings ≥ threshold, or --fail-on=none). |
| 1 | Runtime or analyzer error. |
| 2 | Flag / usage error. |
| 3 | Findings at or above --fail-on threshold. |
Suppress findings that are known noise or out of scope. Patterns are
anchored full-match regexes: foo_.* matches foo_bar but not
xfoo_bar. Empty / whitespace-only patterns are silently ignored.
Three target fields, each with its own flag (repeatable):
| Flag | Drops findings whose ... |
|---|---|
--ignore-metric REGEX |
metric name matches |
--ignore-label REGEX |
evidence label matches |
--ignore-alert REGEX |
alert name matches |
# Repeatable flag
remetric scan \
--prometheus http://localhost:9090 \
--ignore-metric='node_.*' \
--ignore-metric='go_.*' \
--ignore-alert='HighMemoryUsage'
# Environment (comma-separated lists)
REMETRIC_IGNORE_METRIC='node_.*,go_.*' \
REMETRIC_IGNORE_ALERT='HighMemoryUsage' \
remetric scan --prometheus http://localhost:9090
# YAML at ~/.config/remetric/.remetric.yaml or ./.remetric.yaml
# ignore:
# metric: ["node_.*", "go_.*"]
# label: ["pod"]
# alert: ["HighMemoryUsage"]The dropped count surfaces in every output format. Filter runs BEFORE
--fail-on, so an ignored critical finding does not raise exit code 3.
| Component | Minimum | Tested | Notes |
|---|---|---|---|
| Prometheus | 2.30 | 2.51.x, 2.53.x | TSDB stats API (/api/v1/status/tsdb) is the floor. Prometheus 3.x untested - file an issue if you hit something. |
| VictoriaMetrics | v1.93 | v1.108.x | Single-binary + cluster (via vmauth). vmalert required for alert + recording-rule coverage; pass --vmalert. |
| Grafana | 9.0 | 10.4.x, 11.x | Service-account API keys preferred (--grafana-token); basic auth supported. |
| Go (build) | 1.26 | 1.26.3 | Only needed if building from source; releases ship static binaries. |
| OS / arch | - | linux+amd64, linux+arm64, darwin+amd64, darwin+arm64, windows+amd64 | Static binary, no glibc dependency. |
| Docker | - | 24+ | For demo / e2e stacks via docker compose. |
Multi-tenant Cortex / Mimir / Thanos: URL-prefix tenant routing through
vmauth works; X-Scope-OrgID header style is not yet supported.
Does remetric modify my Prometheus or Grafana?
No. Remetric is strictly read-only. It calls GET against the Prometheus HTTP API and Grafana's /api/search + /api/dashboards. Nothing is created, updated, or deleted.
What does each severity mean?
critical- clear, large-impact problem (broken always-firing alert; metric with millions of series concentrated in one unbounded label).high- significant cardinality offender or unused metric responsible for >5% of total series.medium- notable but bounded (suspicious label pattern, never-fired alert).low- informational; below the default--min-severity=mediumcutoff, surfaced via--min-severity=low.
Severity is computed per-analyzer from observed series counts, uniqueness ratios, and lookback windows. See internal/scoring/ for the exact rules.
How accurate is the series-reduction estimate?
It's an upper bound (estimation_method: "labeldrop_upper_bound"). The number assumes the offending label is fully dropped; in practice, partial relabel rules will reduce less. Treat it as "this much waste could go away if you fully suppress this label".
Does --ignore interact with --fail-on?
Ignored findings are dropped BEFORE the --fail-on gate. An ignored critical finding does not raise exit code 3. This is intentional: ignored == "known and accepted".
Why do I need --vmalert for VictoriaMetrics?
VictoriaMetrics serves /api/v1/rules from vmalert, not from vmselect. Without --vmalert, the alert-hygiene analyzer and unused-metric analyzer can't see rule definitions and will warn rules unavailable. Point --vmalert at the vmalert HTTP listener (default :8880).
scan vs report vs cardinality top - which one do I run?
cardinality top(and other focused commands) - drill into one analyzer's output. Use for investigation.scan- run every analyzer, emit a JSONfindings.Report. Use in CI / scripts (with--fail-on).report- same coverage asscanbut renders toterminal(default),json,html, ormarkdownvia--format. Use to share a snapshot.
How do I run remetric in CI?
Pair any command with --fail-on=critical (or stricter) so a regression breaks the build:
remetric scan --prometheus https://prom.internal --fail-on=critical --output json > scan.jsonExit codes: 0 clean, 1 runtime error, 2 flag/usage error, 3 findings at or above threshold. See ## CI integration.
Can I silence known-noisy metrics or alerts?
Yes, with --ignore-metric, --ignore-label, --ignore-alert (anchored regex, repeatable). Patterns can also come from REMETRIC_IGNORE_* env vars or .remetric.yaml. See ## Ignoring findings.
--ignore applies to commands that produce Findings: scan, report, cardinality top, cardinality suspicious, alerts unused, alerts always-firing, metrics unused. It does NOT apply to cardinality labels (which emits a label inventory, not findings) or doctor (connectivity self-check).
What about Loki / multi-tenant Cortex / Mimir?
Post-v0.1 roadmap. Today, single-tenant Prometheus + VictoriaMetrics. Tenant routing via vmauth URL prefixes works; X-Scope-OrgID is not yet wired.
make build # static binary at ./bin/remetric
make test # unit tests
make e2e-up # docker compose Prometheus + node-exporter
make e2e # e2e smoke tests
make e2e-down
make fmt vet lint vuln # toolingApache 2.0. See LICENSE.
