Metrics

Prometheus surface exposed at /metrics on the orchestrator's main listen address (-addr 127.0.0.1:8443 by default). Same TLS/mTLS envelope as the rest of the API.

Source: internal/orchestrator/metrics/metrics.go.

Series

`fangs_orchestrator_info{version}`

Gauge, always 1. Labeled with the build version (-ldflags -X main.version=... or dev). Useful for stamping the version on dashboards.

fangs_orchestrator_info

`fangs_runners_registered`

Gauge. Number of runners currently registered + heartbeat-fresh. Pruner evicts at 90s no-heartbeat; this gauge reflects post-prune state.

# alert when no runners
fangs_runners_registered == 0

`fangs_scans_queued_total`

Counter. Incremented every time SubmitScan enqueues a job.

# scan rate
rate(fangs_scans_queued_total[5m])

`fangs_events_received_total{type}`

Counter. Incremented per arrived event, labeled by event type (file_access, exec, net_connect, dns_query, tls_sni).

# events per second by type
sum by (type) (rate(fangs_events_received_total[1m]))

# total event throughput
sum(rate(fangs_events_received_total[1m]))

Cardinality: bounded — 5 types.

`fangs_events_dropped_total`

Counter. Lifetime sum of events_dropped from every ScanResult — incremented by ObserveEventsDropped on each result POST.

# alert on ringbuf overflow
rate(fangs_events_dropped_total[5m]) > 0

A non-zero rate means at least one recent run lost events to ringbuf overflow. See Sensor-Probes#drop-counter for tuning advice.

`fangs_deviations_written_total{severity}`

Counter. Incremented per Differ-emitted deviation, labeled by severity.

# deviation rate by severity
sum by (severity) (rate(fangs_deviations_written_total[1h]))

# critical deviations
fangs_deviations_written_total{severity="critical"}

Cardinality: bounded — 4 standard severities + unknown.

`fangs_baseline_promoted_total{trigger}`

Counter. Tracks baseline promotions:

auto — Differ promoted a zero-deviation run
manual — operator ran fangs baseline promote

# ratio of human-triggered promotes (a high ratio means most
# releases need operator review — possible tuning opportunity)
sum(fangs_baseline_promoted_total{trigger="manual"})
  / sum(fangs_baseline_promoted_total)

Note: today this counter only fires from the Differ's auto-promote path. The CLI's fangs baseline promote doesn't update the metric since it writes to the DB directly without going through the orchestrator. That's a v2 fix.

`fangs_notifications_total{notifier, status}`

Counter. Incremented per delivery attempt by the Notifier.

Labels:

notifier — the target's name (bounded by your configured targets — typically <10)
status — sent | failed | permanent

# delivery success rate per notifier
sum by (notifier) (rate(fangs_notifications_total{status="sent"}[1h]))
  / sum by (notifier) (rate(fangs_notifications_total[1h]))

# alert on a notifier producing permanent failures
rate(fangs_notifications_total{status="permanent"}[5m]) > 0

Cardinality: bounded — # notifiers × 3.

Go runtime collectors

The metrics package also registers prometheus.NewGoCollector() and prometheus.NewProcessCollector(...), so you get for free:

go_goroutines
go_gc_duration_seconds
go_memstats_*
process_cpu_seconds_total
process_resident_memory_bytes
process_open_fds
...

Useful for orchestrator host monitoring without an extra agent.

Cardinality

Total series count for a healthy deployment is small:

Source	Series
`fangs_*` counters/gauges	~10 base + (5 event types) + (4 severities) + (3 statuses × N notifiers) ≈ 25-50
Go runtime	~30
Process runtime	~10

Should easily stay under 100 series — Prometheus has no trouble.

Sample scrape config

prometheus.yml:

scrape_configs:
  - job_name: fangs
    static_configs:
      - targets: ['fangs.internal:8443']
    scheme: https
    tls_config:
      ca_file: /etc/prometheus/fangs-ca.crt
      cert_file: /etc/prometheus/fangs-client.crt
      key_file:  /etc/prometheus/fangs-client.key

For mTLS deployments, Prometheus needs a client cert signed by the orchestrator's -tls-client-ca to scrape. Issue one via docs/scripts/gen-tls.sh with RUNNER_ID=prom-scraper.

For plain-HTTP development:

scrape_configs:
  - job_name: fangs
    static_configs:
      - targets: ['127.0.0.1:8443']

Recommended alerts

groups:
  - name: fangs
    rules:
      - alert: FangsNoRunners
        expr: fangs_runners_registered == 0
        for: 2m
        labels: {severity: critical}
        annotations:
          summary: No FANGS runners registered for 2 minutes

      - alert: FangsRingbufOverflow
        expr: rate(fangs_events_dropped_total[5m]) > 0
        for: 5m
        labels: {severity: warning}
        annotations:
          summary: FANGS sensor ringbuf overflowing
          description: Events being dropped at probe time; tune ringbuf or reduce concurrent scans

      - alert: FangsNotifierFailing
        expr: rate(fangs_notifications_total{status="permanent"}[10m]) > 0
        labels: {severity: warning}
        annotations:
          summary: Notifier delivery permanently failing
          description: A configured webhook target is returning 4xx persistently

      - alert: FangsHighSeverityDeviation
        expr: increase(fangs_deviations_written_total{severity=~"high|critical"}[5m]) > 0
        labels: {severity: critical}
        annotations:
          summary: High/critical FANGS deviation
          description: Investigate the pending review queue

What's NOT instrumented

Some signals would be useful but aren't surfaced today:

Per-run differ duration (fangs_differ_duration_seconds) — histogram showing how long deltas take.
Per-package run frequency — high-cardinality (one label per watched package) so it's intentionally absent; the DB has the data.
Sandbox lifecycle latency — image pull duration, container exit code distribution. Useful for operators tuning sandbox limits.

Each is a small addition. PRs welcome.

Tracing

Not wired today. OpenTelemetry hooks would be valuable at the per-run state machine + per-target Notifier retry loops. v2 item.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Metrics

Metrics

Series

`fangs_orchestrator_info{version}`

`fangs_runners_registered`

`fangs_scans_queued_total`

`fangs_events_received_total{type}`

`fangs_events_dropped_total`

`fangs_deviations_written_total{severity}`

`fangs_baseline_promoted_total{trigger}`

`fangs_notifications_total{notifier, status}`

Go runtime collectors

Cardinality

Sample scrape config

Recommended alerts

What's NOT instrumented

Tracing

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Clone this wiki locally