Operating

Day-2 reference. Workflows, triage, troubleshooting.

Mental model

FANGS is a delta detector. It compares each run against the package's rolling baseline and surfaces what's new. Three states a run can be in:

State	Meaning
First run for a package	Baseline gets seeded; zero deviations regardless
Subsequent zero-deviation run	Auto-promoted to baseline (`occurrence_count` bump only)
Subsequent any-deviation run	Lands in `fangs pending` for operator decision

Adding a package to the watch list

fangs package add lodash

What happens:

Validates lodash exists on registry.npmjs.org. Bogus names get rejected before the DB row lands.
Inserts into the packages table.
Records the current dist-tags.latest as last_seen_version so the next watcher poll doesn't re-flag it as "new."
Queues an immediate kickoff scan of that version. First run → seeds baseline.

Subsequent releases auto-trigger via the watcher (default 5min poll).

Useful subcommands:

fangs package watched       # current watch list
fangs package list          # packages with runs (post-scan summary view)
fangs package remove lodash # stop watching

Skip the kickoff scan if you want — useful for offline / batch flows:

fangs package add lodash -skip-initial-scan

One-off scans

fangs scan submit -package axios -version 1.7.7

Validates axios@1.7.7 exists on the registry, then POSTs /v1/scans to the orchestrator (default http://127.0.0.1:8443). Returns the assigned run_id + the watch URLs.

Useful flags:

Flag	Purpose
`-orchestrator URL`	when the orchestrator isn't on localhost
`-runner ID`	target a specific runner (default: the operator's hostname)
`-duration 90s`	longer sandbox time for heavy installs
`-skip-registry-validate`	useful for offline tests against private fake-registries

Reviewing pending findings

The triage queue lives in two places — one for browsing, one for scripting:

# Browser
http://127.0.0.1:8443/ui/pending

# CLI
fangs pending
fangs pending -package axios
fangs pending -min-severity high
fangs pending --json | jq .   # for piping into other tools

Each row shows the run, package, version, deviation count, max severity, last detected, and the literal fangs baseline promote … command you can copy-paste.

Triaging a deviation

Five steps:

Look at the package + version. Is it a known dependency? Was there a recent advisory? Sometimes the answer comes from context alone.
Look at the deviation list for the run. fangs deviation list -run <run_id> or click into the run on the UI.
Look at the evidence event. fangs deviation show <id> prints the full kernel-event JSON that triggered the finding. Or click "→ lineage" in the UI to see the process tree.
Decide. Three real options:
- Promote — fangs baseline promote <run_id>. The whole run's fingerprints merge into baseline, deviation rows clear.
- Allowlist the noise — fangs allow add with the appropriate kind. See below.
- Investigate — pull the package tarball, do offline analysis. Nothing in FANGS changes until you act.

Allowlisting

Suppress known-benign noise before it becomes a deviation:

# Global — applies to every package
fangs allow add -kind cidr -value 10.0.0.0/8 -note "internal"

# Per-package — only applies to runs of `axios`
fangs allow add -kind sni -value telemetry.example -package axios

# Path exclusion — silence noisy file_access events
fangs allow add -kind path -value /opt/vendor/ -note "trusted vendored dir"

Three kinds map to three Differ categories:

kind	suppresses	example
`cidr`	`net_new_destination`	`10.0.0.0/8`
`path`	`fs_new_path_*`	`/opt/vendor/`
`sni`	`net_new_https_host`	`telemetry.example`

The hardcoded CDN allowlist (Cloudflare/GitHub/Google/Fastly/CloudFront) applies underneath — entries here are additive.

fangs allow list — show all entries. Config-managed ones (from config/orchestrator.yaml) have cfg…-prefixed IDs.

fangs allow remove <id-prefix> — git-style short ID match.

UI: /ui/allowlist.

Promoting a baseline manually

When a clean release should join baseline but the run had deviations you've accepted as legitimate:

fangs baseline list -package lodash    # see what's in baseline
fangs baseline promote <run-id-prefix>

Promote re-extracts the run's fingerprints (with allowlists applied), merges them into baseline_fingerprints, marks the run is_baseline=true, and clears any deviation rows for it.

Configuring notifiers

Without a notifier, deviations sit in the DB waiting for someone to refresh the UI. With one, every run with ≥1 deviation fires one webhook per configured + enabled target.

# Slack
fangs notifier add -name soc-slack \
  -url 'https://hooks.slack.com/services/T.../B.../...' \
  -template slack

# Discord
fangs notifier add -name soc-discord \
  -url 'https://discord.com/api/webhooks/.../...' \
  -template discord

# Generic — for SIEM / event bus / Lambda
fangs notifier add -name siem \
  -url 'https://intake.internal/fangs' \
  -template generic \
  -secret-env FANGS_HMAC \
  -min-severity high

Knobs:

Flag	Purpose
`-template`	`slack` \| `discord` \| `generic`
`-secret-env ENV_VAR`	env-var name holding HMAC secret (generic targets only)
`-min-severity`	only fire when ≥1 deviation has severity ≥ threshold
`-headers JSON`	extra HTTP headers as JSON object
`-enabled=false`	disable without removing

Verify wiring:

fangs notifier test soc-slack            # fires a synthetic message
fangs notifier list
fangs notifier history -run <run_id>     # delivery attempts for one run

UI: /ui/notifiers.

See Notifier for retry policy, template internals, HMAC details.

Inspecting the runtime

UI overview (/ui/) shows:

Packages watched / packages ever tracked
Runs total / runs on baseline
Open deviations / packages affected
Lifetime events dropped (sensor ringbuf overflow indicator)
Runner pool with heartbeat freshness + active-run links
Recent runs + recent deviations

Prometheus at /metrics — see Metrics for every series.

CLI:

fangs run list -package lodash -limit 20
fangs run show <run-id-prefix>
fangs release list -package lodash

Common failures

"No runners registered"

The orchestrator received a scan request but has no runner. Cause: no fangs-runner is running, or its heartbeat went stale (>90s) and it got pruned. Restart the runner.

Stale runners on `/ui/`

Runner crashed without deregistering. Wait ~90 seconds and the heartbeat pruner evicts. Pruner ticks every 30s.

`events_dropped` > 0 on the dashboard

The eBPF ringbuf overflowed during one or more runs. Causes:

Very-high-throughput sandboxes (e.g. ESM-only packages that open thousands of files at install).
Slow consumer — the runner's event-stream HTTP POST is backed up. Check runner.log for high batch_send_errors in event streamer closed lines.
Orchestrator backpressure under heavy concurrent scans.

The ringbuf is 64 MB per probe (compile-time constant). Most cases resolve by tuning sandbox concurrency.

"container exit_code=137"

OOM kill. The sandbox hit Memory: 512 MB (default). For packages with heavy install steps (TypeScript, native bindings), bump memory in your scan submission's sandbox spec. Per-package memory override is a v2 item.

"container exit_code=125"

Docker daemon couldn't start the container. Usually a permission problem on /var/run/docker.sock or a Docker daemon misconfiguration. Check journalctl -u docker.

"AddCgroup: cgroup id already registered"

Two scans tried to share the same cgroup_id — shouldn't happen in practice. Restart the runner.

Sensor probes failing to attach

Check the runner log for attach tracepoint lines. Common causes:

tracefs not mounted — runner auto-mounts at startup; manual fallback: sudo mount -t tracefs nodev /sys/kernel/tracing
libssl not loadable — Debian/Ubuntu/Kali libssl is mode 644; fix: sudo chmod +x /usr/lib/x86_64-linux-gnu/libssl.so.3
kprobe tcp_v4_connect attach failed — symbol absent on very old kernels; sensor logs warn-and-continue, io_uring TCP connects on this path go unobserved

Differ produces too many false positives

In order of preference:

Allowlist the recurring noise.
Promote a known-good run as baseline.
Tighten path normalization in internal/orchestrator/differ/normalize.go — source edit, rebuild.

Backup + restore

SQLite — copy var/lib/fangs/fangs.db while orchestrator isn't running, or use sqlite3 .backup for hot backups.

Postgres — standard pg_dump / pg_restore.

The DB is the only persistent state. Nothing else needs preserving between hosts.

Stopping cleanly

# Orchestrator: SIGINT/SIGTERM cleanly stops the HTTP listener.
sudo systemctl stop fangs-orchestrator
# Runner: SIGTERM cleans up the per-run cgroup parent before exit.
sudo systemctl stop fangs-runner

Hard kill (SIGKILL) leaves orphan cgroups at /sys/fs/cgroup/.../fangs/<run_id>/. Remove with sudo rmdir once the container exited.

Retention

Raw events are auto-pruned after -retention-days (default 90). Pruner runs daily; first prune fires 30s after orchestrator startup so operators can verify it's wired.

Deviation-evidence events are PINNED forever — the "click for evidence" link on historical findings keeps working past the retention horizon.

Baselines never expire. They're the load-bearing state.

Disable pruning entirely: -retention-days=0.

Operating

Operating

Mental model

Adding a package to the watch list

One-off scans

Reviewing pending findings

Triaging a deviation

Allowlisting

Promoting a baseline manually

Configuring notifiers

Inspecting the runtime

Common failures

"No runners registered"

Stale runners on /ui/

events_dropped > 0 on the dashboard

"container exit_code=137"

"container exit_code=125"

"AddCgroup: cgroup id already registered"

Sensor probes failing to attach

Differ produces too many false positives

Backup + restore

Stopping cleanly

Retention

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Clone this wiki locally

Stale runners on `/ui/`

`events_dropped` > 0 on the dashboard