-
Notifications
You must be signed in to change notification settings - Fork 1
Architecture
The system design behind FANGS. Read this when you want to understand WHY the pieces fit together the way they do. For operator-facing material see Installation, Configuration, Operating.
FANGS is a delta detector, not a malware detector. We never classify "is this code malicious." We only answer "did this run behave differently from the last N versions of this same package."
Every component is shaped by that goal. The hard work that goes into a malware classifier (AV signatures, sandbox-evasion games, ML classification arms races) gets skipped; the easy work of comparing sequences of syscalls becomes the entire product.
flowchart TB
subgraph EXT["External"]
NPM["npm registry"]
WH["Webhook receivers<br/>Slack · Discord · generic"]
PROM["Prometheus scraper"]
end
subgraph CFG["Config / state"]
OCONF["config/orchestrator.yaml<br/>watched_paths · allow"]
DB[("storage<br/>sqlite · postgres")]
end
subgraph ORCH["fangs-orchestrator"]
direction TB
API["HTTP API<br/>scans · events · result · heartbeat"]
W["Watcher<br/>poll 5m"]
D["Differ<br/>fingerprint · diff · auto-promote"]
N["Notifier<br/>per-target retry · HMAC opt-in"]
P["Pruner<br/>retention · stale runners"]
UI["/ui/<br/>read-only dashboard"]
MET["/metrics<br/>Prometheus"]
end
subgraph RNNR["fangs-runner (Linux + Docker)"]
direction TB
AG["agent<br/>register · long-poll · heartbeat · result"]
SB["sandbox driver<br/>hardened Docker container"]
SEN["eBPF sensor<br/>persistent attach · cgroup ancestor walk"]
end
subgraph CLI["fangs (operator CLI)"]
FCLI["package · run · deviation · baseline<br/>notifier · allow · scan submit · pending"]
end
NPM -.poll.-> W
W -- submit job --> API
API -- long-poll --> AG
AG -- POST events / result --> API
SEN -- ringbuf --> AG
SB -- spawn container --> SEN
API --> D
D -- novel finding --> N
N --> WH
OCONF -.startup.-> ORCH
DB <--> ORCH
DB <--> FCLI
MET --> PROM
UI -.browser.-> DB
Two long-running processes (orchestrator + runner), one CLI, one optional config file, one storage backend.
| Component | Process | Trust |
|---|---|---|
| Orchestrator |
fangs-orchestrator (long-running) |
Trusted — operators run it |
| Runner |
fangs-runner (long-running, one per Linux host) |
Trusted but disposable — runs attacker code in sandboxes |
| Sandbox container | child of runner via Docker | Hostile — assume every byte is attacker-controlled |
| Operator CLI |
fangs (short-lived) |
Trusted — talks directly to storage backend |
| eBPF sensor | runs in runner process; probes attached in kernel |
Trusted; cannot be subverted by sandbox code |
The sensor lives in the runner process but its probes run in kernel
space attached to syscall tracepoints. The sandboxed code cannot
subvert what bpf_get_current_* reports about its own syscalls — the
kernel is the source of truth.
Endpoints (plain HTTP by default; HTTPS/mTLS when TLS flags set):
| Method | Path | Purpose |
|---|---|---|
GET |
/v1/health |
liveness probe |
POST |
/v1/runners/register |
runner introduces itself |
POST |
/v1/runners/{id}/heartbeat |
LastSeen refresh + active-run reporting |
GET |
/v1/runners/{id}/jobs |
long-poll for the next job (25s server cap) |
POST |
/v1/scans |
operator (or watcher) queues a scan |
POST |
/v1/runs/{run_id}/events |
runner streams NDJSON event batches |
POST |
/v1/runs/{run_id}/result |
runner posts final ScanResult |
The handler for /v1/scans (SubmitScan) is shared by the HTTP path
and the in-process watcher. It injects the orchestrator's configured
default watched_paths onto jobs that arrived without their own list
— single source of truth for what every scan watches.
Polls registry.npmjs.org every 5 minutes (configurable) for each row
in the packages table. On a new dist-tags.latest, records a release
row and invokes SubmitScan to dispatch a job. Detects "new" via
comparison against packages.last_seen_version, so a fresh
package add followed by an immediate scan doesn't double-queue.
Per D31, v1 supports the official npm registry only; alternate registries (Verdaccio, Artifactory) deferred to v2 via a Registry interface.
Triggered by a 2-second debounce after each event-batch arrival per run. Steps:
-
Load all events for the run + the package's baseline fingerprints.
-
Build a filter from the merged (CDN allowlist + operator allowlist) entries.
-
Extract fingerprints from events. Six categories — see Differ-Rules for details.
-
Apply normalization (PID/temp-file/cacache rewriting) so per-run noise doesn't masquerade as a deviation.
-
Apply the operator + hardcoded allowlists — matching fingerprints get suppressed.
-
Diff against the baseline:
-
First run for the package: seed baseline + mark
is_baseline=true. Zero deviations regardless of behavior. - Subsequent zero-deviation run: auto-promote to baseline.
-
Subsequent any-deviation run: stays unflagged. Each novel
fingerprint becomes a row in the
deviationstable.
Idempotent —
DeleteDeviationsForRunclears prior findings before writing the fresh set. -
First run for the package: seed baseline + mark
Per-run-summary semantics: when the Differ writes ≥1 deviation, one webhook fires per configured + enabled target. Three templates:
-
slack— Block Kit + mobile-friendly fallbacktext -
discord— embed with color-by-severity -
generic— FANGS-native envelope, the SIEM/intake target
Each delivery is logged in the notifications table for audit. Retry
policy: 5 attempts max, exponential backoff (1s × 2^(n-1) ± 25%
jitter), 4xx (non-408/429) is permanent. HMAC opt-in via per-target
secret_env; Slack/Discord skip HMAC (URL is the secret).
See Notifier for template internals + delivery semantics.
In-memory per-runner FIFO queue with long-poll wake channels. Survives the orchestrator's process lifetime; runs queued while a runner is offline are lost (acceptable for v1).
Heartbeat pruner sits beside it: ticks every 30s, evicts runners
whose LastSeen is older than 90s.
Dual backend (sqlite/ + postgres/) behind one Backend interface.
Embedded migrations under migrations/{sqlite,postgres}/. A shared
contract suite (storagetest/) runs against both backends in CI.
See Storage-Schema for every table + every column.
Counters + gauges registered at startup; mounted at /metrics on the
same listener as the API. Cardinality stays bounded — labels are
enumerations, never operator-supplied strings.
See Metrics for every series.
Server-rendered Go templates. No JS framework, no build step. A tiny
vanilla JS file (static/refresh.js, ~80 lines) gives the dashboard
two niceties:
-
Auto-refresh for time-changing pages — any element with
data-refresh-urlre-fetches on an interval, optional CSS-selector extract for partial swaps. Overview refreshes every 5s; pending queue every 10s. -
In-place chip-filter navigation — clicks on
a.chipinside adata-dynamic-navcontainer fetch the URL, extract the same container from the response, swap innerHTML,pushStatethe URL. Expanded<details>are snapshotted bydata-pidpre-swap and re-opened post-swap so filters don't collapse the rows you were looking at.
Pages: /ui/ overview, /ui/packages/{name}, /ui/runs/{id},
/ui/runs/{id}/lineage (indented process tree),
/ui/deviations[/{id}], /ui/events/{id}, /ui/pending (triage
queue), /ui/allowlist, /ui/notifiers, /ui/config.
Read-only by design — every state change goes through the fangs CLI.
Owns the registration handshake, the long-poll job loop, the event streamer, the heartbeat ticker, and the final ScanResult POST. The event streamer batches at 250 ms or 64 events (whichever first) into NDJSON; sequence numbers attached for future per-run dedup.
Spawns one container per scan via stdlib net/http against
/var/run/docker.sock (no Docker SDK). Hardened HostConfig — see
Configuration for the full list.
The runner pre-creates /sys/fs/cgroup/.../fangs/<run_id>/, registers
its inode in the BPF CGMAP, then launches the container with
CgroupParent set to that path. The container's processes nest under
the registered cgroup, and the BPF lookup_cgroup walks ancestors so
events fire from the very first syscall in the container — no race
window between docker-start and sensor-attach.
Probes attach ONCE at runner startup (Sensor.New). Per-job:
AddCgroup populates the in-kernel filter; the container runs nested
under that cgroup; events stream out via a 64 MB ringbuf;
RemoveCgroup on teardown.
See Sensor-Probes for the full probe table + per-probe semantics.
Binary event types matching the BPF C structs (OpenatEvent,
ExecEvent, etc.) + JSON-encoded protocol types (Job, ScanResult,
RunnerRegistration, Heartbeat, EventEnvelope). The runner does
the parsing pass that adds human-readable string fields (PathName,
CommStr, DestIP, SNI, QName, BinaryPathStr) before streaming
events upstream.
sequenceDiagram
participant W as Watcher
participant API as Orchestrator API
participant DISP as Dispatcher
participant R as Runner
participant SEN as eBPF Sensor
participant DB as Storage
participant DIF as Differ
participant NOT as Notifier
participant OP as Operator
W->>API: GET registry.npmjs.org/lodash<br/>(every 5m)
API->>DB: UpdatePackageCheck, RecordRelease
W->>API: SubmitScan(lodash@4.18.2)
API->>DISP: QueueJob(target_runner, job)
R->>API: GET /v1/runners/r1/jobs (long-poll)
DISP-->>R: Job
R->>R: CreateParentCgroup, AddCgroup(sensor)
R->>R: docker run node:20-slim<br/>npm install lodash@4.18.2
SEN-->>R: events (file/exec/net/dns/tls)
R->>API: POST /v1/runs/<id>/events (NDJSON)
API->>DB: AppendEvents
R->>API: POST /v1/runs/<id>/result<br/>(ok, 312 emitted, 0 dropped)
API->>DB: RecordScanResult
Note over API,DIF: 2s debounce after last batch
API->>DIF: AnalyzeRun(run_id)
DIF->>DB: ListEventsByRun, LoadBaseline
DIF->>DIF: Extract fingerprints, apply allowlist, diff
DIF->>DB: WriteDeviations [3 rows]
DIF->>NOT: Trigger(run_id)
NOT->>NOT: Render per-target template
NOT->>OP: webhook (Slack) — "lodash@4.18.2: 3 findings"
OP->>API: GET /ui/pending (or fangs pending)
API->>OP: Run row + 3 findings + promote command
OP->>DB: fangs baseline promote <run> OR investigate
sequenceDiagram
participant OP as Operator
participant CLI as fangs CLI
participant NPM as npm registry
participant API as Orchestrator API
participant DB as Storage
OP->>CLI: fangs scan submit -package axios -version 1.7.7
CLI->>NPM: GET /axios (validate version exists)
NPM-->>CLI: 200 + metadata
CLI->>API: POST /v1/scans (job WITHOUT watched_paths)
API->>API: SubmitScan — stamps default watched_paths from config
API->>DB: CreateRun (state=pending)
API-->>CLI: {queued: true, run_id: 18b2...}
CLI-->>OP: queued run_id=18b2089cca... watch: /ui/runs/18b2089cca...
pending ← CreateRun on /v1/scans
↓
(runner picks up via long-poll, runs the sandbox)
↓
(events stream into /v1/runs/<id>/events; Differ runs 2s after last batch)
↓
─── if any deviation: deviation rows written, runs.is_baseline=false ───
─── if zero: is_baseline=true (auto-promote) ───
↓
done | failed | timeout ← RecordScanResult finalizes the row
done, failed, and timeout are the terminal states. failed
carries a failure_reason. The runner POSTs the final state via
/v1/runs/<id>/result so transitions are runner-driven, not inferred
from event-stream EOF.
- Not an in-product malware classifier. No signatures, no heuristics, no ML on event streams. The whole product is delta-vs- baseline. If you need malware classification, feed deviations into your existing AV pipeline as the "look-at-this" signal.
-
Not a runtime-protection agent. FANGS observes sandboxed scans;
it doesn't gate
npm installon production hosts or block CI. - Not a multi-tenant system. One DB, one watch list. Add tenant columns when team adoption forces the question.
- Not magic at first-watch. The first run for a package becomes baseline regardless of what it observes. Bringing your own trust at watch-add time is a load-bearing assumption. v2 may require K consecutive zero-deviation runs before auto-promotion.