-
Notifications
You must be signed in to change notification settings - Fork 1
API Reference
The orchestrator's HTTP surface. All endpoints under /v1/. JSON
bodies, JSON responses, plain HTTP by default (HTTPS + optional mTLS
when TLS flags set).
Source: internal/orchestrator/api/server.go.
v1 — none for the API layer itself. Two security envelopes available:
-
Localhost bind (default
-addr 127.0.0.1:8443) — network is the trust boundary. -
mTLS —
-tls-cert+-tls-key+-tls-client-ca. Every request must present a client cert signed by the configured CA. See TLS-mTLS.
There is NO bearer-token auth mode today. If you need API-key auth, that's a v2 item; for now, mTLS or trust the network.
Liveness probe. Used by load balancers, kubernetes readiness checks,
or just curl to sanity-check the orchestrator is up.
Response:
{
"status": "ok",
"orchestrator_id": "fangs-orchestrator",
"version": "dev"
}A runner introduces itself at startup. Idempotent — re-registering the
same runner_id replaces the prior registration record.
Request body:
{
"runner_id": "worker-1",
"hostname": "fangs-runner-01",
"capabilities": ["sensor", "sandbox.docker"],
"kernel_version": "6.17.0-azure",
"proto_version": 1
}Response:
{
"ok": true,
"orchestrator_id": "fangs-orchestrator",
"job_poll_interval": 5000000000,
"heartbeat_interval": 30000000000
}The intervals are nanoseconds (Go's time.Duration JSON encoding).
The runner uses these to drive its long-poll cadence + heartbeat
goroutine.
Periodic ping. The runner sends one every heartbeat_interval (30s
default). The orchestrator updates LastSeen + optional status
fields.
Request body (all fields optional):
{
"runner_id": "worker-1",
"active_run_id": "18b2089cca6d1186...",
"status": "running",
"events_queued": 5
}Response on known runner:
{"ok": true}Response on UNKNOWN runner (e.g. orchestrator restarted + lost the in-memory registration):
{"ok": false, "unknown_runner": true}The runner detects unknown_runner: true and re-registers
automatically.
Long-poll for the next job assigned to this runner. Server-side wait caps at 25 seconds; the runner uses a 30s timeout to give the server time to win the race.
Responses:
| Status | Meaning |
|---|---|
200 OK |
a Job is available; body is the Job JSON |
204 No Content |
wait expired, no work; runner sleeps job_poll_interval then polls again |
404 Not Found |
runner-id isn't registered; runner should re-register |
Job body (sandbox_scan):
{
"run_id": [24, 178, 8, 156, 202, 190, 201, 128, 0, 0, 0, 0, 0, 0, 0, 0],
"kind": "sandbox_scan",
"package_name": "axios",
"version": "1.7.7",
"watched_paths": [
{"prefix": "/etc/"},
{"prefix": "/etc/shadow", "cred_tagged": true},
...
],
"duration": 60000000000,
"sandbox": {
"image": "node:20-slim",
"command": ["sh", "-c", "cd /tmp && mkdir -p t && cd t && npm init -y >/dev/null && npm install axios@1.7.7 2>&1 | tail; sleep 2"],
"network_mode": "bridge",
"pull_policy": "missing",
"user": "0:0",
"grace_period": 2000000000,
"cgroup_parent": "/fangs/<runid>"
},
"dispatched_at": "2026-05-22T01:23:45.678Z"
}run_id is a 16-byte ULID-shaped identifier. The runner hex-encodes
it for use in subsequent endpoint URLs.
Operator queues a scan. The orchestrator inserts a runs row with
state=pending and dispatches via the in-memory queue.
Request body:
{
"target_runner": "worker-1",
"Job": {
"kind": "sandbox_scan",
"package_name": "lodash",
"version": "4.18.1",
"duration": 60000000000,
"sandbox": { /* same shape as above */ }
}
}Notably absent from Job: watched_paths. When the request omits
them, the orchestrator stamps in its configured defaults (from
config/orchestrator.yaml) so config edits take effect for every
caller without changes on the caller side. Pass an explicit
watched_paths to override.
Response:
{
"queued": true,
"run_id": "18b2089cca6d11860000000000000000"
}run_id is hex-encoded so it's URL/log-friendly.
Errors:
| Status | Meaning |
|---|---|
400 Bad Request |
malformed JSON / missing required fields |
404 Not Found |
target_runner isn't registered |
500 Internal Server Error |
storage write failed |
Runner streams NDJSON event batches to this endpoint. One EventBatch
per JSON object, separated by newlines.
EventBatch shape (one JSON object per line):
{
"seq": 1,
"events": [
{"type": 1, "payload": {"Header": {...}, "Path": [...], "PathName": "/etc/passwd", ...}},
{"type": 3, "payload": {...}}
]
}type is the event-type discriminator (1=file_access, 2=exec,
3=net_connect, 4=dns_query, 5=tls_sni). payload is the typed
event with the runner's parsed string fields already filled in
(PathName, CommStr, DestIP, etc.).
The orchestrator persists each batch in a transaction, returns the running total + persistence status:
{
"received_batches": 3,
"received_events": 38,
"persisted": 38
}The stream is kept open until the runner closes its side (EOF). On
close, the orchestrator schedules the Differ run for this run_id
via the 2-second debounce.
Runner posts the final scan result. Transitions the run's state to
done | failed | timeout, stamps finished_at, stores the
metrics.
Request body:
{
"run_id": [24, 178, 8, 156, ...],
"status": "ok",
"reason": "",
"events_emitted": 312,
"events_dropped": 0,
"duration": 4500000000
}status values:
| Value | When |
|---|---|
ok |
container exited cleanly (exit_code=0) within duration |
failed |
non-zero exit, sandbox launch error, AddCgroup error |
timeout |
duration elapsed before container exit |
Response:
{"recorded": true}Errors:
| Status | Meaning |
|---|---|
400 Bad Request |
bad status value, malformed body |
404 Not Found |
run_id not in DB (runner sent result for a run the orchestrator never saw) |
500 |
storage write failed |
All structured types live in internal/shared/proto/. JSON encoding
follows Go's defaults (CamelCase fields, time.Duration as
nanoseconds, byte arrays as integer-arrays). The runner's parsed
string fields use snake_case via JSON tags on the wrapper types.
proto.CurrentProtoVersion = 1. The register endpoint accepts any
proto_version today, but a future v2 may reject mismatched runners.
None enforced. The long-poll endpoint's 25s server-side wait + the runner's 5s poll-interval naturally cap query volume per runner to ~12/min when idle.
When -tls-cert + -tls-key set, every endpoint switches to HTTPS.
When -tls-client-ca also set, every connection must present a client
cert signed by that CA — registered subjects then become the runner
identity. See TLS-mTLS for the full lifecycle.