Skip to content

API Reference

cyb3rjerry edited this page May 23, 2026 · 1 revision

API Reference

The orchestrator's HTTP surface. All endpoints under /v1/. JSON bodies, JSON responses, plain HTTP by default (HTTPS + optional mTLS when TLS flags set).

Source: internal/orchestrator/api/server.go.

Authentication

v1 — none for the API layer itself. Two security envelopes available:

  • Localhost bind (default -addr 127.0.0.1:8443) — network is the trust boundary.
  • mTLS-tls-cert + -tls-key + -tls-client-ca. Every request must present a client cert signed by the configured CA. See TLS-mTLS.

There is NO bearer-token auth mode today. If you need API-key auth, that's a v2 item; for now, mTLS or trust the network.

GET /v1/health

Liveness probe. Used by load balancers, kubernetes readiness checks, or just curl to sanity-check the orchestrator is up.

Response:

{
  "status": "ok",
  "orchestrator_id": "fangs-orchestrator",
  "version": "dev"
}

POST /v1/runners/register

A runner introduces itself at startup. Idempotent — re-registering the same runner_id replaces the prior registration record.

Request body:

{
  "runner_id": "worker-1",
  "hostname": "fangs-runner-01",
  "capabilities": ["sensor", "sandbox.docker"],
  "kernel_version": "6.17.0-azure",
  "proto_version": 1
}

Response:

{
  "ok": true,
  "orchestrator_id": "fangs-orchestrator",
  "job_poll_interval": 5000000000,
  "heartbeat_interval": 30000000000
}

The intervals are nanoseconds (Go's time.Duration JSON encoding). The runner uses these to drive its long-poll cadence + heartbeat goroutine.

POST /v1/runners/{id}/heartbeat

Periodic ping. The runner sends one every heartbeat_interval (30s default). The orchestrator updates LastSeen + optional status fields.

Request body (all fields optional):

{
  "runner_id": "worker-1",
  "active_run_id": "18b2089cca6d1186...",
  "status": "running",
  "events_queued": 5
}

Response on known runner:

{"ok": true}

Response on UNKNOWN runner (e.g. orchestrator restarted + lost the in-memory registration):

{"ok": false, "unknown_runner": true}

The runner detects unknown_runner: true and re-registers automatically.

GET /v1/runners/{id}/jobs

Long-poll for the next job assigned to this runner. Server-side wait caps at 25 seconds; the runner uses a 30s timeout to give the server time to win the race.

Responses:

Status Meaning
200 OK a Job is available; body is the Job JSON
204 No Content wait expired, no work; runner sleeps job_poll_interval then polls again
404 Not Found runner-id isn't registered; runner should re-register

Job body (sandbox_scan):

{
  "run_id": [24, 178, 8, 156, 202, 190, 201, 128, 0, 0, 0, 0, 0, 0, 0, 0],
  "kind": "sandbox_scan",
  "package_name": "axios",
  "version": "1.7.7",
  "watched_paths": [
    {"prefix": "/etc/"},
    {"prefix": "/etc/shadow", "cred_tagged": true},
    ...
  ],
  "duration": 60000000000,
  "sandbox": {
    "image": "node:20-slim",
    "command": ["sh", "-c", "cd /tmp && mkdir -p t && cd t && npm init -y >/dev/null && npm install axios@1.7.7 2>&1 | tail; sleep 2"],
    "network_mode": "bridge",
    "pull_policy": "missing",
    "user": "0:0",
    "grace_period": 2000000000,
    "cgroup_parent": "/fangs/<runid>"
  },
  "dispatched_at": "2026-05-22T01:23:45.678Z"
}

run_id is a 16-byte ULID-shaped identifier. The runner hex-encodes it for use in subsequent endpoint URLs.

POST /v1/scans

Operator queues a scan. The orchestrator inserts a runs row with state=pending and dispatches via the in-memory queue.

Request body:

{
  "target_runner": "worker-1",
  "Job": {
    "kind": "sandbox_scan",
    "package_name": "lodash",
    "version": "4.18.1",
    "duration": 60000000000,
    "sandbox": { /* same shape as above */ }
  }
}

Notably absent from Job: watched_paths. When the request omits them, the orchestrator stamps in its configured defaults (from config/orchestrator.yaml) so config edits take effect for every caller without changes on the caller side. Pass an explicit watched_paths to override.

Response:

{
  "queued": true,
  "run_id": "18b2089cca6d11860000000000000000"
}

run_id is hex-encoded so it's URL/log-friendly.

Errors:

Status Meaning
400 Bad Request malformed JSON / missing required fields
404 Not Found target_runner isn't registered
500 Internal Server Error storage write failed

POST /v1/runs/{run_id}/events

Runner streams NDJSON event batches to this endpoint. One EventBatch per JSON object, separated by newlines.

EventBatch shape (one JSON object per line):

{
  "seq": 1,
  "events": [
    {"type": 1, "payload": {"Header": {...}, "Path": [...], "PathName": "/etc/passwd", ...}},
    {"type": 3, "payload": {...}}
  ]
}

type is the event-type discriminator (1=file_access, 2=exec, 3=net_connect, 4=dns_query, 5=tls_sni). payload is the typed event with the runner's parsed string fields already filled in (PathName, CommStr, DestIP, etc.).

The orchestrator persists each batch in a transaction, returns the running total + persistence status:

{
  "received_batches": 3,
  "received_events": 38,
  "persisted": 38
}

The stream is kept open until the runner closes its side (EOF). On close, the orchestrator schedules the Differ run for this run_id via the 2-second debounce.

POST /v1/runs/{run_id}/result

Runner posts the final scan result. Transitions the run's state to done | failed | timeout, stamps finished_at, stores the metrics.

Request body:

{
  "run_id": [24, 178, 8, 156, ...],
  "status": "ok",
  "reason": "",
  "events_emitted": 312,
  "events_dropped": 0,
  "duration": 4500000000
}

status values:

Value When
ok container exited cleanly (exit_code=0) within duration
failed non-zero exit, sandbox launch error, AddCgroup error
timeout duration elapsed before container exit

Response:

{"recorded": true}

Errors:

Status Meaning
400 Bad Request bad status value, malformed body
404 Not Found run_id not in DB (runner sent result for a run the orchestrator never saw)
500 storage write failed

Wire types

All structured types live in internal/shared/proto/. JSON encoding follows Go's defaults (CamelCase fields, time.Duration as nanoseconds, byte arrays as integer-arrays). The runner's parsed string fields use snake_case via JSON tags on the wrapper types.

Versioning

proto.CurrentProtoVersion = 1. The register endpoint accepts any proto_version today, but a future v2 may reject mismatched runners.

Rate limits

None enforced. The long-poll endpoint's 25s server-side wait + the runner's 5s poll-interval naturally cap query volume per runner to ~12/min when idle.

TLS / mTLS

When -tls-cert + -tls-key set, every endpoint switches to HTTPS. When -tls-client-ca also set, every connection must present a client cert signed by that CA — registered subjects then become the runner identity. See TLS-mTLS for the full lifecycle.

Clone this wiki locally