From 5eb2f4e7bdb175713b9a452dd7ea1384480ce39a Mon Sep 17 00:00:00 2001 From: Cursor Agent Date: Sat, 2 May 2026 11:33:15 +0000 Subject: [PATCH 1/2] docs(security): correct POST /v1/events trust boundary in http-api.md and SECURITY.md MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit The auth table in docs/http-api.md labelled POST /v1/events as "loopback only" with a footnote, which implied server-side enforcement. In reality, server/routes/ingest.py has no host or token check — only POST /v1/promote and POST /v1/rollback call _require_mutation_access in actions.py. - Update the auth table to mark POST /v1/events as "open" with a footnote that explains no server-side gate exists; security relies on bind address + network topology. - Update SECURITY.md to be explicit that ingest and diff are open to any caller that can reach the server, and that network-layer controls are needed when --host 0.0.0.0 is used. Co-authored-by: Gottam Sai Bharath --- SECURITY.md | 4 +++- docs/http-api.md | 10 +++++++--- 2 files changed, 10 insertions(+), 4 deletions(-) diff --git a/SECURITY.md b/SECURITY.md index 08307ee..50405de 100644 --- a/SECURITY.md +++ b/SECURITY.md @@ -36,4 +36,6 @@ See **[CONTRIBUTING.md](CONTRIBUTING.md)** for a pre-push checklist aligned with ## Local HTTP API (`flightdeck serve`) -The bundled server is intended for **local development and demos**. **`POST /v1/promote`** and **`POST /v1/rollback`** are gated so that, with no token configured, only **loopback** clients can invoke them. If you set **`FLIGHTDECK_LOCAL_API_TOKEN`**, every mutation request must include **`Authorization: Bearer `**; use a strong random value and treat it like a local secret. Do not expose **`flightdeck serve`** on untrusted networks without understanding that **`POST /v1/events`** and **`POST /v1/diff`** are not behind the same Bearer gate (ingest and diff are still local-trust assumptions). +The bundled server is intended for **local development and demos**. **`POST /v1/promote`** and **`POST /v1/rollback`** are gated in server code so that, with no token configured, only **loopback** clients (`127.0.0.1`, `::1`, `localhost`) can invoke them. If you set **`FLIGHTDECK_LOCAL_API_TOKEN`**, every mutation request must include **`Authorization: Bearer `**; use a strong random value and treat it like a local secret. + +**`POST /v1/events`** and **`POST /v1/diff`** have **no server-side host or token check** in `server/routes/ingest.py` and `server/routes/actions.py`. They are open to any caller that can reach the server. When `flightdeck serve` binds to `127.0.0.1` (the default), this is safe by network topology. If you use `--host 0.0.0.0` or bind to a non-loopback address, event ingest and diff become reachable from any client. Protect them at the network layer (firewall / reverse proxy) if the server is exposed on a shared or public network. diff --git a/docs/http-api.md b/docs/http-api.md index 40babbd..232cbd8 100644 --- a/docs/http-api.md +++ b/docs/http-api.md @@ -23,13 +23,17 @@ Two access tiers: |-------|--------------------|---------------------------------| | `GET /health` | open | open | | `GET /v1/*` (reads) | open | open | -| `POST /v1/events` | loopback only† | open (no Bearer required) | +| `POST /v1/events` | open† | open (no Bearer required) | | `POST /v1/diff` | open | open | | `POST /v1/promote` | loopback only | `Authorization: Bearer ` required | | `POST /v1/rollback` | loopback only | `Authorization: Bearer ` required | -†`POST /v1/events` is not behind the Bearer gate but the server only listens on loopback - by default, so it remains local-only unless `--host` is overridden. +†`POST /v1/events` has **no server-side loopback or token gate** in code + (`server/routes/ingest.py`). Only `POST /v1/promote` and `POST /v1/rollback` call + `_require_mutation_access`. When the server binds to `127.0.0.1` (the default), ingest is + effectively local-only by network topology, not by application enforcement. If you bind + `--host 0.0.0.0`, event ingest becomes reachable from any host. Protect it at the network + layer (firewall / reverse proxy) if that is a concern. ```bash export FLIGHTDECK_LOCAL_API_TOKEN="$(openssl rand -hex 32)" From 9448b4d54e19ea384ea790f3d15c1811cbbaead1 Mon Sep 17 00:00:00 2001 From: Cursor Agent Date: Sat, 2 May 2026 11:33:25 +0000 Subject: [PATCH 2/2] docs: expand developer and operations reference docs DEVELOPMENT.md: - Add a dedicated "What flightdeck-quickstart-verify does" subsection that documents the 11-step quickstart workflow run in CI, subprocess error handling, executable resolution strategy (PATH vs sys.executable fallback), and the isolated temp-dir approach. docs/operations-and-policy.md: - Add "Server initialization: lifespan vs. ensure_app_state" section under the architecture diagram: documents how create_app lifespan initializes app.state (cfg, storage, local_api_token), how ensure_app_state lazily re-initializes on first request when state is absent (e.g. in tests), the cwd-at-first-request implication for flightdeck.yaml resolution, and the "testclient" host in _LOCAL_CLIENT_HOSTS. - Add "Operational runbook" section at the end with: - SQLite SQLITE_BUSY causes and remedies - Backup and restore procedure (offline copy + WAL checkpoint pattern) - doctor failure interpretation table for each check with specific fix guidance - Inline sqlite3 query for audit_seq inspection Co-authored-by: Gottam Sai Bharath --- DEVELOPMENT.md | 25 ++++++++++ docs/operations-and-policy.md | 92 +++++++++++++++++++++++++++++++++++ 2 files changed, 117 insertions(+) diff --git a/DEVELOPMENT.md b/DEVELOPMENT.md index e2c432f..aaa1a7f 100644 --- a/DEVELOPMENT.md +++ b/DEVELOPMENT.md @@ -56,6 +56,31 @@ Match **CI**’s CLI smoke: **`flightdeck --help`** must run successfully after Full command flags and exit codes: [README.md](https://github.com/flightdeckdev/flightdeck/blob/main/README.md). Cross-platform quickstart parity: **`flightdeck-quickstart-verify`** / **`python -m flightdeck.quickstart_smoke`** (also run in CI). HTTP API reference: **[docs/http-api.md](docs/http-api.md)**. Python SDK: **[docs/sdk.md](docs/sdk.md)**. +### What `flightdeck-quickstart-verify` does + +`flightdeck-quickstart-verify` (entry point for `src/flightdeck/quickstart_smoke.py`) runs the full +quickstart workflow end-to-end in an isolated temp directory: + +1. `flightdeck init` +2. Import both pricing tables from `examples/quickstart/` +3. `flightdeck policy set` +4. Register baseline and candidate releases — capture the `release_id` printed to stdout +5. Substitute `__BASELINE_RELEASE_ID__` / `__CANDIDATE_RELEASE_ID__` placeholders in the + quickstart JSONL event files and write them to the temp directory +6. `flightdeck runs ingest` for both event files +7. `flightdeck release diff` (7-day window) +8. `flightdeck release promote` baseline → `local` +9. `flightdeck release history` +10. `flightdeck release verify` (checksum check against the on-disk bundle) +11. `flightdeck doctor` + +All subprocesses use `subprocess.run(..., check=True)`. Any non-zero exit prints stderr and causes +the verifier to exit non-zero. On success it prints `quickstart_smoke: OK`. + +**Executable resolution:** prefers `flightdeck` on `PATH` (`shutil.which`); falls back to +`sys.executable -m flightdeck.cli.main` so it works inside a bare `uv run` context without a +console-scripts install. + **JSON Schemas:** when **`src/flightdeck/`** models or **`scripts/generate_schemas.py`** change wire contracts, regenerate and match CI: ```bash diff --git a/docs/operations-and-policy.md b/docs/operations-and-policy.md index b9107dc..eefc9d3 100644 --- a/docs/operations-and-policy.md +++ b/docs/operations-and-policy.md @@ -34,6 +34,33 @@ The three primary functions: All raise `OperationError` (a `ValueError` subclass) for user-visible problems. The CLI maps these to `click.ClickException`; the HTTP layer maps them to HTTP 400. +### Server initialization: lifespan vs. `ensure_app_state` + +`server/app.py` registers a FastAPI **lifespan** handler that runs at startup: + +```python +cfg = load_config() # reads flightdeck.yaml from cwd +storage = Storage(cfg.db_path) +storage.migrate() +app.state.cfg = cfg +app.state.storage = storage +app.state.local_api_token = os.environ.get("FLIGHTDECK_LOCAL_API_TOKEN") +``` + +Every request handler then calls `ensure_app_state(request)` from +`server/routes/common.py`. That function returns `(cfg, storage)` immediately if +`app.state.cfg` and `app.state.storage` are already set. If they are **not** set (e.g. in +tests that construct the app without going through the full lifespan, or in unusual embedding +scenarios), it re-runs the same load-and-migrate sequence and stores the results on +`app.state`. This lazy fallback means tests can call routes without starting uvicorn, but +it also means the working directory at **first request time** determines which +`flightdeck.yaml` is loaded, not the directory at process start. + +`_require_mutation_access` (called by `POST /v1/promote` and `POST /v1/rollback`) reads +`request.app.state.local_api_token` set during lifespan or lazy init. The test client host +`"testclient"` is included in `_LOCAL_CLIENT_HOSTS` alongside loopback addresses so that +integration tests can call mutation routes without a Bearer token. + --- ## `compute_diff` @@ -390,3 +417,68 @@ corresponding check in `test_schemas.py` (or `test_doctor.py`). | `Reason is required for promote/rollback actions` | Empty `--reason` flag | Provide a non-empty `--reason` | | `No promoted release exists for this agent/environment; nothing to roll back to` | Trying to roll back with no baseline | Promote a release first | | `Workspace config not found: flightdeck.yaml` | Missing `flightdeck.yaml` | `flightdeck init` | + +--- + +## Operational runbook + +### SQLite `SQLITE_BUSY` errors + +FlightDeck uses WAL mode with a 5-second busy timeout (see [Storage connection settings](#storage-connection-settings)). `SQLITE_BUSY` occurs when a write lock is held longer than 5 seconds. + +**Typical causes:** + +- Another `flightdeck serve` or CLI command is running a long `BEGIN IMMEDIATE` transaction. +- The database file is on a network filesystem that does not support `LOCK_EX` correctly + (WAL mode requires byte-range locking). +- OS-level anti-virus or backup software has the file open. + +**Remedies:** + +1. Ensure only one writer is active at a time (CLI and server share the same DB file). +2. Move `db_path` to a local filesystem if you see persistent locking issues on NFS or SMB. +3. For batch operations that hit the limit, reduce parallelism — FlightDeck is designed for + single-user local use, not concurrent writers. + +### Backup and restore + +The full FlightDeck state lives in two places: + +- `flightdeck.yaml` — workspace config (safe to version-control; contains no secrets) +- `.flightdeck/flightdeck.db` — SQLite database (gitignored by default) + +**Backup** (safe copy while the server is not running): + +```bash +cp .flightdeck/flightdeck.db .flightdeck/flightdeck.db.bak +``` + +**Backup with WAL checkpoint** (safe while the server is running; ensures WAL is flushed): + +```bash +sqlite3 .flightdeck/flightdeck.db "PRAGMA wal_checkpoint(FULL);" +cp .flightdeck/flightdeck.db .flightdeck/flightdeck.db.bak +``` + +**Restore:** stop the server, replace `flightdeck.db` with the backup, restart. + +```bash +cp .flightdeck/flightdeck.db.bak .flightdeck/flightdeck.db +``` + +After restore, run `flightdeck doctor` to confirm integrity. + +### Interpreting `flightdeck doctor` failures + +| Check | Failure message | Meaning | Fix | +|-------|----------------|---------|-----| +| `schema_migrations` | `migrations applied=[1, 2] but expected 1..3` | A newer migration has not run (DB was created by an older version) | Run `flightdeck doctor` again (it calls `migrate()` at start); if it still fails, the DB file may be from a version with a different schema history | +| `promoted_pointer::` | `release_id=rel_... not found in releases` | A promoted pointer references a deleted or never-registered release | Re-register the release with the same ID (not supported) or reset the promoted pointer by promoting a known good release | +| `audit_seq` | `gap at seq=5` or `duplicate seq=3` | The `release_actions` table has a missing or duplicate `audit_seq` | Indicates a manual DB edit or incomplete write; restore from backup and reinspect the affected rows with `sqlite3` | + +For the `audit_seq` gap case, you can inspect the table directly: + +```bash +sqlite3 .flightdeck/flightdeck.db \ + "SELECT audit_seq, action, release_id, created_at FROM release_actions ORDER BY audit_seq;" +```