Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
25 changes: 25 additions & 0 deletions DEVELOPMENT.md
Original file line number Diff line number Diff line change
Expand Up @@ -56,6 +56,31 @@ Match **CI**’s CLI smoke: **`flightdeck --help`** must run successfully after

Full command flags and exit codes: [README.md](https://github.com/flightdeckdev/flightdeck/blob/main/README.md). Cross-platform quickstart parity: **`flightdeck-quickstart-verify`** / **`python -m flightdeck.quickstart_smoke`** (also run in CI). HTTP API reference: **[docs/http-api.md](docs/http-api.md)**. Python SDK: **[docs/sdk.md](docs/sdk.md)**.

### What `flightdeck-quickstart-verify` does

`flightdeck-quickstart-verify` (entry point for `src/flightdeck/quickstart_smoke.py`) runs the full
quickstart workflow end-to-end in an isolated temp directory:

1. `flightdeck init`
2. Import both pricing tables from `examples/quickstart/`
3. `flightdeck policy set`
4. Register baseline and candidate releases — capture the `release_id` printed to stdout
5. Substitute `__BASELINE_RELEASE_ID__` / `__CANDIDATE_RELEASE_ID__` placeholders in the
quickstart JSONL event files and write them to the temp directory
6. `flightdeck runs ingest` for both event files
7. `flightdeck release diff` (7-day window)
8. `flightdeck release promote` baseline → `local`
9. `flightdeck release history`
10. `flightdeck release verify` (checksum check against the on-disk bundle)
11. `flightdeck doctor`

All subprocesses use `subprocess.run(..., check=True)`. Any non-zero exit prints stderr and causes
the verifier to exit non-zero. On success it prints `quickstart_smoke: OK`.

**Executable resolution:** prefers `flightdeck` on `PATH` (`shutil.which`); falls back to
`sys.executable -m flightdeck.cli.main` so it works inside a bare `uv run` context without a
console-scripts install.

**JSON Schemas:** when **`src/flightdeck/`** models or **`scripts/generate_schemas.py`** change wire contracts, regenerate and match CI:

```bash
Expand Down
4 changes: 3 additions & 1 deletion SECURITY.md
Original file line number Diff line number Diff line change
Expand Up @@ -36,4 +36,6 @@ See **[CONTRIBUTING.md](CONTRIBUTING.md)** for a pre-push checklist aligned with

## Local HTTP API (`flightdeck serve`)

The bundled server is intended for **local development and demos**. **`POST /v1/promote`** and **`POST /v1/rollback`** are gated so that, with no token configured, only **loopback** clients can invoke them. If you set **`FLIGHTDECK_LOCAL_API_TOKEN`**, every mutation request must include **`Authorization: Bearer <that value>`**; use a strong random value and treat it like a local secret. Do not expose **`flightdeck serve`** on untrusted networks without understanding that **`POST /v1/events`** and **`POST /v1/diff`** are not behind the same Bearer gate (ingest and diff are still local-trust assumptions).
The bundled server is intended for **local development and demos**. **`POST /v1/promote`** and **`POST /v1/rollback`** are gated in server code so that, with no token configured, only **loopback** clients (`127.0.0.1`, `::1`, `localhost`) can invoke them. If you set **`FLIGHTDECK_LOCAL_API_TOKEN`**, every mutation request must include **`Authorization: Bearer <that value>`**; use a strong random value and treat it like a local secret.

**`POST /v1/events`** and **`POST /v1/diff`** have **no server-side host or token check** in `server/routes/ingest.py` and `server/routes/actions.py`. They are open to any caller that can reach the server. When `flightdeck serve` binds to `127.0.0.1` (the default), this is safe by network topology. If you use `--host 0.0.0.0` or bind to a non-loopback address, event ingest and diff become reachable from any client. Protect them at the network layer (firewall / reverse proxy) if the server is exposed on a shared or public network.
10 changes: 7 additions & 3 deletions docs/http-api.md
Original file line number Diff line number Diff line change
Expand Up @@ -23,13 +23,17 @@ Two access tiers:
|-------|--------------------|---------------------------------|
| `GET /health` | open | open |
| `GET /v1/*` (reads) | open | open |
| `POST /v1/events` | loopback only† | open (no Bearer required) |
| `POST /v1/events` | open† | open (no Bearer required) |
| `POST /v1/diff` | open | open |
| `POST /v1/promote` | loopback only | `Authorization: Bearer <token>` required |
| `POST /v1/rollback` | loopback only | `Authorization: Bearer <token>` required |

†`POST /v1/events` is not behind the Bearer gate but the server only listens on loopback
by default, so it remains local-only unless `--host` is overridden.
†`POST /v1/events` has **no server-side loopback or token gate** in code
(`server/routes/ingest.py`). Only `POST /v1/promote` and `POST /v1/rollback` call
`_require_mutation_access`. When the server binds to `127.0.0.1` (the default), ingest is
effectively local-only by network topology, not by application enforcement. If you bind
`--host 0.0.0.0`, event ingest becomes reachable from any host. Protect it at the network
layer (firewall / reverse proxy) if that is a concern.

```bash
export FLIGHTDECK_LOCAL_API_TOKEN="$(openssl rand -hex 32)"
Expand Down
92 changes: 92 additions & 0 deletions docs/operations-and-policy.md
Original file line number Diff line number Diff line change
Expand Up @@ -34,6 +34,33 @@ The three primary functions:
All raise `OperationError` (a `ValueError` subclass) for user-visible problems. The CLI
maps these to `click.ClickException`; the HTTP layer maps them to HTTP 400.

### Server initialization: lifespan vs. `ensure_app_state`

`server/app.py` registers a FastAPI **lifespan** handler that runs at startup:

```python
cfg = load_config() # reads flightdeck.yaml from cwd
storage = Storage(cfg.db_path)
storage.migrate()
app.state.cfg = cfg
app.state.storage = storage
app.state.local_api_token = os.environ.get("FLIGHTDECK_LOCAL_API_TOKEN")
```

Every request handler then calls `ensure_app_state(request)` from
`server/routes/common.py`. That function returns `(cfg, storage)` immediately if
`app.state.cfg` and `app.state.storage` are already set. If they are **not** set (e.g. in
tests that construct the app without going through the full lifespan, or in unusual embedding
scenarios), it re-runs the same load-and-migrate sequence and stores the results on
`app.state`. This lazy fallback means tests can call routes without starting uvicorn, but
it also means the working directory at **first request time** determines which
`flightdeck.yaml` is loaded, not the directory at process start.

`_require_mutation_access` (called by `POST /v1/promote` and `POST /v1/rollback`) reads
`request.app.state.local_api_token` set during lifespan or lazy init. The test client host
`"testclient"` is included in `_LOCAL_CLIENT_HOSTS` alongside loopback addresses so that
integration tests can call mutation routes without a Bearer token.

---

## `compute_diff`
Expand Down Expand Up @@ -390,3 +417,68 @@ corresponding check in `test_schemas.py` (or `test_doctor.py`).
| `Reason is required for promote/rollback actions` | Empty `--reason` flag | Provide a non-empty `--reason` |
| `No promoted release exists for this agent/environment; nothing to roll back to` | Trying to roll back with no baseline | Promote a release first |
| `Workspace config not found: flightdeck.yaml` | Missing `flightdeck.yaml` | `flightdeck init` |

---

## Operational runbook

### SQLite `SQLITE_BUSY` errors

FlightDeck uses WAL mode with a 5-second busy timeout (see [Storage connection settings](#storage-connection-settings)). `SQLITE_BUSY` occurs when a write lock is held longer than 5 seconds.

**Typical causes:**

- Another `flightdeck serve` or CLI command is running a long `BEGIN IMMEDIATE` transaction.
- The database file is on a network filesystem that does not support `LOCK_EX` correctly
(WAL mode requires byte-range locking).
- OS-level anti-virus or backup software has the file open.

**Remedies:**

1. Ensure only one writer is active at a time (CLI and server share the same DB file).
2. Move `db_path` to a local filesystem if you see persistent locking issues on NFS or SMB.
3. For batch operations that hit the limit, reduce parallelism — FlightDeck is designed for
single-user local use, not concurrent writers.

### Backup and restore

The full FlightDeck state lives in two places:

- `flightdeck.yaml` — workspace config (safe to version-control; contains no secrets)
- `.flightdeck/flightdeck.db` — SQLite database (gitignored by default)

**Backup** (safe copy while the server is not running):

```bash
cp .flightdeck/flightdeck.db .flightdeck/flightdeck.db.bak
```

**Backup with WAL checkpoint** (safe while the server is running; ensures WAL is flushed):

```bash
sqlite3 .flightdeck/flightdeck.db "PRAGMA wal_checkpoint(FULL);"
cp .flightdeck/flightdeck.db .flightdeck/flightdeck.db.bak
```

**Restore:** stop the server, replace `flightdeck.db` with the backup, restart.

```bash
cp .flightdeck/flightdeck.db.bak .flightdeck/flightdeck.db
```

After restore, run `flightdeck doctor` to confirm integrity.

### Interpreting `flightdeck doctor` failures

| Check | Failure message | Meaning | Fix |
|-------|----------------|---------|-----|
| `schema_migrations` | `migrations applied=[1, 2] but expected 1..3` | A newer migration has not run (DB was created by an older version) | Run `flightdeck doctor` again (it calls `migrate()` at start); if it still fails, the DB file may be from a version with a different schema history |
| `promoted_pointer:<agent>:<env>` | `release_id=rel_... not found in releases` | A promoted pointer references a deleted or never-registered release | Re-register the release with the same ID (not supported) or reset the promoted pointer by promoting a known good release |
| `audit_seq` | `gap at seq=5` or `duplicate seq=3` | The `release_actions` table has a missing or duplicate `audit_seq` | Indicates a manual DB edit or incomplete write; restore from backup and reinspect the affected rows with `sqlite3` |

For the `audit_seq` gap case, you can inspect the table directly:

```bash
sqlite3 .flightdeck/flightdeck.db \
"SELECT audit_seq, action, release_id, created_at FROM release_actions ORDER BY audit_seq;"
```
Loading