Skip to content

v0.9.0

Choose a tag to compare

@junghan0611 junghan0611 released this 04 Jun 08:03
· 208 commits to main since this release

0.9.0 is the garden-native identity release. This is not just an Entwurf handle rename: the garden's own denote-style naming scheme is imported into the session layer so Entwurf sessions stop being treated as a separate species of worker artifact. Resident sessions, Entwurf children, and the later 1.0.0 meta-bridge direction all converge on the same garden session ontology — one durable sessionId, one human-readable and machine-parseable name surface, one rule that the session comes first and the transcript file is only its trace.

Changed (breaking — Entwurf public handle)

  • Entwurf public handle is now sessionId, not taskId (atomic migration, Phase 3b of #28 / 0.9.0). The garden-native session id YYYYMMDDTHHMMSS-[0-9a-f]{6} (= JSONL header id) replaces the old 8-hex taskId across the entire local Entwurf public surface in one slice — there is no compatibility shim and a saved-session handle from a pre-migration spawn will not resolve:
    • Spawn (runEntwurfSync + native async runEntwurfAsync): the parent generates the sessionId (generateSessionId), pre-checks for collision (assertSessionIdAvailableForSpawn), builds the denote-style session name (buildSessionName, tagged entwurf), and spawns with pi --session-id <id> --name <name>. The *_entwurf-<taskId>.jsonl filename species is gone — Pi names the file <created-at>_<sessionId>.jsonl.
    • Resume (runEntwurfResumeSync + async spawnEntwurfResumeAsync): looked up by header scan (findSessionFileById), child cwd forced to the saved header cwd, and continued with pi --session-id <id> (appends to the same session). Resume keeps the same durable sessionId; a per-process internal/diagnostic runId distinguishes resume runs (never a public handle).
    • Resume identity authority = first model_change. New readSessionIdentity reads the session's FIRST model_change (provider + modelId) as the locked model identity — not the last assistant message's model field. A later differing model_change (drift) or a corrupt session-name mirror (name sessionId/provider/model disagreeing with the header / first model_change) is SessionIdentityError fail-fast.
    • Entwurf-resume is gated on the entwurf name tag. Since the *_entwurf-<taskId>.jsonl species is gone, "is this an Entwurf session?" is now answered by the session name's entwurf tag (requireEntwurf): a general pi session — no session_info name, a non-canonical name, or a canonical name without the entwurf tag — is refused at resume. No compatibility path.
    • Surfaces migrated together: EntwurfResult.sessionId, formatSyncSummary (Session ID:), native entwurf / entwurf_resume / entwurf_status tool schemas + result text, entwurf-async active map (keyed by sessionId) / ack / completion payloads, the spawn_async_resume control RPC, the MCP bridge entwurf / entwurf_resume Zod schema + descriptions, and the cross-cwd / compaction / async-resume / sentinel / MCP-test smokes.
  • Remote/SSH entwurf is out of scope and fails fast (#11). The garden-native sessionId collision pre-check and header-scan resume are local-filesystem only, so spawn/resume/status with a non-local host throws SessionIdentityError up front. Remote identity is a later phase.

Changed (breaking — resident --entwurf-control session must be garden-native)

  • Every --entwurf-control session is now garden-native or it hard-exits (#28 / 0.9.0 — operator session, not just Entwurf children). Garden identity closes over the operator's own session too: when --entwurf-control is enabled, the session header id MUST be a garden sessionId (YYYYMMDDTHHMMSS-[0-9a-f]{6}). pi mints a uuidv7 when the launcher did not pass --session-id, so a non-garden id means the session was not born through the garden launcher — entwurf-control refuses it at session_start and process.exit(1)s before any model turn. No uuid / back-compat path. (A bare throw or ctx.shutdown() in a session_start handler is swallowed by pi's extension runner — verified live that the model turn still ran and leaked 26k tokens — so the guard hard-exits.)
    • Garden launcher. Launch resident sessions through the launcher so the id is injected up front: pi --session-id "$(<repo>/run.sh new-session-id)" --entwurf-control …. run.sh new-session-id prints one fresh garden sessionId from the generateSessionId SSOT (no shell-side format duplication). See README §Garden launcher.
    • /gnew (/garden-new) starts a fresh garden session in the same terminal. Builtin /new remains blocked because pi's ctx.newSession() mints a uuid before any extension can re-stamp it. /gnew uses the safe path instead: a fail-closed writer pre-creates a valid garden session JSONL header, then the command calls ctx.switchSession(file), whose SessionManager.open() reads that garden id before session_start. Header, control socket, backend stream sessionId, and MCP-child PI_SESSION_ID therefore all bind to the new garden id with no torn uuid moment. If the operator quits before the first turn, the empty session remains visible with message count 0; it is a legitimate resident session, not an orphan.
    • Status label is the screwdriver 🪛, not the word "entwurf". The resident status reads 🪛 ready before the first assistant turn (session file not yet on disk — model still changeable) and 🪛 <gardenId> after (file written = model locked). The id's presence is the model-lock lifecycle signal. The status label is decoupled from the session-name tag (the word "entwurf" no longer appears in the status bar, so it can't be misread as "talking to an entwurf'd session").
    • Resident session name is lazy and tagged control, never entwurf. On the first turn (model now locked) entwurf-control sets a garden name via pi.setSessionName(buildGardenSessionName(...)) with the control tag and the cwd basename as title. buildGardenSessionName is registry-FREE (a native model like deepseek/deepseek-v4-pro that is not an Entwurf spawn target passes, where the child buildSessionName would throw) and FORBIDS the entwurf tag — so a resident session is never resumable as an Entwurf child via entwurf_resume (the entwurf tag is that resume marker).
    • Coverage: deterministic check-entwurf-session-identity (now 158 assertions) covers assertGardenNativeSessionId (uuid→throw / garden→pass), buildGardenSessionName (registry-free native model, entwurf tag forbidden, round-trip), computeResidentStatusLabel (🪛 ready / 🪛 id), the regression that a control session is NOT entwurf_resume-able, and the /gnew writer's fail-closed guarantees (wx, collision refusal, full read-back, guarded orphan cleanup). Live smoke-resident-garden-guard proves the negative (raw uuid → nonzero exit, no turn, no socket, 0 tokens), replacement safety (builtin /new / /clone cancelled, not hard-exit), /gnew 0-token E2E (new garden id, socket rebound, no uuid leak), and, opt-in, backend identity after /gnew (entwurf_self reports the new garden id).

Changed (release-gate + test harness)

  • release-gate now runs the two garden-native identity gates first. smoke-session-id-name (Phase 3a — Pi --session-id/--name substrate through the bridge) and smoke-resident-garden-guard (Phase 3c — the resident --entwurf-control guard, NEGATIVE 0-token path) run before the Entwurf live gates so an identity-foundation break fails fast instead of surfacing as confusing downstream failures. Both take no project arg and are exempt from the scratch-isolation concern by construction: the substrate smoke runs every pi turn under its own os.tmpdir() agent dir + cwds (mkdtemp, cleaned up), and the guard's negative path writes no session file at all.
  • smoke-async-resume completion detection hardened against a lazy-persist false-negative. pi persists a parent session file only at the first assistant turn-end, and a slow orchestrator can still be mid-turn long after the resume child finished — so the previous single find_parent_session_file lookup at completion-check time could miss a parent JSONL that was about to appear, recording FAIL even though entwurf-async had already delivered+persisted the entwurf-complete (🏁) CustomMessage. The completion phase now re-resolves the parent file every tick and polls its persisted entwurf-complete count (tmux pane is the secondary fast-path channel); fail-closed is preserved (no detected completion → FAIL). Removed the now-unused wait_jsonl_count_gt helper. Test-harness only; no runtime behavior change. Product was already correct — verified by RESUME_OK in every resume child plus the persisted 🏁 in every parent across all three backends.
  • check-native-async exercises a LOCAL async spawn instead of a bogus remote host. The native async spawn smoke used host="__native_async_smoke_bogus__" to enter runEntwurfAsync cheaply, but the 0.9.0 remote-out-of-scope fail-fast (#11) now rejects any non-local host before runEntwurfAsync runs — so the bogus-host call no longer exercised the async path at all (and failed the gate). The smoke now spawns a local async entwurf, which both matches the 0.9.0 scope and actually drives runEntwurfAsync for the stale-explicitExtensions ReferenceError guard it exists to catch.

Verification

  • The authoritative /gnew-inclusive 0.9.0 release-gate is green and recorded in BASELINE.md: a cut-time pi-session ./run.sh release-gate run, 17 PASS / 0 FAIL / 0 SKIP (Gemini present, no --allow-skip-gemini), with the resident garden guard at 31/0 (negative + replacement + /gnew 0-token E2E + positive/T3) and check-entwurf-session-identity at 158 assertions. It supersedes the earlier pre-/gnew Claude Code sweep that /gnew had invalidated.
  • The async-resume repair was confirmed in isolation (6 PASS / 0 FAIL across Claude/Codex/Gemini + direct-stdio + external negative paths) before the full gate cycle; /gnew adds its own deterministic writer coverage plus live resident-guard smoke coverage.
  • Backend-axis note (Hard Rule #7): /gnew T3 backend identity was live-measured on the release-gate default Claude lane (claude-sonnet-4-6) only. Codex/Gemini /gnew T3 runs are carried forward in NEXT.md; the general runtime matrix still remains covered by smoke-all across all three backends.