fix: harden agentty session lifecycle by pinion05 · Pull Request #10 · pinion05/agentty

pinion05 · 2026-03-06T09:04:22Z

Summary

preserve CLI argv boundaries for agentty start and wait for PTY readiness before reporting success
serialize shared session state updates and tighten attach/kill session lifecycle checks
add regression coverage for argv preservation, kill timeout handling, state concurrency, and the updated Vim flow

Issues

Closes bug(start): preserve argv boundaries instead of re-shelling joined command #5
Closes bug(state): prevent session metadata loss across concurrent CLI invocations #6
Closes bug(start): report readiness only after PTY startup and stable session metadata #7
Closes bug(kill): do not report success before the session has actually exited #8
Closes bug(attach): validate target session before writing active pointer #9

Test Plan

bun run test

Summary by CodeRabbit

Release Notes

New Features
- Enhanced command-line argument parsing with improved preservation and display
- Added validation to ensure only running sessions can be attached
- Implemented timeout enforcement for session termination operations
Bug Fixes
- Prevented session state loss during concurrent invocations through state locking
Tests
- Extended test coverage for argument preservation and concurrent state management

qodo-code-review · 2026-03-06T09:04:37Z

Review Summary by Qodo

Harden agentty session lifecycle with argv preservation, PTY readiness checks, and state serialization

🐞 Bug fix ✨ Enhancement 🧪 Tests

Walkthroughs

Description

• Preserve CLI argv boundaries by splitting command into executable and args array
• Wait for PTY readiness before reporting session start success
• Serialize shared session state updates with file-based locking to prevent concurrent loss
• Validate attach targets and enforce kill timeout with explicit error handling
• Add regression tests for argv preservation, kill timeout, state concurrency, and Vim flow

Diagram

flowchart LR
  CLI["CLI start command"]
  Parse["Parse argv into command + args"]
  Worker["Spawn worker process"]
  PTY["Wait for PTY readiness"]
  Lock["Acquire state lock"]
  Serialize["Serialize session state"]
  Attach["Validate attach target"]
  Kill["Enforce kill timeout"]
  
  CLI --> Parse
  Parse --> Worker
  Worker --> PTY
  PTY --> Lock
  Lock --> Serialize
  Serialize --> Attach
  Attach --> Kill

File Changes

1. src/index.ts ✨ Enhancement +19/-3

Split command into executable and args array

src/index.ts

2. src/sessionRuntime.ts ✨ Enhancement +67/-11

Add PTY readiness handshake and kill timeout validation

src/sessionRuntime.ts

3. src/state.ts ✨ Enhancement +93/-27

Add file-based locking for concurrent state mutations

src/state.ts

View more (9)

4. src/resolveSession.ts 🐞 Bug fix +10/-1

Validate attach target is running before attaching

src/resolveSession.ts

5. src/worker.ts ✨ Enhancement +20/-8

Support args array and display command in worker spec

src/worker.ts

6. tests/attach.test.ts 🧪 Tests +42/-3

Add tests for attach validation and error cases

tests/attach.test.ts

7. tests/e2e.start-argv.test.ts 🧪 Tests +108/-0

New test for argv preservation across quoted and empty entries

tests/e2e.start-argv.test.ts

8. tests/sessionRuntime.kill-timeout.test.ts 🧪 Tests +43/-0

New test for kill timeout error handling

tests/sessionRuntime.kill-timeout.test.ts

9. tests/sessionRuntime.start.test.ts 🧪 Tests +4/-0

Add assertions for PTY pid vs worker pid separation

tests/sessionRuntime.start.test.ts

10. tests/state.concurrent.test.ts 🧪 Tests +52/-0

New test for concurrent session state preservation

tests/state.concurrent.test.ts

11. tests/e2e.vim.test.ts 🧪 Tests +3/-11

Update vim test to use proper shell command syntax

tests/e2e.vim.test.ts

12. docs/plans/2026-03-06-agentty-reliability-implementation.md 📝 Documentation +67/-0

Document reliability implementation plan and tasks

docs/plans/2026-03-06-agentty-reliability-implementation.md

qodo-code-review · 2026-03-06T09:04:38Z

Code Review by Qodo

🐞 Bugs (2) 📘 Rule violations (0) 📎 Requirement gaps (0)

1. Orphan PTY on lock failure 🐞 Bug ⛯ Reliability

Description

If session persistence during worker startup fails (e.g., state lock timeout) or the parent SIGTERMs
the worker during the new readiness wait, the worker can exit before installing SIGTERM/SIGINT
handlers and without calling requestKill(), potentially leaving the PTY process running orphaned.

Code

src/sessionRuntime.ts[R272-285]

+  const readySession = await waitForSessionReady(sessionId, child.pid, () => workerExited);
+
+  if (!readySession) {
+    try {
+      process.kill(child.pid, 'SIGTERM');
+    } catch {
+      // ignore cleanup errors
+    }
+
+    await markSessionExited(sessionId, workerExitCode);
+
+    throw new Error(
+      `session worker failed to become ready within ${START_READY_TIMEOUT_MS}ms: ${socketPath}. Check worker log: ${logFilePath}`,
    );

Evidence

startSession now kills the detached worker when readiness isn’t observed; meanwhile the worker
spawns the PTY before awaiting persistRunning(), and only installs SIGTERM/SIGINT handlers after
that await. Because upsertSession is now guarded by a lock that can time out, persistRunning() can
reject and drive the worker into its top-level catch, which only cleans up the socket and exits
without killing the PTY.

src/sessionRuntime.ts[272-286]
src/worker.ts[330-395]
src/worker.ts[397-408]
src/worker.ts[88-101]
src/state.ts[79-111]
src/state.ts[174-190]

Agent prompt

The issue below was found during a code review. Follow the provided context and guidance below and implement a solution

## Issue description
The worker can spawn a PTY and then fail/exit before it installs SIGTERM handlers or calls `requestKill()`. With the new state-lock timeouts and the new `startSession()` readiness timeout that SIGTERMs the worker, this can leave orphan PTY processes.

## Issue Context
- Worker spawns PTY before awaiting `persistRunning()`.
- State persistence now can fail due to `withStateLock()` timeout.
- Worker top-level catch exits without killing PTY.

## Fix Focus Areas
- src/worker.ts[330-408]
- src/state.ts[79-125]
- src/sessionRuntime.ts[272-286]

ⓘ Copy this prompt and use it to remediate the issue with your preferred AI generation tools

2. Lock stale ownership race 🐞 Bug ⛯ Reliability

Description

The state lock can be removed as “stale” based only on directory mtime, and releaseStateLock()
unconditionally deletes the lock path, which can delete another process’s lock and allow overlapping
writers.

Code

src/state.ts[R95-115]

+      try {
+        const lockStat = await stat(lockPath);
+
+        if (Date.now() - lockStat.mtimeMs > STATE_LOCK_STALE_MS) {
+          await rm(lockPath, { recursive: true, force: true });
+          continue;
+        }
+      } catch {
+        // ignore stale check failures and retry
+      }
+
+      await new Promise((resolve) => setTimeout(resolve, STATE_LOCK_RETRY_MS));
+    }
+  }
+
+  throw new Error(`Timed out waiting for state lock: ${lockPath}`);
+}
+
+async function releaseStateLock(): Promise<void> {
+  await rm(getStateLockPath(), { recursive: true, force: true });
+}

Evidence

The lock has no ownership token; if Process A holds the lock longer than STATE_LOCK_STALE_MS (or
gets paused), Process B can delete and reacquire it. When A later runs the finally-block,
releaseStateLock() rm()s the lock directory without verifying ownership, potentially deleting B’s
lock and breaking mutual exclusion.

src/state.ts[95-101]
src/state.ts[113-115]
src/state.ts[117-125]

Agent prompt

The issue below was found during a code review. Follow the provided context and guidance below and implement a solution

## Issue description
The directory-based lock can be evicted and then removed by a different owner because there is no ownership token and `releaseStateLock()` deletes the lock path unconditionally.

## Issue Context
This undermines the mutual exclusion guarantee the lock is supposed to provide and can reintroduce concurrent writer corruption.

## Fix Focus Areas
- src/state.ts[79-125]

ⓘ Copy this prompt and use it to remediate the issue with your preferred AI generation tools

ⓘ The new review experience is currently in Beta. Learn more

gemini-code-assist · 2026-03-06T09:04:50Z

Summary of Changes

Hello, I'm Gemini Code Assist¹! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request significantly enhances the reliability and robustness of the agentty session management. It focuses on hardening the session lifecycle by ensuring CLI arguments are correctly handled, session startup is more resilient with PTY readiness checks, and concurrent state updates are properly serialized to prevent data loss. Additionally, session attachment and termination processes are made more stringent with explicit validation and timeout mechanisms, leading to a more stable and predictable user experience.

Highlights

CLI Argument Preservation and PTY Readiness: The agentty start command now accurately preserves CLI argument boundaries, including quoted and empty entries. Session startup also waits for PTY readiness before reporting success, ensuring a more robust session initiation.
Serialized Session State Updates: Shared session state updates are now serialized using a file-system based locking mechanism, preventing data loss and race conditions during concurrent CLI invocations. This tightens the attach and kill session lifecycle checks.
Enhanced Session Lifecycle Checks: The attach command now validates that the target session is actively running, rejecting attempts to attach to non-existent or exited sessions. The kill command has improved semantics, explicitly failing if a session does not exit within a defined timeout.
Comprehensive Regression Coverage: New regression tests have been added to cover argv preservation, kill timeout handling, state concurrency, and an updated Vim flow, ensuring the reliability improvements are well-tested.

Changelog

docs/plans/2026-03-06-agentty-reliability-implementation.md
- Added a detailed implementation plan outlining the goals, architecture, tech stack, and specific tasks for improving agentty reliability, covering argv preservation, PTY readiness, state serialization, and attach/kill semantics.
src/index.ts
- Modified the StartOptionResult interface to include args and displayCommand for better command representation.
- Introduced formatCommandForDisplay to properly format command parts for display, handling spaces and special characters.
- Updated parseStartOptions to extract command, args, and displayCommand from CLI arguments.
- Adjusted runCli to pass the new args and displayCommand to startSession.
src/resolveSession.ts
- Implemented ensureRunningSession to validate that a session exists and is in a 'running' state before proceeding.
- Integrated ensureRunningSession into attachSession to prevent attaching to non-existent or exited sessions.
src/sessionRuntime.ts
- Extended StartSessionInput to include args and displayCommand.
- Increased START_READY_TIMEOUT_MS from 1 second to 10 seconds for more robust PTY readiness waiting.
- Modified waitForExited to return a boolean indicating if the session exited within the timeout.
- Introduced waitForSessionReady to explicitly wait for the PTY process to become ready and update its PID.
- Updated startSession to pass args and displayCommand to the worker, and to use waitForSessionReady for PTY readiness.
- Enhanced killSession to throw an error if the session does not exit within the KILL_WAIT_TIMEOUT_MS.
src/state.ts
- Imported stat from node:fs/promises for file status checks.
- Defined new constants for state locking (STATE_LOCK_DIR, STATE_LOCK_TIMEOUT_MS, STATE_LOCK_RETRY_MS, STATE_LOCK_STALE_MS).
- Added getStateLockPath to determine the path for the state lock directory.
- Implemented acquireStateLock and releaseStateLock for managing a file-system based lock to serialize state updates.
- Introduced withStateLock as a utility function to wrap state-modifying operations with lock acquisition and release.
- Created writeSessionsUnlocked to handle the actual writing of session data, to be used within withStateLock.
- Wrapped writeSessions, upsertSession, and writeActiveSessionId with withStateLock to ensure atomic updates to shared state.
src/worker.ts
- Updated WorkerSpec interface to include args and displayCommand.
- Modified toSessionRecord to use displayCommand if available, otherwise fallback to command.
- Adjusted the ptyProcess spawn logic to conditionally use spawn(spec.command, spec.args, ...) when args are provided, otherwise falling back to spawn(shell, ['-lc', spec.command], ...).
tests/attach.test.ts
- Imported upsertSession and writeActiveSessionId for test setup.
- Added upsertSession calls before attachSession to ensure the session exists and is running for successful attachment tests.
- Updated resolveTargetSessionId tests to use writeActiveSessionId for setting the active session.
- Added new tests to verify that attachSession rejects nonexistent or exited sessions.
tests/e2e.start-argv.test.ts
- Added a new end-to-end test to verify that agentty start correctly preserves quoted and empty command-line arguments passed to the spawned process.
tests/e2e.vim.test.ts
- Modified the start command arguments for the Vim e2e test to use sh -lc for more robust command execution.
tests/sessionRuntime.kill-timeout.test.ts
- Added a new test to verify that killSession correctly rejects when a session does not exit within the specified timeout period.
tests/sessionRuntime.start.test.ts
- Updated startSession tests to assert that the session's pid (PTY process) is distinct from its workerPid (worker process) and that both are correctly recorded.
tests/state.concurrent.test.ts
- Added a new test to ensure that concurrent upsertSession calls correctly preserve all session records, demonstrating the effectiveness of the new state locking mechanism.

Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point by creating a comment using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands on the current page.

Feature	Command	Description
Code Review	`/gemini review`	Performs a code review for the current pull request in its current state.
Pull Request Summary	`/gemini summary`	Provides a summary of the current pull request in its current state.
Comment	@gemini-code-assist	Responds in comments when explicitly tagged, both in pull request comments and review comments.
Help	`/gemini help`	Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here.

You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.

Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution. ↩

coderabbitai · 2026-03-06T09:05:14Z

📝 Walkthrough

Walkthrough

Implements a reliability improvement plan for agentty by adding argument preservation, session readiness detection, attachment validation, and concurrent state management locking. Changes span command parsing, session initialization, worker invocation, state persistence, and comprehensive test coverage.

Changes

Cohort / File(s)	Summary
Planning & Documentation `docs/plans/2026-03-06-agentty-reliability-implementation.md`	Detailed reliability implementation plan outlining four tasks: preserving argv boundaries, preventing session state loss across concurrent invocations, tightening attach validation and kill semantics, and verifying the full test suite.
Command Parsing & CLI `src/index.ts`	Extended `StartOptionResult` interface with optional `args?: string[]` and new `displayCommand: string` fields. Added `formatCommandForDisplay` helper to quote command parts containing spaces. Updated start command to pass parsed args and display command to session startup.
Session Attachment Validation `src/resolveSession.ts`	Added `ensureRunningSession` helper that loads and validates sessions, ensuring only running sessions can be attached. Updated `attachSession` to enforce running-session requirement before allowing attachment.
Session Initialization & Readiness `src/sessionRuntime.ts`	Added args and displayCommand support to `StartSessionInput`. Introduced `waitForSessionReady` helper to detect worker readiness and handle early worker exit. Updated startup flow to wait for readiness before returning session. Enhanced timeout handling with configurable polling intervals and improved cleanup on failure. Updated `waitForExited` to return boolean and `killSession` to validate exit within timeout.
Concurrent State Management `src/state.ts`	Implemented file-based state locking mechanism via `withStateLock` to serialize critical mutations. Applied locking to `writeSessions`, `upsertSession`, and `writeActiveSessionId`. Added stale-lock detection and timeout error handling.
Worker Process Management `src/worker.ts`	Extended `WorkerSpec` interface with optional `args?: string[]` and `displayCommand?: string` fields. Updated spawn logic to use direct spawning with args when provided, otherwise shell invocation. Modified `toSessionRecord` to use `displayCommand` when available.
Attachment & State Tests `tests/attach.test.ts`	Added explicit state setup helpers (`upsertSession`, `writeActiveSessionId`). New tests verify error handling for nonexistent and exited sessions, ensuring failures clear active session state.
End-to-End Argument Tests `tests/e2e.start-argv.test.ts`	New e2e test suite validating argv preservation through session lifecycle. Tests spawn CLI with scripts containing quoted, empty, and literal quote arguments, polling output to verify correct argument reconstruction.
End-to-End Vim Test Simplification `tests/e2e.vim.test.ts`	Simplified Vim startup from multi-part shell sequence to single `sh -lc` invocation, delegating vim/vi fallback logic to shell execution.
Kill Timeout Tests `tests/sessionRuntime.kill-timeout.test.ts`	New test validating `killSession` timeout enforcement. Verifies rejection when session does not exit within 3000ms and confirms active session ID is not cleared on timeout.
Session Runtime & Worker PID Tests `tests/sessionRuntime.start.test.ts`	Extended session object with `workerPid: number` field. Added assertions verifying `workerPid` is present, numeric, and distinct from `pid` in both returned and persisted session records.
Concurrent State Mutation Tests `tests/state.concurrent.test.ts`	New concurrency test spawning 24 parallel `upsertSession` calls. Validates all records persist without loss and confirms state locking prevents data corruption under concurrent mutation.

Sequence Diagram(s)

sequenceDiagram
    participant CLI as CLI (start)
    participant Index as index.ts
    participant SessionRT as sessionRuntime.ts
    participant Worker as worker.ts
    participant IPC as IPC Manager
    participant State as state.ts

    CLI->>Index: start command with args
    Index->>Index: parseStartOptions()
    Note over Index: Extract command, args,<br/>displayCommand
    Index->>SessionRT: startSession({command,<br/>args, displayCommand})
    SessionRT->>State: withStateLock()
    State->>SessionRT: Locked access acquired
    SessionRT->>Worker: Spawn with args or shell
    Worker->>IPC: Initialize socket
    IPC-->>Worker: Socket path
    Worker-->>SessionRT: worker spawned
    SessionRT->>SessionRT: waitForSessionReady()
    Note over SessionRT: Poll for worker pid match<br/>& socket readiness
    loop Until ready or timeout
        SessionRT->>IPC: Check socket readiness
        IPC-->>SessionRT: Ready/not ready
        SessionRT->>SessionRT: pid === workerPid?
    end
    SessionRT->>State: upsertSession(ready session)
    State-->>SessionRT: Locked write complete
    SessionRT-->>CLI: Return session with<br/>workerPid & pid

sequenceDiagram
    participant CLI as CLI (attach)
    participant Resolve as resolveSession.ts
    participant State as state.ts
    participant IPC as IPC Manager

    CLI->>Resolve: attachSession(sessionId)
    Resolve->>State: withStateLock()
    State->>Resolve: Locked access acquired
    Resolve->>Resolve: ensureRunningSession(id)
    Resolve->>State: readSessionById(id)
    State-->>Resolve: session object
    alt session not found or<br/>status !== 'running'
        Resolve-->>CLI: Error (invalid session)
        Resolve->>State: releaseStateLock()
    else session exists & running
        Resolve->>IPC: writeActiveSessionId(id)
        IPC-->>Resolve: Active ID set
        Resolve->>State: releaseStateLock()
        Resolve-->>CLI: Success
    end

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~45 minutes

Poem

🐰 Hop, hop, the args now parse with care,
Each token kept, no merged despair,
Locks guard the state through concurrent dance,
Readiness awaits before we prance,
Sessions validated, workers synchronized—
Agentty's heart, now stabilized! 🐇✨

🚥 Pre-merge checks | ✅ 2 | ❌ 1

❌ Failed checks (1 warning)

Check name	Status	Explanation	Resolution
Docstring Coverage	⚠️ Warning	Docstring coverage is 0.00% which is insufficient. The required threshold is 80.00%.	Write docstrings for the functions missing them to satisfy the coverage threshold.

✅ Passed checks (2 passed)

Check name	Status	Explanation
Description Check	✅ Passed	Check skipped - CodeRabbit’s high-level summary is enabled.
Title check	✅ Passed	The title 'fix: harden agentty session lifecycle' directly addresses the main objective: hardening session lifecycle management through argv preservation, PTY readiness checks, state serialization, and tighter validation.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing Touches

📝 Generate docstrings (stacked PR)
📝 Generate docstrings (commit on current branch)

🧪 Generate unit tests (beta)

Create PR with unit tests
Post copyable unit tests in a comment
Commit unit tests in branch fix-agentty-session-reliability

Tip

Try Coding Plans. Let us write the prompt for your AI agent so you can ship faster (with fewer bugs).
Share your feedback on Discord.

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

gemini-code-assist

Code Review

This pull request significantly hardens the agentty session lifecycle by introducing file-based locking to prevent race conditions, improving command-line argument handling for agentty start, ensuring session startup waits for the PTY, and making attach and kill commands more robust. While these changes are well-supported by new and updated tests, the implementation of the state lock and temporary file handling in src/state.ts introduces potential security risks. The use of predictable temporary file names and lack of restrictive permissions on the lock directory could lead to local symbolic link attacks or denial-of-service, particularly when used in shared or world-writable directories (e.g., when AGENTTY_HOME is set to /tmp). Addressing these by using secure temporary file creation and enforcing strict directory permissions will further harden the application's security posture. Additionally, there is one suggestion to reduce code duplication.

gemini-code-assist · 2026-03-06T09:07:57Z

+  const tempPath = `${sessionsPath}.${process.pid}.${Date.now()}.tmp`;
+  const serialized = JSON.stringify(sessions, null, 2);
+
+  await writeFile(tempPath, serialized, 'utf8');
+  await rename(tempPath, sessionsPath);


The use of a predictable temporary file name (sessionsPath.process.pid.Date.now().tmp) is vulnerable to symbolic link attacks. If the state directory (e.g., via AGENTTY_HOME) is set to a world-writable location like /tmp, a local attacker can predict the temporary file name and create a symbolic link at that path pointing to a sensitive file. When agentty attempts to write the sessions file, it will follow the link and overwrite the target file with the user's privileges. It is recommended to use a secure temporary file creation method that ensures a unique, non-predictable name and restrictive permissions (e.g., using fs.mkdtemp or a dedicated library).

gemini-code-assist · 2026-03-06T09:07:57Z

+      await mkdir(lockPath);
+      return;


The lock directory is created without ensuring restrictive permissions. If the state directory is located in a shared or world-writable location (like /tmp), a local attacker could pre-create the lock directory with loose permissions or use it to perform a denial-of-service attack by preventing the user from acquiring the lock. Additionally, the stale lock removal logic (lines 98-101) contains a time-of-check to time-of-use (TOCTOU) race condition where a fresh lock from another process could be deleted if it is misidentified as stale. It is recommended to create the lock directory with restrictive permissions (e.g., 0700) and verify its ownership and permissions if it already exists.

gemini-code-assist · 2026-03-06T09:07:57Z

+  ptyProcess = Array.isArray(spec.args)
+    ? spawn(spec.command, spec.args, {
+        cwd: spec.cwd,
+        env: process.env,
+        name: 'xterm-256color',
+        cols: 80,
+        rows: 24,
+      })
+    : spawn(shell, ['-lc', spec.command], {
+        cwd: spec.cwd,
+        env: process.env,
+        name: 'xterm-256color',
+        cols: 80,
+        rows: 24,
+      });


To improve readability and reduce code duplication, you could extract the common pty options into a separate constant. This makes the code more maintainable, as any future changes to the options would only need to be made in one place.

const ptyOptions = { cwd: spec.cwd, env: process.env, name: 'xterm-256color', cols: 80, rows: 24, }; ptyProcess = Array.isArray(spec.args) ? spawn(spec.command, spec.args, ptyOptions) : spawn(shell, ['-lc', spec.command], ptyOptions);

qodo-code-review · 2026-03-06T09:09:56Z

+  const readySession = await waitForSessionReady(sessionId, child.pid, () => workerExited);
+
+  if (!readySession) {
+    try {
+      process.kill(child.pid, 'SIGTERM');
+    } catch {
+      // ignore cleanup errors
+    }
+
+    await markSessionExited(sessionId, workerExitCode);
+
+    throw new Error(
+      `session worker failed to become ready within ${START_READY_TIMEOUT_MS}ms: ${socketPath}. Check worker log: ${logFilePath}`,
    );


1. Orphan pty on lock failure 🐞 Bug ⛯ Reliability

If session persistence during worker startup fails (e.g., state lock timeout) or the parent SIGTERMs the worker during the new readiness wait, the worker can exit before installing SIGTERM/SIGINT handlers and without calling requestKill(), potentially leaving the PTY process running orphaned.

Agent Prompt

## Issue description The worker can spawn a PTY and then fail/exit before it installs SIGTERM handlers or calls `requestKill()`. With the new state-lock timeouts and the new `startSession()` readiness timeout that SIGTERMs the worker, this can leave orphan PTY processes. ## Issue Context - Worker spawns PTY before awaiting `persistRunning()`. - State persistence now can fail due to `withStateLock()` timeout. - Worker top-level catch exits without killing PTY. ## Fix Focus Areas - src/worker.ts[330-408] - src/state.ts[79-125] - src/sessionRuntime.ts[272-286]

ⓘ Copy this prompt and use it to remediate the issue with your preferred AI generation tools

coderabbitai

Actionable comments posted: 1

🧹 Nitpick comments (1)

tests/e2e.start-argv.test.ts (1)

38-70: Keep the polling helper’s per-call timeout below its overall deadline.

waitForSnapshotContains() says it times out after 8s, but each runCommand() call can block for 10s. A single slow get can therefore overshoot the advertised deadline and make failures much harder to diagnose.

♻️ Proposed fix

-  async function runCommand(args: string[]): Promise<CommandResult> {
+  async function runCommand(args: string[], timeout = 10_000): Promise<CommandResult> {
     const result = await execa('node', ['dist/index.js', ...args], {
       env: {
         ...process.env,
         AGENTTY_HOME: tempHome,
       },
       reject: false,
-      timeout: 10_000,
+      timeout,
     });
@@
   async function waitForSnapshotContains(sessionId: string, needle: string): Promise<string> {
     const timeoutMs = 8_000;
     const intervalMs = 50;
     const startedAt = Date.now();
 
     while (Date.now() - startedAt < timeoutMs) {
-      const result = await runCommand(['get', '--session', sessionId, '--lines', '200']);
+      const elapsedMs = Date.now() - startedAt;
+      const remainingMs = timeoutMs - elapsedMs;
+      const result = await runCommand(
+        ['get', '--session', sessionId, '--lines', '200'],
+        Math.min(remainingMs, 1_000),
+      );
 
       if (result.exitCode === 0 && result.stdout.includes(needle)) {
         return result.stdout;
       }

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In `@tests/e2e.start-argv.test.ts` around lines 38 - 70, The polling helper
waitForSnapshotContains has an overall timeoutMs = 8_000 but calls runCommand
which currently uses a hardcoded execa timeout of 10_000; change runCommand to
accept an optional perCallTimeout parameter (or use a default shorter timeout)
and use that value for the execa timeout option, then call runCommand from
waitForSnapshotContains with a per-call timeout safely smaller than timeoutMs
(for example Math.min(2000, timeoutMs - elapsed - 100) or a fixed 2000ms) so no
single runCommand invocation can block past the polling deadline; update the
runCommand signature and the single call sites (notably the call inside
waitForSnapshotContains) to pass the new timeout.

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@docs/plans/2026-03-06-agentty-reliability-implementation.md`:
- Line 13: The document jumps heading levels from H1 to H3; change the "### Task
1: Preserve argv boundaries and delay `start` success until PTY readiness"
heading to H2 (i.e., "## Task 1...") to restore proper incremental heading
levels, and apply the same change to the headings for Task 2, Task 3, and Task 4
(use the exact heading texts "Task 2:", "Task 3:", "Task 4:" in the file) so
each task heading is H2 and the outline increments by one level at a time.

---

Nitpick comments:
In `@tests/e2e.start-argv.test.ts`:
- Around line 38-70: The polling helper waitForSnapshotContains has an overall
timeoutMs = 8_000 but calls runCommand which currently uses a hardcoded execa
timeout of 10_000; change runCommand to accept an optional perCallTimeout
parameter (or use a default shorter timeout) and use that value for the execa
timeout option, then call runCommand from waitForSnapshotContains with a
per-call timeout safely smaller than timeoutMs (for example Math.min(2000,
timeoutMs - elapsed - 100) or a fixed 2000ms) so no single runCommand invocation
can block past the polling deadline; update the runCommand signature and the
single call sites (notably the call inside waitForSnapshotContains) to pass the
new timeout.

ℹ️ Review info

⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 4430c00e-773d-4ea4-bc96-620bdb5e34fb

📥 Commits

Reviewing files that changed from the base of the PR and between 447d842 and 0511ef4.

📒 Files selected for processing (12)

docs/plans/2026-03-06-agentty-reliability-implementation.md
src/index.ts
src/resolveSession.ts
src/sessionRuntime.ts
src/state.ts
src/worker.ts
tests/attach.test.ts
tests/e2e.start-argv.test.ts
tests/e2e.vim.test.ts
tests/sessionRuntime.kill-timeout.test.ts
tests/sessionRuntime.start.test.ts
tests/state.concurrent.test.ts

coderabbitai · 2026-03-06T09:11:09Z

+
+---
+
+### Task 1: Preserve argv boundaries and delay `start` success until PTY readiness


⚠️ Potential issue | 🟡 Minor

Fix heading level increment.

The static analysis tool flagged that heading levels should increment by one level at a time. The document jumps from # (h1) to ### (h3), skipping ## (h2).

📝 Proposed fix

-### Task 1: Preserve argv boundaries and delay `start` success until PTY readiness +## Task 1: Preserve argv boundaries and delay `start` success until PTY readiness

Apply the same change to Tasks 2, 3, and 4 (lines 31, 43, 59).

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

### Task 1: Preserve argv boundaries and delay `start` success until PTY readiness

## Task 1: Preserve argv boundaries and delay `start` success until PTY readiness

🧰 Tools

🪛 markdownlint-cli2 (0.21.0)

[warning] 13-13: Heading levels should only increment by one level at a time
Expected: h2; Actual: h3

(MD001, heading-increment)

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed. In `@docs/plans/2026-03-06-agentty-reliability-implementation.md` at line 13, The document jumps heading levels from H1 to H3; change the "### Task 1: Preserve argv boundaries and delay `start` success until PTY readiness" heading to H2 (i.e., "## Task 1...") to restore proper incremental heading levels, and apply the same change to the headings for Task 2, Task 3, and Task 4 (use the exact heading texts "Task 2:", "Task 3:", "Task 4:" in the file) so each task heading is H2 and the outline increments by one level at a time.

fix: tighten agentty session lifecycle contracts

0511ef4

gemini-code-assist Bot reviewed Mar 6, 2026

View reviewed changes

qodo-code-review Bot reviewed Mar 6, 2026

View reviewed changes

coderabbitai Bot reviewed Mar 6, 2026

View reviewed changes


		---

		### Task 1: Preserve argv boundaries and delay `start` success until PTY readiness

	### Task 1: Preserve argv boundaries and delay `start` success until PTY readiness
	## Task 1: Preserve argv boundaries and delay `start` success until PTY readiness

Conversation

pinion05 commented Mar 6, 2026 • edited by coderabbitai Bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Issues

Test Plan

Summary by CodeRabbit

Release Notes

Uh oh!

qodo-code-review Bot commented Mar 6, 2026

Review Summary by Qodo

Walkthroughs

File Changes

Uh oh!

qodo-code-review Bot commented Mar 6, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Code Review by Qodo

Uh oh!

gemini-code-assist Bot commented Mar 6, 2026

Summary of Changes

Highlights

Footnotes

Uh oh!

coderabbitai Bot commented Mar 6, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Walkthrough

Changes

Sequence Diagram(s)

Estimated code review effort

Poem

❌ Failed checks (1 warning)

Uh oh!

gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

gemini-code-assist Bot Mar 6, 2026

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist Bot Mar 6, 2026

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist Bot Mar 6, 2026

Choose a reason for hiding this comment

Uh oh!

qodo-code-review Bot Mar 6, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot Mar 6, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

pinion05 commented Mar 6, 2026 •

edited by coderabbitai Bot

Loading

qodo-code-review Bot commented Mar 6, 2026 •

edited

Loading

coderabbitai Bot commented Mar 6, 2026 •

edited

Loading