Skip to content

fix: harden agentty session lifecycle#10

Open
pinion05 wants to merge 1 commit intomainfrom
fix-agentty-session-reliability
Open

fix: harden agentty session lifecycle#10
pinion05 wants to merge 1 commit intomainfrom
fix-agentty-session-reliability

Conversation

@pinion05
Copy link
Copy Markdown
Owner

@pinion05 pinion05 commented Mar 6, 2026

Summary

  • preserve CLI argv boundaries for agentty start and wait for PTY readiness before reporting success
  • serialize shared session state updates and tighten attach/kill session lifecycle checks
  • add regression coverage for argv preservation, kill timeout handling, state concurrency, and the updated Vim flow

Issues

Test Plan

  • bun run test

Summary by CodeRabbit

Release Notes

  • New Features

    • Enhanced command-line argument parsing with improved preservation and display
    • Added validation to ensure only running sessions can be attached
    • Implemented timeout enforcement for session termination operations
  • Bug Fixes

    • Prevented session state loss during concurrent invocations through state locking
  • Tests

    • Extended test coverage for argument preservation and concurrent state management

@qodo-code-review
Copy link
Copy Markdown

Review Summary by Qodo

Harden agentty session lifecycle with argv preservation, PTY readiness checks, and state serialization

🐞 Bug fix ✨ Enhancement 🧪 Tests

Grey Divider

Walkthroughs

Description
• Preserve CLI argv boundaries by splitting command into executable and args array
• Wait for PTY readiness before reporting session start success
• Serialize shared session state updates with file-based locking to prevent concurrent loss
• Validate attach targets and enforce kill timeout with explicit error handling
• Add regression tests for argv preservation, kill timeout, state concurrency, and Vim flow
Diagram
flowchart LR
  CLI["CLI start command"]
  Parse["Parse argv into command + args"]
  Worker["Spawn worker process"]
  PTY["Wait for PTY readiness"]
  Lock["Acquire state lock"]
  Serialize["Serialize session state"]
  Attach["Validate attach target"]
  Kill["Enforce kill timeout"]
  
  CLI --> Parse
  Parse --> Worker
  Worker --> PTY
  PTY --> Lock
  Lock --> Serialize
  Serialize --> Attach
  Attach --> Kill
Loading

Grey Divider

File Changes

1. src/index.ts ✨ Enhancement +19/-3

Split command into executable and args array

src/index.ts


2. src/sessionRuntime.ts ✨ Enhancement +67/-11

Add PTY readiness handshake and kill timeout validation

src/sessionRuntime.ts


3. src/state.ts ✨ Enhancement +93/-27

Add file-based locking for concurrent state mutations

src/state.ts


View more (9)
4. src/resolveSession.ts 🐞 Bug fix +10/-1

Validate attach target is running before attaching

src/resolveSession.ts


5. src/worker.ts ✨ Enhancement +20/-8

Support args array and display command in worker spec

src/worker.ts


6. tests/attach.test.ts 🧪 Tests +42/-3

Add tests for attach validation and error cases

tests/attach.test.ts


7. tests/e2e.start-argv.test.ts 🧪 Tests +108/-0

New test for argv preservation across quoted and empty entries

tests/e2e.start-argv.test.ts


8. tests/sessionRuntime.kill-timeout.test.ts 🧪 Tests +43/-0

New test for kill timeout error handling

tests/sessionRuntime.kill-timeout.test.ts


9. tests/sessionRuntime.start.test.ts 🧪 Tests +4/-0

Add assertions for PTY pid vs worker pid separation

tests/sessionRuntime.start.test.ts


10. tests/state.concurrent.test.ts 🧪 Tests +52/-0

New test for concurrent session state preservation

tests/state.concurrent.test.ts


11. tests/e2e.vim.test.ts 🧪 Tests +3/-11

Update vim test to use proper shell command syntax

tests/e2e.vim.test.ts


12. docs/plans/2026-03-06-agentty-reliability-implementation.md 📝 Documentation +67/-0

Document reliability implementation plan and tasks

docs/plans/2026-03-06-agentty-reliability-implementation.md


Grey Divider

Qodo Logo

@qodo-code-review
Copy link
Copy Markdown

qodo-code-review Bot commented Mar 6, 2026

Code Review by Qodo

🐞 Bugs (2) 📘 Rule violations (0) 📎 Requirement gaps (0)

Grey Divider


Action required

1. Orphan PTY on lock failure 🐞 Bug ⛯ Reliability
Description
If session persistence during worker startup fails (e.g., state lock timeout) or the parent SIGTERMs
the worker during the new readiness wait, the worker can exit before installing SIGTERM/SIGINT
handlers and without calling requestKill(), potentially leaving the PTY process running orphaned.
Code

src/sessionRuntime.ts[R272-285]

+  const readySession = await waitForSessionReady(sessionId, child.pid, () => workerExited);
+
+  if (!readySession) {
+    try {
+      process.kill(child.pid, 'SIGTERM');
+    } catch {
+      // ignore cleanup errors
+    }
+
+    await markSessionExited(sessionId, workerExitCode);
+
+    throw new Error(
+      `session worker failed to become ready within ${START_READY_TIMEOUT_MS}ms: ${socketPath}. Check worker log: ${logFilePath}`,
    );
Evidence
startSession now kills the detached worker when readiness isn’t observed; meanwhile the worker
spawns the PTY before awaiting persistRunning(), and only installs SIGTERM/SIGINT handlers after
that await. Because upsertSession is now guarded by a lock that can time out, persistRunning() can
reject and drive the worker into its top-level catch, which only cleans up the socket and exits
without killing the PTY.

src/sessionRuntime.ts[272-286]
src/worker.ts[330-395]
src/worker.ts[397-408]
src/worker.ts[88-101]
src/state.ts[79-111]
src/state.ts[174-190]

Agent prompt
The issue below was found during a code review. Follow the provided context and guidance below and implement a solution

## Issue description
The worker can spawn a PTY and then fail/exit before it installs SIGTERM handlers or calls `requestKill()`. With the new state-lock timeouts and the new `startSession()` readiness timeout that SIGTERMs the worker, this can leave orphan PTY processes.

## Issue Context
- Worker spawns PTY before awaiting `persistRunning()`.
- State persistence now can fail due to `withStateLock()` timeout.
- Worker top-level catch exits without killing PTY.

## Fix Focus Areas
- src/worker.ts[330-408]
- src/state.ts[79-125]
- src/sessionRuntime.ts[272-286]

ⓘ Copy this prompt and use it to remediate the issue with your preferred AI generation tools



Remediation recommended

2. Lock stale ownership race 🐞 Bug ⛯ Reliability
Description
The state lock can be removed as “stale” based only on directory mtime, and releaseStateLock()
unconditionally deletes the lock path, which can delete another process’s lock and allow overlapping
writers.
Code

src/state.ts[R95-115]

+      try {
+        const lockStat = await stat(lockPath);
+
+        if (Date.now() - lockStat.mtimeMs > STATE_LOCK_STALE_MS) {
+          await rm(lockPath, { recursive: true, force: true });
+          continue;
+        }
+      } catch {
+        // ignore stale check failures and retry
+      }
+
+      await new Promise((resolve) => setTimeout(resolve, STATE_LOCK_RETRY_MS));
+    }
+  }
+
+  throw new Error(`Timed out waiting for state lock: ${lockPath}`);
+}
+
+async function releaseStateLock(): Promise<void> {
+  await rm(getStateLockPath(), { recursive: true, force: true });
+}
Evidence
The lock has no ownership token; if Process A holds the lock longer than STATE_LOCK_STALE_MS (or
gets paused), Process B can delete and reacquire it. When A later runs the finally-block,
releaseStateLock() rm()s the lock directory without verifying ownership, potentially deleting B’s
lock and breaking mutual exclusion.

src/state.ts[95-101]
src/state.ts[113-115]
src/state.ts[117-125]

Agent prompt
The issue below was found during a code review. Follow the provided context and guidance below and implement a solution

## Issue description
The directory-based lock can be evicted and then removed by a different owner because there is no ownership token and `releaseStateLock()` deletes the lock path unconditionally.

## Issue Context
This undermines the mutual exclusion guarantee the lock is supposed to provide and can reintroduce concurrent writer corruption.

## Fix Focus Areas
- src/state.ts[79-125]

ⓘ Copy this prompt and use it to remediate the issue with your preferred AI generation tools


Grey Divider

ⓘ The new review experience is currently in Beta. Learn more

Grey Divider

Qodo Logo

@gemini-code-assist
Copy link
Copy Markdown

Summary of Changes

Hello, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request significantly enhances the reliability and robustness of the agentty session management. It focuses on hardening the session lifecycle by ensuring CLI arguments are correctly handled, session startup is more resilient with PTY readiness checks, and concurrent state updates are properly serialized to prevent data loss. Additionally, session attachment and termination processes are made more stringent with explicit validation and timeout mechanisms, leading to a more stable and predictable user experience.

Highlights

  • CLI Argument Preservation and PTY Readiness: The agentty start command now accurately preserves CLI argument boundaries, including quoted and empty entries. Session startup also waits for PTY readiness before reporting success, ensuring a more robust session initiation.
  • Serialized Session State Updates: Shared session state updates are now serialized using a file-system based locking mechanism, preventing data loss and race conditions during concurrent CLI invocations. This tightens the attach and kill session lifecycle checks.
  • Enhanced Session Lifecycle Checks: The attach command now validates that the target session is actively running, rejecting attempts to attach to non-existent or exited sessions. The kill command has improved semantics, explicitly failing if a session does not exit within a defined timeout.
  • Comprehensive Regression Coverage: New regression tests have been added to cover argv preservation, kill timeout handling, state concurrency, and an updated Vim flow, ensuring the reliability improvements are well-tested.
Changelog
  • docs/plans/2026-03-06-agentty-reliability-implementation.md
    • Added a detailed implementation plan outlining the goals, architecture, tech stack, and specific tasks for improving agentty reliability, covering argv preservation, PTY readiness, state serialization, and attach/kill semantics.
  • src/index.ts
    • Modified the StartOptionResult interface to include args and displayCommand for better command representation.
    • Introduced formatCommandForDisplay to properly format command parts for display, handling spaces and special characters.
    • Updated parseStartOptions to extract command, args, and displayCommand from CLI arguments.
    • Adjusted runCli to pass the new args and displayCommand to startSession.
  • src/resolveSession.ts
    • Implemented ensureRunningSession to validate that a session exists and is in a 'running' state before proceeding.
    • Integrated ensureRunningSession into attachSession to prevent attaching to non-existent or exited sessions.
  • src/sessionRuntime.ts
    • Extended StartSessionInput to include args and displayCommand.
    • Increased START_READY_TIMEOUT_MS from 1 second to 10 seconds for more robust PTY readiness waiting.
    • Modified waitForExited to return a boolean indicating if the session exited within the timeout.
    • Introduced waitForSessionReady to explicitly wait for the PTY process to become ready and update its PID.
    • Updated startSession to pass args and displayCommand to the worker, and to use waitForSessionReady for PTY readiness.
    • Enhanced killSession to throw an error if the session does not exit within the KILL_WAIT_TIMEOUT_MS.
  • src/state.ts
    • Imported stat from node:fs/promises for file status checks.
    • Defined new constants for state locking (STATE_LOCK_DIR, STATE_LOCK_TIMEOUT_MS, STATE_LOCK_RETRY_MS, STATE_LOCK_STALE_MS).
    • Added getStateLockPath to determine the path for the state lock directory.
    • Implemented acquireStateLock and releaseStateLock for managing a file-system based lock to serialize state updates.
    • Introduced withStateLock as a utility function to wrap state-modifying operations with lock acquisition and release.
    • Created writeSessionsUnlocked to handle the actual writing of session data, to be used within withStateLock.
    • Wrapped writeSessions, upsertSession, and writeActiveSessionId with withStateLock to ensure atomic updates to shared state.
  • src/worker.ts
    • Updated WorkerSpec interface to include args and displayCommand.
    • Modified toSessionRecord to use displayCommand if available, otherwise fallback to command.
    • Adjusted the ptyProcess spawn logic to conditionally use spawn(spec.command, spec.args, ...) when args are provided, otherwise falling back to spawn(shell, ['-lc', spec.command], ...).
  • tests/attach.test.ts
    • Imported upsertSession and writeActiveSessionId for test setup.
    • Added upsertSession calls before attachSession to ensure the session exists and is running for successful attachment tests.
    • Updated resolveTargetSessionId tests to use writeActiveSessionId for setting the active session.
    • Added new tests to verify that attachSession rejects nonexistent or exited sessions.
  • tests/e2e.start-argv.test.ts
    • Added a new end-to-end test to verify that agentty start correctly preserves quoted and empty command-line arguments passed to the spawned process.
  • tests/e2e.vim.test.ts
    • Modified the start command arguments for the Vim e2e test to use sh -lc for more robust command execution.
  • tests/sessionRuntime.kill-timeout.test.ts
    • Added a new test to verify that killSession correctly rejects when a session does not exit within the specified timeout period.
  • tests/sessionRuntime.start.test.ts
    • Updated startSession tests to assert that the session's pid (PTY process) is distinct from its workerPid (worker process) and that both are correctly recorded.
  • tests/state.concurrent.test.ts
    • Added a new test to ensure that concurrent upsertSession calls correctly preserve all session records, demonstrating the effectiveness of the new state locking mechanism.
Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point by creating a comment using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands on the current page.

Feature Command Description
Code Review /gemini review Performs a code review for the current pull request in its current state.
Pull Request Summary /gemini summary Provides a summary of the current pull request in its current state.
Comment @gemini-code-assist Responds in comments when explicitly tagged, both in pull request comments and review comments.
Help /gemini help Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here.

You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.

Footnotes

  1. Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution.

@coderabbitai
Copy link
Copy Markdown

coderabbitai Bot commented Mar 6, 2026

📝 Walkthrough

Walkthrough

Implements a reliability improvement plan for agentty by adding argument preservation, session readiness detection, attachment validation, and concurrent state management locking. Changes span command parsing, session initialization, worker invocation, state persistence, and comprehensive test coverage.

Changes

Cohort / File(s) Summary
Planning & Documentation
docs/plans/2026-03-06-agentty-reliability-implementation.md
Detailed reliability implementation plan outlining four tasks: preserving argv boundaries, preventing session state loss across concurrent invocations, tightening attach validation and kill semantics, and verifying the full test suite.
Command Parsing & CLI
src/index.ts
Extended StartOptionResult interface with optional args?: string[] and new displayCommand: string fields. Added formatCommandForDisplay helper to quote command parts containing spaces. Updated start command to pass parsed args and display command to session startup.
Session Attachment Validation
src/resolveSession.ts
Added ensureRunningSession helper that loads and validates sessions, ensuring only running sessions can be attached. Updated attachSession to enforce running-session requirement before allowing attachment.
Session Initialization & Readiness
src/sessionRuntime.ts
Added args and displayCommand support to StartSessionInput. Introduced waitForSessionReady helper to detect worker readiness and handle early worker exit. Updated startup flow to wait for readiness before returning session. Enhanced timeout handling with configurable polling intervals and improved cleanup on failure. Updated waitForExited to return boolean and killSession to validate exit within timeout.
Concurrent State Management
src/state.ts
Implemented file-based state locking mechanism via withStateLock to serialize critical mutations. Applied locking to writeSessions, upsertSession, and writeActiveSessionId. Added stale-lock detection and timeout error handling.
Worker Process Management
src/worker.ts
Extended WorkerSpec interface with optional args?: string[] and displayCommand?: string fields. Updated spawn logic to use direct spawning with args when provided, otherwise shell invocation. Modified toSessionRecord to use displayCommand when available.
Attachment & State Tests
tests/attach.test.ts
Added explicit state setup helpers (upsertSession, writeActiveSessionId). New tests verify error handling for nonexistent and exited sessions, ensuring failures clear active session state.
End-to-End Argument Tests
tests/e2e.start-argv.test.ts
New e2e test suite validating argv preservation through session lifecycle. Tests spawn CLI with scripts containing quoted, empty, and literal quote arguments, polling output to verify correct argument reconstruction.
End-to-End Vim Test Simplification
tests/e2e.vim.test.ts
Simplified Vim startup from multi-part shell sequence to single sh -lc invocation, delegating vim/vi fallback logic to shell execution.
Kill Timeout Tests
tests/sessionRuntime.kill-timeout.test.ts
New test validating killSession timeout enforcement. Verifies rejection when session does not exit within 3000ms and confirms active session ID is not cleared on timeout.
Session Runtime & Worker PID Tests
tests/sessionRuntime.start.test.ts
Extended session object with workerPid: number field. Added assertions verifying workerPid is present, numeric, and distinct from pid in both returned and persisted session records.
Concurrent State Mutation Tests
tests/state.concurrent.test.ts
New concurrency test spawning 24 parallel upsertSession calls. Validates all records persist without loss and confirms state locking prevents data corruption under concurrent mutation.

Sequence Diagram(s)

sequenceDiagram
    participant CLI as CLI (start)
    participant Index as index.ts
    participant SessionRT as sessionRuntime.ts
    participant Worker as worker.ts
    participant IPC as IPC Manager
    participant State as state.ts

    CLI->>Index: start command with args
    Index->>Index: parseStartOptions()
    Note over Index: Extract command, args,<br/>displayCommand
    Index->>SessionRT: startSession({command,<br/>args, displayCommand})
    SessionRT->>State: withStateLock()
    State->>SessionRT: Locked access acquired
    SessionRT->>Worker: Spawn with args or shell
    Worker->>IPC: Initialize socket
    IPC-->>Worker: Socket path
    Worker-->>SessionRT: worker spawned
    SessionRT->>SessionRT: waitForSessionReady()
    Note over SessionRT: Poll for worker pid match<br/>& socket readiness
    loop Until ready or timeout
        SessionRT->>IPC: Check socket readiness
        IPC-->>SessionRT: Ready/not ready
        SessionRT->>SessionRT: pid === workerPid?
    end
    SessionRT->>State: upsertSession(ready session)
    State-->>SessionRT: Locked write complete
    SessionRT-->>CLI: Return session with<br/>workerPid & pid
Loading
sequenceDiagram
    participant CLI as CLI (attach)
    participant Resolve as resolveSession.ts
    participant State as state.ts
    participant IPC as IPC Manager

    CLI->>Resolve: attachSession(sessionId)
    Resolve->>State: withStateLock()
    State->>Resolve: Locked access acquired
    Resolve->>Resolve: ensureRunningSession(id)
    Resolve->>State: readSessionById(id)
    State-->>Resolve: session object
    alt session not found or<br/>status !== 'running'
        Resolve-->>CLI: Error (invalid session)
        Resolve->>State: releaseStateLock()
    else session exists & running
        Resolve->>IPC: writeActiveSessionId(id)
        IPC-->>Resolve: Active ID set
        Resolve->>State: releaseStateLock()
        Resolve-->>CLI: Success
    end
Loading

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~45 minutes

Poem

🐰 Hop, hop, the args now parse with care,
Each token kept, no merged despair,
Locks guard the state through concurrent dance,
Readiness awaits before we prance,
Sessions validated, workers synchronized—
Agentty's heart, now stabilized! 🐇✨

🚥 Pre-merge checks | ✅ 2 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 0.00% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (2 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title 'fix: harden agentty session lifecycle' directly addresses the main objective: hardening session lifecycle management through argv preservation, PTY readiness checks, state serialization, and tighter validation.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
  • 📝 Generate docstrings (stacked PR)
  • 📝 Generate docstrings (commit on current branch)
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment
  • Commit unit tests in branch fix-agentty-session-reliability

Tip

Try Coding Plans. Let us write the prompt for your AI agent so you can ship faster (with fewer bugs).
Share your feedback on Discord.


Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Copy Markdown

@gemini-code-assist gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request significantly hardens the agentty session lifecycle by introducing file-based locking to prevent race conditions, improving command-line argument handling for agentty start, ensuring session startup waits for the PTY, and making attach and kill commands more robust. While these changes are well-supported by new and updated tests, the implementation of the state lock and temporary file handling in src/state.ts introduces potential security risks. The use of predictable temporary file names and lack of restrictive permissions on the lock directory could lead to local symbolic link attacks or denial-of-service, particularly when used in shared or world-writable directories (e.g., when AGENTTY_HOME is set to /tmp). Addressing these by using secure temporary file creation and enforcing strict directory permissions will further harden the application's security posture. Additionally, there is one suggestion to reduce code duplication.

Comment thread src/state.ts
Comment on lines +131 to +135
const tempPath = `${sessionsPath}.${process.pid}.${Date.now()}.tmp`;
const serialized = JSON.stringify(sessions, null, 2);

await writeFile(tempPath, serialized, 'utf8');
await rename(tempPath, sessionsPath);
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

security-medium medium

The use of a predictable temporary file name (sessionsPath.process.pid.Date.now().tmp) is vulnerable to symbolic link attacks. If the state directory (e.g., via AGENTTY_HOME) is set to a world-writable location like /tmp, a local attacker can predict the temporary file name and create a symbolic link at that path pointing to a sensitive file. When agentty attempts to write the sessions file, it will follow the link and overwrite the target file with the user's privileges. It is recommended to use a secure temporary file creation method that ensures a unique, non-predictable name and restrictive permissions (e.g., using fs.mkdtemp or a dedicated library).

Comment thread src/state.ts
Comment on lines +86 to +87
await mkdir(lockPath);
return;
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

security-medium medium

The lock directory is created without ensuring restrictive permissions. If the state directory is located in a shared or world-writable location (like /tmp), a local attacker could pre-create the lock directory with loose permissions or use it to perform a denial-of-service attack by preventing the user from acquiring the lock. Additionally, the stale lock removal logic (lines 98-101) contains a time-of-check to time-of-use (TOCTOU) race condition where a fresh lock from another process could be deleted if it is misidentified as stale. It is recommended to create the lock directory with restrictive permissions (e.g., 0700) and verify its ownership and permissions if it already exists.

Comment thread src/worker.ts
Comment on lines +349 to +363
ptyProcess = Array.isArray(spec.args)
? spawn(spec.command, spec.args, {
cwd: spec.cwd,
env: process.env,
name: 'xterm-256color',
cols: 80,
rows: 24,
})
: spawn(shell, ['-lc', spec.command], {
cwd: spec.cwd,
env: process.env,
name: 'xterm-256color',
cols: 80,
rows: 24,
});
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

To improve readability and reduce code duplication, you could extract the common pty options into a separate constant. This makes the code more maintainable, as any future changes to the options would only need to be made in one place.

  const ptyOptions = {
    cwd: spec.cwd,
    env: process.env,
    name: 'xterm-256color',
    cols: 80,
    rows: 24,
  };

  ptyProcess = Array.isArray(spec.args)
    ? spawn(spec.command, spec.args, ptyOptions)
    : spawn(shell, ['-lc', spec.command], ptyOptions);

Comment thread src/sessionRuntime.ts
Comment on lines +272 to 285
const readySession = await waitForSessionReady(sessionId, child.pid, () => workerExited);

if (!readySession) {
try {
process.kill(child.pid, 'SIGTERM');
} catch {
// ignore cleanup errors
}

await markSessionExited(sessionId, workerExitCode);

throw new Error(
`session worker failed to become ready within ${START_READY_TIMEOUT_MS}ms: ${socketPath}. Check worker log: ${logFilePath}`,
);
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Action required

1. Orphan pty on lock failure 🐞 Bug ⛯ Reliability

If session persistence during worker startup fails (e.g., state lock timeout) or the parent SIGTERMs
the worker during the new readiness wait, the worker can exit before installing SIGTERM/SIGINT
handlers and without calling requestKill(), potentially leaving the PTY process running orphaned.
Agent Prompt
## Issue description
The worker can spawn a PTY and then fail/exit before it installs SIGTERM handlers or calls `requestKill()`. With the new state-lock timeouts and the new `startSession()` readiness timeout that SIGTERMs the worker, this can leave orphan PTY processes.

## Issue Context
- Worker spawns PTY before awaiting `persistRunning()`.
- State persistence now can fail due to `withStateLock()` timeout.
- Worker top-level catch exits without killing PTY.

## Fix Focus Areas
- src/worker.ts[330-408]
- src/state.ts[79-125]
- src/sessionRuntime.ts[272-286]

ⓘ Copy this prompt and use it to remediate the issue with your preferred AI generation tools

Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🧹 Nitpick comments (1)
tests/e2e.start-argv.test.ts (1)

38-70: Keep the polling helper’s per-call timeout below its overall deadline.

waitForSnapshotContains() says it times out after 8s, but each runCommand() call can block for 10s. A single slow get can therefore overshoot the advertised deadline and make failures much harder to diagnose.

♻️ Proposed fix
-  async function runCommand(args: string[]): Promise<CommandResult> {
+  async function runCommand(args: string[], timeout = 10_000): Promise<CommandResult> {
     const result = await execa('node', ['dist/index.js', ...args], {
       env: {
         ...process.env,
         AGENTTY_HOME: tempHome,
       },
       reject: false,
-      timeout: 10_000,
+      timeout,
     });
@@
   async function waitForSnapshotContains(sessionId: string, needle: string): Promise<string> {
     const timeoutMs = 8_000;
     const intervalMs = 50;
     const startedAt = Date.now();
 
     while (Date.now() - startedAt < timeoutMs) {
-      const result = await runCommand(['get', '--session', sessionId, '--lines', '200']);
+      const elapsedMs = Date.now() - startedAt;
+      const remainingMs = timeoutMs - elapsedMs;
+      const result = await runCommand(
+        ['get', '--session', sessionId, '--lines', '200'],
+        Math.min(remainingMs, 1_000),
+      );
 
       if (result.exitCode === 0 && result.stdout.includes(needle)) {
         return result.stdout;
       }
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@tests/e2e.start-argv.test.ts` around lines 38 - 70, The polling helper
waitForSnapshotContains has an overall timeoutMs = 8_000 but calls runCommand
which currently uses a hardcoded execa timeout of 10_000; change runCommand to
accept an optional perCallTimeout parameter (or use a default shorter timeout)
and use that value for the execa timeout option, then call runCommand from
waitForSnapshotContains with a per-call timeout safely smaller than timeoutMs
(for example Math.min(2000, timeoutMs - elapsed - 100) or a fixed 2000ms) so no
single runCommand invocation can block past the polling deadline; update the
runCommand signature and the single call sites (notably the call inside
waitForSnapshotContains) to pass the new timeout.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@docs/plans/2026-03-06-agentty-reliability-implementation.md`:
- Line 13: The document jumps heading levels from H1 to H3; change the "### Task
1: Preserve argv boundaries and delay `start` success until PTY readiness"
heading to H2 (i.e., "## Task 1...") to restore proper incremental heading
levels, and apply the same change to the headings for Task 2, Task 3, and Task 4
(use the exact heading texts "Task 2:", "Task 3:", "Task 4:" in the file) so
each task heading is H2 and the outline increments by one level at a time.

---

Nitpick comments:
In `@tests/e2e.start-argv.test.ts`:
- Around line 38-70: The polling helper waitForSnapshotContains has an overall
timeoutMs = 8_000 but calls runCommand which currently uses a hardcoded execa
timeout of 10_000; change runCommand to accept an optional perCallTimeout
parameter (or use a default shorter timeout) and use that value for the execa
timeout option, then call runCommand from waitForSnapshotContains with a
per-call timeout safely smaller than timeoutMs (for example Math.min(2000,
timeoutMs - elapsed - 100) or a fixed 2000ms) so no single runCommand invocation
can block past the polling deadline; update the runCommand signature and the
single call sites (notably the call inside waitForSnapshotContains) to pass the
new timeout.

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 4430c00e-773d-4ea4-bc96-620bdb5e34fb

📥 Commits

Reviewing files that changed from the base of the PR and between 447d842 and 0511ef4.

📒 Files selected for processing (12)
  • docs/plans/2026-03-06-agentty-reliability-implementation.md
  • src/index.ts
  • src/resolveSession.ts
  • src/sessionRuntime.ts
  • src/state.ts
  • src/worker.ts
  • tests/attach.test.ts
  • tests/e2e.start-argv.test.ts
  • tests/e2e.vim.test.ts
  • tests/sessionRuntime.kill-timeout.test.ts
  • tests/sessionRuntime.start.test.ts
  • tests/state.concurrent.test.ts


---

### Task 1: Preserve argv boundaries and delay `start` success until PTY readiness
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Fix heading level increment.

The static analysis tool flagged that heading levels should increment by one level at a time. The document jumps from # (h1) to ### (h3), skipping ## (h2).

📝 Proposed fix
-### Task 1: Preserve argv boundaries and delay `start` success until PTY readiness
+## Task 1: Preserve argv boundaries and delay `start` success until PTY readiness

Apply the same change to Tasks 2, 3, and 4 (lines 31, 43, 59).

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
### Task 1: Preserve argv boundaries and delay `start` success until PTY readiness
## Task 1: Preserve argv boundaries and delay `start` success until PTY readiness
🧰 Tools
🪛 markdownlint-cli2 (0.21.0)

[warning] 13-13: Heading levels should only increment by one level at a time
Expected: h2; Actual: h3

(MD001, heading-increment)

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@docs/plans/2026-03-06-agentty-reliability-implementation.md` at line 13, The
document jumps heading levels from H1 to H3; change the "### Task 1: Preserve
argv boundaries and delay `start` success until PTY readiness" heading to H2
(i.e., "## Task 1...") to restore proper incremental heading levels, and apply
the same change to the headings for Task 2, Task 3, and Task 4 (use the exact
heading texts "Task 2:", "Task 3:", "Task 4:" in the file) so each task heading
is H2 and the outline increments by one level at a time.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

1 participant