Conversation
Review Summary by QodoHarden agentty session lifecycle with argv preservation, PTY readiness checks, and state serialization
WalkthroughsDescription• Preserve CLI argv boundaries by splitting command into executable and args array • Wait for PTY readiness before reporting session start success • Serialize shared session state updates with file-based locking to prevent concurrent loss • Validate attach targets and enforce kill timeout with explicit error handling • Add regression tests for argv preservation, kill timeout, state concurrency, and Vim flow Diagramflowchart LR
CLI["CLI start command"]
Parse["Parse argv into command + args"]
Worker["Spawn worker process"]
PTY["Wait for PTY readiness"]
Lock["Acquire state lock"]
Serialize["Serialize session state"]
Attach["Validate attach target"]
Kill["Enforce kill timeout"]
CLI --> Parse
Parse --> Worker
Worker --> PTY
PTY --> Lock
Lock --> Serialize
Serialize --> Attach
Attach --> Kill
File Changes2. src/sessionRuntime.ts
|
Code Review by Qodo
1. Orphan PTY on lock failure
|
Summary of ChangesHello, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed! This pull request significantly enhances the reliability and robustness of the Highlights
Changelog
Using Gemini Code AssistThe full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips. Invoking Gemini You can request assistance from Gemini at any point by creating a comment using either
Customization To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a Limitations & Feedback Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here. You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension. Footnotes
|
📝 WalkthroughWalkthroughImplements a reliability improvement plan for agentty by adding argument preservation, session readiness detection, attachment validation, and concurrent state management locking. Changes span command parsing, session initialization, worker invocation, state persistence, and comprehensive test coverage. Changes
Sequence Diagram(s)sequenceDiagram
participant CLI as CLI (start)
participant Index as index.ts
participant SessionRT as sessionRuntime.ts
participant Worker as worker.ts
participant IPC as IPC Manager
participant State as state.ts
CLI->>Index: start command with args
Index->>Index: parseStartOptions()
Note over Index: Extract command, args,<br/>displayCommand
Index->>SessionRT: startSession({command,<br/>args, displayCommand})
SessionRT->>State: withStateLock()
State->>SessionRT: Locked access acquired
SessionRT->>Worker: Spawn with args or shell
Worker->>IPC: Initialize socket
IPC-->>Worker: Socket path
Worker-->>SessionRT: worker spawned
SessionRT->>SessionRT: waitForSessionReady()
Note over SessionRT: Poll for worker pid match<br/>& socket readiness
loop Until ready or timeout
SessionRT->>IPC: Check socket readiness
IPC-->>SessionRT: Ready/not ready
SessionRT->>SessionRT: pid === workerPid?
end
SessionRT->>State: upsertSession(ready session)
State-->>SessionRT: Locked write complete
SessionRT-->>CLI: Return session with<br/>workerPid & pid
sequenceDiagram
participant CLI as CLI (attach)
participant Resolve as resolveSession.ts
participant State as state.ts
participant IPC as IPC Manager
CLI->>Resolve: attachSession(sessionId)
Resolve->>State: withStateLock()
State->>Resolve: Locked access acquired
Resolve->>Resolve: ensureRunningSession(id)
Resolve->>State: readSessionById(id)
State-->>Resolve: session object
alt session not found or<br/>status !== 'running'
Resolve-->>CLI: Error (invalid session)
Resolve->>State: releaseStateLock()
else session exists & running
Resolve->>IPC: writeActiveSessionId(id)
IPC-->>Resolve: Active ID set
Resolve->>State: releaseStateLock()
Resolve-->>CLI: Success
end
Estimated code review effort🎯 4 (Complex) | ⏱️ ~45 minutes Poem
🚥 Pre-merge checks | ✅ 2 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (2 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches
🧪 Generate unit tests (beta)
Tip Try Coding Plans. Let us write the prompt for your AI agent so you can ship faster (with fewer bugs). Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Code Review
This pull request significantly hardens the agentty session lifecycle by introducing file-based locking to prevent race conditions, improving command-line argument handling for agentty start, ensuring session startup waits for the PTY, and making attach and kill commands more robust. While these changes are well-supported by new and updated tests, the implementation of the state lock and temporary file handling in src/state.ts introduces potential security risks. The use of predictable temporary file names and lack of restrictive permissions on the lock directory could lead to local symbolic link attacks or denial-of-service, particularly when used in shared or world-writable directories (e.g., when AGENTTY_HOME is set to /tmp). Addressing these by using secure temporary file creation and enforcing strict directory permissions will further harden the application's security posture. Additionally, there is one suggestion to reduce code duplication.
| const tempPath = `${sessionsPath}.${process.pid}.${Date.now()}.tmp`; | ||
| const serialized = JSON.stringify(sessions, null, 2); | ||
|
|
||
| await writeFile(tempPath, serialized, 'utf8'); | ||
| await rename(tempPath, sessionsPath); |
There was a problem hiding this comment.
The use of a predictable temporary file name (sessionsPath.process.pid.Date.now().tmp) is vulnerable to symbolic link attacks. If the state directory (e.g., via AGENTTY_HOME) is set to a world-writable location like /tmp, a local attacker can predict the temporary file name and create a symbolic link at that path pointing to a sensitive file. When agentty attempts to write the sessions file, it will follow the link and overwrite the target file with the user's privileges. It is recommended to use a secure temporary file creation method that ensures a unique, non-predictable name and restrictive permissions (e.g., using fs.mkdtemp or a dedicated library).
| await mkdir(lockPath); | ||
| return; |
There was a problem hiding this comment.
The lock directory is created without ensuring restrictive permissions. If the state directory is located in a shared or world-writable location (like /tmp), a local attacker could pre-create the lock directory with loose permissions or use it to perform a denial-of-service attack by preventing the user from acquiring the lock. Additionally, the stale lock removal logic (lines 98-101) contains a time-of-check to time-of-use (TOCTOU) race condition where a fresh lock from another process could be deleted if it is misidentified as stale. It is recommended to create the lock directory with restrictive permissions (e.g., 0700) and verify its ownership and permissions if it already exists.
| ptyProcess = Array.isArray(spec.args) | ||
| ? spawn(spec.command, spec.args, { | ||
| cwd: spec.cwd, | ||
| env: process.env, | ||
| name: 'xterm-256color', | ||
| cols: 80, | ||
| rows: 24, | ||
| }) | ||
| : spawn(shell, ['-lc', spec.command], { | ||
| cwd: spec.cwd, | ||
| env: process.env, | ||
| name: 'xterm-256color', | ||
| cols: 80, | ||
| rows: 24, | ||
| }); |
There was a problem hiding this comment.
To improve readability and reduce code duplication, you could extract the common pty options into a separate constant. This makes the code more maintainable, as any future changes to the options would only need to be made in one place.
const ptyOptions = {
cwd: spec.cwd,
env: process.env,
name: 'xterm-256color',
cols: 80,
rows: 24,
};
ptyProcess = Array.isArray(spec.args)
? spawn(spec.command, spec.args, ptyOptions)
: spawn(shell, ['-lc', spec.command], ptyOptions);| const readySession = await waitForSessionReady(sessionId, child.pid, () => workerExited); | ||
|
|
||
| if (!readySession) { | ||
| try { | ||
| process.kill(child.pid, 'SIGTERM'); | ||
| } catch { | ||
| // ignore cleanup errors | ||
| } | ||
|
|
||
| await markSessionExited(sessionId, workerExitCode); | ||
|
|
||
| throw new Error( | ||
| `session worker failed to become ready within ${START_READY_TIMEOUT_MS}ms: ${socketPath}. Check worker log: ${logFilePath}`, | ||
| ); |
There was a problem hiding this comment.
1. Orphan pty on lock failure 🐞 Bug ⛯ Reliability
If session persistence during worker startup fails (e.g., state lock timeout) or the parent SIGTERMs the worker during the new readiness wait, the worker can exit before installing SIGTERM/SIGINT handlers and without calling requestKill(), potentially leaving the PTY process running orphaned.
Agent Prompt
## Issue description
The worker can spawn a PTY and then fail/exit before it installs SIGTERM handlers or calls `requestKill()`. With the new state-lock timeouts and the new `startSession()` readiness timeout that SIGTERMs the worker, this can leave orphan PTY processes.
## Issue Context
- Worker spawns PTY before awaiting `persistRunning()`.
- State persistence now can fail due to `withStateLock()` timeout.
- Worker top-level catch exits without killing PTY.
## Fix Focus Areas
- src/worker.ts[330-408]
- src/state.ts[79-125]
- src/sessionRuntime.ts[272-286]
ⓘ Copy this prompt and use it to remediate the issue with your preferred AI generation tools
There was a problem hiding this comment.
Actionable comments posted: 1
🧹 Nitpick comments (1)
tests/e2e.start-argv.test.ts (1)
38-70: Keep the polling helper’s per-call timeout below its overall deadline.
waitForSnapshotContains()says it times out after 8s, but eachrunCommand()call can block for 10s. A single slowgetcan therefore overshoot the advertised deadline and make failures much harder to diagnose.♻️ Proposed fix
- async function runCommand(args: string[]): Promise<CommandResult> { + async function runCommand(args: string[], timeout = 10_000): Promise<CommandResult> { const result = await execa('node', ['dist/index.js', ...args], { env: { ...process.env, AGENTTY_HOME: tempHome, }, reject: false, - timeout: 10_000, + timeout, }); @@ async function waitForSnapshotContains(sessionId: string, needle: string): Promise<string> { const timeoutMs = 8_000; const intervalMs = 50; const startedAt = Date.now(); while (Date.now() - startedAt < timeoutMs) { - const result = await runCommand(['get', '--session', sessionId, '--lines', '200']); + const elapsedMs = Date.now() - startedAt; + const remainingMs = timeoutMs - elapsedMs; + const result = await runCommand( + ['get', '--session', sessionId, '--lines', '200'], + Math.min(remainingMs, 1_000), + ); if (result.exitCode === 0 && result.stdout.includes(needle)) { return result.stdout; }🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@tests/e2e.start-argv.test.ts` around lines 38 - 70, The polling helper waitForSnapshotContains has an overall timeoutMs = 8_000 but calls runCommand which currently uses a hardcoded execa timeout of 10_000; change runCommand to accept an optional perCallTimeout parameter (or use a default shorter timeout) and use that value for the execa timeout option, then call runCommand from waitForSnapshotContains with a per-call timeout safely smaller than timeoutMs (for example Math.min(2000, timeoutMs - elapsed - 100) or a fixed 2000ms) so no single runCommand invocation can block past the polling deadline; update the runCommand signature and the single call sites (notably the call inside waitForSnapshotContains) to pass the new timeout.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@docs/plans/2026-03-06-agentty-reliability-implementation.md`:
- Line 13: The document jumps heading levels from H1 to H3; change the "### Task
1: Preserve argv boundaries and delay `start` success until PTY readiness"
heading to H2 (i.e., "## Task 1...") to restore proper incremental heading
levels, and apply the same change to the headings for Task 2, Task 3, and Task 4
(use the exact heading texts "Task 2:", "Task 3:", "Task 4:" in the file) so
each task heading is H2 and the outline increments by one level at a time.
---
Nitpick comments:
In `@tests/e2e.start-argv.test.ts`:
- Around line 38-70: The polling helper waitForSnapshotContains has an overall
timeoutMs = 8_000 but calls runCommand which currently uses a hardcoded execa
timeout of 10_000; change runCommand to accept an optional perCallTimeout
parameter (or use a default shorter timeout) and use that value for the execa
timeout option, then call runCommand from waitForSnapshotContains with a
per-call timeout safely smaller than timeoutMs (for example Math.min(2000,
timeoutMs - elapsed - 100) or a fixed 2000ms) so no single runCommand invocation
can block past the polling deadline; update the runCommand signature and the
single call sites (notably the call inside waitForSnapshotContains) to pass the
new timeout.
ℹ️ Review info
⚙️ Run configuration
Configuration used: defaults
Review profile: CHILL
Plan: Pro
Run ID: 4430c00e-773d-4ea4-bc96-620bdb5e34fb
📒 Files selected for processing (12)
docs/plans/2026-03-06-agentty-reliability-implementation.mdsrc/index.tssrc/resolveSession.tssrc/sessionRuntime.tssrc/state.tssrc/worker.tstests/attach.test.tstests/e2e.start-argv.test.tstests/e2e.vim.test.tstests/sessionRuntime.kill-timeout.test.tstests/sessionRuntime.start.test.tstests/state.concurrent.test.ts
|
|
||
| --- | ||
|
|
||
| ### Task 1: Preserve argv boundaries and delay `start` success until PTY readiness |
There was a problem hiding this comment.
Fix heading level increment.
The static analysis tool flagged that heading levels should increment by one level at a time. The document jumps from # (h1) to ### (h3), skipping ## (h2).
📝 Proposed fix
-### Task 1: Preserve argv boundaries and delay `start` success until PTY readiness
+## Task 1: Preserve argv boundaries and delay `start` success until PTY readinessApply the same change to Tasks 2, 3, and 4 (lines 31, 43, 59).
📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| ### Task 1: Preserve argv boundaries and delay `start` success until PTY readiness | |
| ## Task 1: Preserve argv boundaries and delay `start` success until PTY readiness |
🧰 Tools
🪛 markdownlint-cli2 (0.21.0)
[warning] 13-13: Heading levels should only increment by one level at a time
Expected: h2; Actual: h3
(MD001, heading-increment)
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@docs/plans/2026-03-06-agentty-reliability-implementation.md` at line 13, The
document jumps heading levels from H1 to H3; change the "### Task 1: Preserve
argv boundaries and delay `start` success until PTY readiness" heading to H2
(i.e., "## Task 1...") to restore proper incremental heading levels, and apply
the same change to the headings for Task 2, Task 3, and Task 4 (use the exact
heading texts "Task 2:", "Task 3:", "Task 4:" in the file) so each task heading
is H2 and the outline increments by one level at a time.
Summary
agentty startand wait for PTY readiness before reporting successattach/killsession lifecycle checksIssues
Test Plan
bun run testSummary by CodeRabbit
Release Notes
New Features
Bug Fixes
Tests