Skip to content

feat(harness): FileSnapshotStore takes top-level dir, auto-gitignore#123

Open
minpeter wants to merge 1 commit intomainfrom
feat/file-snapshot-auto-gitignore
Open

feat(harness): FileSnapshotStore takes top-level dir, auto-gitignore#123
minpeter wants to merge 1 commit intomainfrom
feat/file-snapshot-auto-gitignore

Conversation

@minpeter
Copy link
Copy Markdown
Owner

@minpeter minpeter commented Apr 24, 2026

Summary

  • Reshape FileSnapshotStore so it owns its on-disk layout: pass a top-level directory (.plugsuits, .minimal-agent, ...) instead of a pre-resolved /sessions path. The store manages <root>/sessions/*.jsonl itself and exposes rootDir / sessionsDir getters for consumers that want to co-locate related files (e.g. session memory).
  • Auto-gitignore: when the root dir lives inside a git worktree, the store appends the top-level dir to that worktree's .gitignore if not already listed. Opt out with { autoGitignore: false }.
  • Migrate all in-repo consumers (cea, minimal-agent, tgbot) to the new contract. No backward compatibility is kept for the legacy session-file path or the old env var names.

Motivation

Every consumer was reimplementing the same few lines: resolve a root dir, mkdirSync(.../sessions), pass .../sessions into the store, and then separately remember to add the state dir to .gitignore. The last step was usually forgotten, and agent state directories leaked into commits.

This PR pushes both responsibilities into the store:

  1. Convention over configuration — consumers stop thinking about the /sessions subpath. They hand over a top-level dir; the store lays out its files.
  2. Safe-by-default persistence — state dirs are auto-ignored on the way in, so .plugsuits/, .minimal-agent/, <tmpdir>/tgbot/ can't be committed by accident.

Changes

FileSnapshotStore (core)

  • Constructor signature: new FileSnapshotStore(rootDir, options?). rootDir is the top-level dir; sessions live at <rootDir>/sessions/*.jsonl.
  • Public getters: rootDir, sessionsDir (resolved absolute paths).
  • New options type: FileSnapshotStoreOptions { autoGitignore?: boolean } (defaults to true).
  • Removed the undocumented getFilePath fallback for unencoded session filenames. Files are always encoded via encodeSessionId.

gitignore-sync (new module)

Covered by packages/harness/src/gitignore-sync.ts + tests. Exported from both the package root and @ai-sdk-tool/harness/sessions:

  • ensureDirIgnoredByGit, ensureGitignoreEntry, findNearestGitignore, gitignoreEntryForDir.

Design constraints:

Risk Mitigation
Concurrent writers corrupt .gitignore .gitignore.lock via openSync(path, \"wx\") serializes writers
Crashed writer wedges the next caller Stale locks older than 30s are reclaimed
Partial write leaves .gitignore truncated Temp-file + rename swap (atomic on same fs)
Mixes LF/CRLF with existing file Detects and preserves the file's line-ending convention
Accidentally modifies parent repo's or home-level .gitignore Refuses to touch any ancestor .gitignore that is not at a verified worktree root (sibling .git marker)

Consumer migration (no backward compat)

Package Before After
cea hardcoded .plugsuits/sessions + explicit mkdirSync + manual session-memory path new FileSnapshotStore(\".plugsuits\"); session-memory path derived from store.sessionsDir
minimal-agent SESSION_DIR (default .minimal-agent/sessions) MINIMAL_AGENT_DIR (default .minimal-agent)
tgbot SESSION_DIR (default <tmpdir>/tgbot-sessions) TGBOT_DIR (default <tmpdir>/tgbot)

Tests

  • Existing FileSnapshotStore suite passes { autoGitignore: false } to stay hermetic.
  • New cases: rootDir / sessionsDir getter exposure, <root>/sessions/ layout, auto-gitignore inside a fake worktree, and the skip path when the root is outside any worktree.
  • Standalone gitignore-sync.test.ts covers concurrency (via a worker mjs), stale-lock reclaim, LF/CRLF preservation, and the worktree-root guard.

Docs

  • AGENTS.md — updated FileSnapshotStore example and auto-gitignore semantics.
  • packages/harness/README.md — sample path updated.
  • packages/minimal-agent/README.md — documents the new MINIMAL_AGENT_DIR.

Changeset

patch bump for @ai-sdk-tool/harness, @plugsuits/minimal-agent, @plugsuits/tgbot, plugsuits.

Verification

  • pnpm run typecheck — all 6 packages green (full turbo cache).
  • pnpm --filter @ai-sdk-tool/harness test — 727/727 passing (47 files).

Migration notes for downstream consumers

If you were constructing FileSnapshotStore directly:

- new FileSnapshotStore(\".plugsuits/sessions\")
+ new FileSnapshotStore(\".plugsuits\")

If you relied on SESSION_DIR:

  • minimal-agent: set MINIMAL_AGENT_DIR instead.
  • tgbot: set TGBOT_DIR instead.

To disable the auto-gitignore behavior (e.g. in tests or non-git environments):

new FileSnapshotStore(dir, { autoGitignore: false })

Summary by cubic

FileSnapshotStore now takes a top-level state directory and manages <root>/sessions/*.jsonl itself. It also auto-adds the state dir to the repo .gitignore (safe and atomic), which you can disable.

  • New Features

    • @ai-sdk-tool/harness: new FileSnapshotStore(rootDir, { autoGitignore = true }); exposes rootDir and sessionsDir; session files always use encodeSessionId.
    • Auto-gitignore inside a git worktree; concurrency-safe with a lock, atomic writes, preserves LF/CRLF, and only writes at a verified worktree root.
    • New utilities exported: ensureDirIgnoredByGit, ensureGitignoreEntry, findNearestGitignore, gitignoreEntryForDir.
  • Migration

    • Replace new FileSnapshotStore("<dir>/sessions") with new FileSnapshotStore("<dir>"), and use store.sessionsDir for co-located files.
    • @plugsuits/minimal-agent: SESSION_DIRMINIMAL_AGENT_DIR (default .minimal-agent).
    • @plugsuits/tgbot: SESSION_DIRTGBOT_DIR (default <tmpdir>/tgbot).
    • To opt out of auto-gitignore: new FileSnapshotStore(dir, { autoGitignore: false }).

Written for commit b81ba25. Summary will update on new commits.

Reshape FileSnapshotStore to own its on-disk layout and keep agent state
directories out of git by default.

Constructor now takes a top-level directory (e.g. `.plugsuits`,
`.minimal-agent`) instead of a pre-resolved `/sessions` path. The store
manages `<root>/sessions/*.jsonl` itself and exposes `rootDir` /
`sessionsDir` getters so consumers can co-locate related files
(e.g. session memory).

When the root dir lives inside a git worktree, the store appends the
top-level dir to that worktree's `.gitignore` if not already listed.
The update is concurrency-safe (exclusive `.gitignore.lock` via
`openSync(path, "wx")`, stale-lock reclaim after 30s) and atomic
(temp-file + rename). Existing LF/CRLF line endings are preserved, and
any ancestor `.gitignore` that is not at a verified worktree root is
refused — so a parent repo's or home-level ignore file cannot be
touched accidentally. Opt out with `{ autoGitignore: false }`.

No backward compatibility:
- The undocumented unencoded-filename `getFilePath` fallback is gone;
  session files always live at
  `<sessionsDir>/<encodeSessionId(sessionId)>.jsonl`.
- `minimal-agent`: `SESSION_DIR` -> `MINIMAL_AGENT_DIR` (default
  `.minimal-agent`).
- `tgbot`: `SESSION_DIR` -> `TGBOT_DIR` (default `<tmpdir>/tgbot`).
- CEA now constructs its store with `.plugsuits` as the top-level dir
  and derives session-memory paths from `store.sessionsDir`.
@coderabbitai
Copy link
Copy Markdown

coderabbitai Bot commented Apr 24, 2026

Important

Review skipped

Auto reviews are disabled on this repository. To trigger a review, include @crb review in the PR description. Please check the settings in the CodeRabbit UI or the .coderabbit.yaml file in this repository. To trigger a single review, invoke the @coderabbitai review command.

⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: 8d2be9dc-d3fb-4b8f-ba9b-90bb58584b98

You can disable this status message by setting the reviews.review_status to false in the CodeRabbit configuration file.

Use the checkbox below for a quick retry:

  • 🔍 Trigger review
✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch feat/file-snapshot-auto-gitignore

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Copy Markdown
Contributor

@gemini-code-assist gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request refactors the FileSnapshotStore to manage its own internal directory layout, moving session snapshots into a sessions subdirectory under a specified root. It introduces a new gitignore-sync utility that automatically and atomically appends the storage directory to the nearest .gitignore file when a git worktree is detected. Additionally, environment variables for minimal-agent and tgbot have been migrated to support this new structure. Feedback was provided regarding the sleepSync implementation in the gitignore utility, which currently uses a busy-wait loop that blocks the Node.js event loop and consumes excessive CPU cycles.

Comment on lines +89 to +95
function sleepSync(ms: number): void {
const end = Date.now() + ms;
while (Date.now() < end) {
// Busy-wait is acceptable for sub-100ms lock contention. The lock window
// is bounded by a single small file rename, so spin time is negligible.
}
}
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

high

The sleepSync function uses a busy-wait loop, which consumes 100% CPU on the thread while waiting. In Node.js, this blocks the event loop and is highly inefficient. Since this utility is used during session initialization (including in long-running processes like tgbot), this can lead to significant performance degradation and resource exhaustion under lock contention. A better approach for a synchronous sleep in Node.js is to use Atomics.wait with a SharedArrayBuffer, which suspends the thread without burning CPU cycles.

function sleepSync(ms: number): void {
  Atomics.wait(new Int32Array(new SharedArrayBuffer(4)), 0, 0, ms);
}

Copy link
Copy Markdown

@cubic-dev-ai cubic-dev-ai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

1 issue found across 16 files

Prompt for AI agents (unresolved issues)

Check if these issues are valid — if so, understand the root cause of each and fix them. If appropriate, use sub-agents to investigate and fix each issue separately.


<file name="packages/harness/src/gitignore-sync.ts">

<violation number="1" location="packages/harness/src/gitignore-sync.ts:89">
P2: `sleepSync` uses a busy-wait loop that burns 100% CPU for the sleep duration. Under lock contention the acquire loop can spin for up to 5 seconds (`LOCK_ACQUIRE_TIMEOUT_MS`). Use `Atomics.wait` instead, which suspends the thread without consuming CPU cycles:
```ts
function sleepSync(ms: number): void {
  Atomics.wait(new Int32Array(new SharedArrayBuffer(4)), 0, 0, ms);
}
```</violation>
</file>

Reply with feedback, questions, or to request a fix. Tag @cubic-dev-ai to re-run a review.

Comment on lines +89 to +94
function sleepSync(ms: number): void {
const end = Date.now() + ms;
while (Date.now() < end) {
// Busy-wait is acceptable for sub-100ms lock contention. The lock window
// is bounded by a single small file rename, so spin time is negligible.
}
Copy link
Copy Markdown

@cubic-dev-ai cubic-dev-ai Bot Apr 24, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2: sleepSync uses a busy-wait loop that burns 100% CPU for the sleep duration. Under lock contention the acquire loop can spin for up to 5 seconds (LOCK_ACQUIRE_TIMEOUT_MS). Use Atomics.wait instead, which suspends the thread without consuming CPU cycles:

function sleepSync(ms: number): void {
  Atomics.wait(new Int32Array(new SharedArrayBuffer(4)), 0, 0, ms);
}
Prompt for AI agents
Check if this issue is valid — if so, understand the root cause and fix it. At packages/harness/src/gitignore-sync.ts, line 89:

<comment>`sleepSync` uses a busy-wait loop that burns 100% CPU for the sleep duration. Under lock contention the acquire loop can spin for up to 5 seconds (`LOCK_ACQUIRE_TIMEOUT_MS`). Use `Atomics.wait` instead, which suspends the thread without consuming CPU cycles:
```ts
function sleepSync(ms: number): void {
  Atomics.wait(new Int32Array(new SharedArrayBuffer(4)), 0, 0, ms);
}
```</comment>

<file context>
@@ -0,0 +1,268 @@
+  return normalizeEntry(trimmed) === normalizedEntry;
+}
+
+function sleepSync(ms: number): void {
+  const end = Date.now() + ms;
+  while (Date.now() < end) {
</file context>
Suggested change
function sleepSync(ms: number): void {
const end = Date.now() + ms;
while (Date.now() < end) {
// Busy-wait is acceptable for sub-100ms lock contention. The lock window
// is bounded by a single small file rename, so spin time is negligible.
}
function sleepSync(ms: number): void {
Atomics.wait(new Int32Array(new SharedArrayBuffer(4)), 0, 0, ms);
}
Fix with Cubic

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant