Skip to content

fix(cli): add --data-dir flag + AGENTMEMORY_DATA_DIR so engine state lives outside repos#314

Open
mvanhorn wants to merge 1 commit into
rohitg00:mainfrom
mvanhorn:fix/303-data-cwd-pollution
Open

fix(cli): add --data-dir flag + AGENTMEMORY_DATA_DIR so engine state lives outside repos#314
mvanhorn wants to merge 1 commit into
rohitg00:mainfrom
mvanhorn:fix/303-data-cwd-pollution

Conversation

@mvanhorn
Copy link
Copy Markdown
Contributor

@mvanhorn mvanhorn commented May 12, 2026

Summary

Adds a --data-dir flag and AGENTMEMORY_DATA_DIR env var so the engine writes its state outside the caller's cwd. Defaults to a platform-appropriate user dir, so the documented npx @agentmemory/agentmemory flow no longer drops a data/ tree into whatever repo you happen to be in.

What changed

  • src/cli-data-dir.ts — new resolver: flag > env > platform default. ~ expansion, ${VAR} interpolation, . and ./data honored as explicit opt-ins for repo-local state.
  • src/cli.ts--data-dir <path> parsed alongside --port; writes a runtime iii-config.yaml into the resolved dir so state_store.db and stream_store live there too. --help documents the flag.
  • docker-compose.yml — fallback mount becomes ${AGENTMEMORY_DATA_DIR:-iii-data} to /data.
  • README.md — short section on the new flag and env var.
  • test/cli-data-dir.test.ts — covers flag-wins, env-fallback, default-not-cwd, repo-local opt-in, ~ expansion.

Default by platform:

  • macOS: ~/Library/Application Support/agentmemory
  • Linux: $XDG_DATA_HOME/agentmemory (falls back to ~/.local/share/agentmemory)
  • Windows: ~/AppData/Local/agentmemory

Testing

npm test and npm run build couldn't run locally because vitest / tsdown aren't installed in this sandbox (npm install blocked by network). The new tests are written against the existing vitest shape used elsewhere in test/; CI / CodeRabbit will exercise them.

Fixes #303

Signed-off-by: Matt Van Horn mvanhorn@gmail.com

Summary by CodeRabbit

  • New Features

    • Users can now configure where the application stores persistent data using the --data-dir command-line flag or AGENTMEMORY_DATA_DIR environment variable. Platform-specific default storage locations are supported.
  • Documentation

    • README documentation now includes data directory configuration options, default paths for macOS, Linux, and Windows, and usage examples.

Review Change Stack

…lives outside repos (rohitg00#303)

Signed-off-by: Matt Van Horn <455140+mvanhorn@users.noreply.github.com>
@vercel
Copy link
Copy Markdown

vercel Bot commented May 12, 2026

@mvanhorn is attempting to deploy a commit to the rohitg00's projects Team on Vercel.

A member of the Team first needs to authorize it.

@coderabbitai
Copy link
Copy Markdown

coderabbitai Bot commented May 12, 2026

📝 Walkthrough

Walkthrough

This PR adds configurable data directory support to agentmemory to prevent the engine from writing state into user project directories when started from within a git repository. It introduces a new CLI module to resolve data directories with precedence (flag > env var > platform default), integrates it into the CLI startup and runtime config generation, updates Docker Compose to use the resolved path, and documents the feature.

Changes

Configurable data directory to prevent repo pollution

Layer / File(s) Summary
Data directory resolution module
src/cli-data-dir.ts, test/cli-data-dir.test.ts
Exports resolveDataDir() to select data directory from CLI flag, environment variable, or platform-specific defaults (macOS Application Support, Linux XDG_DATA_HOME, Windows APPDATA). When using defaults, detects git worktree ancestors and optionally relocates the directory outside the repository. Tests verify flag/env precedence and platform-specific paths.
CLI integration and runtime config generation
src/cli.ts
CLI calls resolveDataDir() early and sets AGENTMEMORY_DATA_DIR environment variable. New helpers render a runtime iii-config.yaml with file paths rewritten to the resolved data directory. startEngine() writes the rendered config, warns if the default was relocated from a git worktree, and ensures the data directory exists. Help text documents the new environment variable, and import-jsonl argument parsing treats --data-dir as a value-taking flag.
Docker Compose volume mapping
docker-compose.yml
Both iii-init and iii-engine services now map volumes using ${AGENTMEMORY_DATA_DIR:-iii-data}:/data, allowing the CLI to override the volume path via environment variable while preserving the named volume fallback.
User documentation
README.md
Explains default data directory locations across platforms and how to override using --data-dir flag or AGENTMEMORY_DATA_DIR environment variable, with example usage.

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~25 minutes

Poem

🐰 A worktree shall not be cursed,

With data/ scattered and immersed!

Flag and env now take the lead,

XDG helpers plant the seed.

From ~/.local/ state runs free,

Your repos clean as they should be! ✨

🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 0.00% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (4 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title accurately describes the main change: adding a --data-dir flag and AGENTMEMORY_DATA_DIR to resolve the issue of engine state being written to the caller's working directory.
Linked Issues check ✅ Passed The PR comprehensively addresses all coding requirements from issue #303: implements --data-dir flag, AGENTMEMORY_DATA_DIR env var, platform-specific defaults, auto-relocation from git repos, and includes tests and documentation.
Out of Scope Changes check ✅ Passed All changes are scoped to implementing the data directory resolution feature: CLI parsing, data directory resolution logic, Docker Compose config, README documentation, and tests—no unrelated modifications detected.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🧹 Nitpick comments (1)
test/cli-data-dir.test.ts (1)

1-73: ⚡ Quick win

Add test coverage for git relocation behavior.

The test suite doesn't verify the relocatedFrom field that's returned when the default data directory is inside a git worktree. Adding tests for this behavior would have caught the logic bug in src/cli-data-dir.ts lines 99-110.

📝 Example test to add
it("relocates default when cwd is inside a git repo", () => {
  const cwd = "/repo/project";
  const home = "/home/alex";
  const resolved = resolveDataDir({
    args: [],
    env: {},
    cwd,
    home,
    platform: "linux",
  });

  // Should relocate away from XDG default when inside a git repo
  expect(resolved.source).toBe("default");
  expect(resolved.relocatedFrom).toBeDefined();
});
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@test/cli-data-dir.test.ts` around lines 1 - 73, The tests are missing
coverage for the git relocation behavior: add a test calling resolveDataDir with
no args/env and cwd inside a git worktree to assert source === "default" and
that relocatedFrom is defined; if that test fails, fix the relocation logic
inside resolveDataDir so that when the computed default dataDir would live
inside the current cwd (or git worktree) the function moves it out and sets the
relocatedFrom property to the original path (ensure the branch that computes the
alternate dataDir assigns relocatedFrom before returning). Target the
resolveDataDir function and the relocatedFrom field to locate and update the
code and add the corresponding test case to the test suite.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@src/cli-data-dir.ts`:
- Around line 99-110: The relocation check is using nearestGitParent(defaultDir)
but should detect if the current working directory is inside a git repo; change
the condition to call nearestGitParent(cwd) instead of
nearestGitParent(defaultDir) so that the block which computes relocated via
platformDefaultDataDir(env, home, nodePlatform, false) and returns { dataDir:
relocated, source: "default", relocatedFrom: defaultDir } executes when the
process is started inside a git repository; update any variable references
around that if needed to keep the existing logic (defaultDir, relocated) intact.

---

Nitpick comments:
In `@test/cli-data-dir.test.ts`:
- Around line 1-73: The tests are missing coverage for the git relocation
behavior: add a test calling resolveDataDir with no args/env and cwd inside a
git worktree to assert source === "default" and that relocatedFrom is defined;
if that test fails, fix the relocation logic inside resolveDataDir so that when
the computed default dataDir would live inside the current cwd (or git worktree)
the function moves it out and sets the relocatedFrom property to the original
path (ensure the branch that computes the alternate dataDir assigns
relocatedFrom before returning). Target the resolveDataDir function and the
relocatedFrom field to locate and update the code and add the corresponding test
case to the test suite.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: e47788fc-09f1-41a2-a834-a3ab49a9e16e

📥 Commits

Reviewing files that changed from the base of the PR and between 48b11e8 and fb387c8.

📒 Files selected for processing (5)
  • README.md
  • docker-compose.yml
  • src/cli-data-dir.ts
  • src/cli.ts
  • test/cli-data-dir.test.ts

Comment thread src/cli-data-dir.ts
Comment on lines +99 to +110
const defaultDir = platformDefaultDataDir(env, home, nodePlatform);
const gitParent = nearestGitParent(defaultDir);
if (gitParent) {
const relocated = platformDefaultDataDir(env, home, nodePlatform, false);
if (relocated !== defaultDir) {
return {
dataDir: relocated,
source: "default",
relocatedFrom: defaultDir,
};
}
}
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🔴 Critical | ⚡ Quick win

Git relocation logic checks the wrong path.

Lines 100-110 check whether the platform default data directory is inside a git repo (nearestGitParent(defaultDir)), but the PR objective states "auto-relocate when started inside a git repo" — meaning when the user's current working directory is inside a git repo.

The logic should check nearestGitParent(cwd) instead of nearestGitParent(defaultDir). Currently, relocation only happens if the data directory default (e.g., ~/.local/share/agentmemory) itself is inside a git repo, which is almost never the case.

🐛 Proposed fix
   const defaultDir = platformDefaultDataDir(env, home, nodePlatform);
-  const gitParent = nearestGitParent(defaultDir);
+  const gitParent = nearestGitParent(cwd);
   if (gitParent) {
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@src/cli-data-dir.ts` around lines 99 - 110, The relocation check is using
nearestGitParent(defaultDir) but should detect if the current working directory
is inside a git repo; change the condition to call nearestGitParent(cwd) instead
of nearestGitParent(defaultDir) so that the block which computes relocated via
platformDefaultDataDir(env, home, nodePlatform, false) and returns { dataDir:
relocated, source: "default", relocatedFrom: defaultDir } executes when the
process is started inside a git repository; update any variable references
around that if needed to keep the existing logic (defaultDir, relocated) intact.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Engine writes data/ to caller's cwd — pollutes user repos when started from project root

1 participant