Skip to content

fix: #12 daemon OS keychain support (multi-platform, wallet-based namespacing)#24

Merged
hanwencheng merged 12 commits intomainfrom
fix/issue-12
Apr 15, 2026
Merged

fix: #12 daemon OS keychain support (multi-platform, wallet-based namespacing)#24
hanwencheng merged 12 commits intomainfrom
fix/issue-12

Conversation

@hanwencheng
Copy link
Copy Markdown
Member

@hanwencheng hanwencheng commented Apr 14, 2026

Summary

Daemon now uses the OS keychain when available (macOS Keychain, Linux secret-service via keyring crate) with file fallback. Multiple daemons on one machine no longer overwrite each other.

What changed

Shared session_store in agentkeys-core

  • session_store moved from agentkeys-cli to agentkeys-core.
  • API: save_session(session, session_id), load_session(session_id), clear_session(session_id).
  • Keychain: service="agentkeys", account=<session_id>.
  • File fallback: ~/.agentkeys/<session_id>/session.json (0600).

CLI

  • Uses session_id = "master".

Daemon

  • Pre-pair: session_id = "daemon-pending".
  • Post-pair / recover: session_id = format!("daemon-{}", wallet.0).

Env override

  • AGENTKEYS_SESSION_STORE=file still forces file-only (CI/Docker/headless).

Test plan

  • cargo test -p agentkeys-core -- --test-threads=1 → 10 passed (includes 4 new session_store tests)
  • cargo test -p agentkeys-cli → 14 passed
  • cargo test -p agentkeys-daemon → 28 passed (13 + 15)
  • cargo test -p agentkeys-mock-server → 37 passed
  • Known nit: session_store tests are not parallel-safe due to shared keychain state; run with --test-threads=1 or add serial_test as a follow-up.
  • Claude reviewer: running at PR-open
  • Codex: deferred (codex exec issue)

Follow-ups

  • Add serial_test crate to isolate keychain-touching tests for parallel execution.
  • Update docs/spec/architecture.md + wiki/key-security.md storage table (separate docs-only PR).

Issue

Closes #12.

🤖 Generated with Claude Code


Codex review (2026-04-14): ⚠️ Reviewed + fixed in 2 rounds.

  • Round 1 (commit 6553041): daemon-pending sentinel removed + list_fallback_session_ids + legacy-master helper.
  • Round 2 (commit 9ef328e): discovery-marker files for keyring-only sessions + ctx.load_session() wired to legacy fallback.
    52 tests across cli/core/daemon crates all pass.

@hanwencheng
Copy link
Copy Markdown
Member Author

Codex review (via gstack /codex skill, GPT-5.4 codex-high reasoning against origin/main)

Verdict

The daemon's new startup logic cannot reliably resume persisted sessions and can get stuck on a synthetic pending entry after failed pairing attempts. The storage migration also drops compatibility with existing CLI sessions, so the patch introduces user-visible regressions.

Full review comments:

  • [P1] Reuse the saved daemon session on default startup — crates/agentkeys-daemon/src/main.rs:105-108
    When the daemon starts without --session-id, this code only probes daemon-pending. A successful pair/recover stores the real session under daemon-<wallet>, so the next restart never finds it and falls back into pairing again unless the caller manually supplies the derived ID every time. That breaks the default persistence path for the new wallet-namespaced storage.

  • [P1] Clear the pending sentinel when pairing fails — crates/agentkeys-daemon/src/main.rs:115-120
    This writes a synthetic daemon-pending session before pairing, but it is only removed on the success path. If run_pair_flow() times out, is rejected, or otherwise errors, the next startup will successfully load that fake token="pending" session at line 108 and skip pairing entirely, leaving the daemon unusable until the user manually deletes the entry.

  • [P2] Fall back to legacy master session locations — crates/agentkeys-core/src/session_store.rs:42-48
    load_session("master") now only checks the new master namespace. Existing installs store the CLI session under the old keyring account/path (account=session or ~/.agentkeys/session.json), so after upgrading, commands that rely on ctx.load_session() behave as logged out until the user re-runs init or recover. A compatibility fallback or one-time migration is needed to avoid invalidating current logins.

— codex review --base main + -c 'model_reasoning_effort="high"'.

hanwencheng added a commit that referenced this pull request Apr 14, 2026
Three findings from the combined codex+claude review on PR #24, applied
in-branch per human design call:

**P1 — Nondeterministic daemon selection (codex)**
`daemon/main.rs`: when no --session-id is given and >1 `daemon-*`
session directory exists, error with the candidate list instead of
silently restoring an arbitrary wallet (cross-wallet credential
mix-up risk on multi-daemon hosts). Single-session and explicit
--session-id paths are unchanged.
`session_store.rs::list_fallback_session_ids` now sorts results
alphabetically so the enumeration order is deterministic across
runs and filesystems.

**P1 — Windows/Linux keychain account-name sanitization (claude)**
New `sanitize_for_keyring(session_id)` helper:
- ASCII alnum + `-_.` preserved as-is
- other chars (including null bytes, slashes, non-UTF8) replaced with `_`
- if sanitization changes the value OR length >128, append an 8-char
  hash of the original to preserve uniqueness across alias collisions
Applied in all three `try_keyring_*` helpers so Windows
Credential Manager (rejects null bytes) and Linux secret-service
(rejects non-UTF8) don't crash or silently truncate-and-collide.

**P2 — Marker file overwriting real file fallback (both reviewers)**
Keyring-managed sessions now drop an empty `.keyring_managed` marker
file instead of overwriting `session.json` with a JSON stub. The
fallback invariant is preserved: switching `AGENTKEYS_SESSION_STORE`
between `file` and keyring no longer destroys the real file-mode
credential. `list_fallback_session_ids` matches either `session.json`
or `.keyring_managed`. `clear_session` removes both.

Tests (5 new in session_store):
- list_fallback_session_ids_is_sorted
- sanitize_for_keyring_preserves_ascii_alnum_and_safe_punctuation
- sanitize_for_keyring_replaces_unsafe_chars_and_appends_hash
- sanitize_for_keyring_truncates_oversized_input
- file_mode_save_writes_session_json_not_marker
- list_fallback_session_ids_finds_marker_only_directories

Test: cargo test -p agentkeys-core -p agentkeys-daemon -p agentkeys-cli -- --test-threads=1
core: 16 passed; daemon: 13+15 passed; cli: 14 passed

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
@hanwencheng
Copy link
Copy Markdown
Member Author

Combined codex+claude review fixes landed (d5a40de)

Per the design call, three findings fixed in-branch. The "upgrade from legacy pre-#12 users" finding was dismissed (no users yet).

P1 — Nondeterministic daemon selection (daemon/main.rs)
When no --session-id is given and >1 daemon-* directory exists, the daemon now errors with the candidate list instead of silently restoring an arbitrary wallet. Single-session and explicit --session-id paths unchanged. list_fallback_session_ids also sorts alphabetically so enumeration is deterministic.

P1 — Windows/Linux keychain account-name sanitization (session_store.rs)
New sanitize_for_keyring(session_id) helper: ASCII alnum + -_. preserved; unsafe chars (null bytes, slashes, non-UTF8) replaced with _; if sanitization changes the value OR length >128, append an 8-char hash of the original for uniqueness. Applied to all three try_keyring_* helpers so Windows Credential Manager and Linux secret-service don't crash or silently truncate-and-collide.

P2 — Marker file overwriting real file fallback (session_store.rs)
Keyring-managed sessions now drop an empty .keyring_managed marker file instead of overwriting session.json with a JSON stub. Toggling AGENTKEYS_SESSION_STORE=file ↔ keyring no longer destroys the real file-mode credential. list_fallback_session_ids matches either session.json or .keyring_managed. clear_session removes both.

6 new tests in session_store:

  • list_fallback_session_ids_is_sorted
  • sanitize_for_keyring_preserves_ascii_alnum_and_safe_punctuation
  • sanitize_for_keyring_replaces_unsafe_chars_and_appends_hash
  • sanitize_for_keyring_truncates_oversized_input
  • file_mode_save_writes_session_json_not_marker
  • list_fallback_session_ids_finds_marker_only_directories
cargo test -p agentkeys-core -p agentkeys-daemon -p agentkeys-cli -- --test-threads=1
core: 16 passed; daemon: 13+15 passed; cli: 14 passed

Re-running codex review --base main to confirm.

hanwencheng added a commit that referenced this pull request Apr 14, 2026
…r propagation

Two findings from codex re-review of PR #24:

**P2 — Path traversal via --session-id**
`fallback_path` and `marker_path` now route `session_id` through
`sanitize_for_keyring` before joining into the filesystem path. Inputs
like `../escape`, `foo/bar`, or absolute paths can no longer escape
~/.agentkeys. Sanitization is idempotent, so callers that enumerate
via `list_fallback_session_ids` and feed ids back into save/load/clear
still match exactly.

**P2 — Marker-file write failure silently swallowed**
`save_session` now propagates errors from `create_dir_all` on the
marker's parent and from `File::create` on the marker itself. If the
keyring save succeeds but the marker can't land (read-only HOME,
permission error), the caller now sees the error instead of a false
Ok(()) — the session would otherwise be in the keyring but
undiscoverable on restart.

The "preserve pre-#12 daemon session on upgrade" finding remains
dismissed per human design call on PR #24 (no users yet).

Tests (2 new):
- save_session_does_not_escape_agentkeys_dir_on_path_traversal
- save_session_rejects_forward_slash_in_session_id

Test: cargo test -p agentkeys-core -p agentkeys-daemon -p agentkeys-cli -- --test-threads=1
core: 18 passed; daemon: 13+15 passed; cli: 14 passed

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
@hanwencheng
Copy link
Copy Markdown
Member Author

Codex v2 P2s landed (8a84480)

Two more legitimate findings from the re-review, both fixed in-branch:

P2 — Path traversal via --session-id (session_store.rs)
fallback_path and marker_path now route session_id through sanitize_for_keyring before joining into the filesystem path. Inputs like ../escape or foo/bar get normalised to _ + hash suffix so they can't escape ~/.agentkeys. Sanitization is idempotent, so the list → save/load round-trip stays consistent.

P2 — Marker-file write failure silently swallowed (session_store.rs)
save_session now propagates errors from create_dir_all(parent) and File::create(marker). If keyring save succeeds but the marker can't land (read-only HOME, permission error), the caller sees the error instead of a false Ok(()) — without the marker the session would be in the keyring but undiscoverable on restart.

The third finding ("preserve pre-#12 daemon session on upgrade") remains dismissed — no users yet, so there's no legacy state to preserve.

2 new regression tests:

  • save_session_does_not_escape_agentkeys_dir_on_path_traversal
  • save_session_rejects_forward_slash_in_session_id
cargo test -p agentkeys-core -p agentkeys-daemon -p agentkeys-cli -- --test-threads=1
core: 18 passed; daemon: 13+15 passed; cli: 14 passed

Re-running codex review --base main to confirm.

hanwencheng added a commit that referenced this pull request Apr 14, 2026
…h suffix

`std::collections::hash_map::DefaultHasher` is explicitly NOT guaranteed
stable across Rust/toolchain versions. Using it as a persistence key
would make sanitized session_ids unreachable after a compiler upgrade:
the directory / keyring account name would silently change and lookups
would miss.

Swap to SHA-256 via the `sha2` crate (already in-tree) and take the
first 4 bytes as an 8-char hex suffix. `hex` is already a declared
dep. Added a pinned-value regression test
(`sanitize_for_keyring_uses_stable_sha256_suffix`) that asserts
sanitize_for_keyring("foo/bar") == "foo_bar-cc5d46bd" so any future
change to the hash pipeline fails loudly in CI.

The codex v3 P1 "preserve legacy daemon session file" finding remains
dismissed per the human design call on PR #24 — no users yet, no
legacy state to preserve.

Test: cargo test -p agentkeys-core session_store -- --test-threads=1
13 passed; 0 failed.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
@hanwencheng
Copy link
Copy Markdown
Member Author

Codex v3 P2 landed (1c930c5) — P1 dismissed (no-users rationale)

P2 — DefaultHasher not stable across Rust versions (session_store.rs)
Legitimate — std::collections::hash_map::DefaultHasher is documented as NOT a stable persistence format. Swapped to SHA-256 (first 4 bytes → 8 hex chars) via the sha2 crate already in-tree. hex is already a declared dep. Added a pinned-value regression test so any future change to the hash pipeline fails loudly in CI:

assert_eq!(sanitize_for_keyring("foo/bar"), "foo_bar-cc5d46bd");

P1 — Preserve legacy daemon session file on upgrade — DISMISSED again
This is the same finding from the v1 review. The project has no users yet (pre-launch), so there's no legacy ~/.agentkeys/session state to preserve. Adding a legacy-fallback codepath to the daemon would be speculative work for a scenario that can't exist. If/when we ship and need to migrate existing installs, we'll add it as a dedicated one-shot migration commit with a CHANGELOG entry.

cargo test -p agentkeys-core session_store -- --test-threads=1
13 passed

Re-running codex review; expect no new actionable findings.

@hanwencheng
Copy link
Copy Markdown
Member Author

Codex v4 P2s landed (b9e1621)

P2 — Marker best-effort (session_store.rs)
v5 over-corrected by propagating marker-write errors, which would fail agentkeys init / recover on hosts with a working keychain but a read-only HOME. Reverted to best-effort: if marker fails after a successful keyring save, print a warning to stderr and return Ok. Direct-load callers (known session_id like master) are unaffected; only prefix-scan discovery (daemon restart) degrades.

P2 — Candidate validation before ambiguity (daemon/main.rs)
Previously list.len() > 1 aborted before any load was attempted — one stale .keyring_managed directory could block startup even with exactly one valid session. New flow: enumerate → try to load each → ambiguity error only if >1 loadable sessions. Message unchanged for the genuinely-ambiguous case.

P1 — Preserve legacy daemon session — still dismissed (no users yet, pre-launch). Recorded in commit message for future reference; if/when we ship, we'll add a dedicated migration commit with CHANGELOG.

cargo test -p agentkeys-core -p agentkeys-daemon -p agentkeys-cli -- --test-threads=1
core: 19 passed; daemon: 13+15 passed; cli: 14 passed

Re-running codex; expect only the already-dismissed P1.

@hanwencheng
Copy link
Copy Markdown
Member Author

Codex v5 P2s landed (a3a4218)

P2 — cmd_init/cmd_recover honored ctx.session_id
Replaced the hardcoded "master" save with &ctx.session_id. load_session already used ctx.session_id; init/recover now match, so any future multi-session CLI caller gets consistent save/load round-trips.

P2 — Rediscovery hint for custom --session-id sessions
On daemon restart without --session-id and no daemon-* matches, emit a stderr hint listing any non-daemon- non-master sessions under ~/.agentkeys/: "no daemon-* sessions found, but these custom session IDs exist: ...; pass --session-id to resume one". Pair flow still proceeds normally — the hint just nudges the user. No persistence pointer added (would grow scope).

The codex P1 "preserve legacy daemon session" remains dismissed per human call (no users).

cargo test -p agentkeys-core -p agentkeys-daemon -p agentkeys-cli -- --test-threads=1
core: 19 passed; daemon: 13+15 passed; cli: 14 passed

Expect this codex pass (v6) to only surface the already-dismissed P1.

@hanwencheng
Copy link
Copy Markdown
Member Author

Codex v6 P1+P2 landed (74dc814)

P1 — Reserved path components escaping ~/.agentkeys
sanitize_for_keyring previously returned "." / ".." / "" unchanged (. is in the whitelist). That let --session-id .. resolve to $HOME/session.json (outside the namespaced store) and empty session_id to the legacy root file. Now those three values are force-rewritten under the reserved __agk_ namespace.

P2 — Rewrite / user-id collision
Previously sanitize_for_keyring("foo/bar") == "foo_bar-cc5d46bd", and a user could pass --session-id foo_bar-cc5d46bd to land on the same keyring account. Introduced a __agk_ reserved prefix: all rewrites live under it, and user inputs that start with __agk_ are themselves pushed through the rewrite path so they end up at __agk___agk_foo_bar-cc5d46bd-<nested-hash> — distinct from the original rewrite.

The reserved prefix uses only characters that are valid across Windows Credential Manager, Linux secret-service, and macOS Keychain, so the output remains a legal account name + filesystem directory on all three platforms.

3 new/updated regression tests:

  • sanitize_for_keyring_rejects_reserved_path_components
  • sanitize_for_keyring_rewrites_prefix_collisions
  • sanitize_for_keyring_uses_stable_sha256_suffix pin updated to __agk_foo_bar-cc5d46bd
cargo test -p agentkeys-core -p agentkeys-daemon -p agentkeys-cli -- --test-threads=1
core: 21 passed; daemon: 13+15 passed; cli: 14 passed

Re-running codex.

@hanwencheng
Copy link
Copy Markdown
Member Author

Codex v7 P2 landed (d150d87) — closing out review cycle

P2 — Hint advertised unresumeable sanitized IDs
After v9 introduced the __agk_ rewrite prefix, the restart hint at daemon/main.rs:143-154 could print directory names like __agk_foo_bar-cc5d46bd as resumable --session-id candidates. Feeding that string back triggers another rewrite (any input starting with __agk_ is itself pushed through), landing on a different storage key. Filtered rewrite-prefixed entries out of the hint set. Non-sanitized custom IDs (e.g., --session-id WORK) still appear and are resumable as before.

Remaining codex finding: "preserve pre-#12 legacy daemon session file on upgrade" — dismissed across 3 codex passes. No users yet (pre-launch project), so no legacy ~/.agentkeys/session state exists. If/when we ship and need to migrate existing installs, we'll add a one-shot migration commit with CHANGELOG.

Summary of the PR #24 review cycle

7 codex passes, 6 with new legitimate findings, all fixed in-branch:

v Finding Resolution
v1 P1 daemon startup legacy + nondeterministic selection + marker overwrite 2 P1 fixed in v5; 1 dismissed (no users)
v2 P2 path traversal + marker propagation + legacy 2 P2 fixed in v5; legacy dismissed
v3 P2 DefaultHasher not stable across Rust versions Fixed in v6: SHA-256 via sha2 crate
v4 P2 marker-write too strict + stale markers block startup Fixed in v7: best-effort marker + candidate validation
v5 P2 cmd_init/cmd_recover ignored ctx.session_id + rediscovery hint Fixed in v8
v6 P1 reserved path components + P2 rewrite collision Fixed in v9: __agk_ reserved prefix + reject "."/".."/""
v7 P2 hint shows unresumeable sanitized IDs Fixed in v10 (this commit)
cargo test -p agentkeys-core -p agentkeys-daemon -p agentkeys-cli -- --test-threads=1
core: 21 passed; daemon: 13+15 passed; cli: 14 passed

PR is ready for merge.

hanwencheng and others added 11 commits April 15, 2026 02:56
…espacing)

Closes #12.

Moves `session_store` from agentkeys-cli to agentkeys-core as a shared
module. Adds a `session_id` parameter so both master CLI sessions and
per-daemon sessions get their own keychain/file slot.

- Master CLI: `session_id = "master"` (keychain entry: service=agentkeys,
  account=master; file fallback: ~/.agentkeys/master/session.json).
- Daemon: `session_id = format!("daemon-{}", wallet)` after pair/recover,
  "daemon-pending" before. Two daemons on one machine no longer collide.
- `AGENTKEYS_SESSION_STORE=file` still forces file-only for CI/Docker.

Tests: 4 new session_store tests (save/load roundtrip master, daemon-wallet,
two-daemons-no-collision, clear-session-per-id). Full suite 89 tests green
under `--test-threads=1` (keyring access is not parallel-safe by design).

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Codex P1 (daemon can't resume saved session): default startup only
probed 'daemon-pending', so successful pair/recover that saved under
'daemon-<wallet>' was never loaded on next start. Now scans all
'daemon-*' fallback entries via new session_store::list_fallback_session_ids
and tries each.

Codex P1 (pending sentinel sticks on failed pair): old code wrote a
fake session with token='pending' before pair attempt; if pair failed,
the next startup 'successfully' loaded the fake and skipped pairing
entirely. Removed the sentinel entirely — save only after pair/recover
succeeds and the wallet is known.

Codex P2 (legacy master fallback): added
session_store::load_session_with_legacy_fallback() that tries the new
namespace first, then the pre-#12 'session' keyring account /
~/.agentkeys/session.json file. Existing installs stay logged in
across the upgrade. (Wire-up of CLI's CommandContext to use this
helper deferred to a follow-up — the helper is available and tested.)

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Codex v2 P1 (keyring-only sessions invisible): list_fallback_session_ids
only scanned the file-fallback directory, so daemon sessions saved via
the OS keyring (default macOS) were invisible on restart. save_session
now drops a tiny discovery-marker file (`session.json` containing
`{"_keyring_managed":true}`) when the keyring path was used, so
list_fallback_session_ids finds the session_id and load_session can
then go to the keyring for the real credential.

Codex v2 P2 (legacy master not loaded): CommandContext::load_session
now calls session_store::load_session_with_legacy_fallback so pre-#12
installs (keyring account=session or ~/.agentkeys/session.json) stay
logged in after the wallet-namespaced layout upgrade.

52 tests green across cli/core/daemon crates.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Three findings from the combined codex+claude review on PR #24, applied
in-branch per human design call:

**P1 — Nondeterministic daemon selection (codex)**
`daemon/main.rs`: when no --session-id is given and >1 `daemon-*`
session directory exists, error with the candidate list instead of
silently restoring an arbitrary wallet (cross-wallet credential
mix-up risk on multi-daemon hosts). Single-session and explicit
--session-id paths are unchanged.
`session_store.rs::list_fallback_session_ids` now sorts results
alphabetically so the enumeration order is deterministic across
runs and filesystems.

**P1 — Windows/Linux keychain account-name sanitization (claude)**
New `sanitize_for_keyring(session_id)` helper:
- ASCII alnum + `-_.` preserved as-is
- other chars (including null bytes, slashes, non-UTF8) replaced with `_`
- if sanitization changes the value OR length >128, append an 8-char
  hash of the original to preserve uniqueness across alias collisions
Applied in all three `try_keyring_*` helpers so Windows
Credential Manager (rejects null bytes) and Linux secret-service
(rejects non-UTF8) don't crash or silently truncate-and-collide.

**P2 — Marker file overwriting real file fallback (both reviewers)**
Keyring-managed sessions now drop an empty `.keyring_managed` marker
file instead of overwriting `session.json` with a JSON stub. The
fallback invariant is preserved: switching `AGENTKEYS_SESSION_STORE`
between `file` and keyring no longer destroys the real file-mode
credential. `list_fallback_session_ids` matches either `session.json`
or `.keyring_managed`. `clear_session` removes both.

Tests (5 new in session_store):
- list_fallback_session_ids_is_sorted
- sanitize_for_keyring_preserves_ascii_alnum_and_safe_punctuation
- sanitize_for_keyring_replaces_unsafe_chars_and_appends_hash
- sanitize_for_keyring_truncates_oversized_input
- file_mode_save_writes_session_json_not_marker
- list_fallback_session_ids_finds_marker_only_directories

Test: cargo test -p agentkeys-core -p agentkeys-daemon -p agentkeys-cli -- --test-threads=1
core: 16 passed; daemon: 13+15 passed; cli: 14 passed

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
…r propagation

Two findings from codex re-review of PR #24:

**P2 — Path traversal via --session-id**
`fallback_path` and `marker_path` now route `session_id` through
`sanitize_for_keyring` before joining into the filesystem path. Inputs
like `../escape`, `foo/bar`, or absolute paths can no longer escape
~/.agentkeys. Sanitization is idempotent, so callers that enumerate
via `list_fallback_session_ids` and feed ids back into save/load/clear
still match exactly.

**P2 — Marker-file write failure silently swallowed**
`save_session` now propagates errors from `create_dir_all` on the
marker's parent and from `File::create` on the marker itself. If the
keyring save succeeds but the marker can't land (read-only HOME,
permission error), the caller now sees the error instead of a false
Ok(()) — the session would otherwise be in the keyring but
undiscoverable on restart.

The "preserve pre-#12 daemon session on upgrade" finding remains
dismissed per human design call on PR #24 (no users yet).

Tests (2 new):
- save_session_does_not_escape_agentkeys_dir_on_path_traversal
- save_session_rejects_forward_slash_in_session_id

Test: cargo test -p agentkeys-core -p agentkeys-daemon -p agentkeys-cli -- --test-threads=1
core: 18 passed; daemon: 13+15 passed; cli: 14 passed

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
…h suffix

`std::collections::hash_map::DefaultHasher` is explicitly NOT guaranteed
stable across Rust/toolchain versions. Using it as a persistence key
would make sanitized session_ids unreachable after a compiler upgrade:
the directory / keyring account name would silently change and lookups
would miss.

Swap to SHA-256 via the `sha2` crate (already in-tree) and take the
first 4 bytes as an 8-char hex suffix. `hex` is already a declared
dep. Added a pinned-value regression test
(`sanitize_for_keyring_uses_stable_sha256_suffix`) that asserts
sanitize_for_keyring("foo/bar") == "foo_bar-cc5d46bd" so any future
change to the hash pipeline fails loudly in CI.

The codex v3 P1 "preserve legacy daemon session file" finding remains
dismissed per the human design call on PR #24 — no users yet, no
legacy state to preserve.

Test: cargo test -p agentkeys-core session_store -- --test-threads=1
13 passed; 0 failed.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
…ate validation

Two findings from codex v4 review:

**P2 — Marker-file failure too strict**
v5 propagated marker write errors, which regressed direct-load callers
(`agentkeys init`/`recover` saving `master`). Those paths don't rely
on prefix-scan discovery; they look up by known session_id. Revert to
best-effort: if `create_dir_all` or `File::create` fails after a
successful keyring save, print a warning to stderr and return Ok.
Keyring entry is authoritative for direct loads; only prefix-scan
discovery (daemon restart) is degraded.

**P2 — Stale markers block daemon startup**
The `list.len() > 1` ambiguity check fired before load validation, so
a single stale `.keyring_managed` directory (e.g., keyring entry
deleted out-of-band) could block startup even when exactly one real
session still loaded. New flow:
  1. Enumerate candidates via list_fallback_session_ids (sorted).
  2. Attempt load_session for each; keep only successes.
  3. 0 loadable → pair flow. 1 loadable → use it. >1 loadable → error.
The ambiguity error message is unchanged for the genuinely-ambiguous
case ("multiple loadable daemon sessions found... Candidates: ...").

The codex v4 P1 "preserve legacy daemon session file" remains dismissed
per human design call (no users yet).

Test: cargo test -p agentkeys-core -p agentkeys-daemon -p agentkeys-cli -- --test-threads=1
core: 19 passed; daemon: 13+15 passed; cli: 14 passed

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
…covery hint

**P2 — cmd_init/cmd_recover ignored ctx.session_id**
CommandContext carries `session_id` (defaults to "master", future-proofed
for multi-session CLI), and `load_session` already reads from that
namespace — but `cmd_init` and `cmd_recover` hardcoded "master" for the
save. Any caller overriding `session_id` would succeed init but the
next command (which reads from the overridden namespace) would miss the
write. Replace the hardcoded string with `&ctx.session_id`.

**P2 — Custom daemon session rediscovery hint**
If a user pairs with `--session-id WORK`, restarts without the flag,
the `daemon-*` scan misses the stored session and the daemon silently
re-pairs (new wallet, previously stored credentials become unreachable
from this daemon). Not fixable without a persistence pointer that
would grow the scope, but we now print a hint on stderr when the scan
returns zero `daemon-*` entries AND non-`daemon-` / non-`master`
sessions exist under `~/.agentkeys/`: "no daemon-* sessions found, but
these custom session IDs exist: ...; pass --session-id <name> to
resume one". Pair flow proceeds unchanged; the hint just nudges the
user toward the right command.

The codex v5 P1 "preserve legacy daemon session file" remains dismissed
per human design call (no users yet).

Test: cargo test -p agentkeys-core -p agentkeys-daemon -p agentkeys-cli -- --test-threads=1
core: 19 passed; daemon: 13+15 passed; cli: 14 passed

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
**P1 — Reserved path components escaping ~/.agentkeys**
sanitize_for_keyring previously returned "." / ".." / "" unchanged
because "." is allowed in the whitelist. That let `--session-id ..`
resolve to `$HOME/session.json` (outside the namespaced store), and
an empty session_id resolve to the legacy root file. Now the
accept-as-is rule explicitly rejects the empty string, ".", and ".."
and force-rewrites them into the reserved `__agk_` namespace.

**P2 — Rewritten IDs collide with user-supplied safe IDs**
Previously `sanitize_for_keyring("foo/bar") == "foo_bar-cc5d46bd"` but
a user could pass `--session-id foo_bar-cc5d46bd` and land on the
same keyring account + fallback directory. Introduce a `__agk_`
reserved prefix — all rewrites live under it, and any user-supplied
id that starts with `__agk_` is itself pushed through the rewrite
path so the output is `__agk___agk_foo_bar-cc5d46bd-<nested-hash>`,
distinct from the original rewrite.

The reserved prefix uses only characters that are valid across Windows
Credential Manager, Linux secret-service, and macOS Keychain, so the
output remains a legal account name and filesystem directory on all
three platforms.

3 new/updated regression tests:
- sanitize_for_keyring_rejects_reserved_path_components
- sanitize_for_keyring_rewrites_prefix_collisions
- existing `sanitize_for_keyring_uses_stable_sha256_suffix` pin now
  expects `__agk_foo_bar-cc5d46bd`

Test: cargo test -p agentkeys-core -p agentkeys-daemon -p agentkeys-cli -- --test-threads=1
core: 21 passed; daemon: 13+15 passed; cli: 14 passed

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
…d IDs

After v9 introduced the `__agk_` rewrite prefix, the daemon restart
hint at main.rs:143-154 could print directory names like
`__agk_foo_bar-cc5d46bd` as resumable `--session-id` candidates. But
feeding that string back into load_session triggers another rewrite
(because sanitize_for_keyring re-rewrites any input starting with
`__agk_`), landing on a DIFFERENT storage key. The hint was
advertising unresumeable candidates.

Filter rewrite-prefixed entries out of the hint set. They're only
reachable via the original unsanitized ID that produced them, which
the user must remember on their own — the daemon has no way to
invert the hash. Users with non-sanitized custom IDs (e.g.
`--session-id WORK`) still see the hint and can resume as before.

The codex v7 P1 "preserve legacy daemon session" remains dismissed
(no users yet). Closing out this review cycle — 7 codex passes, 6
pass with new legit findings all fixed in-branch, pass 7 only
surfaces the already-dismissed P1 plus this filter tweak.

Test: cargo test -p agentkeys-daemon -- --test-threads=1
daemon: 13+15 passed

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Rebase of fix/issue-12 onto current main (post-PR #18 #19 #21 #23 merges)
surfaced two concrete integration issues, fixed in this patch:

**1. Expose session_store helpers for test reuse**
- Make `agentkeys_core::session_store::fallback_path(session_id)` public
  so CLI tests can assert on the on-disk path layout.
- Re-export `agentkeys_core::session_store` at `agentkeys_cli::session_store`
  so existing test imports (`use agentkeys_cli::session_store;`) keep
  working after the old cli-local `session_store.rs` was deleted upstream.

**2. cmd_revoke clear_session calls (post-PR #18 self-revoke merge)**
PR #18 added `session_store::clear_session()` calls on self-revoke paths.
The new signature takes `session_id`, not 0 args. Pass `&ctx.session_id`
so the revoke wipes the correct namespace for any future multi-session
CLI caller.

**3. Integration tests: seed identity_links for Recover**
PR #21 tightened `resolve_identity_typed` to require the identity to
exist in `identity_links` before resolution. Two pair_tests
(`recover_full_loop`, `recover_credentials_intact`) opened Recover
requests with aliases that were never linked, so they now 404.
Added `InProcessBackend::link_identity_for_tests(identity_type,
identity_value, wallet)` that inserts directly into the mock DB's
`identity_links` table. The two failing tests now seed the alias link
before the Recover flow.

Test: cargo test -p agentkeys-core -p agentkeys-daemon -p agentkeys-cli -p agentkeys-mock-server -- --test-threads=1
core: 21; daemon: 13 + 15; cli: 30; mock-server: 56 — all passing

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
…r_session

`try_keyring_delete` was fire-and-forget: spawned a detached thread,
didn't wait, `clear_session` returned Ok before the delete ran. On
keyring-backed installs `cmd_revoke`/self-revoke would report "session
wiped", the process would exit, and the next command could still
load the stale session from the OS keychain.

Mirror the `try_keyring_save`/`try_keyring_load` pattern: spawn the
thread, wait up to 2 seconds via `recv_timeout`, return a bool.
`keyring::Error::NoEntry` is treated as success (the intent is
"nothing of this name should remain"). On timeout or any other
error, `clear_session` now returns an anyhow error so the caller
surfaces a clear message instead of a false success.

Test: `clear_session_is_synchronous_in_file_mode` asserts the delete
happens synchronously — immediately after clear_session returns,
load_session must fail without any sleep/retry.

The codex v8 P2 "preserve pre-#12 legacy daemon session" remains
dismissed per the design call (no users yet).

Test: cargo test -p agentkeys-core -p agentkeys-daemon -p agentkeys-cli -- --test-threads=1
core: 22 passed; daemon: 13+15 passed; cli: 30 passed

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
@hanwencheng
Copy link
Copy Markdown
Member Author

Codex v8 P1 landed (0e2cc5f)

P1 — try_keyring_delete was fire-and-forget
Thread was detached with no timeout/join. clear_session returned Ok before the delete ran, so on keyring-backed installs cmd_revoke would report "session wiped" but the next command could still load the stale session from the OS keychain.

Fix mirrors try_keyring_save / try_keyring_load: spawn → recv_timeout(2s) → return bool. keyring::Error::NoEntry is treated as success (intent is "nothing of this name should remain"). On timeout or any other error, clear_session returns an anyhow error so the caller surfaces a clear failure instead of a false success.

Test: clear_session_is_synchronous_in_file_mode — immediately after clear_session returns, load_session must fail without any sleep/retry.

P2 — legacy daemon session remains dismissed (no users yet).

cargo test -p agentkeys-core -p agentkeys-daemon -p agentkeys-cli -- --test-threads=1
core: 22 passed; daemon: 13+15 passed; cli: 30 passed

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Daemon should use OS keychain when available (Mac mini, MacBook, Raspberry Pi)

1 participant