fix: #12 daemon OS keychain support (multi-platform, wallet-based namespacing)#24
fix: #12 daemon OS keychain support (multi-platform, wallet-based namespacing)#24hanwencheng merged 12 commits intomainfrom
Conversation
|
Codex review (via gstack VerdictThe daemon's new startup logic cannot reliably resume persisted sessions and can get stuck on a synthetic pending entry after failed pairing attempts. The storage migration also drops compatibility with existing CLI sessions, so the patch introduces user-visible regressions. Full review comments:
— codex |
Three findings from the combined codex+claude review on PR #24, applied in-branch per human design call: **P1 — Nondeterministic daemon selection (codex)** `daemon/main.rs`: when no --session-id is given and >1 `daemon-*` session directory exists, error with the candidate list instead of silently restoring an arbitrary wallet (cross-wallet credential mix-up risk on multi-daemon hosts). Single-session and explicit --session-id paths are unchanged. `session_store.rs::list_fallback_session_ids` now sorts results alphabetically so the enumeration order is deterministic across runs and filesystems. **P1 — Windows/Linux keychain account-name sanitization (claude)** New `sanitize_for_keyring(session_id)` helper: - ASCII alnum + `-_.` preserved as-is - other chars (including null bytes, slashes, non-UTF8) replaced with `_` - if sanitization changes the value OR length >128, append an 8-char hash of the original to preserve uniqueness across alias collisions Applied in all three `try_keyring_*` helpers so Windows Credential Manager (rejects null bytes) and Linux secret-service (rejects non-UTF8) don't crash or silently truncate-and-collide. **P2 — Marker file overwriting real file fallback (both reviewers)** Keyring-managed sessions now drop an empty `.keyring_managed` marker file instead of overwriting `session.json` with a JSON stub. The fallback invariant is preserved: switching `AGENTKEYS_SESSION_STORE` between `file` and keyring no longer destroys the real file-mode credential. `list_fallback_session_ids` matches either `session.json` or `.keyring_managed`. `clear_session` removes both. Tests (5 new in session_store): - list_fallback_session_ids_is_sorted - sanitize_for_keyring_preserves_ascii_alnum_and_safe_punctuation - sanitize_for_keyring_replaces_unsafe_chars_and_appends_hash - sanitize_for_keyring_truncates_oversized_input - file_mode_save_writes_session_json_not_marker - list_fallback_session_ids_finds_marker_only_directories Test: cargo test -p agentkeys-core -p agentkeys-daemon -p agentkeys-cli -- --test-threads=1 core: 16 passed; daemon: 13+15 passed; cli: 14 passed Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Combined codex+claude review fixes landed (d5a40de)Per the design call, three findings fixed in-branch. The "upgrade from legacy pre-#12 users" finding was dismissed (no users yet). P1 — Nondeterministic daemon selection ( P1 — Windows/Linux keychain account-name sanitization ( P2 — Marker file overwriting real file fallback ( 6 new tests in
Re-running |
…r propagation Two findings from codex re-review of PR #24: **P2 — Path traversal via --session-id** `fallback_path` and `marker_path` now route `session_id` through `sanitize_for_keyring` before joining into the filesystem path. Inputs like `../escape`, `foo/bar`, or absolute paths can no longer escape ~/.agentkeys. Sanitization is idempotent, so callers that enumerate via `list_fallback_session_ids` and feed ids back into save/load/clear still match exactly. **P2 — Marker-file write failure silently swallowed** `save_session` now propagates errors from `create_dir_all` on the marker's parent and from `File::create` on the marker itself. If the keyring save succeeds but the marker can't land (read-only HOME, permission error), the caller now sees the error instead of a false Ok(()) — the session would otherwise be in the keyring but undiscoverable on restart. The "preserve pre-#12 daemon session on upgrade" finding remains dismissed per human design call on PR #24 (no users yet). Tests (2 new): - save_session_does_not_escape_agentkeys_dir_on_path_traversal - save_session_rejects_forward_slash_in_session_id Test: cargo test -p agentkeys-core -p agentkeys-daemon -p agentkeys-cli -- --test-threads=1 core: 18 passed; daemon: 13+15 passed; cli: 14 passed Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Codex v2 P2s landed (8a84480)Two more legitimate findings from the re-review, both fixed in-branch: P2 — Path traversal via P2 — Marker-file write failure silently swallowed ( The third finding ("preserve pre-#12 daemon session on upgrade") remains dismissed — no users yet, so there's no legacy state to preserve. 2 new regression tests:
Re-running |
…h suffix
`std::collections::hash_map::DefaultHasher` is explicitly NOT guaranteed
stable across Rust/toolchain versions. Using it as a persistence key
would make sanitized session_ids unreachable after a compiler upgrade:
the directory / keyring account name would silently change and lookups
would miss.
Swap to SHA-256 via the `sha2` crate (already in-tree) and take the
first 4 bytes as an 8-char hex suffix. `hex` is already a declared
dep. Added a pinned-value regression test
(`sanitize_for_keyring_uses_stable_sha256_suffix`) that asserts
sanitize_for_keyring("foo/bar") == "foo_bar-cc5d46bd" so any future
change to the hash pipeline fails loudly in CI.
The codex v3 P1 "preserve legacy daemon session file" finding remains
dismissed per the human design call on PR #24 — no users yet, no
legacy state to preserve.
Test: cargo test -p agentkeys-core session_store -- --test-threads=1
13 passed; 0 failed.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Codex v3 P2 landed (1c930c5) — P1 dismissed (no-users rationale)P2 — DefaultHasher not stable across Rust versions ( assert_eq!(sanitize_for_keyring("foo/bar"), "foo_bar-cc5d46bd");P1 — Preserve legacy daemon session file on upgrade — DISMISSED again Re-running codex review; expect no new actionable findings. |
Codex v4 P2s landed (b9e1621)P2 — Marker best-effort ( P2 — Candidate validation before ambiguity ( P1 — Preserve legacy daemon session — still dismissed (no users yet, pre-launch). Recorded in commit message for future reference; if/when we ship, we'll add a dedicated migration commit with CHANGELOG. Re-running codex; expect only the already-dismissed P1. |
Codex v5 P2s landed (a3a4218)P2 — P2 — Rediscovery hint for custom The codex P1 "preserve legacy daemon session" remains dismissed per human call (no users). Expect this codex pass (v6) to only surface the already-dismissed P1. |
Codex v6 P1+P2 landed (74dc814)P1 — Reserved path components escaping P2 — Rewrite / user-id collision The reserved prefix uses only characters that are valid across Windows Credential Manager, Linux secret-service, and macOS Keychain, so the output remains a legal account name + filesystem directory on all three platforms. 3 new/updated regression tests:
Re-running codex. |
Codex v7 P2 landed (d150d87) — closing out review cycleP2 — Hint advertised unresumeable sanitized IDs Remaining codex finding: "preserve pre-#12 legacy daemon session file on upgrade" — dismissed across 3 codex passes. No users yet (pre-launch project), so no legacy Summary of the PR #24 review cycle7 codex passes, 6 with new legitimate findings, all fixed in-branch:
PR is ready for merge. |
…espacing) Closes #12. Moves `session_store` from agentkeys-cli to agentkeys-core as a shared module. Adds a `session_id` parameter so both master CLI sessions and per-daemon sessions get their own keychain/file slot. - Master CLI: `session_id = "master"` (keychain entry: service=agentkeys, account=master; file fallback: ~/.agentkeys/master/session.json). - Daemon: `session_id = format!("daemon-{}", wallet)` after pair/recover, "daemon-pending" before. Two daemons on one machine no longer collide. - `AGENTKEYS_SESSION_STORE=file` still forces file-only for CI/Docker. Tests: 4 new session_store tests (save/load roundtrip master, daemon-wallet, two-daemons-no-collision, clear-session-per-id). Full suite 89 tests green under `--test-threads=1` (keyring access is not parallel-safe by design). Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Codex P1 (daemon can't resume saved session): default startup only probed 'daemon-pending', so successful pair/recover that saved under 'daemon-<wallet>' was never loaded on next start. Now scans all 'daemon-*' fallback entries via new session_store::list_fallback_session_ids and tries each. Codex P1 (pending sentinel sticks on failed pair): old code wrote a fake session with token='pending' before pair attempt; if pair failed, the next startup 'successfully' loaded the fake and skipped pairing entirely. Removed the sentinel entirely — save only after pair/recover succeeds and the wallet is known. Codex P2 (legacy master fallback): added session_store::load_session_with_legacy_fallback() that tries the new namespace first, then the pre-#12 'session' keyring account / ~/.agentkeys/session.json file. Existing installs stay logged in across the upgrade. (Wire-up of CLI's CommandContext to use this helper deferred to a follow-up — the helper is available and tested.) Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Codex v2 P1 (keyring-only sessions invisible): list_fallback_session_ids
only scanned the file-fallback directory, so daemon sessions saved via
the OS keyring (default macOS) were invisible on restart. save_session
now drops a tiny discovery-marker file (`session.json` containing
`{"_keyring_managed":true}`) when the keyring path was used, so
list_fallback_session_ids finds the session_id and load_session can
then go to the keyring for the real credential.
Codex v2 P2 (legacy master not loaded): CommandContext::load_session
now calls session_store::load_session_with_legacy_fallback so pre-#12
installs (keyring account=session or ~/.agentkeys/session.json) stay
logged in after the wallet-namespaced layout upgrade.
52 tests green across cli/core/daemon crates.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Three findings from the combined codex+claude review on PR #24, applied in-branch per human design call: **P1 — Nondeterministic daemon selection (codex)** `daemon/main.rs`: when no --session-id is given and >1 `daemon-*` session directory exists, error with the candidate list instead of silently restoring an arbitrary wallet (cross-wallet credential mix-up risk on multi-daemon hosts). Single-session and explicit --session-id paths are unchanged. `session_store.rs::list_fallback_session_ids` now sorts results alphabetically so the enumeration order is deterministic across runs and filesystems. **P1 — Windows/Linux keychain account-name sanitization (claude)** New `sanitize_for_keyring(session_id)` helper: - ASCII alnum + `-_.` preserved as-is - other chars (including null bytes, slashes, non-UTF8) replaced with `_` - if sanitization changes the value OR length >128, append an 8-char hash of the original to preserve uniqueness across alias collisions Applied in all three `try_keyring_*` helpers so Windows Credential Manager (rejects null bytes) and Linux secret-service (rejects non-UTF8) don't crash or silently truncate-and-collide. **P2 — Marker file overwriting real file fallback (both reviewers)** Keyring-managed sessions now drop an empty `.keyring_managed` marker file instead of overwriting `session.json` with a JSON stub. The fallback invariant is preserved: switching `AGENTKEYS_SESSION_STORE` between `file` and keyring no longer destroys the real file-mode credential. `list_fallback_session_ids` matches either `session.json` or `.keyring_managed`. `clear_session` removes both. Tests (5 new in session_store): - list_fallback_session_ids_is_sorted - sanitize_for_keyring_preserves_ascii_alnum_and_safe_punctuation - sanitize_for_keyring_replaces_unsafe_chars_and_appends_hash - sanitize_for_keyring_truncates_oversized_input - file_mode_save_writes_session_json_not_marker - list_fallback_session_ids_finds_marker_only_directories Test: cargo test -p agentkeys-core -p agentkeys-daemon -p agentkeys-cli -- --test-threads=1 core: 16 passed; daemon: 13+15 passed; cli: 14 passed Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
…r propagation Two findings from codex re-review of PR #24: **P2 — Path traversal via --session-id** `fallback_path` and `marker_path` now route `session_id` through `sanitize_for_keyring` before joining into the filesystem path. Inputs like `../escape`, `foo/bar`, or absolute paths can no longer escape ~/.agentkeys. Sanitization is idempotent, so callers that enumerate via `list_fallback_session_ids` and feed ids back into save/load/clear still match exactly. **P2 — Marker-file write failure silently swallowed** `save_session` now propagates errors from `create_dir_all` on the marker's parent and from `File::create` on the marker itself. If the keyring save succeeds but the marker can't land (read-only HOME, permission error), the caller now sees the error instead of a false Ok(()) — the session would otherwise be in the keyring but undiscoverable on restart. The "preserve pre-#12 daemon session on upgrade" finding remains dismissed per human design call on PR #24 (no users yet). Tests (2 new): - save_session_does_not_escape_agentkeys_dir_on_path_traversal - save_session_rejects_forward_slash_in_session_id Test: cargo test -p agentkeys-core -p agentkeys-daemon -p agentkeys-cli -- --test-threads=1 core: 18 passed; daemon: 13+15 passed; cli: 14 passed Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
…h suffix
`std::collections::hash_map::DefaultHasher` is explicitly NOT guaranteed
stable across Rust/toolchain versions. Using it as a persistence key
would make sanitized session_ids unreachable after a compiler upgrade:
the directory / keyring account name would silently change and lookups
would miss.
Swap to SHA-256 via the `sha2` crate (already in-tree) and take the
first 4 bytes as an 8-char hex suffix. `hex` is already a declared
dep. Added a pinned-value regression test
(`sanitize_for_keyring_uses_stable_sha256_suffix`) that asserts
sanitize_for_keyring("foo/bar") == "foo_bar-cc5d46bd" so any future
change to the hash pipeline fails loudly in CI.
The codex v3 P1 "preserve legacy daemon session file" finding remains
dismissed per the human design call on PR #24 — no users yet, no
legacy state to preserve.
Test: cargo test -p agentkeys-core session_store -- --test-threads=1
13 passed; 0 failed.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
…ate validation
Two findings from codex v4 review:
**P2 — Marker-file failure too strict**
v5 propagated marker write errors, which regressed direct-load callers
(`agentkeys init`/`recover` saving `master`). Those paths don't rely
on prefix-scan discovery; they look up by known session_id. Revert to
best-effort: if `create_dir_all` or `File::create` fails after a
successful keyring save, print a warning to stderr and return Ok.
Keyring entry is authoritative for direct loads; only prefix-scan
discovery (daemon restart) is degraded.
**P2 — Stale markers block daemon startup**
The `list.len() > 1` ambiguity check fired before load validation, so
a single stale `.keyring_managed` directory (e.g., keyring entry
deleted out-of-band) could block startup even when exactly one real
session still loaded. New flow:
1. Enumerate candidates via list_fallback_session_ids (sorted).
2. Attempt load_session for each; keep only successes.
3. 0 loadable → pair flow. 1 loadable → use it. >1 loadable → error.
The ambiguity error message is unchanged for the genuinely-ambiguous
case ("multiple loadable daemon sessions found... Candidates: ...").
The codex v4 P1 "preserve legacy daemon session file" remains dismissed
per human design call (no users yet).
Test: cargo test -p agentkeys-core -p agentkeys-daemon -p agentkeys-cli -- --test-threads=1
core: 19 passed; daemon: 13+15 passed; cli: 14 passed
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
…covery hint **P2 — cmd_init/cmd_recover ignored ctx.session_id** CommandContext carries `session_id` (defaults to "master", future-proofed for multi-session CLI), and `load_session` already reads from that namespace — but `cmd_init` and `cmd_recover` hardcoded "master" for the save. Any caller overriding `session_id` would succeed init but the next command (which reads from the overridden namespace) would miss the write. Replace the hardcoded string with `&ctx.session_id`. **P2 — Custom daemon session rediscovery hint** If a user pairs with `--session-id WORK`, restarts without the flag, the `daemon-*` scan misses the stored session and the daemon silently re-pairs (new wallet, previously stored credentials become unreachable from this daemon). Not fixable without a persistence pointer that would grow the scope, but we now print a hint on stderr when the scan returns zero `daemon-*` entries AND non-`daemon-` / non-`master` sessions exist under `~/.agentkeys/`: "no daemon-* sessions found, but these custom session IDs exist: ...; pass --session-id <name> to resume one". Pair flow proceeds unchanged; the hint just nudges the user toward the right command. The codex v5 P1 "preserve legacy daemon session file" remains dismissed per human design call (no users yet). Test: cargo test -p agentkeys-core -p agentkeys-daemon -p agentkeys-cli -- --test-threads=1 core: 19 passed; daemon: 13+15 passed; cli: 14 passed Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
**P1 — Reserved path components escaping ~/.agentkeys**
sanitize_for_keyring previously returned "." / ".." / "" unchanged
because "." is allowed in the whitelist. That let `--session-id ..`
resolve to `$HOME/session.json` (outside the namespaced store), and
an empty session_id resolve to the legacy root file. Now the
accept-as-is rule explicitly rejects the empty string, ".", and ".."
and force-rewrites them into the reserved `__agk_` namespace.
**P2 — Rewritten IDs collide with user-supplied safe IDs**
Previously `sanitize_for_keyring("foo/bar") == "foo_bar-cc5d46bd"` but
a user could pass `--session-id foo_bar-cc5d46bd` and land on the
same keyring account + fallback directory. Introduce a `__agk_`
reserved prefix — all rewrites live under it, and any user-supplied
id that starts with `__agk_` is itself pushed through the rewrite
path so the output is `__agk___agk_foo_bar-cc5d46bd-<nested-hash>`,
distinct from the original rewrite.
The reserved prefix uses only characters that are valid across Windows
Credential Manager, Linux secret-service, and macOS Keychain, so the
output remains a legal account name and filesystem directory on all
three platforms.
3 new/updated regression tests:
- sanitize_for_keyring_rejects_reserved_path_components
- sanitize_for_keyring_rewrites_prefix_collisions
- existing `sanitize_for_keyring_uses_stable_sha256_suffix` pin now
expects `__agk_foo_bar-cc5d46bd`
Test: cargo test -p agentkeys-core -p agentkeys-daemon -p agentkeys-cli -- --test-threads=1
core: 21 passed; daemon: 13+15 passed; cli: 14 passed
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
…d IDs After v9 introduced the `__agk_` rewrite prefix, the daemon restart hint at main.rs:143-154 could print directory names like `__agk_foo_bar-cc5d46bd` as resumable `--session-id` candidates. But feeding that string back into load_session triggers another rewrite (because sanitize_for_keyring re-rewrites any input starting with `__agk_`), landing on a DIFFERENT storage key. The hint was advertising unresumeable candidates. Filter rewrite-prefixed entries out of the hint set. They're only reachable via the original unsanitized ID that produced them, which the user must remember on their own — the daemon has no way to invert the hash. Users with non-sanitized custom IDs (e.g. `--session-id WORK`) still see the hint and can resume as before. The codex v7 P1 "preserve legacy daemon session" remains dismissed (no users yet). Closing out this review cycle — 7 codex passes, 6 pass with new legit findings all fixed in-branch, pass 7 only surfaces the already-dismissed P1 plus this filter tweak. Test: cargo test -p agentkeys-daemon -- --test-threads=1 daemon: 13+15 passed Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Rebase of fix/issue-12 onto current main (post-PR #18 #19 #21 #23 merges) surfaced two concrete integration issues, fixed in this patch: **1. Expose session_store helpers for test reuse** - Make `agentkeys_core::session_store::fallback_path(session_id)` public so CLI tests can assert on the on-disk path layout. - Re-export `agentkeys_core::session_store` at `agentkeys_cli::session_store` so existing test imports (`use agentkeys_cli::session_store;`) keep working after the old cli-local `session_store.rs` was deleted upstream. **2. cmd_revoke clear_session calls (post-PR #18 self-revoke merge)** PR #18 added `session_store::clear_session()` calls on self-revoke paths. The new signature takes `session_id`, not 0 args. Pass `&ctx.session_id` so the revoke wipes the correct namespace for any future multi-session CLI caller. **3. Integration tests: seed identity_links for Recover** PR #21 tightened `resolve_identity_typed` to require the identity to exist in `identity_links` before resolution. Two pair_tests (`recover_full_loop`, `recover_credentials_intact`) opened Recover requests with aliases that were never linked, so they now 404. Added `InProcessBackend::link_identity_for_tests(identity_type, identity_value, wallet)` that inserts directly into the mock DB's `identity_links` table. The two failing tests now seed the alias link before the Recover flow. Test: cargo test -p agentkeys-core -p agentkeys-daemon -p agentkeys-cli -p agentkeys-mock-server -- --test-threads=1 core: 21; daemon: 13 + 15; cli: 30; mock-server: 56 — all passing Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
d150d87 to
7f33f8f
Compare
…r_session `try_keyring_delete` was fire-and-forget: spawned a detached thread, didn't wait, `clear_session` returned Ok before the delete ran. On keyring-backed installs `cmd_revoke`/self-revoke would report "session wiped", the process would exit, and the next command could still load the stale session from the OS keychain. Mirror the `try_keyring_save`/`try_keyring_load` pattern: spawn the thread, wait up to 2 seconds via `recv_timeout`, return a bool. `keyring::Error::NoEntry` is treated as success (the intent is "nothing of this name should remain"). On timeout or any other error, `clear_session` now returns an anyhow error so the caller surfaces a clear message instead of a false success. Test: `clear_session_is_synchronous_in_file_mode` asserts the delete happens synchronously — immediately after clear_session returns, load_session must fail without any sleep/retry. The codex v8 P2 "preserve pre-#12 legacy daemon session" remains dismissed per the design call (no users yet). Test: cargo test -p agentkeys-core -p agentkeys-daemon -p agentkeys-cli -- --test-threads=1 core: 22 passed; daemon: 13+15 passed; cli: 30 passed Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Codex v8 P1 landed (0e2cc5f)P1 — Fix mirrors Test: P2 — legacy daemon session remains dismissed (no users yet). |
Summary
Daemon now uses the OS keychain when available (macOS Keychain, Linux secret-service via
keyringcrate) with file fallback. Multiple daemons on one machine no longer overwrite each other.What changed
Shared session_store in agentkeys-core
session_storemoved fromagentkeys-clitoagentkeys-core.save_session(session, session_id),load_session(session_id),clear_session(session_id).service="agentkeys", account=<session_id>.~/.agentkeys/<session_id>/session.json(0600).CLI
session_id = "master".Daemon
session_id = "daemon-pending".session_id = format!("daemon-{}", wallet.0).Env override
AGENTKEYS_SESSION_STORE=filestill forces file-only (CI/Docker/headless).Test plan
cargo test -p agentkeys-core -- --test-threads=1→ 10 passed (includes 4 new session_store tests)cargo test -p agentkeys-cli→ 14 passedcargo test -p agentkeys-daemon→ 28 passed (13 + 15)cargo test -p agentkeys-mock-server→ 37 passed--test-threads=1or addserial_testas a follow-up.Follow-ups
serial_testcrate to isolate keychain-touching tests for parallel execution.docs/spec/architecture.md+wiki/key-security.mdstorage table (separate docs-only PR).Issue
Closes #12.
🤖 Generated with Claude Code
Codex review (2026-04-14):⚠️ Reviewed + fixed in 2 rounds.
52 tests across cli/core/daemon crates all pass.