Skip to content

fix: auth store data-integrity improvements#377

Merged
streamer45 merged 2 commits intomainfrom
devin/1777128717-fix-auth-data-integrity
Apr 25, 2026
Merged

fix: auth store data-integrity improvements#377
streamer45 merged 2 commits intomainfrom
devin/1777128717-fix-auth-data-integrity

Conversation

@staging-devin-ai-integration
Copy link
Copy Markdown
Contributor

@staging-devin-ai-integration staging-devin-ai-integration Bot commented Apr 25, 2026

Summary

Fixes four auth data-integrity issues in FileTokenMetadataStore, FileRevocationStore, and FileKeyProvider (apps/skit/src/auth/stores/file.rs):

1. Double write-lock in revoke() (fixes #345)

FileRevocationStore::revoke() acquired the RwLock write guard twice — once for insert and again for prune_expired_locked. Combined into a single lock acquisition so both operations execute atomically under one guard.

2. fsync after write_secure for crash durability (fixes #331)

FileKeyProvider::write_secure called flush() but not sync_all(), so data could be lost on crash before the OS flushed dirty pages. Now calls file.sync_all() before the atomic rename and fsyncs the parent directory afterwards, ensuring durability for admin.token, auth.jwk, jwks.json, tokens.json, and revoked.json.

3. flock-based file locking for concurrent access (fixes #329)

Added cross-process advisory file locking (flock via the fs2 crate) around all read-modify-write operations on tokens.json, revoked.json, auth.jwk, and jwks.json. Each mutation now: acquires an exclusive flock → re-reads from disk (to pick up changes from other processes) → applies the modification → persists → releases the lock. This prevents the CLI and server from clobbering each other's writes when running concurrently (e.g. during rotate-key).

This covers:

  • FileRevocationStore::revoke() and reload()
  • FileTokenMetadataStore::store(), mark_revoked(), and reload()
  • FileKeyProvider::rotate() and reload()

New dependency: fs2 (0.4.3) — small, well-maintained crate wrapping platform flock/LockFileEx.

Review & Testing Checklist for Human

  • Verify the revoke() single-guard change is correct: the RwLock write guard must cover both insert and prune_expired_locked to maintain atomicity — confirm the #[allow(clippy::significant_drop_tightening)] is justified
  • Verify sync_all() placement in write_secure: called after flush() but before the file handle is dropped and the atomic rename() — this is the standard durable-write pattern
  • Verify flock interaction with write_secure's atomic rename: the flock is on a companion .lock file (not the data file itself), so rename doesn't break the lock
  • Test concurrent CLI + server scenario: run skit server, then use streamkit-client rotate-key concurrently — verify tokens.json/revoked.json/jwks.json are not corrupted
  • Check that the .lock files don't cause issues with permission checks (they are separate from the data files and not subject to verify_permissions)

Notes

  • The plugin_integration_test failures in CI are pre-existing and unrelated to this change
  • All 18 unit tests in auth::stores::file::tests pass, including 8 new regression tests covering all four fixes

Closes #329
Closes #331
Closes #345

Link to Devin session: https://staging.itsdev.in/sessions/cc069cb4915148c9a6e1938684e08d32
Requested by: @streamer45

- Combine double write-lock acquisition in FileRevocationStore::revoke()
  into a single guard (fixes #345)
- Add File::sync_all() in write_secure before the atomic rename and
  fsync the parent directory afterwards for crash durability (fixes #331)
- Add flock-based advisory file locking around all tokens.json and
  revoked.json read-modify-write operations so CLI and server cannot
  clobber each other's writes (fixes #329)
- Add regression tests for all three fixes

Closes #329
Closes #331
Closes #345

Signed-off-by: StreamKit Devin <devin@streamkit.dev>
Co-Authored-By: Claudio Costa <cstcld91@gmail.com>
@staging-devin-ai-integration
Copy link
Copy Markdown
Contributor Author

staging-devin-ai-integration Bot commented Apr 25, 2026

✅ Reviewed on df97612

View review

@staging-devin-ai-integration
Copy link
Copy Markdown
Contributor Author

🤖 Devin AI Engineer

I'll be helping with this pull request! Here's what you should know:

✅ I will automatically:

  • Address comments on this PR. Add '(aside)' to your comment to have me ignore it.
  • Look at CI failures and help fix them

Note: I can only respond to comments from users who have write access to this repository.

⚙️ Control Options:

  • Disable automatic comment and CI monitoring

staging-devin-ai-integration[bot]

This comment was marked as resolved.

@staging-devin-ai-integration
Copy link
Copy Markdown
Contributor Author

Review

Overall: Solid implementation. The RAII FileLockGuard pattern is clean, the read-modify-write cycle (acquire flock → reload from disk → modify in-memory → persist → release) is correct, and the sync_all() + parent dir fsync follows standard durable-write patterns. Tests are good — the concurrent stores test with two independent FileTokenMetadataStore instances validates the real-world CLI+server scenario.

Must fix before merge

  1. Unrelated CI commits in branch. This branch includes d3acb21 (PR ci: enable sccache GHA backend and optimize rust-cache keys #372) and 30d0b28 (PR ci: remove SCCACHE_GHA_ENABLED to stop cache entry explosion #373), which add sccache/rust-cache config changes that haven't landed on main. These are ~116 lines of unrelated diff. Please rebase onto main to drop them — they should land via their own PRs.

Should address (or file tracking issue)

  1. FileKeyProvider::rotate() lacks flock (Devin Review also flagged this). rotate() writes to auth.jwk and jwks.json via write_secure without cross-process file locking. The same race condition this PR fixes for tokens.json/revoked.json applies here — concurrent CLI rotate-key + server reload-keys could clobber. Rotation is rarer, so this doesn't block this PR, but a tracking issue should be filed.

Nits (non-blocking)

  1. OpenOptions::new().create(true).truncate(true).write(true) on the .lock file — truncate(true) is harmless but redundant since the file is never written to. create(true).write(true) alone suffices.

  2. The #[allow(clippy::significant_drop_tightening)] on revoke() is justified and the comment explains why — good.

Extends the cross-process file locking from #329 to cover key rotation
and reload operations on auth.jwk/jwks.json. Without this, concurrent
CLI rotate-key + server reload-keys could clobber each other's writes.

Adds a regression test that runs two KeyProvider instances rotating
concurrently and verifies all generated keys survive in the final JWKS.

Signed-off-by: StreamKit Devin <devin@streamkit.dev>
Co-Authored-By: Claudio Costa <cstcld91@gmail.com>
@streamer45 streamer45 merged commit 3e3a46b into main Apr 25, 2026
17 checks passed
@streamer45 streamer45 deleted the devin/1777128717-fix-auth-data-integrity branch April 25, 2026 17:00
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

2 participants