fix(cache): add missing token cache invalidation and access token expiry jitter#136
Merged
fix(cache): add missing token cache invalidation and access token expiry jitter#136
Conversation
Codecov Report❌ Patch coverage is 📢 Thoughts on this report? Let us know! |
…iry jitter Fix token cache invalidation gaps, add access token expiry jitter, and align all token/cache defaults for a workday-optimized deployment. ## Problem Two issues identified during code review: 1. Cache invalidation bug: RevokeUserAuthorization() and RevokeAllApplicationTokens() revoked tokens in DB but did NOT invalidate the token cache. With Redis Aside Cache enabled, revoked tokens could pass validation for up to the cache TTL. 2. Thundering herd on token refresh: When many users log in at the same time (e.g., 9:00 AM), all tokens expire simultaneously, causing a spike of refresh requests. ## Solution: Redis Aside Cache as Primary Strategy After comparing two approaches with 20,000 concurrent users, we chose Redis Aside Cache over short-lived stateless tokens: - Stateless (5min token): ~133 DB ops/s, up to 5min revocation delay - Redis Aside (10h token): ~1.4 DB ops/s, ~millisecond revocation Redis Aside Cache wins by 95x on DB load while achieving near-instant revocation via RESP3 push invalidation. ## Changes ### 1. Cache Invalidation Fix (Bug) - Add GetActiveTokenHashesByAuthorizationID and GetActiveTokenHashesByClientID store methods - Export InvalidateTokenCacheByHashes on TokenService for cross-service use - Inject TokenService into AuthorizationService - Collect token hashes before revocation, invalidate cache after - Log errors on hash collection failures ### 2. Access Token Expiry Jitter - Add JWT_EXPIRATION_JITTER env var (default: 30m) - Additive jitter: token lifetime = [expiry, expiry+jitter) - Only applies to access tokens, not refresh or client credentials - Uses math/rand/v2 (Go 1.22+, concurrency-safe) ### 3. Production-Ready Defaults - JWT_EXPIRATION: 1h -> 10h (one login covers a full workday) - JWT_EXPIRATION_JITTER: 0 -> 30m (spread refresh over 30min window) - TOKEN_CACHE_TTL: 5m -> 10h (cache = token lifetime; RESP3 handles invalidation) - TOKEN_CACHE_CLIENT_TTL: 30s -> 1h (RESP3 handles real-time; TTL is fallback) ### 4. Documentation - Update .env.example, CLAUDE.md, docs/CONFIGURATION.md with new defaults - Update ARCHITECTURE.md mermaid diagram and expiry description - Update JWT_VERIFICATION.md key rotation timeline Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Fix token cache invalidation gaps, add access token expiry jitter, and align all token/cache defaults for a workday-optimized deployment.
Problem
Two issues identified during code review:
Cache invalidation bug:
RevokeUserAuthorization()andRevokeAllApplicationTokens()revoked tokens in DB but did NOT invalidate the token cache. With Redis Aside Cache enabled, revoked tokens could pass validation for up to the cache TTL.Thundering herd on token refresh: When many users log in at the same time (e.g., 9:00 AM), all tokens expire simultaneously, causing a spike of refresh requests.
Solution: Redis Aside Cache as Primary Strategy
After comparing two approaches with 20,000 concurrent users, we chose Redis Aside Cache over short-lived stateless tokens:
Redis Aside Cache wins by 95x on DB load while achieving near-instant revocation.
How Revocation Works
Cache TTL Design
Two-layer cache with TTLs serving as fallback only (RESP3 handles real-time invalidation):
Changes
1. Cache Invalidation Fix (Bug)
GetActiveTokenHashesByAuthorizationIDandGetActiveTokenHashesByClientIDstore methodsInvalidateTokenCacheByHasheson TokenService for cross-service use2. Access Token Expiry Jitter
New
JWT_EXPIRATION_JITTERenv var adds random offset to access token expiry:JWT_EXPIRATION=10h JWT_EXPIRATION_JITTER=30m # token lifetime: [10h, 10h30m)math/rand/v2(Go 1.22+, concurrency-safe)3. Production-Ready Defaults
All defaults aligned for a typical workday scenario:
JWT_EXPIRATIONJWT_EXPIRATION_JITTERTOKEN_CACHE_TTLTOKEN_CACHE_CLIENT_TTLRecommended Production Configuration
Files Changed (20 files, +538/-263)
core/store.go,store/token.goGetActiveTokenHashesBy*methodsservices/token.go,services/authorization.gobootstrap/services.goconfig/config.gotoken/local.goGenerateToken().env.example,CLAUDE.md,docs/CONFIGURATION.mdTest Plan
NewAuthorizationServicecall sites (5 test files)make test-- all tests passmake lint-- 0 issuesGenerated with Claude Code