Skip to content

fix(cache): add missing token cache invalidation and access token expiry jitter#136

Merged
appleboy merged 1 commit intomainfrom
worktree-auth
Mar 29, 2026
Merged

fix(cache): add missing token cache invalidation and access token expiry jitter#136
appleboy merged 1 commit intomainfrom
worktree-auth

Conversation

@appleboy
Copy link
Copy Markdown
Member

@appleboy appleboy commented Mar 28, 2026

Summary

Fix token cache invalidation gaps, add access token expiry jitter, and align all token/cache defaults for a workday-optimized deployment.


Problem

Two issues identified during code review:

  1. Cache invalidation bug: RevokeUserAuthorization() and RevokeAllApplicationTokens() revoked tokens in DB but did NOT invalidate the token cache. With Redis Aside Cache enabled, revoked tokens could pass validation for up to the cache TTL.

  2. Thundering herd on token refresh: When many users log in at the same time (e.g., 9:00 AM), all tokens expire simultaneously, causing a spike of refresh requests.

Solution: Redis Aside Cache as Primary Strategy

After comparing two approaches with 20,000 concurrent users, we chose Redis Aside Cache over short-lived stateless tokens:

Metric Stateless (5min token) Redis Aside Cache (10h token)
DB operations/sec ~133 (frequent refresh) ~1.4
Token revocation latency Up to 5 minutes ~milliseconds (RESP3 push)
Frontend complexity Frequent refresh + retry Low (10h refresh cycle)

Redis Aside Cache wins by 95x on DB load while achieving near-instant revocation.

How Revocation Works

Admin revokes token
  1. DB: UPDATE status = "revoked"
  2. Redis: DEL cache key
  3. Redis RESP3 pushes invalidation to all Pods
  4. Each Pod's client-side cache evicts the token
  5. Next ValidateToken: cache miss -> DB -> "revoked" -> rejected
  Total latency: ~milliseconds

Cache TTL Design

Two-layer cache with TTLs serving as fallback only (RESP3 handles real-time invalidation):

Layer TTL Purpose
Redis server cache 10h (= JWT_EXPIRATION) Token lives in cache for its entire lifetime; zero DB re-queries
Pod client-side cache 1h Fallback for missed RESP3 notifications; ~5.6 Redis ops/s for 20K users

Changes

1. Cache Invalidation Fix (Bug)

  • Add GetActiveTokenHashesByAuthorizationID and GetActiveTokenHashesByClientID store methods
  • Export InvalidateTokenCacheByHashes on TokenService for cross-service use
  • Inject TokenService into AuthorizationService
  • Collect token hashes before revocation, invalidate cache after
  • Log errors on hash collection failures (consistent with existing patterns)

2. Access Token Expiry Jitter

New JWT_EXPIRATION_JITTER env var adds random offset to access token expiry:

JWT_EXPIRATION=10h
JWT_EXPIRATION_JITTER=30m  # token lifetime: [10h, 10h30m)
  • Additive jitter (configured expiry is minimum lifetime)
  • Only applies to access tokens, not refresh or client credentials tokens
  • Uses math/rand/v2 (Go 1.22+, concurrency-safe)

3. Production-Ready Defaults

All defaults aligned for a typical workday scenario:

Setting Old Default New Default Rationale
JWT_EXPIRATION 1h 10h One login covers a full workday
JWT_EXPIRATION_JITTER 0 (disabled) 30m Spread refresh requests over 30min window
TOKEN_CACHE_TTL 5m 10h Cache = token lifetime; RESP3 handles invalidation
TOKEN_CACHE_CLIENT_TTL 30s 1h RESP3 handles real-time; TTL is fallback only

Recommended Production Configuration

# Token settings (defaults are production-ready)
JWT_EXPIRATION=10h
JWT_EXPIRATION_JITTER=30m

# Refresh token with rotation
ENABLE_REFRESH_TOKENS=true
REFRESH_TOKEN_EXPIRATION=720h
ENABLE_TOKEN_ROTATION=true

# Token cache (Redis Aside)
TOKEN_CACHE_ENABLED=true
TOKEN_CACHE_TYPE=redis-aside
TOKEN_CACHE_TTL=10h
TOKEN_CACHE_CLIENT_TTL=1h

Files Changed (20 files, +538/-263)

Area Files Changes
Store core/store.go, store/token.go 2 new GetActiveTokenHashesBy* methods
Services services/token.go, services/authorization.go Cache invalidation fix + TokenService injection
Bootstrap bootstrap/services.go Wire TokenService into AuthorizationService
Config config/config.go New defaults for JWT/cache TTLs + jitter field
Token token/local.go Jitter in GenerateToken()
Docs .env.example, CLAUDE.md, docs/CONFIGURATION.md Document all new settings and defaults
Tests 8 test files Store tests, jitter tests, config validation, constructor updates

Test Plan

  • Store method tests: hash collection by authorization ID and client ID
  • Jitter config validation: negative, equal to expiry, greater than expiry
  • Jitter functional tests: range verification, variation check, refresh token unaffected
  • Updated all NewAuthorizationService call sites (5 test files)
  • make test -- all tests pass
  • make lint -- 0 issues

Generated with Claude Code

@codecov
Copy link
Copy Markdown

codecov bot commented Mar 28, 2026

Codecov Report

❌ Patch coverage is 36.66667% with 57 lines in your changes missing coverage. Please review.

Files with missing lines Patch % Lines
internal/mocks/mock_store.go 0.00% 36 Missing ⚠️
internal/services/authorization.go 14.28% 13 Missing and 5 partials ⚠️
internal/services/token.go 0.00% 2 Missing ⚠️
internal/bootstrap/services.go 0.00% 1 Missing ⚠️

📢 Thoughts on this report? Let us know!

…iry jitter

Fix token cache invalidation gaps, add access token expiry jitter, and
align all token/cache defaults for a workday-optimized deployment.

## Problem

Two issues identified during code review:

1. Cache invalidation bug: RevokeUserAuthorization() and
   RevokeAllApplicationTokens() revoked tokens in DB but did NOT
   invalidate the token cache. With Redis Aside Cache enabled, revoked
   tokens could pass validation for up to the cache TTL.

2. Thundering herd on token refresh: When many users log in at the same
   time (e.g., 9:00 AM), all tokens expire simultaneously, causing a
   spike of refresh requests.

## Solution: Redis Aside Cache as Primary Strategy

After comparing two approaches with 20,000 concurrent users, we chose
Redis Aside Cache over short-lived stateless tokens:

- Stateless (5min token): ~133 DB ops/s, up to 5min revocation delay
- Redis Aside (10h token): ~1.4 DB ops/s, ~millisecond revocation

Redis Aside Cache wins by 95x on DB load while achieving near-instant
revocation via RESP3 push invalidation.

## Changes

### 1. Cache Invalidation Fix (Bug)

- Add GetActiveTokenHashesByAuthorizationID and
  GetActiveTokenHashesByClientID store methods
- Export InvalidateTokenCacheByHashes on TokenService for cross-service use
- Inject TokenService into AuthorizationService
- Collect token hashes before revocation, invalidate cache after
- Log errors on hash collection failures

### 2. Access Token Expiry Jitter

- Add JWT_EXPIRATION_JITTER env var (default: 30m)
- Additive jitter: token lifetime = [expiry, expiry+jitter)
- Only applies to access tokens, not refresh or client credentials
- Uses math/rand/v2 (Go 1.22+, concurrency-safe)

### 3. Production-Ready Defaults

- JWT_EXPIRATION: 1h -> 10h (one login covers a full workday)
- JWT_EXPIRATION_JITTER: 0 -> 30m (spread refresh over 30min window)
- TOKEN_CACHE_TTL: 5m -> 10h (cache = token lifetime; RESP3 handles invalidation)
- TOKEN_CACHE_CLIENT_TTL: 30s -> 1h (RESP3 handles real-time; TTL is fallback)

### 4. Documentation

- Update .env.example, CLAUDE.md, docs/CONFIGURATION.md with new defaults
- Update ARCHITECTURE.md mermaid diagram and expiry description
- Update JWT_VERIFICATION.md key rotation timeline

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
@appleboy appleboy merged commit 9111d1f into main Mar 29, 2026
16 of 17 checks passed
@appleboy appleboy deleted the worktree-auth branch March 29, 2026 03:06
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant