Skip to content

fix(lru_cache): gc() must walk full list, not stop at tail#2

Merged
jjacke13 merged 1 commit intojjacke13:mainfrom
lukeburns:fix/lru-cache-gc-walks-full-list
Apr 26, 2026
Merged

fix(lru_cache): gc() must walk full list, not stop at tail#2
jjacke13 merged 1 commit intojjacke13:mainfrom
lukeburns:fix/lru-cache-gc-walks-full-list

Conversation

@lukeburns
Copy link
Copy Markdown
Contributor

@lukeburns lukeburns commented Apr 26, 2026

Summary

LruCache::gc (in include/hyperdht/lru_cache.hpp) walks the list from the tail ("oldest") and breaks on the first entry whose `created_at` is within the TTL. That works only if tail order tracks `created_at` order — but it doesn't, because `get()` promotes accessed entries to the front via splice without touching `created_at`.

After any `get()`, list order = LRU access order, while expiry is still keyed on `created_at`. The two diverge, and the early `break` then leaks expired entries indefinitely.

Repro

put(\"old\",   t=1000)     list: [old]
put(\"fresh\", t=5000)     list: [fresh, old]
get(\"old\")               list: [old, fresh]   // fresh is now at the tail
gc(now=4000, ttl=2000)
  -> walks back, sees fresh (created_at=5000), breaks
  -> \"old\" (age 3000, expired) is leaked forever

This LruCache backs the DHT's user-facing `mutables_` / `immutables_` storage caches in `rpc_handlers.cpp`, so a missed eviction can return key/value records past their declared TTL.

Fix

Walk every entry. The caches in question are bounded (default 1024 entries) and `gc` is invoked on a low-frequency timer, so the constant-factor cost is negligible — and it removes the chronological-order assumption that `get()` already broke.

I considered two alternatives:

  1. Have `get()` refresh `created_at`. Closer to JS xache semantics, but a behaviour change that may affect callers relying on "insertion-time TTL". Out of scope here.
  2. Keep a separate min-heap by `created_at`. More complex; not justified at current call volumes.

Test plan

  • Adds `LruCache.GcWalksFullListAfterGetPromotion` reproducing the leak.
  • Verified the new test fails on `main` (without the header change, just the test): `Expected get("old") to be null; size() == 1` — both fail as predicted.
  • Verified the new test passes after the fix.
  • All 9 `./test_lru_cache` tests pass.

Made with Cursor

`LruCache::gc` walked from the list tail (oldest by insertion order)
and broke on the first entry whose `created_at` was within the TTL.
That assumed tail order tracked `created_at` order — but it doesn't,
because `get()` promotes accessed entries to the front via splice
WITHOUT touching `created_at`.

Repro:
  put("old",   t=1000)   list: [old]
  put("fresh", t=5000)   list: [fresh, old]
  get("old")             list: [old, fresh]   // fresh is now at tail
  gc(now=4000, ttl=2000)
    -> walks back, sees fresh (created_at=5000), breaks
    -> "old" (age 3000, expired) is leaked forever

Fix: walk every entry. The list is small enough (mutables /
immutables / connection caches) that this is fine, and it removes
the order assumption that `get()` already broke.

This LruCache backs the DHT's user-facing `mutables_` /
`immutables_` storage caches in `rpc_handlers.cpp`, so a missed
eviction can return values past their declared TTL.

Adds a regression test (`LruCache.GcWalksFullListAfterGetPromotion`)
that fails on the old `gc` and passes on the new one. All 9 LruCache
tests pass.

Made-with: Cursor
@jjacke13 jjacke13 merged commit 2f90cad into jjacke13:main Apr 26, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants