cxo: fix Cache.cleanDown eviction bug and skip re-encoding unchanged sub-trees#2420
Merged
0pcom merged 2 commits intoskycoin:developfrom May 3, 2026
Merged
Conversation
…lling item The filter loop's "skip wanted" / "skip filling" comments said skip, but the code returned, aborting the entire ranking on the first non-evictable entry encountered during map iteration. On a busy visor there is virtually always something filling or being awaited, so the cache never evicted anything: putItem invoked cleanDown once the budget was exceeded, cleanDown built a len(c.is)-sized rank slice and immediately discarded it, and putItem then inserted the new item on top of an already-over-budget cache. The discarded slice is what the alloc profile pinned at 95% of total bytes allocated under cleanDown, and the un-evicted backlog is what pinned the live heap under putItem. continue, not return. While here, switch the rank slice from []*rankItem to []rankItem so we don't allocate a tiny heap object per cache entry.
Publisher.publishRoot re-walked the entire in-memory tree on every dirty batch and called encoder.Serialize + entry.Sub.SetValue at every level, even when only a single leaf had changed. With the visor stats publisher running at 1-min cadence this dominated CPU (via GC pressure from the Serialize churn) and live heap (the encoded TreeNode bytes piling up in Cache.putItem). Add a per-memNode encode-cache (pubHash + cached) that is populated after each successful Save and invalidated by putAt / deleteAt / pruneAt along the touched path. encodeNode now places the cached hash directly into the parent's TreeEntry for any unchanged sub-node and skips the recursive walk plus the encoder.Serialize / Cache.Set chain entirely. Container.Save's reference walk increments the rc on the cached hash without descending into it (skyobject/unpack.go:156-167), so the underlying CXO objects stay alive across publishes as long as each successive Root references the same hash — exactly the content-addressed behavior the package doc already promised. Cache promotion is deferred until after Save returns nil so an aborted publish doesn't leave the cache pointing at hashes that up.Close has rolled back. Adds TestEncodeCacheInvalidatesOnlyAffectedPath to pin the sibling-preservation contract.
6 tasks
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Two CXO fixes that together drop a public visor's GC CPU share from ~60% to baseline and substantially reduce live heap. Found via pprof on a public visor running the user-feeds + visor-stats publishers; both fixes apply to every CXO node in the network, not just the visor.
1.
cxo/skyobject:Cache.cleanDownnever actually evicted anythingpkg/cxo/skyobject/cache.go:341,345— the filter loop's "skip wanted" / "skip filling" comments said skip, but the codereturned, aborting the entire ranking on the first non-evictable entry encountered during map iteration. On a busy node there is virtually always something filling or being awaited, so the cache never evicted:putIteminvokedcleanDownonce the budget was exceeded,cleanDownbuilt alen(c.is)-sized rank slice and immediately discarded it, andputItemthen inserted the new item on top of an already-over-budget cache.The discarded slice was 95.6% of total bytes allocated under
cleanDownin the alloc profile; the un-evicted backlog was the dominant share of live heap underputItem. Fix:continue, notreturn. Also switched the rank slice to[]rankItem(value-slice) to drop a per-entry heap allocation.This is a pure bug fix and benefits every
skyobject.Container— every CXO publisher, subscriber, the TPD aggregator, and the standalonecxodaemon.2.
cxo/treestore: cache encoded sub-node hashes across publishesPublisher.publishRootre-walked the entire in-memory tree on every dirty batch and ranencoder.Serialize+entry.Sub.SetValueat every level, even when only a single leaf had changed. The package doc already promised content-addressed re-use of unchanged sub-trees ("only the O(depth) chain of objects from the modified leaf up to the root gets re-encoded"); the implementation didn't match.Adds a per-
memNodeencode-cache (pubHash+cached) that's populated after each successfulSaveand invalidated byputAt/deleteAt/pruneAtalong the touched path.encodeNodenow places the cached hash directly into the parent'sTreeEntryfor any unchanged sub-node, skipping the recursive walk and theencoder.Serialize/Cache.Setchain entirely.Container.Save's reference walk increments the rc on the cached hash without descending into it (skyobject/unpack.go:156-167), so the underlying CXO objects stay alive across publishes as long as each successive Root references the same hash.Cache promotion is deferred until after
Savereturns nil so an aborted publish doesn't leave the cache pointing at hashes thatup.Closehas rolled back.Benefits every
treestore.Publisher—pkg/visor/init_stats.go,pkg/visor/cxo_user_feeds.go,cmd/apps/skychat/pairing/pair.go.Test plan
go test ./pkg/cxo/...(treestore, skyobject, node, data, idxdb, cxds — all green)go build ./...TestEncodeCacheInvalidatesOnlyAffectedPathpins the sibling-preservation contract: a Put under one branch must not change pubHashes on unrelated subtrees, andcachedmust clear only on the touched path.TestPublisherUnchangedSubtreeKeepsHashstill passes — the doc-comment guarantee is now actively enforced by the cache fast path rather than only emergent from CXO content-addressing.Cache.putItemlive heap shrinks.