Skip to content

fix: persist snapshot once for avoiding badger vlog#3153

Merged
julienrbrt merged 6 commits intomainfrom
julien/cache
Mar 10, 2026
Merged

fix: persist snapshot once for avoiding badger vlog#3153
julienrbrt merged 6 commits intomainfrom
julien/cache

Conversation

@julienrbrt
Copy link
Member

@julienrbrt julienrbrt commented Mar 10, 2026

Overview

Badger was seeing an increasing amount of big vlog entry due to this optimistic save (small win for crash recovery).
This avoids badger to go crazy.

Summary by CodeRabbit

  • Bug Fixes

    • Cache persistence now only occurs at shutdown to avoid excessive disk growth.
  • Tests

    • Tests updated to explicitly persist state during test flows to ensure reliable restore behavior.
  • Chores

    • CI workflow action version adjusted; local module resolutions enabled for some apps to support local builds.
  • Documentation

    • CHANGELOG updated with the fixed persistence behavior.

@coderabbitai
Copy link
Contributor

coderabbitai bot commented Mar 10, 2026

📝 Walkthrough

Walkthrough

Mutations to the cache no longer persist snapshots on the hot path; snapshot persistence is now explicit via SaveToStore. Tests and a submitter test were updated to call SaveToStore where they previously relied on implicit persistence. CHANGELOG and a few workflow and module files were also adjusted.

Changes

Cohort / File(s) Summary
CHANGELOG
CHANGELOG.md
Added "Fixed" entry: persist cache snapshot only once at shutdown to avoid Badger vlog increase.
Cache Persistence Logic
block/internal/cache/generic_cache.go
Removed implicit calls to persistSnapshot from setDAIncluded, removeDAIncluded, and deleteAllForHeight; these now update in-memory state only. persistSnapshot doc clarifies explicit use via SaveToStore.
Cache Tests
block/internal/cache/generic_cache_test.go
Tests updated to call SaveToStore explicitly after cache mutations (several tests: round-trip, clear, placeholder/refire, restart idempotency).
Submitter Test
block/internal/submitting/submitter_test.go
Added SaveToStore call to persist DA-included mark before restart in pre-restart setup.
Workflows
.github/workflows/docker-build-push.yml, .github/workflows/release-apps.yml
Changed GHCR login action reference from docker/login-action@v4 to docker/login-action@v3.
App module replacements
apps/evm/go.mod, apps/testapp/go.mod
Uncommented local replace directives so local module paths are used for github.com/evstack/ev-node during builds.
Minor test control-flow tweaks
node/execution_test.go, pkg/da/types/namespace_test.go
Added explicit return nil / early returns after t.Fatal nil-checks in tests (no behavioral change in normal test runs).

Sequence Diagram(s)

sequenceDiagram
    participant Cache as Cache (in-memory)
    participant Saver as SaveToStore
    participant Store as Persistent Store
    Cache->>Cache: mutate state (set/remove/delete)
    Note right of Cache: No implicit persist
    Cache->>Saver: SaveToStore() called explicitly (e.g., shutdown/test)
    Saver->>Store: serialize snapshot & write
    Store-->>Saver: ack write
    Saver-->>Cache: return
Loading

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~25 minutes

Possibly related PRs

  • chore: minor deduplication #3139 — touches DA-inclusion persistence/read-write paths and timing for DAIncludedHeight persistence; strongly related to snapshot persistence changes.

Suggested reviewers

  • tac0turtle
  • Manav-Aggarwal
  • tzdybal

Poem

🐰 I write once and only once tonight,
No vlog mountains left in sight,
SaveToStore hums, the snapshot sleeps,
Tests applaud in careful heaps,
Hoppity hops — the cache is light!

🚥 Pre-merge checks | ✅ 2 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 55.56% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (2 passed)
Check name Status Explanation
Title check ✅ Passed The title accurately describes the main change: snapshot persistence is now limited to occur once at shutdown to prevent Badger vlog growth, which aligns with the core fix across the codebase.
Description check ✅ Passed The PR description provides clear context about the issue (Badger vlog growth from optimistic saves) and the solution (persist snapshot only once), meeting the template requirements despite being concise.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
  • 📝 Generate docstrings (stacked PR)
  • 📝 Generate docstrings (commit on current branch)
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment
  • Commit unit tests in branch julien/cache

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@github-actions
Copy link
Contributor

github-actions bot commented Mar 10, 2026

The latest Buf updates on your PR. Results from workflow CI / buf-check (pull_request).

BuildFormatLintBreakingUpdated (UTC)
✅ passed⏩ skipped✅ passed✅ passedMar 10, 2026, 5:04 PM

@claude
Copy link
Contributor

claude bot commented Mar 10, 2026

Claude finished @julienrbrt's task in 1m 31s —— View job


Code Review

  • Gather context and understand the PR
  • Review cache persistence logic changes
  • Review test changes
  • Review shutdown error handling (CodeRabbit concern)
  • Compile final review

Summary

The change is well-targeted and correct. Removing the implicit persistSnapshot calls from hot-path mutations (setDAIncluded, removeDAIncluded, deleteAllForHeight) eliminates repeated small Badger writes and concentrates durability at a single shutdown checkpoint. The logic and test coverage are solid overall.


Issues Found

1. Stale docstring on SaveToStoremanager.go:331

The comment still describes the old behavior:

// SaveToStore flushes the DA inclusion snapshot to the store.
// Mutations already persist on every call, so this is mainly a final flush.

After this PR, mutations no longer persist automatically. The second sentence is now incorrect and should be removed or replaced (e.g., "This is the only durable write and must be called at shutdown.").

Fix this →

2. Stale docstring on NewCachegeneric_cache.go:69

// NewCache creates a Cache. When store and keyPrefix are set, mutations
// persist a snapshot so RestoreFromStore can recover in-flight state.

Mutations no longer persist automatically. The second sentence is now incorrect.

Fix this →

3. Typo in warning comment — generic_cache.go:203

// NEVER CALL IT ON HOT-PATH TO AVOID BAGER WRITE AMPLIFICATION.

BAGERBADGER

Fix this →


Addressing the CodeRabbit Concern

"SaveToStore is now the only durable checkpoint but the shutdown path … still just logs that error and returns success."

This concern is incorrect. Looking at block/components.go:126-129, the error from Cache.SaveToStore() is properly collected and returned via errors.Join:

if bc.Cache != nil {
    if err := bc.Cache.SaveToStore(); err != nil {
        errs = errors.Join(errs, fmt.Errorf("failed to save caches: %w", err))
    }
}
return errs

The error is propagated to the caller. No action needed here.


Minor Observations (Non-blocking)

  • removeDAIncluded leaves a dangling hashByHeight entry (generic_cache.go:157): removing a hash from daIncluded does not clean up the corresponding hashByHeight[blockHeight] pointer. This is harmless today because persistSnapshot only saves entries present in both maps, and deleteAllForHeight cleans both. But it means getDAIncludedByHeight will miss for a height whose hash was removed via removeDAIncluded directly. This is pre-existing and not introduced here, just worth noting.

  • setMaxDAHeight CAS loop (generic_cache.go:168): the for range 1_000 retry bound is pragmatic but unusual. An unbounded CAS loop with continue is the standard Go pattern and cannot live-lock here since the condition is monotonic. Pre-existing, not a blocker.


Verdict

The core change is correct and well-tested. The only actionable items are two stale docstrings and a typo, all easily fixed.

@codecov
Copy link

codecov bot commented Mar 10, 2026

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 60.22%. Comparing base (ba25ea8) to head (662c81e).
⚠️ Report is 1 commits behind head on main.

Additional details and impacted files
@@            Coverage Diff             @@
##             main    #3153      +/-   ##
==========================================
- Coverage   60.23%   60.22%   -0.02%     
==========================================
  Files         115      115              
  Lines       11899    11896       -3     
==========================================
- Hits         7167     7164       -3     
+ Misses       3922     3921       -1     
- Partials      810      811       +1     
Flag Coverage Δ
combined 60.22% <ø> (-0.02%) ⬇️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@block/internal/cache/generic_cache.go`:
- Around line 202-203: SaveToStore (which calls persistSnapshot) is now the only
durable checkpoint but the shutdown path that invokes SaveToStore treats
failures as informational; change that shutdown caller to return (or propagate)
the error instead of just logging it, or implement a bounded retry/backoff loop
before giving up, so a failed SaveToStore is not silently ignored. Locate the
shutdown/stop routine that calls SaveToStore and replace the log-only handling
with proper error propagation (return the error) or a retry mechanism, ensuring
callers handle the returned error up the stack.

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 9d534481-293e-478a-8805-35e71d7c34a4

📥 Commits

Reviewing files that changed from the base of the PR and between ba25ea8 and 3fbdfae.

📒 Files selected for processing (4)
  • CHANGELOG.md
  • block/internal/cache/generic_cache.go
  • block/internal/cache/generic_cache_test.go
  • block/internal/submitting/submitter_test.go

Comment on lines +202 to +203
// persistSnapshot writes all current in-flight [blockHeight, daHeight] pairs to the store under a single key.
// Only called explicitly via SaveToStore. NEVER CALL IT ON HOT-PATH TO AVOID BAGER WRITE AMPLIFICATION.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Don't leave the new shutdown flush as best-effort.

These lines make SaveToStore the only durable snapshot write, but the production shutdown path in block/components.go:120-135 still just logs that error and returns success. A failed flush now silently loses the only restart checkpoint. Please propagate the shutdown error (or retry it) instead of treating it as informational.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@block/internal/cache/generic_cache.go` around lines 202 - 203, SaveToStore
(which calls persistSnapshot) is now the only durable checkpoint but the
shutdown path that invokes SaveToStore treats failures as informational; change
that shutdown caller to return (or propagate) the error instead of just logging
it, or implement a bounded retry/backoff loop before giving up, so a failed
SaveToStore is not silently ignored. Locate the shutdown/stop routine that calls
SaveToStore and replace the log-only handling with proper error propagation
(return the error) or a retry mechanism, ensuring callers handle the returned
error up the stack.


- name: Log in to GHCR
uses: docker/login-action@v4
uses: docker/login-action@v3

Check warning

Code scanning / CodeQL

Unpinned tag for a non-immutable Action in workflow Medium

Unpinned 3rd party Action 'Apps release' step
Uses Step
uses 'docker/login-action' with ref 'v3', not a pinned commit hash
Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)
.github/workflows/docker-build-push.yml (1)

36-41: ⚠️ Potential issue | 🟠 Major

Pin this GHCR login action to an immutable commit SHA and document the downgrade reason.

Line 37 uses a mutable tag docker/login-action@v3 in an authenticated publish job. Pin it to a commit SHA for supply chain security. Additionally, if the v4 → v3 downgrade is intentional, document the reason in a comment (e.g., self-hosted runners or GHES with Actions Runner < v2.327.1 cannot run v4's Node 24 runtime). Use commit SHA c94ce9fb468520275223c153574b00df6fe4bcc9 for v3 or b45d80f862d83dbcd57f89517bcf500b2ab88fb2 for v4.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In @.github/workflows/docker-build-push.yml around lines 36 - 41, Replace the
mutable action reference "docker/login-action@v3" in the "Log in to GHCR" step
with the provided immutable commit SHA (e.g., use
c94ce9fb468520275223c153574b00df6fe4bcc9 for v3 or
b45d80f862d83dbcd57f89517bcf500b2ab88fb2 for v4) to pin the action for
supply-chain security, and add a brief comment above that step explaining why v3
is used (if you intentionally downgraded from v4 — e.g., compatibility with
self-hosted runners or GHES Actions Runner < v2.327.1) so the rationale is
recorded.
🧹 Nitpick comments (1)
pkg/da/types/namespace_test.go (1)

39-42: Unreachable return after t.Fatal.

t.Fatal already terminates the test by calling runtime.Goexit(), making the subsequent return statements unreachable. While harmless, this adds dead code.

That said, some static analyzers don't recognize t.Fatal as a terminating call, so adding return can silence nil-dereference warnings for the code that follows. If that's the intent here, consider adding a brief comment to clarify (e.g., // appease static analysis).

Also applies to: 121-124

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@pkg/da/types/namespace_test.go` around lines 39 - 42, The test contains
unreachable return statements immediately after t.Fatal (e.g., in the nil check
for ns where code calls t.Fatal("expected non-nil namespace but got nil") then
`return`); remove the redundant `return` or, if the `return` was added to
appease static analyzers, keep it but add a short clarifying comment like `//
appease static analysis` next to the `return`; apply the same change to the
other occurrence around the ns-related check later in the file (the block at
lines similar to 121-124).
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@apps/evm/go.mod`:
- Around line 5-8: The apps/evm module contains local replace directives
("replace github.com/evstack/ev-node => ../../" and "replace
github.com/evstack/ev-node/execution/evm => ../../execution/evm") that break
standalone releases; remove these replace lines from the apps/evm go.mod and
instead add equivalent replace entries into a repo-root go.work workspace file
(using Go workspaces supported in Go 1.25.6) so local monorepo development works
but published modules remain usable via go get.

In `@apps/testapp/go.mod`:
- Line 5: The go.mod in apps/testapp contains a relative replace directive
("replace github.com/evstack/ev-node => ../../.") which must be removed because
apps/testapp is a standalone module; delete that replace line from the module's
go.mod and instead wire local module replacements via a workspace file (go.work)
at the monorepo root when developing locally so published/checked-out versions
of github.com/evstack/ev-node/apps/testapp do not depend on a relative path.

---

Outside diff comments:
In @.github/workflows/docker-build-push.yml:
- Around line 36-41: Replace the mutable action reference
"docker/login-action@v3" in the "Log in to GHCR" step with the provided
immutable commit SHA (e.g., use c94ce9fb468520275223c153574b00df6fe4bcc9 for v3
or b45d80f862d83dbcd57f89517bcf500b2ab88fb2 for v4) to pin the action for
supply-chain security, and add a brief comment above that step explaining why v3
is used (if you intentionally downgraded from v4 — e.g., compatibility with
self-hosted runners or GHES Actions Runner < v2.327.1) so the rationale is
recorded.

---

Nitpick comments:
In `@pkg/da/types/namespace_test.go`:
- Around line 39-42: The test contains unreachable return statements immediately
after t.Fatal (e.g., in the nil check for ns where code calls t.Fatal("expected
non-nil namespace but got nil") then `return`); remove the redundant `return`
or, if the `return` was added to appease static analyzers, keep it but add a
short clarifying comment like `// appease static analysis` next to the `return`;
apply the same change to the other occurrence around the ns-related check later
in the file (the block at lines similar to 121-124).

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: c976c110-dc88-455d-ab2c-185b8095ee7a

📥 Commits

Reviewing files that changed from the base of the PR and between 3fbdfae and 662c81e.

⛔ Files ignored due to path filters (2)
  • apps/evm/go.sum is excluded by !**/*.sum
  • apps/testapp/go.sum is excluded by !**/*.sum
📒 Files selected for processing (6)
  • .github/workflows/docker-build-push.yml
  • .github/workflows/release-apps.yml
  • apps/evm/go.mod
  • apps/testapp/go.mod
  • node/execution_test.go
  • pkg/da/types/namespace_test.go

Comment on lines +5 to +8
replace (
github.com/evstack/ev-node => ../../
github.com/evstack/ev-node/execution/evm => ../../execution/evm
)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

🧩 Analysis chain

🏁 Script executed:

#!/bin/bash
set -euo pipefail

python - <<'PY'
from pathlib import Path

mod = Path("apps/evm/go.mod")
module_dir = mod.parent.resolve()

for raw in mod.read_text().splitlines():
    line = raw.strip()
    if "=>" not in line or line.startswith("//"):
        continue

    left, right = [part.strip() for part in line.split("=>", 1)]
    if left.startswith("replace "):
        target = right.split()[0]
    elif left.startswith("github.com/"):
        target = right.split()[0]
    else:
        continue

    if target.startswith("."):
        resolved = (module_dir / target).resolve()
        escapes = module_dir not in (resolved, *resolved.parents)
        print(f"{mod}: {target} -> {resolved}")
        print(f"escapes module dir: {escapes}")
PY

Repository: evstack/ev-node

Length of output: 227


🏁 Script executed:

# Check if go.work exists in repo root
ls -la go.work 2>/dev/null || echo "go.work not found"

# Check the actual module path in apps/evm/go.mod
head -5 apps/evm/go.mod

# Check Go version requirement
head -3 go.mod

Repository: evstack/ev-node

Length of output: 186


🏁 Script executed:

# Search for any references to go.work or workspace setup
rg "go.work" --type md --type txt --type sh .

# Check if there are any build/CI configurations that might show intended usage
fd -t f -e yml -e yaml -e Makefile | head -20

# Look for any documentation about building apps/evm standalone
rg "apps/evm" --type md . -A 3

Repository: evstack/ev-node

Length of output: 1538


These local replace directives break standalone releases of apps/evm.

The RELEASE.md documents apps/evm as a standalone module released via go get github.com/evstack/ev-node/apps/evm@v0.3.0. When users obtain this module, the ../../ and ../../execution/evm paths will not exist, making the module unusable.

Move these replacements to a go.work file in the repository root to manage local monorepo development without shipping broken module files. Go 1.25.6 fully supports workspaces.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@apps/evm/go.mod` around lines 5 - 8, The apps/evm module contains local
replace directives ("replace github.com/evstack/ev-node => ../../" and "replace
github.com/evstack/ev-node/execution/evm => ../../execution/evm") that break
standalone releases; remove these replace lines from the apps/evm go.mod and
instead add equivalent replace entries into a repo-root go.work workspace file
(using Go workspaces supported in Go 1.25.6) so local monorepo development works
but published modules remain usable via go get.

go 1.25.6

//replace github.com/evstack/ev-node => ../../.
replace github.com/evstack/ev-node => ../../.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

🧩 Analysis chain

🏁 Script executed:

#!/bin/bash
set -euo pipefail

python - <<'PY'
from pathlib import Path

mod = Path("apps/testapp/go.mod")
module_dir = mod.parent.resolve()

for raw in mod.read_text().splitlines():
    line = raw.strip()
    if "=>" not in line or line.startswith("//"):
        continue

    left, right = [part.strip() for part in line.split("=>", 1)]
    if left.startswith("replace "):
        target = right.split()[0]
    elif left.startswith("github.com/"):
        target = right.split()[0]
    else:
        continue

    if target.startswith("."):
        resolved = (module_dir / target).resolve()
        escapes = module_dir not in (resolved, *resolved.parents)
        print(f"{mod}: {target} -> {resolved}")
        print(f"escapes module dir: {escapes}")
PY

Repository: evstack/ev-node

Length of output: 134


Do not commit a relative replace in this nested module.

apps/testapp is its own module. A versioned checkout of github.com/evstack/ev-node/apps/testapp will not contain the target at ../../., breaking standalone builds of this module outside a full monorepo checkout. Use go.work for local wiring instead of shipping relative replace directives in published go.mod files.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@apps/testapp/go.mod` at line 5, The go.mod in apps/testapp contains a
relative replace directive ("replace github.com/evstack/ev-node => ../../.")
which must be removed because apps/testapp is a standalone module; delete that
replace line from the module's go.mod and instead wire local module replacements
via a workspace file (go.work) at the monorepo root when developing locally so
published/checked-out versions of github.com/evstack/ev-node/apps/testapp do not
depend on a relative path.

@julienrbrt julienrbrt added this pull request to the merge queue Mar 10, 2026
Merged via the queue into main with commit 7c829ca Mar 10, 2026
33 of 34 checks passed
@julienrbrt julienrbrt deleted the julien/cache branch March 10, 2026 18:45
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants