execution, cl: persist initial-cycle lifecycle by block TTL by JkLondon · Pull Request #20895 · erigontech/erigon

JkLondon · 2026-04-28T23:03:16Z

closes #20854

Summary

Fix IsInitialCycle startup semantics so a restarted synced node no longer looks like a fresh initial sync just because the process restarted.

This replaces the process-local startup flag and duration heuristic with persisted tip-reached state measured in blocks.

Changes

Add --sync.initial-cycle-block-ttl / Sync.InitialCycleBlockTTL with default 1024.
Persist the last block where the node reached tip in kv.DatabaseInfo.
Add rawdb accessors for reading, writing, and deleting the tip-reached marker.
Add stagedsync lifecycle helpers that compute initial-cycle from:
- persisted tip-reached marker
- headers progress
- frozen blocks
- optional known-tip hints
- Finish progress
- block TTL
Use the DB-backed lifecycle decision in staged-loop startup, staged-loop iterations, and execmodule startup processing.
Update the tip-reached marker after successful sync/prune when Finish has caught known tip.
Clear the marker from reset paths.
Set header downloader initial mode from the same lifecycle decision.
Propagate embedded Caplin known-tip hints into execmodule startup processing.
Fetch a best-effort CL head execution block hint from checkpoint sync endpoints so the first startup cycle sees the actual head, not only finalized checkpoint state.
Preserve FCU request-local limitedBigJump behavior while allowing lifecycle initial-cycle to participate in FCU pipeline mode.

…es/otterprunefixes

Copilot

Pull request overview

Persists “initial sync cycle” lifecycle state across restarts by writing a “tip reached” marker to the DB and deriving initial-cycle status from progress deltas over a block-based TTL, rather than a process-local startup heuristic.

Changes:

Add Sync.InitialCycleBlockTTL / --sync.initial-cycle-block-ttl (default 1024) and snapshot it in config tests.
Persist and manage a “last tip reached block” marker in kv.DatabaseInfo, including reset-path cleanup and lifecycle helpers.
Propagate known-tip hints (including Caplin/checkpoint-sync derived hints) into staged sync loop and execmodule FCU handling to compute initial-cycle consistently.

Reviewed changes

Copilot reviewed 22 out of 23 changed files in this pull request and generated 2 comments.

Show a summary per file

File	Description
node/ethconfig/config.go	Adds `InitialCycleBlockTTL` to sync config defaults/struct.
node/cli/flags.go	Introduces CLI flag and wires it into config application.
node/cli/default_flags.go	Registers the new sync lifecycle flag in default flag set.
execution/stagedsync/stageloop/stageloop.go	Switches stageloop to DB-derived initial-cycle decision and updates tip marker during sync/prune.
execution/stagedsync/stage_snapshots.go	Adjusts snapshot prune worker behavior based on `CurrentSyncCycle.IsInitialCycle` (but includes a stray `println`).
execution/stagedsync/stage_execute.go	Uses `CurrentSyncCycle.IsInitialCycle` in pruning behavior (but includes a stray `println` + stack).
execution/stagedsync/rawdbreset/reset_stages.go	Clears the persisted tip-reached marker on reset paths.
execution/stagedsync/initial_cycle_test.go	Adds tests for initial-cycle derivation and tip marker CRUD behavior.
execution/stagedsync/initial_cycle.go	Adds lifecycle helpers for computing initial-cycle and updating the tip marker.
execution/stagedsync/headerdownload/header_algos.go	Adds `SetInitialCycle(bool)` and routes `AfterInitialCycle()` through it.
execution/execmodule/forkchoice.go	Makes FCU pipeline initial-cycle incorporate lifecycle state + limited-big-jump and updates tip marker post-run.
execution/execmodule/executor.go	Adds known-tip hint plumbing and per-iteration initial-cycle decision + post-prune hook to update marker.
execution/execmodule/exec_module.go	Exposes `SetKnownTipHint` on ExecModule for upstream hint propagation.
execution/execmodule/chainreader/chain_reader.go	Forwards known-tip hints to the execution module when supported.
db/rawdb/accessors_chain.go	Adds rawdb accessors for reading/writing/deleting persisted tip-reached marker.
db/kv/kv_interface.go	Extracts `Deleter` interface and embeds it into `Putter`.
cmd/erigon/node/config_snapshot_test.go	Extends config snapshot to include initial-cycle TTL and adds flag/default tests.
cmd/caplin/caplin1/run.go	Propagates Caplin known-tip hints (local + checkpoint-sync head hint) into execution engine at startup.
cl/phase1/execution_client/execution_client_direct.go	Implements `SetKnownTipHint` by forwarding to chainRW.
cl/phase1/core/checkpoint_sync/util.go	Adds helper to fetch best-effort head execution block number from checkpoint endpoints.
cl/phase1/core/checkpoint_sync/remote_checkpoint_sync.go	Implements endpoint translation + JSON parsing to fetch head execution block number.
cl/phase1/core/checkpoint_sync/checkpoint_sync_test.go	Adds tests for head execution block number fetch and URI translation.
.gitignore	Ignores local Go cache/artifact directories.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

…es/otterprunefixes

yperbasis

Correctness

fetchHeadExecutionBlockNumber: deferred Close error is silently dropped (cl/phase1/core/checkpoint_sync/remote_checkpoint_sync.go:111)

resp, err := http.DefaultClient.Do(req)
if err != nil { return 0, err }
defer func() {
err = resp.Body.Close()
}()

err here is the local variable from :=, not a named return. The deferred assignment runs after return 0, … has captured the return value, so the close error is unobservable. Either
ignore it explicitly (_ = resp.Body.Close()) or use named returns and merge:

func (r *RemoteCheckpointSync) fetchHeadExecutionBlockNumber(...) (n uint64, err error) {
...
defer func() {
if cerr := resp.Body.Close(); err == nil { err = cerr }
}()

(Note: GetLatestBeaconState above has the same pattern. The PR is copying a pre-existing bug — but it's worth fixing here since it's a brand-new function, and ideally fixing both.)

PipelineExecutor.SetKnownTipHint is not actually atomic-monotonic (execution/execmodule/executor.go)

if cur := pe.knownTipHint.Load(); blockNum > cur {
pe.knownTipHint.Store(blockNum)
}

This is a classic check-then-store TOCTOU. The comment claims "The value is monotonic (max wins)" but with concurrent writers a higher value can be overwritten by a lower one. Today
the only caller is Caplin startup (single-shot), but the contract advertised in the doc-comment isn't met. Fix with a CAS loop:

for {
cur := pe.knownTipHint.Load()
if blockNum <= cur { return }
if pe.knownTipHint.CompareAndSwap(cur, blockNum) { return }
}

Startup delay: per-URI timeout × N URIs

ReadRemoteHeadExecutionBlockNumber is called synchronously from RunCaplinService (cmd/caplin/caplin1/run.go:218–232). GetHeadExecutionBlockNumber iterates all checkpoint URIs and
applies r.timeout (500ms) to each. Mainnet has 5 default URIs → up to 2.5s of added startup latency if every endpoint is slow/dead. Two suggestions:

Cap with an overall context deadline (context.WithTimeout(ctx, HeadExecutionBlockHintTimeout) once before the loop), or
Race the URIs with errgroup and return on first success.

This is a "best-effort hint" — failing fast matters more than being thorough.

executeBlock's isInitialCycle parameter becomes dead (execution/stagedsync/exec3_serial.go:324)

After replacing the only use site at line 421 with se.accumulator != nil, the isInitialCycle parameter is no longer read. unparam will likely flag this. Either remove the parameter
(and the corresponding arg at line 174) or document why it's retained.

Semantic change: !IsInitialCycle → accumulator != nil for RecentReceipts

Both exec3_serial.go:421 and exec3_parallel.go:295 switch the gate on RecentReceipts.Add from "not in initial cycle" to "accumulator is set". This is justifiable (no subscribers ⇒ no
point populating the cache), but it's a behavior change, not a refactor:

During initial cycle, the parallel/serial executors traditionally skipped RecentReceipts work entirely. Now they do it whenever an accumulator is present.
Conversely, if there's ever a path where accumulator == nil while IsInitialCycle == false, RecentReceipts will no longer be populated where it used to be.

Worth verifying both cases empirically, and ideally the PR description should call this out — right now it's hidden inside a "lifecycle" PR.

Stale tip-reached marker on generic Reset

DeleteTipReached is wired into ResetState, ResetBlocks, and ResetExec, but not into Reset(ctx, db, stagesList...) (reset_stages.go:193). If someone calls Reset(ctx, db,
stages.Finish), Finish progress goes to 0 (which makes IsInitialCycle correctly return true regardless), but the stored marker is left at its old high-water value.
UpdateTipReachedFromProgress then refuses to update until finish climbs above the old marker, so the marker stays stale through the entire re-sync. Cosmetic — not load-bearing for
correctness — but worth either deleting in Reset too or commenting on why we don't bother.

Style / Smaller issues

.gitignore (/.gitignore)

Adds gocache, .gomodcache, _artifacts — these look like personal dev/CI artifacts unrelated to the lifecycle change. Either move to a separate cleanup PR or justify why they belong
here.
Last line lacks a trailing newline (\ No newline at end of file in the diff). Add one.

stage_snapshots.go

Spurious blank line added at line 432; no other change in this file. Drop it.

Tests in checkpoint_sync_test.go

TestRemoteCheckpointSyncHeadExecutionBlockNumber mutates package global clparams.ConfigurableCheckpointsURLs without restoring it (no t.Cleanup). This will cross-contaminate any
test that runs after it in the same package. Save and restore.
Same test uses positional struct initialization: &RemoteCheckpointSync{&clparams.MainnetBeaconConfig, chainspec.MainnetChainID, time.Second}. The factory function
NewRemoteCheckpointSync exists and ReadRemoteHeadExecutionBlockNumber uses named init — match that style for consistency and so a field reorder doesn't silently break the test.

Two KnownTipHintReceiver interfaces with the same name

Defined in both cl/phase1/execution_client/interface.go and execution/execmodule/interface.go. Each is appropriate at its own seam (Caplin → ChainReader → ExecModule), but the
identical name across packages will confuse gopls "go to definition" navigation. Consider renaming one to disambiguate (e.g. ExecKnownTipHintReceiver on the inner one).

headBlockURI URL manipulation

String-finding /eth/v2/ in u.Path is fragile (custom proxy paths can vary). It's a best-effort hint endpoint with graceful degradation, so this is acceptable, but a brief comment
explaining the assumption ("checkpoint URLs are expected to follow …/eth/v2/debug/beacon/states/finalized; we strip everything after /eth/v2/ and substitute the head-block path")
would help future readers.

Default InitialCycleBlockTTL = 1024

At ~12s/slot that's roughly 3.4 hours on mainnet. A node that's been off overnight will re-enter initial cycle. Probably the intended trade-off (better safe than sorry on a long
downtime), but a one-line Why: 1024 ≈ ~3 hours, balancing "don't reuse stale fast-pipeline mode" vs. "don't punish brief restarts" in the flag's Usage string would make the choice
legible to operators.

Test coverage

Good for the pure helpers (IsInitialCycleFromProgress, UpdateTipReachedFromProgress, headBlockURI, the new HTTP path). Gaps:

No integration test exercising the full restart-on-synced-node story end-to-end (write marker → restart → confirm prune uses non-initial-cycle flag). Hard to write but high-value
given how subtle the semantics are.
No test for the Reset* paths confirming the marker is cleared.
No race test for SetKnownTipHint (which, given the issue above, would currently fail under -race if exercised concurrently).

Summary

The persistence approach is the right fix for the underlying bug, and the call-site comments do an honest job of explaining the dual InitialCycle semantics. Before merge I'd want at
minimum:

Fix the defer ... = resp.Body.Close() no-op.
Either fix SetKnownTipHint to be CAS-based or weaken its doc-comment.
Bound the checkpoint-hint fetch with a single overall timeout.
Drop the dead isInitialCycle parameter (or document why it stays).
Restore globals in tests; revert the unrelated .gitignore / whitespace churn or split it out.

The RecentReceipts gate change deserves a separate sentence in the PR description so reviewers don't miss it.

Copilot

Pull request overview

Copilot reviewed 23 out of 24 changed files in this pull request and generated 4 comments.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

 	if execctx.IsDomainAheadOfBlocks(ctx, tx, pe.logger) {
+		if err := stagedsync.UpdateTipReached(tx, pe.blockReader.FrozenBlocks()); err != nil {
+			return err
+		}


+		if headExecutionBlockNumber, ok, err := checkpoint_sync.ReadRemoteHeadExecutionBlockNumber(ctx, beaconConfig, config); err != nil {
+			log.Debug("[Checkpoint Sync] Could not fetch head execution block hint", "err", err)
+		} else if ok {
+			knownTipHint = max(knownTipHint, headExecutionBlockNumber)
+		}


+		pruneInitialCycle, err := stagedsync.IsInitialCycle(tx, e.syncCfg, e.blockReader.FrozenBlocks(), fcuBlockNum)
+		if err != nil {
+			return err
+		}
+		if err := e.pipelineExecutor.RunPrune(e.bacgroundCtx, tx, pruneInitialCycle, pruneTimeout); err != nil {



+	SyncInitialCycleBlockTTLFlag = cli.Uint64Flag{
+		Name:  "sync.initial-cycle-block-ttl",
+		Usage: "Sets the block window for treating a recently synced node as past the initial sync cycle",


JkLondon added 15 commits April 28, 2026 16:57

commit

f0fce35

commit

77d304e

commit

62aeaa4

commit

813c16b

commit

960d51d

commit

3d662c0

commit

d7b7fb4

commit

8debbe9

commit

dc51d1e

commit

5f8cb69

commit

c702fa2

commit

667c4f0

commit

294dd94

commit

6bdfb0a

commit

97e4e2d

JkLondon self-assigned this Apr 28, 2026

JkLondon requested review from AskAlexSharov, Giulio2002, domiwei, mh0lt, sudeepdino008 and yperbasis as code owners April 28, 2026 23:03

Merge remote-tracking branch 'origin/main' into jklondon/prune-featur…

283f24e

…es/otterprunefixes

AskAlexSharov requested a review from Copilot April 29, 2026 02:31

Copilot started reviewing on behalf of AskAlexSharov April 29, 2026 02:31 View session

Copilot AI reviewed Apr 29, 2026

View reviewed changes

Comment thread execution/stagedsync/stage_snapshots.go Outdated

Comment thread execution/stagedsync/stage_execute.go Outdated

info@weblogix.biz and others added 4 commits April 29, 2026 06:06

Merge remote-tracking branch 'origin/main' into jklondon/prune-featur…

3c35b92

…es/otterprunefixes

commit

a4b7ee3

commit

dae8cbe

commit

5a08d85

info@weblogix.biz and others added 15 commits April 29, 2026 14:06

Merge remote-tracking branch 'origin/main' into jklondon/prune-featur…

eb25a35

…es/otterprunefixes

Merge remote-tracking branch 'origin/main' into jklondon/prune-featur…

cda548b

…es/otterprunefixes

Merge remote-tracking branch 'origin/main' into jklondon/prune-featur…

0432c3d

…es/otterprunefixes

Merge remote-tracking branch 'origin/main' into jklondon/prune-featur…

11427d4

…es/otterprunefixes

Merge remote-tracking branch 'origin/main' into jklondon/prune-featur…

7828682

…es/otterprunefixes

commit

bb09ee8

commit

ff1ac9d

commit

1e01fa3

commit

912f18e

Merge remote-tracking branch 'origin/main' into jklondon/prune-featur…

e0042a9

…es/otterprunefixes

Merge remote-tracking branch 'origin/main' into jklondon/prune-featur…

f6b410f

…es/otterprunefixes

commit

ce0fe82

commit

0f0f086

commit

d15892b

commit

9e6975c

yperbasis requested a review from Copilot May 5, 2026 08:05

Copilot started reviewing on behalf of yperbasis May 5, 2026 08:05 View session

yperbasis requested changes May 5, 2026

View reviewed changes

Copilot AI reviewed May 5, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

execution, cl: persist initial-cycle lifecycle by block TTL#20895

execution, cl: persist initial-cycle lifecycle by block TTL#20895
JkLondon wants to merge 35 commits into
mainfrom
jklondon/prune-features/otterprunefixes

JkLondon commented Apr 28, 2026 •

edited

Loading

Uh oh!

Copilot AI left a comment

Uh oh!

Uh oh!

Uh oh!

yperbasis left a comment

Uh oh!

Copilot AI left a comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

JkLondon commented Apr 28, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Changes

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Uh oh!

Uh oh!

yperbasis left a comment

Choose a reason for hiding this comment

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

JkLondon commented Apr 28, 2026 •

edited

Loading