Skip to content

fix: skip validator duties while syncing#418

Open
dicethedev wants to merge 6 commits into
lambdaclass:mainfrom
dicethedev:fix/skip-validator-duties-while-syncing
Open

fix: skip validator duties while syncing#418
dicethedev wants to merge 6 commits into
lambdaclass:mainfrom
dicethedev:fix/skip-validator-duties-while-syncing

Conversation

@dicethedev
Copy link
Copy Markdown
Contributor

@dicethedev dicethedev commented Jun 6, 2026

🗒️ Description / Motivation

What Changed

  • Gate scheduled block proposals using the node’s sync status.
  • Report has_proposal = false to the store when a proposal is skipped.
  • Gate attestation production at interval 1.
  • Log skipped proposal and attestation duties.
  • Keep block processing, fork choice, aggregation, metrics, and key advancement active while syncing.
  • Add tests covering gating, recovery, and network-wide stalls.

Correctness / Behavior Guarantees

  • Proposals and attestations are disabled only while the local node is considered syncing.
  • Duties resume once the node catches up past the hysteresis recovery boundary.
  • Duties remain enabled during a network-wide stall so validators can help the chain recover.
  • Incoming network data and synchronization processing remain unaffected.
  • A skipped proposal is not reported to fork choice as locally available.

Tests Added / Run

  • Added tests verifying:
    • Proposals and attestations are gated while syncing.
    • Duties resume after catching up.
    • Duties remain enabled during network-wide stalls.
  • Ran cargo test -p ethlambda-blockchain --lib --offline.
    • 31 passed; 0 failed.
  • Ran cargo clippy -p ethlambda-blockchain --lib --offline -- -D warnings.
  • Ran cargo fmt --all -- --check.
  • Ran git diff --check.

Related Issues / PRs

✅ Verification Checklist

  • Ran make fmt — clean
  • Ran make lint (clippy with -D warnings) — clean
  • Ran cargo test --workspace --release — all passing

dicethedev and others added 5 commits June 6, 2026 05:47
Co-authored-by: greptile-apps[bot] <165735046+greptile-apps[bot]@users.noreply.github.com>
Co-authored-by: greptile-apps[bot] <165735046+greptile-apps[bot]@users.noreply.github.com>
@greptile-apps
Copy link
Copy Markdown
Contributor

greptile-apps Bot commented Jun 6, 2026

Greptile Summary

This PR gates block proposals and attestations behind a new SyncStatusTracker that uses a lag threshold plus hysteresis, replacing a coarse binary head-slot check. The sync status is now refreshed on every tick rather than only on block receipt, ensuring duties are correctly suppressed before each potential signing action.

  • SyncStatusTracker::update classifies the node as syncing when head_lag > SYNC_LAG_THRESHOLD (4 slots), uses a SYNC_HYSTERESIS_BAND (2 slots) to prevent rapid flapping near the boundary, and overrides to "synced" when network_lag > NETWORK_STALL_THRESHOLD (8 slots) so validators can help a stalled network recover.
  • gate_proposer / duties_allowed are threaded into the interval-0 proposer path and the interval-1 attestation path; a gated proposal correctly passes has_proposal = false to store::on_tick.
  • Ten unit tests cover threshold boundaries, hysteresis, network-stall override, and the gate helpers.

Confidence Score: 4/5

Safe to merge; the gating logic is straightforward and the happy/recovery/stall paths are all tested. A minor arithmetic expression needs hardening before constants evolve.

The sync-gating logic is correct end-to-end: proposals and attestations are blocked at the right intervals, has_proposal is correctly reported to the store when a proposal is skipped, and the hysteresis prevents flapping. The one concern is the bare SYNC_LAG_THRESHOLD - SYNC_HYSTERESIS_BAND subtraction on u64 constants — safe today but would silently lock validators out of all duties in release builds if the band constant were ever set above the threshold.

crates/blockchain/src/lib.rs — the recovery expression and the threshold boundary test

Important Files Changed

Filename Overview
crates/blockchain/src/lib.rs Introduces SyncStatusTracker with threshold + hysteresis logic, gates proposals and attestations via gate_proposer/duties_allowed, moves sync-status refresh to every tick, and adds 10 unit tests. One minor runtime-underflow risk in the recovery expression; test boundary coverage for the threshold transition is incomplete.
crates/blockchain/src/metrics.rs Adds Debug, Clone, Copy, PartialEq, Eq derives to SyncStatus — required by the new unit tests that assert equality on the enum. Purely additive and correct.

Flowchart

%%{init: {'theme': 'neutral'}}%%
flowchart TD
    A[on_tick: interval fires] --> B[update_sync_status]
    B --> C{network_lag >\nNETWORK_STALL_THRESHOLD?}
    C -- yes --> D[syncing = false\nnetwork stall, help recover]
    C -- no --> E{currently syncing?}
    E -- yes --> F{head_lag >\nTHRESHOLD - BAND?}
    F -- yes --> G[syncing = true\nstay syncing]
    F -- no --> H[syncing = false\nrecovered]
    E -- no --> I{head_lag >\nSYNC_LAG_THRESHOLD?}
    I -- yes --> J[syncing = true\nenter syncing]
    I -- no --> K[syncing = false\nremains synced]
    D & G & H & J & K --> L[metrics::set_node_sync_status]
    L --> M{interval == 0 and slot > 0?}
    M -- yes --> N[get_our_proposer]
    N --> O[gate_proposer: syncing?]
    O -- syncing --> P[proposer_validator_id = None\nlog skip\nhas_proposal=false to store]
    O -- synced --> Q[proposer_validator_id = Some\npropose_block]
    M -- no --> R{interval == 1?}
    R -- yes --> S{duties_allowed?}
    S -- yes --> T[produce_attestations]
    S -- no --> U[log skip attestations]
Loading
Prompt To Fix All With AI
Fix the following 2 code review issues. Work through them one at a time, proposing concise fixes.

---

### Issue 1 of 2
crates/blockchain/src/lib.rs:89
The recovery-threshold expression `SYNC_LAG_THRESHOLD - SYNC_HYSTERESIS_BAND` is `u64` arithmetic evaluated at runtime. If a future change makes `SYNC_HYSTERESIS_BAND > SYNC_LAG_THRESHOLD`, this underflows — panicking in debug mode and wrapping to a huge number in release, which would permanently lock the node in the syncing state and disable all validator duties. Using `saturating_sub` makes the expression safe against such accidental changes.

```suggestion
            self.syncing = head_lag > SYNC_LAG_THRESHOLD.saturating_sub(SYNC_HYSTERESIS_BAND);
```

### Issue 2 of 2
crates/blockchain/src/lib.rs:898-916
**Test doesn't verify the syncing boundary at `SYNC_LAG_THRESHOLD + 1`**

`sync_status_allows_lag_through_threshold` loops `0..=SYNC_LAG_THRESHOLD` and asserts `Synced`, but never checks that `lag == SYNC_LAG_THRESHOLD + 1` produces `Syncing`. Without that assertion the test would still pass if the `>` comparison were accidentally changed to `>=` (which would break the boundary by one slot). `sync_status_detects_local_lag_when_fresh_blocks_are_known` covers the "is syncing" case but does so with a separate tracker; adding one extra call inside this test would make the off-by-one boundary self-contained.

Reviews (1): Last reviewed commit: "Merge branch 'fix/sync-status-heuristic'..." | Re-trigger Greptile

Comment thread crates/blockchain/src/lib.rs
Comment thread crates/blockchain/src/lib.rs
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Don't propose/attest blocks during sync

1 participant