Skip to content

[PR #32] Exponential backoff has no jitter — thundering herd on BSC WebSocket provider #92

@obchain

Description

@obchain

PR: #32 feat/07-block-listener
File: crates/charon-scanner/src/listener.rs — BlockListener::run loop

Problem

Reconnect loop doubles backoff on every failure with no random jitter:

backoff = (backoff * 2).min(Duration::from_secs(30));

When BSC WS provider restarts or rate-limits (common on public endpoints), every listener task that disconnected at same instant retries at identical intervals: 2s, 4s, 8s, … 30s. Requests arrive in lockstep, overwhelming provider and triggering another rejection wave — thundering-herd loop.

Impact

All chain listeners share same WS endpoint config. Provider blip that disconnects all listeners simultaneously causes correlated retry storms keeping bot offline far longer than underlying outage.

Fix

Add 0-25% random jitter before sleeping:

use rand::Rng;
let jitter = rand::thread_rng().gen_range(0..=backoff.as_millis() / 4);
tokio::time::sleep(backoff + Duration::from_millis(jitter as u64)).await;
backoff = (backoff * 2).min(Duration::from_secs(30));

Add rand to workspace deps. 30s cap is correct; only jitter missing.

Metadata

Metadata

Assignees

No one assigned

    Labels

    layer:rustRust crates (core / scanner / protocols / executor / cli)pr-reviewFindings from PR review processpriority:p1-coreCore MVP scopetype:featureNew capability or deliverable

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions