Skip to content

RGS sync 404s after the first sync: persisted latest_rgs_snapshot_timestamp isn't aligned to the RGS server's snapshot cadence #201

@vincenzopalazzo

Description

@vincenzopalazzo

Summary

After a successful RGS sync, update_rgs_snapshot() (in ldk-node src/gossip.rs) persists latest_rgs_snapshot_timestamp from the value returned by update_network_graph() — which is the snapshot's internal latest_seen_timestamp (e.g. mid-day UTC). The reference RGS server (https://rapidsync.lightningdevkit.org) only serves snapshots at 24h-aligned timestamps (00:00 UTC). On the next periodic sync, LDK requests /snapshot/v2/<mid-day-ts> and gets HTTP 404, so every subsequent sync fails until the persisted value is wiped.

Effect: a fresh node syncs once (works), then Background sync of RGS gossip data failed: Failed to update gossip data repeats every interval and across all restarts.

Filing here in ldk-server because that's the layer I'm running, but the fix is upstream in ldk-node.

Reproduction

  1. Run ldk-server with rgs_server_url = "https://rapidsync.lightningdevkit.org/snapshot/v2/" (the contrib default).
  2. First sync succeeds.
  3. get_node_info().latest_rgs_snapshot_timestamp is e.g. 1778068800 = 2026-05-06 12:00:00 UTC.
  4. On any subsequent sync (or restart), the failure repeats.

Server behavior, confirmed manually:

for ts in 1778025600 1777939200 1777852800 1778068800; do
  code=$(curl -sS -o /dev/null -w "%{http_code}" "https://rapidsync.lightningdevkit.org/snapshot/v2/$ts")
  echo "ts=$ts ($(date -u -d @$ts '+%F %T')): $code"
done
ts=1778025600 (2026-05-06 00:00:00): 200
ts=1777939200 (2026-05-05 00:00:00): 200
ts=1777852800 (2026-05-04 00:00:00): 200
ts=1778068800 (2026-05-06 12:00:00): 404

Root cause

In ldk-node src/gossip.rs update_rgs_snapshot():

200 => {
    let new_latest_sync_timestamp =
        gossip_sync.update_network_graph(response.as_bytes()).map_err(...)?;
    latest_sync_timestamp.store(new_latest_sync_timestamp, Ordering::Release);
    Ok(new_latest_sync_timestamp)
}

update_network_graph() reads latest_seen_timestamp from the snapshot binary header (rust-lightning lightning-rapid-gossip-sync/src/processing.rs:90) and returns it — that's the most recent gossip-message timestamp inside the snapshot, not the snapshot's URL/cadence timestamp.

Workaround

Delete the node_metrics row from ldk_node_data.sqlite before each daemon start; forces a fresh /snapshot/v2/0. Functional but ugly:

DELETE FROM ldk_node_data WHERE primary_namespace='' AND secondary_namespace='' AND key='node_metrics';

Suggested fix

Round new_latest_sync_timestamp down to the nearest 24h boundary before persisting (matches the reference RGS server's cadence and keeps deltas small). Could be a const or made configurable.

Alternatively, persist the request URL's query_timestamp instead of the snapshot's internal one — but then deltas always start from the previously-requested boundary, which means full re-fetch when starting from 0.

If the canonical contract is that any client-supplied timestamp must be honored, this is a server bug instead — but updating deployed clients is the more tractable fix.

Environment

  • ldk-server @ 50fe752
  • ldk-node 0.8.0+git @ 21eea8c
  • lightning-rapid-gossip-sync 0.3.0+git @ 38a62c3
  • Network: bitcoin (mainnet)
  • Bitcoin backend: [esplora] server_url = "https://mempool.space/api"
  • Linux x86_64

Closest existing ldk-node issue: #615 (different — shutdown race). This one is steady-state.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions