Trigger gossip sync immediately on peer connection#1271
Conversation
Send TickNetworkMaintenance to self when a new peer connects, eliminating the initial 0-60s wait before gossip data sync begins. This is critical for WASM nodes that need peer address data from gossip shortly after connecting to a bootnode. Fixes: #1269
There was a problem hiding this comment.
Pull request overview
This PR reduces gossip bootstrap latency by triggering TickNetworkMaintenance immediately after a peer connects, rather than waiting for the next periodic maintenance interval (default 60s). This helps nodes (notably WASM) start gossip syncing soon after connecting to a bootnode and obtain peer address data earlier.
Changes:
- On
PeerConnected, enqueueGossipActorMessage::TickNetworkMaintenanceimmediately to kick off syncing with the new peer. - Add in-code rationale explaining the motivation (avoid initial 0–60s delay) and impact (WASM bootstrapping).
| // Immediately trigger network maintenance to start syncing with | ||
| // the new peer, rather than waiting for the next periodic | ||
| // TickNetworkMaintenance (up to 60s in production). | ||
| // This eliminates the initial gossip sync delay after connecting | ||
| // to a bootnode, which is critical for WASM nodes that need | ||
| // peer addresses from gossip data quickly. | ||
| myself.send_message(GossipActorMessage::TickNetworkMaintenance)?; |
There was a problem hiding this comment.
Triggering TickNetworkMaintenance immediately on PeerConnected introduces a race where the gossip actor may start an active syncer before ReceivedControl has been processed. The active syncer sends SendGossipMessage immediately in its pre_start, and send_message_to_peer ultimately calls state.get_control().expect(...), which will panic if state.control is still None. Consider gating the immediate tick on state.control.is_some(), or deferring the tick until after ReceivedControl is set (e.g., send the tick from the ReceivedControl handler if there are connected peers).
There was a problem hiding this comment.
@copilot apply changes based on this feedback
There was a problem hiding this comment.
Fixed in 48c2886. The TickNetworkMaintenance in PeerConnected is now gated on state.control.is_some(). If a peer connects before control arrives, the tick is sent from the ReceivedControl handler once state.peer_states is non-empty — covering both orderings without risk of a panic.
There was a problem hiding this comment.
Fixed in commit 48c2886. The PeerConnected handler now gates the immediate TickNetworkMaintenance on state.control.is_some(). Additionally, the ReceivedControl handler triggers TickNetworkMaintenance if peer_states is non-empty, so both orderings (control arrives before or after peers connect) are handled correctly without any panic risk.
| // to a bootnode, which is critical for WASM nodes that need | ||
| // peer addresses from gossip data quickly. | ||
| myself.send_message(GossipActorMessage::TickNetworkMaintenance)?; |
There was a problem hiding this comment.
This behavior change is user-visible (removes the initial wait before gossip sync) and seems important for bootnode discovery; it would be good to add a regression test that fails without this change (e.g., set the configured gossip maintenance interval to a very large value and assert that after a peer connects, gossip-derived data/requests happen within a short deadline). This will help prevent future changes from reintroducing the startup sync delay.
There was a problem hiding this comment.
@copilot apply changes based on this feedback
There was a problem hiding this comment.
Added test_gossip_sync_starts_immediately_on_peer_connect in commit 91ab173. The test sets a 1-hour gossip maintenance interval, injects a node announcement into node A, then connects node B and verifies node B syncs the announcement within 2 seconds — far sooner than the interval would allow. Without the PeerConnected immediate tick, this test would fail.
…ontrol Agent-Logs-Url: https://github.com/nervosnetwork/fiber/sessions/5c445ccc-f1ef-4a6d-96aa-4254471fffb9 Co-authored-by: quake <8990+quake@users.noreply.github.com>
Agent-Logs-Url: https://github.com/nervosnetwork/fiber/sessions/42d6ab86-ae76-47da-8ccd-b535b89f7be1 Co-authored-by: quake <8990+quake@users.noreply.github.com>
|
This PR improves gossip sync speed, which is a nice improvement. But I don't think it fully fixes #1269. In the original issue, This might need a separate fix — either WASM nodes should filter for wss-compatible addresses when resolving via pubkey, or the target node needs to announce its wss address in |
|
it's resolved in other PR: #1270
…On Tue, Apr 14, 2026 at 2:55 PM gpBlockchain ***@***.***> wrote:
*gpBlockchain* left a comment (nervosnetwork/fiber#1271)
<#1271 (comment)>
This PR improves gossip sync speed, which is a nice improvement. But I
don't think it fully fixes #1269
<#1269>.
In the original issue, connectPeer({"pubkey": "0291a6..."}) was called
after sleeping 1 min and *didn't return an error* — meaning the gossip
data was already synced. The actual problem is that the synced
NodeAnnouncement for that node only contains a TCP address (
/ip4/.../tcp/8119/...), not a wss address. Since WASM nodes can only
connect via wss, the connection silently fails.
This might need a separate fix — either WASM nodes should filter for
wss-compatible addresses when resolving via pubkey, or the target node
needs to announce its wss address in NodeAnnouncement.
—
Reply to this email directly, view it on GitHub
<#1271 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/AAACGHVJRTZIZIGFN2GCGZL4VXHDJAVCNFSM6AAAAACXYDUGQ6VHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHM2DENBRGU3TSMRQGA>
.
You are receiving this because you authored the thread.Message ID:
***@***.***>
|
There is another question, does the node broadcast the wss address? Because when I queried graph_nodes, I did not see the wss address |
resolved in cca995d |
* fix: push limit to DB layer in list_payments to avoid unbounded memory usage (#1261) * fix: push limit to DB layer in list_payments to avoid unbounded memory usage get_payment_sessions_with_limit previously called collect_by_prefix without a limit, loading ALL payment session KV pairs into memory before applying .filter_map().take(limit) as iterator adapters. Add PrefixIterator::new_from() for cursor-based lazy iteration, and prefix_iter/prefix_iter_from helpers to FiberStore trait. Rewrite get_payment_sessions_with_limit as a simple iterator chain using the lazy batched PrefixIterator, which fetches only 100 entries at a time and stops as soon as enough results are collected. * Update crates/fiber-store/src/iterator.rs Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com> --------- Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com> * feat: update macOS build configuration for portability (#1237) * feat: update macOS build configuration for portability * refactor: simplify macOS build steps by removing Homebrew setup * renaming for keep portable * feat: add official Docker image support (#1244) * feat: add official Docker image support * ci: publish docker images to ghcr and docker hub * fix docker image readme * Bump fiber-rpc-gen to 0.1.22 (#1264) * network: onion & socks5 support for fiber (#1228) * onion & socks5 support for fiber * make CI happy * Add default configuration * fix tor service * use nested structure for onion and proxy config * send MaintainConnections message to NetworkActor when tor is reconnected * make fmt happy * Apply suggestions from code review Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com> * fix some issues pointed out by copilot * fix openroc-json-generator * update ckb-testtool * update cargo.lock * merge newest changes * fix a race condition in onion service start * make fmt happy * Update crates/fiber-lib/src/fiber/onion_service.rs Co-authored-by: Eval Exec <execvy@gmail.com> * Change default onion external port * update * update .gitignore * isolate wasm configuration related to proxy and tor * add timeout check for start_onion_service & retry in `OnionService::start` * Added 3-second delay before sending MaintainConnections * move `proxy` and `onion` related configurations to their individual modules --------- Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com> Co-authored-by: Eval Exec <execvy@gmail.com> * Fix fnn-migrate error messages and update README (#1249) * Initial plan * fix: update fnn-migrate flag from -p to -d in error message and README Agent-Logs-Url: https://github.com/gpBlockchain/fiber/sessions/a6b4f2a5-59de-4e8b-a07f-a3ffb49b7a48 Co-authored-by: gpBlockchain <32102187+gpBlockchain@users.noreply.github.com> * fix: show fiber data directory (without /store) in fnn-migrate error message Agent-Logs-Url: https://github.com/gpBlockchain/fiber/sessions/0de95c55-8af8-4a5c-b2f7-98b4c62c35b2 Co-authored-by: gpBlockchain <32102187+gpBlockchain@users.noreply.github.com> * Apply suggestion from @gpBlockchain * Apply suggestion from @gpBlockchain * Apply suggestion from @gpBlockchain * Update crates/fiber-store/src/db_migrate.rs Co-authored-by: gpBlockchain <32102187+gpBlockchain@users.noreply.github.com> --------- Co-authored-by: copilot-swe-agent[bot] <198982749+Copilot@users.noreply.github.com> Co-authored-by: Quake Wang <quake.wang@gmail.com> * chore(deps): bump tokio from 1.50.0 to 1.51.1 Bumps [tokio](https://github.com/tokio-rs/tokio) from 1.50.0 to 1.51.1. - [Release notes](https://github.com/tokio-rs/tokio/releases) - [Commits](tokio-rs/tokio@tokio-1.50.0...tokio-1.51.1) --- updated-dependencies: - dependency-name: tokio dependency-version: 1.51.1 dependency-type: direct:production update-type: version-update:semver-minor ... Signed-off-by: dependabot[bot] <support@github.com> * docs: refresh public node and network node documentation (#1266) * rename testnet-nodes.md and add network-nodes.md * docs: update README network node links * docs: refine pubkey-based node docs * docs: clarify node2 rpc discovery in public nodes guide * fix: abort funding on insufficient UDT cells (#1195) (#1253) When UDT cells are not yet indexed, the funding transaction builder would fail with a generic TxBuilderError that was silently swallowed (empty tx logged and ignored). This left UDT channels permanently stuck in the negotiating state. - Add FundingError::InsufficientCells variant that is non-temporary, so schedule_funding_retry aborts the channel instead of retrying - Reclassify AbsentTx as temporary so empty funding results get retried via the existing backoff mechanism - Extract map_tx_builder_error to convert the sentinel UDT message into InsufficientCells before it reaches the retry logic - Move FundingError tests to dedicated tests/error.rs module and add coverage for the new error mapping and classification Co-authored-by: ian <ian@cryptape.com> * feat: add gossip metrics benchmarks and CI integration (#1177) * feat: add gossip metrics benchmarks and CI integration - add gossip protocol metrics counters/histograms and active-sync/query observability - add criterion gossip benchmarks for multi-node propagation and sync-recovery - add tests/perf gossip benchmark flows driven by metrics (steady/burst + baseline/compare) - integrate benchmark CI with metrics-enabled startup and bootstrap gossip regression gating * fix: stabilize gossip benchmark runs and move gossip perf to dedicated workflow * chore: ignore generated gossip perf benchmark artifacts * fix: tighten gossip duplicate/rejected metrics and perf timing * ci: move perf benchmark artifacts under tests/perf/artifacts * refactor: move gossip metrics helpers into dedicated modules * chore(deps): bump rand from 0.8.5 to 0.9.3 in /tests/deploy/udt-init Bumps [rand](https://github.com/rust-random/rand) from 0.8.5 to 0.9.3. - [Release notes](https://github.com/rust-random/rand/releases) - [Changelog](https://github.com/rust-random/rand/blob/0.9.3/CHANGELOG.md) - [Commits](rust-random/rand@0.8.5...0.9.3) --- updated-dependencies: - dependency-name: rand dependency-version: 0.9.3 dependency-type: direct:production ... Signed-off-by: dependabot[bot] <support@github.com> * Trigger gossip sync immediately on peer connection (#1271) * Trigger gossip sync immediately on peer connection Send TickNetworkMaintenance to self when a new peer connects, eliminating the initial 0-60s wait before gossip data sync begins. This is critical for WASM nodes that need peer address data from gossip shortly after connecting to a bootnode. Fixes: #1269 * Fix gossip tick race: gate on control.is_some(), defer from ReceivedControl Agent-Logs-Url: https://github.com/nervosnetwork/fiber/sessions/5c445ccc-f1ef-4a6d-96aa-4254471fffb9 Co-authored-by: quake <8990+quake@users.noreply.github.com> * Add regression test for immediate gossip sync on peer connect Agent-Logs-Url: https://github.com/nervosnetwork/fiber/sessions/42d6ab86-ae76-47da-8ccd-b535b89f7be1 Co-authored-by: quake <8990+quake@users.noreply.github.com> --------- Co-authored-by: copilot-swe-agent[bot] <198982749+Copilot@users.noreply.github.com> Co-authored-by: quake <8990+quake@users.noreply.github.com> * feat: add addr_type parameter to connect_peer RPC for transport type filtering (#1270) * feat: add addr_type parameter to connect_peer RPC for transport type filtering When connect_peer is called with only a pubkey, the node randomly selects an address from the peer's known addresses. In WASM environments, this may select a non-WSS address that is unsupported. Add an optional addr_type parameter (tcp/ws/wss) to allow callers to filter addresses by transport type before random selection. * refactor: decouple actor protocol from JSON types and improve error messages - Introduce MultiAddrTransport enum in fiber-types as the internal transport filter type, keeping fiber-json-types::MultiAddrType as the RPC-facing type only - Add From<MultiAddrType> for MultiAddrTransport conversion in fiber-json-types convert.rs (behind conversion feature) - Update NetworkActorCommand::ConnectPeerWithPubkey to use MultiAddrTransport instead of fiber_json_types::MultiAddrType - Convert at the RPC boundary in rpc/peer.rs via .map(Into::into) - Add NoMatchingAddress(Pubkey, MultiAddrTransport) error variant to distinguish 'peer not found' from 'peer has no addresses matching the requested transport type' - Rename matches_addr_type -> matches_addr_transport for clarity * refactor: use tentacle::utils::TransportType and rename MultiAddrType Per review: reuse the existing tentacle::utils::TransportType instead of introducing a custom MultiAddrTransport in fiber-types. - Remove MultiAddrTransport enum and Display impl from fiber-types - Remove From<MultiAddrType> conversion from fiber-json-types/convert.rs - Rename MultiAddrType -> TransportType in fiber-json-types/peer.rs - Update NetworkActorCommand::ConnectPeerWithPubkey to use tentacle::utils::TransportType directly - Remove cfg(not(wasm32)) gate from TransportType import and find_type() (both are available on all targets in tentacle) - Replace matches_addr_transport() with find_type() == transport - Add to_transport_type() conversion in rpc/peer.rs at the RPC boundary - Update NoMatchingAddress error to use tentacle::utils::TransportType - Update TypeScript types and regenerate RPC docs * fix: allow DNS-based WSS addresses to pass the private address filter The private address filter used multiaddr_to_socketaddr() which only handles Ip4/Ip6 protocols, silently dropping DNS-based addresses like /dns4/example.com/tcp/443/wss. This prevented WSS addresses from being broadcast in node announcements. Add is_addr_reachable() helper that treats Dns4/Dns6 addresses as always reachable (since DNS implies a public endpoint), while preserving the existing IP-based reachability check. Applied at all three filter locations: announcement creation, graph ingestion, and gossip processing. * feat(cch): default final TLC expiry deltas to 60 hours (#1258) * feat(cch): default final TLC expiry deltas to 60 hours Raise DEFAULT_BTC_FINAL_TLC_EXPIRY_DELTA_BLOCKS to 360 (~10 min/block) and DEFAULT_CKB_FINAL_TLC_EXPIRY_DELTA_SECONDS to 216,000. Update CCH actor tests that assumed the previous 30h defaults. Made-with: Cursor * test(cch): replace expiry magic numbers with named constants Use BTC_BLOCK_TIME_SECS, DEFAULT_BTC_FINAL_TLC_EXPIRY_DELTA_BLOCKS, DEFAULT_CKB_FINAL_TLC_EXPIRY_DELTA_SECONDS, and per-test scenario consts in CCH actor tests. Made-with: Cursor --------- Co-authored-by: ian <ian@cryptape.com> * Local RPC method not found should not return unauthorized (#1235) * chore: bump version to v0.8.1 (#1274) --------- Signed-off-by: dependabot[bot] <support@github.com> Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com> Co-authored-by: Yukang <moorekang@gmail.com> Co-authored-by: Officeyutong <yt.xyxx@gmail.com> Co-authored-by: Eval Exec <execvy@gmail.com> Co-authored-by: gpBlockchain <32102187+gpBlockchain@users.noreply.github.com> Co-authored-by: copilot-swe-agent[bot] <198982749+Copilot@users.noreply.github.com> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com> Co-authored-by: sunchengzhu <36075573+sunchengzhu@users.noreply.github.com> Co-authored-by: ian <me@iany.me> Co-authored-by: ian <ian@cryptape.com> Co-authored-by: swananan <jt26wzz@gmail.com> Co-authored-by: quake <8990+quake@users.noreply.github.com> Co-authored-by: jjy <jjyruby@gmail.com>
Send TickNetworkMaintenance to self when a new peer connects, eliminating the initial 0-60s wait before gossip data sync begins. This is critical for WASM nodes that need peer address data from gossip shortly after connecting to a bootnode.
Fixes: #1269