Skip to content

Trigger gossip sync immediately on peer connection#1271

Merged
quake merged 3 commits intodevelopfrom
agent/quake-opencode-wsl/8ee5820a
Apr 15, 2026
Merged

Trigger gossip sync immediately on peer connection#1271
quake merged 3 commits intodevelopfrom
agent/quake-opencode-wsl/8ee5820a

Conversation

@quake
Copy link
Copy Markdown
Member

@quake quake commented Apr 14, 2026

Send TickNetworkMaintenance to self when a new peer connects, eliminating the initial 0-60s wait before gossip data sync begins. This is critical for WASM nodes that need peer address data from gossip shortly after connecting to a bootnode.

Fixes: #1269

Send TickNetworkMaintenance to self when a new peer connects, eliminating
the initial 0-60s wait before gossip data sync begins. This is critical
for WASM nodes that need peer address data from gossip shortly after
connecting to a bootnode.

Fixes: #1269
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR reduces gossip bootstrap latency by triggering TickNetworkMaintenance immediately after a peer connects, rather than waiting for the next periodic maintenance interval (default 60s). This helps nodes (notably WASM) start gossip syncing soon after connecting to a bootnode and obtain peer address data earlier.

Changes:

  • On PeerConnected, enqueue GossipActorMessage::TickNetworkMaintenance immediately to kick off syncing with the new peer.
  • Add in-code rationale explaining the motivation (avoid initial 0–60s delay) and impact (WASM bootstrapping).

Comment thread crates/fiber-lib/src/fiber/gossip.rs Outdated
Comment on lines +2685 to +2691
// Immediately trigger network maintenance to start syncing with
// the new peer, rather than waiting for the next periodic
// TickNetworkMaintenance (up to 60s in production).
// This eliminates the initial gossip sync delay after connecting
// to a bootnode, which is critical for WASM nodes that need
// peer addresses from gossip data quickly.
myself.send_message(GossipActorMessage::TickNetworkMaintenance)?;
Copy link

Copilot AI Apr 14, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Triggering TickNetworkMaintenance immediately on PeerConnected introduces a race where the gossip actor may start an active syncer before ReceivedControl has been processed. The active syncer sends SendGossipMessage immediately in its pre_start, and send_message_to_peer ultimately calls state.get_control().expect(...), which will panic if state.control is still None. Consider gating the immediate tick on state.control.is_some(), or deferring the tick until after ReceivedControl is set (e.g., send the tick from the ReceivedControl handler if there are connected peers).

Copilot uses AI. Check for mistakes.
Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@copilot apply changes based on this feedback

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed in 48c2886. The TickNetworkMaintenance in PeerConnected is now gated on state.control.is_some(). If a peer connects before control arrives, the tick is sent from the ReceivedControl handler once state.peer_states is non-empty — covering both orderings without risk of a panic.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed in commit 48c2886. The PeerConnected handler now gates the immediate TickNetworkMaintenance on state.control.is_some(). Additionally, the ReceivedControl handler triggers TickNetworkMaintenance if peer_states is non-empty, so both orderings (control arrives before or after peers connect) are handled correctly without any panic risk.

Comment thread crates/fiber-lib/src/fiber/gossip.rs Outdated
Comment on lines +2689 to +2691
// to a bootnode, which is critical for WASM nodes that need
// peer addresses from gossip data quickly.
myself.send_message(GossipActorMessage::TickNetworkMaintenance)?;
Copy link

Copilot AI Apr 14, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This behavior change is user-visible (removes the initial wait before gossip sync) and seems important for bootnode discovery; it would be good to add a regression test that fails without this change (e.g., set the configured gossip maintenance interval to a very large value and assert that after a peer connects, gossip-derived data/requests happen within a short deadline). This will help prevent future changes from reintroducing the startup sync delay.

Copilot uses AI. Check for mistakes.
Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@copilot apply changes based on this feedback

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added test_gossip_sync_starts_immediately_on_peer_connect in commit 91ab173. The test sets a 1-hour gossip maintenance interval, injects a node announcement into node A, then connects node B and verifies node B syncs the announcement within 2 seconds — far sooner than the interval would allow. Without the PeerConnected immediate tick, this test would fail.

@gpBlockchain
Copy link
Copy Markdown
Contributor

This PR improves gossip sync speed, which is a nice improvement. But I don't think it fully fixes #1269.

In the original issue, connectPeer({"pubkey": "0291a6..."}) was called after sleeping 1 min and didn't return an error — meaning the gossip data was already synced. The actual problem is that the synced NodeAnnouncement for that node only contains a TCP address (/ip4/.../tcp/8119/...), not a wss address. Since WASM nodes can only connect via wss, the connection silently fails.

This might need a separate fix — either WASM nodes should filter for wss-compatible addresses when resolving via pubkey, or the target node needs to announce its wss address in NodeAnnouncement.

@quake
Copy link
Copy Markdown
Member Author

quake commented Apr 14, 2026 via email

@gpBlockchain
Copy link
Copy Markdown
Contributor

it's resolved in other PR: #1270

On Tue, Apr 14, 2026 at 2:55 PM gpBlockchain @.> wrote: gpBlockchain left a comment (nervosnetwork/fiber#1271) <#1271 (comment)> This PR improves gossip sync speed, which is a nice improvement. But I don't think it fully fixes #1269 <#1269>. In the original issue, connectPeer({"pubkey": "0291a6..."}) was called after sleeping 1 min and didn't return an error — meaning the gossip data was already synced. The actual problem is that the synced NodeAnnouncement for that node only contains a TCP address ( /ip4/.../tcp/8119/...), not a wss address. Since WASM nodes can only connect via wss, the connection silently fails. This might need a separate fix — either WASM nodes should filter for wss-compatible addresses when resolving via pubkey, or the target node needs to announce its wss address in NodeAnnouncement. — Reply to this email directly, view it on GitHub <#1271 (comment)>, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAACGHVJRTZIZIGFN2GCGZL4VXHDJAVCNFSM6AAAAACXYDUGQ6VHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHM2DENBRGU3TSMRQGA . You are receiving this because you authored the thread.Message ID: @.>

There is another question, does the node broadcast the wss address? Because when I queried graph_nodes, I did not see the wss address

@quake
Copy link
Copy Markdown
Member Author

quake commented Apr 14, 2026

There is another question, does the node broadcast the wss address? Because when I queried graph_nodes, I did not see the wss address

resolved in cca995d

@quake quake merged commit 0d2b574 into develop Apr 15, 2026
102 of 104 checks passed
quake added a commit that referenced this pull request Apr 15, 2026
* fix: push limit to DB layer in list_payments to avoid unbounded memory usage (#1261)

* fix: push limit to DB layer in list_payments to avoid unbounded memory usage

get_payment_sessions_with_limit previously called collect_by_prefix
without a limit, loading ALL payment session KV pairs into memory
before applying .filter_map().take(limit) as iterator adapters.

Add PrefixIterator::new_from() for cursor-based lazy iteration, and
prefix_iter/prefix_iter_from helpers to FiberStore trait. Rewrite
get_payment_sessions_with_limit as a simple iterator chain using the
lazy batched PrefixIterator, which fetches only 100 entries at a time
and stops as soon as enough results are collected.

* Update crates/fiber-store/src/iterator.rs

Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>

---------

Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>

* feat: update macOS build configuration for portability (#1237)

* feat: update macOS build configuration for portability

* refactor: simplify macOS build steps by removing Homebrew setup

* renaming for keep portable

* feat: add official Docker image support (#1244)

* feat: add official Docker image support

* ci: publish docker images to ghcr and docker hub

* fix docker image readme

* Bump fiber-rpc-gen to 0.1.22 (#1264)

* network: onion & socks5 support for fiber (#1228)

* onion & socks5 support for fiber

* make CI happy

* Add default configuration

* fix tor service

* use nested structure for onion and proxy config

* send MaintainConnections message to NetworkActor when tor is reconnected

* make fmt happy

* Apply suggestions from code review

Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>

* fix some issues pointed out by copilot

* fix openroc-json-generator

* update ckb-testtool

* update cargo.lock

* merge newest changes

* fix a race condition in onion service start

* make fmt happy

* Update crates/fiber-lib/src/fiber/onion_service.rs

Co-authored-by: Eval Exec <execvy@gmail.com>

* Change default onion external port

* update

* update .gitignore

* isolate wasm configuration related to proxy and tor

* add timeout check for start_onion_service & retry in `OnionService::start`

* Added 3-second delay before sending MaintainConnections

* move `proxy` and `onion` related configurations to their individual modules

---------

Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
Co-authored-by: Eval Exec <execvy@gmail.com>

* Fix fnn-migrate error messages and update README (#1249)

* Initial plan

* fix: update fnn-migrate flag from -p to -d in error message and README

Agent-Logs-Url: https://github.com/gpBlockchain/fiber/sessions/a6b4f2a5-59de-4e8b-a07f-a3ffb49b7a48

Co-authored-by: gpBlockchain <32102187+gpBlockchain@users.noreply.github.com>

* fix: show fiber data directory (without /store) in fnn-migrate error message

Agent-Logs-Url: https://github.com/gpBlockchain/fiber/sessions/0de95c55-8af8-4a5c-b2f7-98b4c62c35b2

Co-authored-by: gpBlockchain <32102187+gpBlockchain@users.noreply.github.com>

* Apply suggestion from @gpBlockchain

* Apply suggestion from @gpBlockchain

* Apply suggestion from @gpBlockchain

* Update crates/fiber-store/src/db_migrate.rs

Co-authored-by: gpBlockchain <32102187+gpBlockchain@users.noreply.github.com>

---------

Co-authored-by: copilot-swe-agent[bot] <198982749+Copilot@users.noreply.github.com>
Co-authored-by: Quake Wang <quake.wang@gmail.com>

* chore(deps): bump tokio from 1.50.0 to 1.51.1

Bumps [tokio](https://github.com/tokio-rs/tokio) from 1.50.0 to 1.51.1.
- [Release notes](https://github.com/tokio-rs/tokio/releases)
- [Commits](tokio-rs/tokio@tokio-1.50.0...tokio-1.51.1)

---
updated-dependencies:
- dependency-name: tokio
  dependency-version: 1.51.1
  dependency-type: direct:production
  update-type: version-update:semver-minor
...

Signed-off-by: dependabot[bot] <support@github.com>

* docs: refresh public node and network node documentation (#1266)

* rename testnet-nodes.md and add network-nodes.md

* docs: update README network node links

* docs: refine pubkey-based node docs

* docs: clarify node2 rpc discovery in public nodes guide

* fix: abort funding on insufficient UDT cells (#1195) (#1253)

When UDT cells are not yet indexed, the funding transaction builder
would fail with a generic TxBuilderError that was silently swallowed
(empty tx logged and ignored). This left UDT channels permanently
stuck in the negotiating state.

- Add FundingError::InsufficientCells variant that is non-temporary,
  so schedule_funding_retry aborts the channel instead of retrying
- Reclassify AbsentTx as temporary so empty funding results get
  retried via the existing backoff mechanism
- Extract map_tx_builder_error to convert the sentinel UDT message
  into InsufficientCells before it reaches the retry logic
- Move FundingError tests to dedicated tests/error.rs module and add
  coverage for the new error mapping and classification

Co-authored-by: ian <ian@cryptape.com>

* feat: add gossip metrics benchmarks and CI integration (#1177)

* feat: add gossip metrics benchmarks and CI integration

- add gossip protocol metrics counters/histograms and active-sync/query observability

- add criterion gossip benchmarks for multi-node propagation and sync-recovery

- add tests/perf gossip benchmark flows driven by metrics (steady/burst + baseline/compare)

- integrate benchmark CI with metrics-enabled startup and bootstrap gossip regression gating

* fix: stabilize gossip benchmark runs and move gossip perf to dedicated workflow

* chore: ignore generated gossip perf benchmark artifacts

* fix: tighten gossip duplicate/rejected metrics and perf timing

* ci: move perf benchmark artifacts under tests/perf/artifacts

* refactor: move gossip metrics helpers into dedicated modules

* chore(deps): bump rand from 0.8.5 to 0.9.3 in /tests/deploy/udt-init

Bumps [rand](https://github.com/rust-random/rand) from 0.8.5 to 0.9.3.
- [Release notes](https://github.com/rust-random/rand/releases)
- [Changelog](https://github.com/rust-random/rand/blob/0.9.3/CHANGELOG.md)
- [Commits](rust-random/rand@0.8.5...0.9.3)

---
updated-dependencies:
- dependency-name: rand
  dependency-version: 0.9.3
  dependency-type: direct:production
...

Signed-off-by: dependabot[bot] <support@github.com>

* Trigger gossip sync immediately on peer connection (#1271)

* Trigger gossip sync immediately on peer connection

Send TickNetworkMaintenance to self when a new peer connects, eliminating
the initial 0-60s wait before gossip data sync begins. This is critical
for WASM nodes that need peer address data from gossip shortly after
connecting to a bootnode.

Fixes: #1269

* Fix gossip tick race: gate on control.is_some(), defer from ReceivedControl

Agent-Logs-Url: https://github.com/nervosnetwork/fiber/sessions/5c445ccc-f1ef-4a6d-96aa-4254471fffb9

Co-authored-by: quake <8990+quake@users.noreply.github.com>

* Add regression test for immediate gossip sync on peer connect

Agent-Logs-Url: https://github.com/nervosnetwork/fiber/sessions/42d6ab86-ae76-47da-8ccd-b535b89f7be1

Co-authored-by: quake <8990+quake@users.noreply.github.com>

---------

Co-authored-by: copilot-swe-agent[bot] <198982749+Copilot@users.noreply.github.com>
Co-authored-by: quake <8990+quake@users.noreply.github.com>

* feat: add addr_type parameter to connect_peer RPC for transport type filtering (#1270)

* feat: add addr_type parameter to connect_peer RPC for transport type filtering

When connect_peer is called with only a pubkey, the node randomly selects
an address from the peer's known addresses. In WASM environments, this may
select a non-WSS address that is unsupported. Add an optional addr_type
parameter (tcp/ws/wss) to allow callers to filter addresses by transport
type before random selection.

* refactor: decouple actor protocol from JSON types and improve error messages

- Introduce MultiAddrTransport enum in fiber-types as the internal
  transport filter type, keeping fiber-json-types::MultiAddrType as the
  RPC-facing type only
- Add From<MultiAddrType> for MultiAddrTransport conversion in
  fiber-json-types convert.rs (behind conversion feature)
- Update NetworkActorCommand::ConnectPeerWithPubkey to use
  MultiAddrTransport instead of fiber_json_types::MultiAddrType
- Convert at the RPC boundary in rpc/peer.rs via .map(Into::into)
- Add NoMatchingAddress(Pubkey, MultiAddrTransport) error variant to
  distinguish 'peer not found' from 'peer has no addresses matching
  the requested transport type'
- Rename matches_addr_type -> matches_addr_transport for clarity

* refactor: use tentacle::utils::TransportType and rename MultiAddrType

Per review: reuse the existing tentacle::utils::TransportType instead
of introducing a custom MultiAddrTransport in fiber-types.

- Remove MultiAddrTransport enum and Display impl from fiber-types
- Remove From<MultiAddrType> conversion from fiber-json-types/convert.rs
- Rename MultiAddrType -> TransportType in fiber-json-types/peer.rs
- Update NetworkActorCommand::ConnectPeerWithPubkey to use
  tentacle::utils::TransportType directly
- Remove cfg(not(wasm32)) gate from TransportType import and find_type()
  (both are available on all targets in tentacle)
- Replace matches_addr_transport() with find_type() == transport
- Add to_transport_type() conversion in rpc/peer.rs at the RPC boundary
- Update NoMatchingAddress error to use tentacle::utils::TransportType
- Update TypeScript types and regenerate RPC docs

* fix: allow DNS-based WSS addresses to pass the private address filter

The private address filter used multiaddr_to_socketaddr() which only
handles Ip4/Ip6 protocols, silently dropping DNS-based addresses like
/dns4/example.com/tcp/443/wss. This prevented WSS addresses from being
broadcast in node announcements.

Add is_addr_reachable() helper that treats Dns4/Dns6 addresses as
always reachable (since DNS implies a public endpoint), while preserving
the existing IP-based reachability check. Applied at all three filter
locations: announcement creation, graph ingestion, and gossip processing.

* feat(cch): default final TLC expiry deltas to 60 hours (#1258)

* feat(cch): default final TLC expiry deltas to 60 hours

Raise DEFAULT_BTC_FINAL_TLC_EXPIRY_DELTA_BLOCKS to 360 (~10 min/block)
and DEFAULT_CKB_FINAL_TLC_EXPIRY_DELTA_SECONDS to 216,000. Update CCH
actor tests that assumed the previous 30h defaults.

Made-with: Cursor

* test(cch): replace expiry magic numbers with named constants

Use BTC_BLOCK_TIME_SECS, DEFAULT_BTC_FINAL_TLC_EXPIRY_DELTA_BLOCKS,
DEFAULT_CKB_FINAL_TLC_EXPIRY_DELTA_SECONDS, and per-test scenario consts
in CCH actor tests.

Made-with: Cursor

---------

Co-authored-by: ian <ian@cryptape.com>

* Local RPC method not found should not return unauthorized (#1235)

* chore: bump version to v0.8.1 (#1274)

---------

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
Co-authored-by: Yukang <moorekang@gmail.com>
Co-authored-by: Officeyutong <yt.xyxx@gmail.com>
Co-authored-by: Eval Exec <execvy@gmail.com>
Co-authored-by: gpBlockchain <32102187+gpBlockchain@users.noreply.github.com>
Co-authored-by: copilot-swe-agent[bot] <198982749+Copilot@users.noreply.github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
Co-authored-by: sunchengzhu <36075573+sunchengzhu@users.noreply.github.com>
Co-authored-by: ian <me@iany.me>
Co-authored-by: ian <ian@cryptape.com>
Co-authored-by: swananan <jt26wzz@gmail.com>
Co-authored-by: quake <8990+quake@users.noreply.github.com>
Co-authored-by: jjy <jjyruby@gmail.com>
@quake quake deleted the agent/quake-opencode-wsl/8ee5820a branch April 15, 2026 23:42
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Failed to connect to wss node using pubkey

5 participants