Skip to content

[v0.20.x-branch] Backport #10540: discovery: fix gossiper shutdown deadlock#10554

Merged
yyforyongyu merged 2 commits intov0.20.x-branchfrom
backport-10540-to-v0.20.x-branch
Feb 6, 2026
Merged

[v0.20.x-branch] Backport #10540: discovery: fix gossiper shutdown deadlock#10554
yyforyongyu merged 2 commits intov0.20.x-branchfrom
backport-10540-to-v0.20.x-branch

Conversation

@github-actions
Copy link

@github-actions github-actions bot commented Feb 5, 2026

Backport of #10540


When processing a remote network announcement, it is possible for two error messages to be sent back on the errChan. Since Brontide doesn't actually read from errChan, and since errChan only buffered one error message, the sending goroutine would deadlock forever. This would only become apparent when the gossiper attempted to shut down and got hung up.

For now, we can fix this simply by buffering up to two error messages on errChan. There is an existing TODO to restructure this logic entirely to use the actor model, and we can do a more thorough fix as part of that work.

This bug was discovered while doing full node fuzz testing and was triggered by sending a specific channel_announcement message and then shutting down LND.

When processing a remote network announcement, it is possible for two
error messages to be sent back on the errChan.  Since Brontide doesn't
actually read from errChan, and since errChan only buffered one error
message, the sending goroutine would deadlock forever.  This would only
become apparent when the gossiper attempted to shut down and got hung
up.

For now, we can fix this simply by buffering up to two error messages on
errChan.  There is an existing TODO to restructure this logic entirely
to use the actor model, and we can do a more thorough fix as part of
that work.

This bug was discovered while doing full node fuzz testing and was
triggered by sending a specific channel_announcement message and then
shutting down LND.

(cherry picked from commit 21588ac)
@github-actions github-actions bot added this to the v0.21.0 milestone Feb 5, 2026
@github-actions
Copy link
Author

github-actions bot commented Feb 5, 2026

Please cherry-pick the changes locally and resolve any conflicts.

git fetch origin backport-10540-to-v0.20.x-branch
git worktree add --checkout .worktree/backport-10540-to-v0.20.x-branch backport-10540-to-v0.20.x-branch
cd .worktree/backport-10540-to-v0.20.x-branch
git reset --hard HEAD^
git cherry-pick -x 5fa025e9e7a180de0a81750757fcc20eb37b28a9
git push --force-with-lease

@Roasbeef Roasbeef force-pushed the backport-10540-to-v0.20.x-branch branch from 0d4fc6e to f427fee Compare February 5, 2026 01:17
@lightninglabs-deploy
Copy link
Collaborator

🟠 PR Severity: HIGH

Backport bugfix | 1 functional file | 9 lines changed

🟠 High (1 file)
  • discovery/gossiper.go - Gossip protocol shutdown deadlock fix
🟡 Medium (0 files)

None

🟢 Low (1 file)
  • docs/release-notes/release-notes-0.20.1.md - Release notes update

Analysis

This PR backports #10540 which fixes a shutdown deadlock in the gossiper. The primary change is in discovery/gossiper.go, which is part of the gossip protocol subsystem classified as HIGH severity.

Key observations:

  • Focused bugfix: Only 9 lines changed in the core functional file
  • Shutdown safety: Addresses a deadlock condition during node shutdown
  • Test coverage: Includes comprehensive test case to verify the fix
  • Backport: This is a backport to the v0.20.x branch, indicating the fix has already been reviewed in master

The HIGH severity is appropriate because the gossiper is responsible for network topology propagation, and while this is a shutdown-specific bug rather than affecting normal operation, changes to concurrency control in the discovery package warrant careful review by an engineer familiar with the gossip protocol's threading model.


To override, add a severity-override-{critical,high,medium,low} label.

@Roasbeef Roasbeef marked this pull request as ready for review February 5, 2026 01:22
@Roasbeef
Copy link
Member

Roasbeef commented Feb 5, 2026

Fixed the conflict manually.

@saubyk saubyk modified the milestones: v0.21.0, v0.20.1 Feb 5, 2026
@gijswijs gijswijs self-requested a review February 5, 2026 21:04
Copy link
Collaborator

@gijswijs gijswijs left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Confirmed that the backport is the same fix that's already in master, so:
LGTM! 🎉

@yyforyongyu yyforyongyu merged commit a4f375f into v0.20.x-branch Feb 6, 2026
40 checks passed
@yyforyongyu yyforyongyu deleted the backport-10540-to-v0.20.x-branch branch February 6, 2026 02:15
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants