Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Use a total lockorder for NetworkGraph's PartialEq impl #2284

Merged

Conversation

dunxen
Copy link
Contributor

@dunxen dunxen commented May 9, 2023

NetworkGraph's PartialEq impl before this commit was deadlock-prone. Similarly to ChannelMonitor's, PartialEq impl, we use position in memory for a total lockorder. This uses the assumption that the objects cannot move within memory while the inner locks are held.

Fixes #2190

@dunxen dunxen marked this pull request as draft May 9, 2023 18:08
@TheBlueMatt
Copy link
Collaborator

Grr, the lockorder stuff is confused by the fact that the two networkgraph locks are both constructed on different lines (network graph read vs network graph new). Can you add a stupid hacky wrapper util function that, given the networkgraph fields, actually constructs the mutexes and NetworkGraph object?

@wpaulino
Copy link
Contributor

wpaulino commented May 9, 2023

Grr, the lockorder stuff is confused by the fact that the two networkgraph locks are both constructed on different lines (network graph read vs network graph new).

Haven't looked at it closely, but wouldn't 2166c8a have prevented this?

@TheBlueMatt
Copy link
Collaborator

Ah, right, its currently barfing cause it thinks they're different lock-take lines just because there's 4 lines where locks are taken based on the pointer comparison. We could fix it simpler by, instead of taking the locks in the if, just returning a reference to the mutex itself in the ifs and then taking the locks on the same line.

@dunxen dunxen force-pushed the 2023-05-netgraphpartialeqtotallock branch from 3572b6d to 4f86dcf Compare May 23, 2023 14:14
@codecov-commenter
Copy link

Codecov Report

Patch coverage: 100.00% and project coverage change: +0.13 🎉

Comparison is base (6775b95) 90.81% compared to head (3572b6d) 90.94%.

❗ Current head 3572b6d differs from pull request most recent head 4f86dcf. Consider uploading reports for the commit 4f86dcf to get more accurate results

❗ Your organization is not using the GitHub App Integration. As a result you may experience degraded service beginning May 15th. Please install the Github App Integration for your organization. Read more.

Additional details and impacted files
@@            Coverage Diff             @@
##             main    #2284      +/-   ##
==========================================
+ Coverage   90.81%   90.94%   +0.13%     
==========================================
  Files         104      104              
  Lines       53009    52752     -257     
  Branches    53009    52752     -257     
==========================================
- Hits        48139    47975     -164     
+ Misses       4870     4777      -93     
Impacted Files Coverage Δ
lightning/src/routing/gossip.rs 89.81% <100.00%> (+0.03%) ⬆️

... and 31 files with indirect coverage changes

☔ View full report in Codecov by Sentry.
📢 Do you have feedback about the report comment? Let us know in this issue.

@dunxen dunxen marked this pull request as ready for review May 23, 2023 14:40
Copy link
Contributor

@wpaulino wpaulino left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we can save some horizontal space with the following diff and still achieve the same end result:

diff --git a/lightning/src/routing/gossip.rs b/lightning/src/routing/gossip.rs
index a7d378ae..f134bd05 100644
--- a/lightning/src/routing/gossip.rs
+++ b/lightning/src/routing/gossip.rs
@@ -1330,11 +1330,11 @@ impl<L: Deref> PartialEq for NetworkGraph<L> where L::Target: Logger {
 		// For a total lockorder, sort by position in memory and take the inner locks in that order.
 		// (Assumes that we can't move within memory while a lock is held).
 		let ord = ((self as *const _) as usize) < ((other as *const _) as usize);
-		let (genesis_a, channels_a, nodes_a) = if ord { (self.genesis_hash, &self.channels, &self.nodes) } else { (other.genesis_hash, &other.channels, &other.nodes) };
-		let (genesis_b, channels_b, nodes_b) = if ord { (other.genesis_hash, &other.channels, &other.nodes) } else { (self.genesis_hash, &self.channels, &self.nodes) };
-
-		let (channels_a, channels_b, nodes_a, nodes_b) = (channels_a.unsafe_well_ordered_double_lock_self(), channels_b.unsafe_well_ordered_double_lock_self(), nodes_a.unsafe_well_ordered_double_lock_self(), nodes_b.unsafe_well_ordered_double_lock_self());
-		genesis_a.eq(&genesis_b) && channels_a.eq(&channels_b) && nodes_a.eq(&nodes_b)
+		let a = if ord { (&self.channels, &self.nodes) } else { (&other.channels, &other.nodes) };
+		let b = if ord { (&other.channels, &other.nodes) } else { (&self.channels, &self.nodes) };
+		let (channels_a, channels_b) = (a.0.unsafe_well_ordered_double_lock_self(), b.0.unsafe_well_ordered_double_lock_self());
+		let (nodes_a, nodes_b) = (a.1.unsafe_well_ordered_double_lock_self(), b.1.unsafe_well_ordered_double_lock_self());
+		self.genesis_hash.eq(&other.genesis_hash) && channels_a.eq(&channels_b) && nodes_a.eq(&nodes_b)
 	}
 }
 

`NetworkGraph`'s `PartialEq` impl before this commit was deadlock-prone.
Similarly to `ChannelMonitor`'s, `PartialEq` impl, we use position in
memory for a total lockorder. This uses the assumption that the objects
cannot move within memory while the inner locks are held.
@dunxen dunxen force-pushed the 2023-05-netgraphpartialeqtotallock branch from 4f86dcf to 6418a86 Compare May 23, 2023 20:51
@dunxen
Copy link
Contributor Author

dunxen commented May 23, 2023

I think we can save some horizontal space with the following diff and still achieve the same end result:

Thanks, much better! The debug_sync stuff is still a black box to me and didn't want to upset it lol.

@wpaulino
Copy link
Contributor

CI failed on routing::gossip::tests::handling_channel_announcements, which is a known flake.

@TheBlueMatt TheBlueMatt merged commit 7f3701a into lightningdevkit:main May 24, 2023
14 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

NetworkGraph PartialEq impl is deadlock-prone
4 participants