Persist NetworkGraph on removal of stale channels #1376

jurvis · 2022-03-22T03:24:33Z

This PR addresses issue #1191. I borrowed mostly the same ideas from persisting ChannelManager, except instead of throwing an exception on error, I do a log_error! instead to mirror the non-terminal behavior in ldk-sample.

I'm also not sure if we want to be calling persist_network_graph at the same time as persist_manager, but I couldn't figure out how else to test it, since the prune block only gets called after 60 seconds. Let me know if there are any ideas here.

codecov-commenter · 2022-03-22T03:51:40Z

Codecov Report

Merging #1376 (df2e60d) into main (c244c78) will increase coverage by 0.06%.
The diff coverage is 96.82%.

@@            Coverage Diff             @@
##             main    #1376      +/-   ##
==========================================
+ Coverage   90.73%   90.80%   +0.06%     
==========================================
  Files          73       73              
  Lines       40808    41241     +433     
  Branches    40808    41241     +433     
==========================================
+ Hits        37027    37447     +420     
- Misses       3781     3794      +13

Impacted Files	Coverage Δ
lightning-background-processor/src/lib.rs	`95.20% <96.42%> (+2.09%)`	⬆️
lightning-persister/src/lib.rs	`93.93% <100.00%> (+0.33%)`	⬆️
lightning/src/util/config.rs	`45.83% <0.00%> (-0.98%)`	⬇️
lightning/src/ln/channel.rs	`88.29% <0.00%> (-0.92%)`	⬇️
lightning-net-tokio/src/lib.rs	`75.88% <0.00%> (-0.81%)`	⬇️
lightning/src/debug_sync.rs	`94.79% <0.00%> (-0.27%)`	⬇️
lightning/src/routing/scoring.rs	`94.04% <0.00%> (-0.26%)`	⬇️
lightning-invoice/src/de.rs	`81.06% <0.00%> (-0.21%)`	⬇️
lightning/src/ln/channelmanager.rs	`84.74% <0.00%> (-0.04%)`	⬇️
lightning/src/ln/functional_tests.rs	`97.06% <0.00%> (-0.03%)`	⬇️
... and 16 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update c244c78...df2e60d. Read the comment docs.

TheBlueMatt

Thanks! Looks pretty good at first glance.

TheBlueMatt · 2022-03-22T04:14:23Z

lightning-background-processor/src/lib.rs

@@ -277,6 +303,9 @@ impl BackgroundProcessor {
 					if let Some(ref handler) = net_graph_msg_handler {
 						log_trace!(logger, "Pruning network graph of stale entries");
 						handler.network_graph().remove_stale_channels();
+						if network_graph_persister.persist_graph(handler.network_graph()).is_err() {
+							log_warn!(logger, "Warning: Failed to persist network graph, check your disk and permissions");


Probably log_error, I think, instead.

TheBlueMatt · 2022-03-22T04:15:09Z

lightning-background-processor/src/lib.rs

-					persister.persist_manager(&*channel_manager)?;
+					channel_manager_persister.persist_manager(&*channel_manager)?;
+					if let Some(ref handler) = net_graph_msg_handler {
+						if network_graph_persister.persist_graph(handler.network_graph()).is_err() {


We definitely don't need to persist the graph here, otherwise we'll be doing it every time we have any HTLC/commitment updates.

that's actually related to the question I had in the PR description... I added it here in order to test if NetworkGraph persists without waiting for 60 seconds for the prune block to run.

I was wondering if you had any ideas on that? 🤔

You're welcome to const-ify that 60 and then use #[cfg(test)] to have a different value in test, eg the way PING_TIMER and FRESHNESS_TIMER are done.

@TheBlueMatt ah, got it! didn't see that. Thanks! 😄

lightning-background-processor/src/lib.rs

TheBlueMatt

A handful of rather trivial comments, but largely looks great, thanks!

lightning-background-processor/src/lib.rs

TheBlueMatt · 2022-03-23T18:18:11Z

lightning-background-processor/src/lib.rs

+	/// Unlike, [`persist_manager`], this will not cause [`BackgroundProcessor`] to exit.
+	/// 
+	/// [`NetworkGraph`]: lightning::routing::network_graph::NetworkGraph
+	/// [`BackgroundProcessor`]: lightning-background-processor::BackgroundProcessor


You should be able to drop this, no? By default the docs stuff will resolve any symbols which are resolvable locally.

got it -- this is my first time working with rustdoc works so you're probably right 😄

TheBlueMatt · 2022-03-23T18:18:17Z

lightning-background-processor/src/lib.rs

+	/// 
+	/// [`NetworkGraph`]: lightning::routing::network_graph::NetworkGraph
+	/// [`BackgroundProcessor`]: lightning-background-processor::BackgroundProcessor
+	/// [`persist_manager`]: lightning-background-processor::Persister::persist_manager


note you shouldn't need the full crate reference here.

TheBlueMatt · 2022-03-23T18:18:42Z

lightning-background-processor/src/lib.rs

 ///
 /// [`ChannelManager`]: lightning::ln::channelmanager::ChannelManager
-pub trait ChannelManagerPersister<Signer: Sign, M: Deref, T: Deref, K: Deref, F: Deref, L: Deref>
+/// [`NetworkGraph`]: lightning::routing::network_graph::NetworkGraph 


here and in a number of places you have EOL whitespace. A local git show should highlight them based on your terminal settings.

TheBlueMatt · 2022-03-23T18:19:15Z

lightning-background-processor/src/lib.rs

@@ -150,6 +150,11 @@ impl BackgroundProcessor {
 	/// uploading to one or more backup services. See [`ChannelManager::write`] for writing out a
 	/// [`ChannelManager`]. See [`FilesystemPersister::persist_manager`] for Rust-Lightning's
 	/// provided implementation.
+	/// 
+	/// `persist_graph` is responsible for writing out the [`NetworkGraph`] to disk, and/or
+	/// uploading to one or more backup services. See [`ChannelManager::write`] for writing out a


Don't think we want to suggest people back up the network graph, its all public data.

TheBlueMatt · 2022-03-23T18:20:02Z

lightning-background-processor/src/lib.rs

+			L::Target: 'static + Logger,
+		{
+			fn persist_manager(&self, _channel_manager: &ChannelManager<Signer, M, T, K, F, L>) -> Result<(), std::io::Error> {
+				Err(std::io::Error::new(std::io::ErrorKind::Other, "test"))


In tests, instead of erroring, we should panic to ensure the test fails.

hm, this test is actually for testing if channel_manager persistence fails. I didn't change the original implementation of persist_manager in tests.

Right now it checks and panics if there isn't an error.

actually that makes me wonder, and I apologize if this is a dumb question: since we only do log_error! instead of throwing it when persisting a graph fails, how do we test that behavior? do I use the testing_logger crate?

Oops lol sorry.

how do we test that behavior?

Our TestLogger struct has a few assert_log* functions that allow you to assert a regex or specific string was logged. We don't generally worry too much about ensuring particular log entries are printed, though.

@TheBlueMatt gotcha -- in that case, I created a test that just does the same error checks as what we do for persisting channel_manager 😄

TheBlueMatt

Oh, it'd be great to persist the graph on exit like we do the channelmanager as well.

TheBlueMatt · 2022-03-25T02:17:15Z

lightning-background-processor/src/lib.rs

-	fn persist_manager(&self, channel_manager: &ChannelManager<Signer, M, T, K, F, L>) -> Result<(), std::io::Error> {
-		self(channel_manager)
-	}
+	/// Persist the given [`NetworkGraph`] to disk, logging an error if persistence failed.


The method is expected to return an error, not log it. The caller logs the error. I think in general method docs should describe the method's behavior, not the caller's.

TheBlueMatt · 2022-03-25T02:18:32Z

lightning-background-processor/src/lib.rs

 						last_prune_call = Instant::now();
 						have_pruned = true;
 					}
 				}
 			}
+
+			// Persist NetworkGraph on exit
+			if let Some(ref handler) = net_graph_msg_handler {


Lets do this after the channel manager - the manager is much more important, and if the network graph fails to be persisted cause the user kills the process during shutdown its not a big deal.

TheBlueMatt

LGTM. Can you squash the commits down into one or two commits that stand alone without later fixups?

jurvis · 2022-03-27T23:12:57Z

squashed

jkczyz

Looks pretty good. Some of my comments predate your change but would be a good opportunity to fix now.

jkczyz · 2022-03-28T20:46:44Z

lightning-background-processor/src/lib.rs

+#[cfg(test)]
+const FIRST_NETWORK_PRUNE_TIMER: u64 = 1;
+
+/// Trait which handles persisting a [`ChannelManager`] and [`NetworkGraph`] to disk.


nit: s/which/that

jkczyz · 2022-03-28T20:51:30Z

lightning-background-processor/src/lib.rs

 /// [`ChannelManager`]: lightning::ln::channelmanager::ChannelManager
-pub trait ChannelManagerPersister<Signer: Sign, M: Deref, T: Deref, K: Deref, F: Deref, L: Deref>
+/// [`NetworkGraph`]: lightning::routing::network_graph::NetworkGraph


These mappings can be removed (here and below) since both structs are imported. Can verify by removing and running cargo doc -p lightning-background-processor

jkczyz · 2022-03-28T20:52:50Z

lightning-background-processor/src/lib.rs

@@ -87,24 +93,15 @@ where
 	L::Target: 'static + Logger,
 {
 	/// Persist the given [`ChannelManager`] to disk, returning an error if persistence failed
-	/// (which will cause the [`BackgroundProcessor`] which called this method to exit.
+	/// (which will cause the [`BackgroundProcessor`] which called this method to exit.)


nit: period after parenthesis.

jkczyz · 2022-03-28T20:58:10Z

lightning-background-processor/src/lib.rs

 					if let Some(ref handler) = net_graph_msg_handler {
 						log_trace!(logger, "Pruning network graph of stale entries");
 						handler.network_graph().remove_stale_channels();
+						if persister.persist_graph(handler.network_graph()).is_err() {
+							log_error!(logger, "Warning: Failed to persist network graph, check your disk and permissions");


Should we say "Error:" given the logging level used?

jkczyz · 2022-03-28T21:00:21Z

lightning-background-processor/src/lib.rs

 					if let Some(ref handler) = net_graph_msg_handler {
 						log_trace!(logger, "Pruning network graph of stale entries");
 						handler.network_graph().remove_stale_channels();
+						if persister.persist_graph(handler.network_graph()).is_err() {
+							log_error!(logger, "Warning: Failed to persist network graph, check your disk and permissions");


Could we include the error in the log?

jkczyz · 2022-03-28T21:20:55Z

lightning-background-processor/src/lib.rs

@@ -570,6 +609,14 @@ mod tests {
 			if !nodes[0].node.get_persistence_condvar_value() { break }
 		}

+		// Check network graph is persisted
+		let filepath = get_full_filepath("test_background_processor_persister_0".to_string(), "network_graph".to_string());
+		let mut expected_bytes = Vec::new();


Having expected_bytes here is a bit confusing. Could you move it directly into check_persisted_data!?

jkczyz · 2022-03-28T21:29:13Z

lightning-background-processor/src/lib.rs

+	fn test_network_graph_persist_error() {
+		// Test that if we encounter an error during network graph persistence, an error gets returned.
+		let nodes = create_nodes(2, "test_persist_network_graph_error".to_string());
+		open_channel!(nodes[0], nodes[1], 100000);


Looks like this line is not necessary.

jkczyz · 2022-03-28T22:48:23Z

lightning-background-processor/src/lib.rs

@@ -402,6 +419,27 @@ mod tests {
 		}
 	}

+	#[derive(Clone)]


Instead of deriving Clone, create a new Persister where needed and clone the data dir to pass to it.

jkczyz · 2022-03-28T22:59:22Z

lightning-background-processor/src/lib.rs

 		// Test that if we encounter an error during manager persistence, the thread panics.
 		let nodes = create_nodes(2, "test_persist_error".to_string());
 		open_channel!(nodes[0], nodes[1], 100000);

-		let persister = |_: &_| Err(std::io::Error::new(std::io::ErrorKind::Other, "test"));
+		struct ChannelManagerErrorPersister {


To avoid the duplication and boilerplate needed for the specialized persisters, you can modify the standard one you provided earlier as follows:

struct Persister { data_dir: String, graph_error: Option<(std::io::ErrorKind, &'static str)>, } impl Persister { fn new(data_dir: String) -> Self { Self { data_dir, graph_error: None } } fn with_graph_error(self, error: std::io::ErrorKind, message: &'static str) -> Self { Self { graph_error: Some((error, message)), ..self } } } // ... fn persist_graph(&self, network_graph: &NetworkGraph) -> Result<(), std::io::Error> { match self.graph_error { None => FilesystemPersister::persist_network_graph(self.data_dir.clone(), network_graph), Some((error, message)) => Err(std::io::Error::new(error, message)), } }

And then create one as:

let persister = Persister::new(data_dir).with_graph_error(std::io::ErrorKind::Other, "test");

This will make the tests easier to read as they'll be more concise. You can do something similar for ChannelManager persistence errors.

this looks really slick, will do this. thanks!

jkczyz · 2022-03-28T23:11:40Z

lightning-background-processor/src/lib.rs

@@ -151,7 +148,12 @@ impl BackgroundProcessor {
 	/// [`ChannelManager`]. See [`FilesystemPersister::persist_manager`] for Rust-Lightning's
 	/// provided implementation.
 	///
-	/// Typically, users should either implement [`ChannelManagerPersister`] to never return an
+	/// `persist_graph` is responsible for writing out the [`NetworkGraph`] to disk. See


In the paragraphs above, I think the reference to persist_manager was at one point referring to a parameter to start but has been renamed a few times since. We should update the docs accordingly and use similar wording in this paragraph for persist_graph now that these are methods on persister.

jurvis · 2022-03-29T03:02:48Z

thanks for the review @jkczyz! I made the changes and broke them up by commit. I'm a little unsure on what else needs to be done to the docs beyond explicitly saying that persist_manager is now called via an implementation of the Persister trait. Let me know if there is any specific language you will prefer to add 😄

jkczyz

All looks great! Can squash again once @TheBlueMatt is good.

TheBlueMatt · 2022-03-29T17:19:05Z

LGTM! Please squash the commits down into logically consistent commits without fixups in later commits and this should be good to go.

Instead of creating a separate trait for persisting NetworkGraph, use and rename the existing ChannelManagerPersister to handle them both. persist_graph is then called on removal of stale channels and on exit.

jurvis · 2022-03-30T02:39:27Z

@TheBlueMatt @jkczyz squashed. hope the way I organized the commits makes sense.

jurvis force-pushed the jurvis/persist-networkgraph branch from b7f1396 to cc88533 Compare March 22, 2022 03:27

TheBlueMatt reviewed Mar 22, 2022

View reviewed changes

jurvis marked this pull request as ready for review March 23, 2022 17:46

TheBlueMatt reviewed Mar 23, 2022

View reviewed changes

TheBlueMatt added the Seeking Code Review label Mar 23, 2022

TheBlueMatt reviewed Mar 23, 2022

View reviewed changes

jurvis force-pushed the jurvis/persist-networkgraph branch 3 times, most recently from c0f6a43 to 3685a6c Compare March 24, 2022 21:37

TheBlueMatt reviewed Mar 25, 2022

View reviewed changes

TheBlueMatt reviewed Mar 27, 2022

View reviewed changes

jurvis force-pushed the jurvis/persist-networkgraph branch from 7621e4b to 1670e90 Compare March 27, 2022 23:12

jkczyz reviewed Mar 28, 2022

View reviewed changes

jkczyz previously approved these changes Mar 29, 2022

View reviewed changes

TheBlueMatt removed the Seeking Code Review label Mar 29, 2022

jurvis dismissed jkczyz’s stale review via de4cb40 March 30, 2022 02:36

jurvis force-pushed the jurvis/persist-networkgraph branch from 06a2b81 to de4cb40 Compare March 30, 2022 02:36

jurvis added 3 commits March 29, 2022 19:38

Add NetworkGraph persistence

afb7aa8

Instead of creating a separate trait for persisting NetworkGraph, use and rename the existing ChannelManagerPersister to handle them both. persist_graph is then called on removal of stale channels and on exit.

Move expected_bytes to check_persisted_data! macro

6ebc739

Use common Persister for persistence tests

df2e60d

jurvis force-pushed the jurvis/persist-networkgraph branch from de4cb40 to df2e60d Compare March 30, 2022 02:38

TheBlueMatt approved these changes Mar 30, 2022

View reviewed changes

TheBlueMatt assigned jkczyz Mar 30, 2022

jkczyz approved these changes Mar 30, 2022

View reviewed changes

jkczyz merged commit aeeafed into lightningdevkit:main Mar 30, 2022

jkczyz added a commit to jkczyz/rust-lightning that referenced this pull request Apr 1, 2022

f - Address feedback and add lightningdevkit#1376

7c9eb67

jurvis mentioned this pull request Apr 12, 2022

Add utils to persist scorer in BackgroundProcessor #1416

Merged

tnull mentioned this pull request Jun 28, 2022

Provide Utilities to Persist NetworkGraph and Scorer #1191

Closed

Persist NetworkGraph on removal of stale channels #1376

Persist NetworkGraph on removal of stale channels #1376

Conversation

jurvis commented Mar 22, 2022 • edited

codecov-commenter commented Mar 22, 2022 • edited

Codecov Report

TheBlueMatt left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

jurvis Mar 22, 2022 • edited

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

TheBlueMatt left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

TheBlueMatt left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

TheBlueMatt left a comment

Choose a reason for hiding this comment

jurvis commented Mar 27, 2022

jkczyz left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

jurvis commented Mar 29, 2022

jkczyz left a comment

Choose a reason for hiding this comment

TheBlueMatt commented Mar 29, 2022

jurvis commented Mar 30, 2022

jurvis commented Mar 22, 2022 •

edited

codecov-commenter commented Mar 22, 2022 •

edited

jurvis Mar 22, 2022 •

edited