Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Persist NetworkGraph on removal of stale channels #1376

Merged
merged 3 commits into from
Mar 30, 2022

Conversation

jurvis
Copy link
Contributor

@jurvis jurvis commented Mar 22, 2022

This PR addresses issue #1191. I borrowed mostly the same ideas from persisting ChannelManager, except instead of throwing an exception on error, I do a log_error! instead to mirror the non-terminal behavior in ldk-sample.

I'm also not sure if we want to be calling persist_network_graph at the same time as persist_manager, but I couldn't figure out how else to test it, since the prune block only gets called after 60 seconds. Let me know if there are any ideas here.

@jurvis jurvis force-pushed the jurvis/persist-networkgraph branch from b7f1396 to cc88533 Compare March 22, 2022 03:27
@codecov-commenter
Copy link

codecov-commenter commented Mar 22, 2022

Codecov Report

Merging #1376 (df2e60d) into main (c244c78) will increase coverage by 0.06%.
The diff coverage is 96.82%.

@@            Coverage Diff             @@
##             main    #1376      +/-   ##
==========================================
+ Coverage   90.73%   90.80%   +0.06%     
==========================================
  Files          73       73              
  Lines       40808    41241     +433     
  Branches    40808    41241     +433     
==========================================
+ Hits        37027    37447     +420     
- Misses       3781     3794      +13     
Impacted Files Coverage Δ
lightning-background-processor/src/lib.rs 95.20% <96.42%> (+2.09%) ⬆️
lightning-persister/src/lib.rs 93.93% <100.00%> (+0.33%) ⬆️
lightning/src/util/config.rs 45.83% <0.00%> (-0.98%) ⬇️
lightning/src/ln/channel.rs 88.29% <0.00%> (-0.92%) ⬇️
lightning-net-tokio/src/lib.rs 75.88% <0.00%> (-0.81%) ⬇️
lightning/src/debug_sync.rs 94.79% <0.00%> (-0.27%) ⬇️
lightning/src/routing/scoring.rs 94.04% <0.00%> (-0.26%) ⬇️
lightning-invoice/src/de.rs 81.06% <0.00%> (-0.21%) ⬇️
lightning/src/ln/channelmanager.rs 84.74% <0.00%> (-0.04%) ⬇️
lightning/src/ln/functional_tests.rs 97.06% <0.00%> (-0.03%) ⬇️
... and 16 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update c244c78...df2e60d. Read the comment docs.

Copy link
Collaborator

@TheBlueMatt TheBlueMatt left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks! Looks pretty good at first glance.

@@ -277,6 +303,9 @@ impl BackgroundProcessor {
if let Some(ref handler) = net_graph_msg_handler {
log_trace!(logger, "Pruning network graph of stale entries");
handler.network_graph().remove_stale_channels();
if network_graph_persister.persist_graph(handler.network_graph()).is_err() {
log_warn!(logger, "Warning: Failed to persist network graph, check your disk and permissions");
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Probably log_error, I think, instead.

persister.persist_manager(&*channel_manager)?;
channel_manager_persister.persist_manager(&*channel_manager)?;
if let Some(ref handler) = net_graph_msg_handler {
if network_graph_persister.persist_graph(handler.network_graph()).is_err() {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We definitely don't need to persist the graph here, otherwise we'll be doing it every time we have any HTLC/commitment updates.

Copy link
Contributor Author

@jurvis jurvis Mar 22, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

that's actually related to the question I had in the PR description... I added it here in order to test if NetworkGraph persists without waiting for 60 seconds for the prune block to run.

I was wondering if you had any ideas on that? 🤔

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You're welcome to const-ify that 60 and then use #[cfg(test)] to have a different value in test, eg the way PING_TIMER and FRESHNESS_TIMER are done.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@TheBlueMatt ah, got it! didn't see that. Thanks! 😄

lightning-background-processor/src/lib.rs Show resolved Hide resolved
@jurvis jurvis marked this pull request as ready for review March 23, 2022 17:46
Copy link
Collaborator

@TheBlueMatt TheBlueMatt left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A handful of rather trivial comments, but largely looks great, thanks!

lightning-background-processor/src/lib.rs Outdated Show resolved Hide resolved
/// Unlike, [`persist_manager`], this will not cause [`BackgroundProcessor`] to exit.
///
/// [`NetworkGraph`]: lightning::routing::network_graph::NetworkGraph
/// [`BackgroundProcessor`]: lightning-background-processor::BackgroundProcessor
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You should be able to drop this, no? By default the docs stuff will resolve any symbols which are resolvable locally.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

got it -- this is my first time working with rustdoc works so you're probably right 😄

///
/// [`NetworkGraph`]: lightning::routing::network_graph::NetworkGraph
/// [`BackgroundProcessor`]: lightning-background-processor::BackgroundProcessor
/// [`persist_manager`]: lightning-background-processor::Persister::persist_manager
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

note you shouldn't need the full crate reference here.

///
/// [`ChannelManager`]: lightning::ln::channelmanager::ChannelManager
pub trait ChannelManagerPersister<Signer: Sign, M: Deref, T: Deref, K: Deref, F: Deref, L: Deref>
/// [`NetworkGraph`]: lightning::routing::network_graph::NetworkGraph
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

here and in a number of places you have EOL whitespace. A local git show should highlight them based on your terminal settings.

@@ -150,6 +150,11 @@ impl BackgroundProcessor {
/// uploading to one or more backup services. See [`ChannelManager::write`] for writing out a
/// [`ChannelManager`]. See [`FilesystemPersister::persist_manager`] for Rust-Lightning's
/// provided implementation.
///
/// `persist_graph` is responsible for writing out the [`NetworkGraph`] to disk, and/or
/// uploading to one or more backup services. See [`ChannelManager::write`] for writing out a
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Don't think we want to suggest people back up the network graph, its all public data.

L::Target: 'static + Logger,
{
fn persist_manager(&self, _channel_manager: &ChannelManager<Signer, M, T, K, F, L>) -> Result<(), std::io::Error> {
Err(std::io::Error::new(std::io::ErrorKind::Other, "test"))
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In tests, instead of erroring, we should panic to ensure the test fails.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

hm, this test is actually for testing if channel_manager persistence fails. I didn't change the original implementation of persist_manager in tests.

Right now it checks and panics if there isn't an error.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

actually that makes me wonder, and I apologize if this is a dumb question: since we only do log_error! instead of throwing it when persisting a graph fails, how do we test that behavior? do I use the testing_logger crate?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oops lol sorry.

how do we test that behavior?

Our TestLogger struct has a few assert_log* functions that allow you to assert a regex or specific string was logged. We don't generally worry too much about ensuring particular log entries are printed, though.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@TheBlueMatt gotcha -- in that case, I created a test that just does the same error checks as what we do for persisting channel_manager 😄

Copy link
Collaborator

@TheBlueMatt TheBlueMatt left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh, it'd be great to persist the graph on exit like we do the channelmanager as well.

@jurvis jurvis force-pushed the jurvis/persist-networkgraph branch 3 times, most recently from c0f6a43 to 3685a6c Compare March 24, 2022 21:37
fn persist_manager(&self, channel_manager: &ChannelManager<Signer, M, T, K, F, L>) -> Result<(), std::io::Error> {
self(channel_manager)
}
/// Persist the given [`NetworkGraph`] to disk, logging an error if persistence failed.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The method is expected to return an error, not log it. The caller logs the error. I think in general method docs should describe the method's behavior, not the caller's.

last_prune_call = Instant::now();
have_pruned = true;
}
}
}

// Persist NetworkGraph on exit
if let Some(ref handler) = net_graph_msg_handler {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Lets do this after the channel manager - the manager is much more important, and if the network graph fails to be persisted cause the user kills the process during shutdown its not a big deal.

Copy link
Collaborator

@TheBlueMatt TheBlueMatt left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. Can you squash the commits down into one or two commits that stand alone without later fixups?

@jurvis jurvis force-pushed the jurvis/persist-networkgraph branch from 7621e4b to 1670e90 Compare March 27, 2022 23:12
@jurvis
Copy link
Contributor Author

jurvis commented Mar 27, 2022

squashed

Copy link
Contributor

@jkczyz jkczyz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks pretty good. Some of my comments predate your change but would be a good opportunity to fix now.

#[cfg(test)]
const FIRST_NETWORK_PRUNE_TIMER: u64 = 1;

/// Trait which handles persisting a [`ChannelManager`] and [`NetworkGraph`] to disk.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: s/which/that

Comment on lines 85 to 86
/// [`ChannelManager`]: lightning::ln::channelmanager::ChannelManager
pub trait ChannelManagerPersister<Signer: Sign, M: Deref, T: Deref, K: Deref, F: Deref, L: Deref>
/// [`NetworkGraph`]: lightning::routing::network_graph::NetworkGraph
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These mappings can be removed (here and below) since both structs are imported. Can verify by removing and running cargo doc -p lightning-background-processor

@@ -87,24 +93,15 @@ where
L::Target: 'static + Logger,
{
/// Persist the given [`ChannelManager`] to disk, returning an error if persistence failed
/// (which will cause the [`BackgroundProcessor`] which called this method to exit.
/// (which will cause the [`BackgroundProcessor`] which called this method to exit.)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: period after parenthesis.

if let Some(ref handler) = net_graph_msg_handler {
log_trace!(logger, "Pruning network graph of stale entries");
handler.network_graph().remove_stale_channels();
if persister.persist_graph(handler.network_graph()).is_err() {
log_error!(logger, "Warning: Failed to persist network graph, check your disk and permissions");
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we say "Error:" given the logging level used?

if let Some(ref handler) = net_graph_msg_handler {
log_trace!(logger, "Pruning network graph of stale entries");
handler.network_graph().remove_stale_channels();
if persister.persist_graph(handler.network_graph()).is_err() {
log_error!(logger, "Warning: Failed to persist network graph, check your disk and permissions");
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could we include the error in the log?

@@ -570,6 +609,14 @@ mod tests {
if !nodes[0].node.get_persistence_condvar_value() { break }
}

// Check network graph is persisted
let filepath = get_full_filepath("test_background_processor_persister_0".to_string(), "network_graph".to_string());
let mut expected_bytes = Vec::new();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Having expected_bytes here is a bit confusing. Could you move it directly into check_persisted_data!?

fn test_network_graph_persist_error() {
// Test that if we encounter an error during network graph persistence, an error gets returned.
let nodes = create_nodes(2, "test_persist_network_graph_error".to_string());
open_channel!(nodes[0], nodes[1], 100000);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks like this line is not necessary.

@@ -402,6 +419,27 @@ mod tests {
}
}

#[derive(Clone)]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Instead of deriving Clone, create a new Persister where needed and clone the data dir to pass to it.

// Test that if we encounter an error during manager persistence, the thread panics.
let nodes = create_nodes(2, "test_persist_error".to_string());
open_channel!(nodes[0], nodes[1], 100000);

let persister = |_: &_| Err(std::io::Error::new(std::io::ErrorKind::Other, "test"));
struct ChannelManagerErrorPersister {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To avoid the duplication and boilerplate needed for the specialized persisters, you can modify the standard one you provided earlier as follows:

struct Persister {
	data_dir: String,
	graph_error: Option<(std::io::ErrorKind, &'static str)>,
}

impl Persister {
	fn new(data_dir: String) -> Self {
		Self { data_dir, graph_error: None }
	}

	fn with_graph_error(self, error: std::io::ErrorKind, message: &'static str) -> Self {
 	 	 Self { graph_error: Some((error, message)), ..self }
	}
}

	// ...
	fn persist_graph(&self, network_graph: &NetworkGraph) -> Result<(), std::io::Error> {
		match self.graph_error {
			None => FilesystemPersister::persist_network_graph(self.data_dir.clone(), network_graph),
			Some((error, message)) => Err(std::io::Error::new(error, message)),
		}
	}

And then create one as:

let persister = Persister::new(data_dir).with_graph_error(std::io::ErrorKind::Other, "test");

This will make the tests easier to read as they'll be more concise. You can do something similar for ChannelManager persistence errors.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this looks really slick, will do this. thanks!

@@ -151,7 +148,12 @@ impl BackgroundProcessor {
/// [`ChannelManager`]. See [`FilesystemPersister::persist_manager`] for Rust-Lightning's
/// provided implementation.
///
/// Typically, users should either implement [`ChannelManagerPersister`] to never return an
/// `persist_graph` is responsible for writing out the [`NetworkGraph`] to disk. See
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In the paragraphs above, I think the reference to persist_manager was at one point referring to a parameter to start but has been renamed a few times since. We should update the docs accordingly and use similar wording in this paragraph for persist_graph now that these are methods on persister.

@jurvis
Copy link
Contributor Author

jurvis commented Mar 29, 2022

thanks for the review @jkczyz! I made the changes and broke them up by commit. I'm a little unsure on what else needs to be done to the docs beyond explicitly saying that persist_manager is now called via an implementation of the Persister trait. Let me know if there is any specific language you will prefer to add 😄

jkczyz
jkczyz previously approved these changes Mar 29, 2022
Copy link
Contributor

@jkczyz jkczyz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

All looks great! Can squash again once @TheBlueMatt is good.

@TheBlueMatt
Copy link
Collaborator

LGTM! Please squash the commits down into logically consistent commits without fixups in later commits and this should be good to go.

Instead of creating a separate trait for persisting NetworkGraph, use and rename the existing ChannelManagerPersister to handle them both. persist_graph is then called on removal of stale channels and on exit.
@jurvis jurvis force-pushed the jurvis/persist-networkgraph branch from de4cb40 to df2e60d Compare March 30, 2022 02:38
@jurvis
Copy link
Contributor Author

jurvis commented Mar 30, 2022

@TheBlueMatt @jkczyz squashed. hope the way I organized the commits makes sense.

@jkczyz jkczyz merged commit aeeafed into lightningdevkit:main Mar 30, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants