-
Notifications
You must be signed in to change notification settings - Fork 339
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Persist NetworkGraph on removal of stale channels #1376
Persist NetworkGraph on removal of stale channels #1376
Conversation
b7f1396
to
cc88533
Compare
Codecov Report
@@ Coverage Diff @@
## main #1376 +/- ##
==========================================
+ Coverage 90.73% 90.80% +0.06%
==========================================
Files 73 73
Lines 40808 41241 +433
Branches 40808 41241 +433
==========================================
+ Hits 37027 37447 +420
- Misses 3781 3794 +13
Continue to review full report at Codecov.
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks! Looks pretty good at first glance.
@@ -277,6 +303,9 @@ impl BackgroundProcessor { | |||
if let Some(ref handler) = net_graph_msg_handler { | |||
log_trace!(logger, "Pruning network graph of stale entries"); | |||
handler.network_graph().remove_stale_channels(); | |||
if network_graph_persister.persist_graph(handler.network_graph()).is_err() { | |||
log_warn!(logger, "Warning: Failed to persist network graph, check your disk and permissions"); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Probably log_error, I think, instead.
persister.persist_manager(&*channel_manager)?; | ||
channel_manager_persister.persist_manager(&*channel_manager)?; | ||
if let Some(ref handler) = net_graph_msg_handler { | ||
if network_graph_persister.persist_graph(handler.network_graph()).is_err() { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We definitely don't need to persist the graph here, otherwise we'll be doing it every time we have any HTLC/commitment updates.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
that's actually related to the question I had in the PR description... I added it here in order to test if NetworkGraph persists without waiting for 60 seconds for the prune block to run.
I was wondering if you had any ideas on that? 🤔
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You're welcome to const-ify that 60 and then use #[cfg(test)]
to have a different value in test, eg the way PING_TIMER
and FRESHNESS_TIMER
are done.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@TheBlueMatt ah, got it! didn't see that. Thanks! 😄
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A handful of rather trivial comments, but largely looks great, thanks!
/// Unlike, [`persist_manager`], this will not cause [`BackgroundProcessor`] to exit. | ||
/// | ||
/// [`NetworkGraph`]: lightning::routing::network_graph::NetworkGraph | ||
/// [`BackgroundProcessor`]: lightning-background-processor::BackgroundProcessor |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You should be able to drop this, no? By default the docs stuff will resolve any symbols which are resolvable locally.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
got it -- this is my first time working with rustdoc
works so you're probably right 😄
/// | ||
/// [`NetworkGraph`]: lightning::routing::network_graph::NetworkGraph | ||
/// [`BackgroundProcessor`]: lightning-background-processor::BackgroundProcessor | ||
/// [`persist_manager`]: lightning-background-processor::Persister::persist_manager |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
note you shouldn't need the full crate reference here.
/// | ||
/// [`ChannelManager`]: lightning::ln::channelmanager::ChannelManager | ||
pub trait ChannelManagerPersister<Signer: Sign, M: Deref, T: Deref, K: Deref, F: Deref, L: Deref> | ||
/// [`NetworkGraph`]: lightning::routing::network_graph::NetworkGraph |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
here and in a number of places you have EOL whitespace. A local git show
should highlight them based on your terminal settings.
@@ -150,6 +150,11 @@ impl BackgroundProcessor { | |||
/// uploading to one or more backup services. See [`ChannelManager::write`] for writing out a | |||
/// [`ChannelManager`]. See [`FilesystemPersister::persist_manager`] for Rust-Lightning's | |||
/// provided implementation. | |||
/// | |||
/// `persist_graph` is responsible for writing out the [`NetworkGraph`] to disk, and/or | |||
/// uploading to one or more backup services. See [`ChannelManager::write`] for writing out a |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Don't think we want to suggest people back up the network graph, its all public data.
L::Target: 'static + Logger, | ||
{ | ||
fn persist_manager(&self, _channel_manager: &ChannelManager<Signer, M, T, K, F, L>) -> Result<(), std::io::Error> { | ||
Err(std::io::Error::new(std::io::ErrorKind::Other, "test")) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In tests, instead of erroring, we should panic to ensure the test fails.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
hm, this test is actually for testing if channel_manager
persistence fails. I didn't change the original implementation of persist_manager
in tests.
Right now it checks and panics if there isn't an error.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
actually that makes me wonder, and I apologize if this is a dumb question: since we only do log_error!
instead of throwing it when persisting a graph fails, how do we test that behavior? do I use the testing_logger
crate?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oops lol sorry.
how do we test that behavior?
Our TestLogger
struct has a few assert_log*
functions that allow you to assert a regex or specific string was logged. We don't generally worry too much about ensuring particular log entries are printed, though.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@TheBlueMatt gotcha -- in that case, I created a test that just does the same error checks as what we do for persisting channel_manager
😄
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh, it'd be great to persist the graph on exit like we do the channelmanager as well.
c0f6a43
to
3685a6c
Compare
fn persist_manager(&self, channel_manager: &ChannelManager<Signer, M, T, K, F, L>) -> Result<(), std::io::Error> { | ||
self(channel_manager) | ||
} | ||
/// Persist the given [`NetworkGraph`] to disk, logging an error if persistence failed. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The method is expected to return an error, not log it. The caller logs the error. I think in general method docs should describe the method's behavior, not the caller's.
last_prune_call = Instant::now(); | ||
have_pruned = true; | ||
} | ||
} | ||
} | ||
|
||
// Persist NetworkGraph on exit | ||
if let Some(ref handler) = net_graph_msg_handler { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Lets do this after the channel manager - the manager is much more important, and if the network graph fails to be persisted cause the user kills the process during shutdown its not a big deal.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM. Can you squash the commits down into one or two commits that stand alone without later fixups?
7621e4b
to
1670e90
Compare
squashed |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks pretty good. Some of my comments predate your change but would be a good opportunity to fix now.
#[cfg(test)] | ||
const FIRST_NETWORK_PRUNE_TIMER: u64 = 1; | ||
|
||
/// Trait which handles persisting a [`ChannelManager`] and [`NetworkGraph`] to disk. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: s/which/that
/// [`ChannelManager`]: lightning::ln::channelmanager::ChannelManager | ||
pub trait ChannelManagerPersister<Signer: Sign, M: Deref, T: Deref, K: Deref, F: Deref, L: Deref> | ||
/// [`NetworkGraph`]: lightning::routing::network_graph::NetworkGraph |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
These mappings can be removed (here and below) since both structs are imported. Can verify by removing and running cargo doc -p lightning-background-processor
@@ -87,24 +93,15 @@ where | |||
L::Target: 'static + Logger, | |||
{ | |||
/// Persist the given [`ChannelManager`] to disk, returning an error if persistence failed | |||
/// (which will cause the [`BackgroundProcessor`] which called this method to exit. | |||
/// (which will cause the [`BackgroundProcessor`] which called this method to exit.) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: period after parenthesis.
if let Some(ref handler) = net_graph_msg_handler { | ||
log_trace!(logger, "Pruning network graph of stale entries"); | ||
handler.network_graph().remove_stale_channels(); | ||
if persister.persist_graph(handler.network_graph()).is_err() { | ||
log_error!(logger, "Warning: Failed to persist network graph, check your disk and permissions"); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should we say "Error:" given the logging level used?
if let Some(ref handler) = net_graph_msg_handler { | ||
log_trace!(logger, "Pruning network graph of stale entries"); | ||
handler.network_graph().remove_stale_channels(); | ||
if persister.persist_graph(handler.network_graph()).is_err() { | ||
log_error!(logger, "Warning: Failed to persist network graph, check your disk and permissions"); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could we include the error in the log?
@@ -570,6 +609,14 @@ mod tests { | |||
if !nodes[0].node.get_persistence_condvar_value() { break } | |||
} | |||
|
|||
// Check network graph is persisted | |||
let filepath = get_full_filepath("test_background_processor_persister_0".to_string(), "network_graph".to_string()); | |||
let mut expected_bytes = Vec::new(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Having expected_bytes
here is a bit confusing. Could you move it directly into check_persisted_data!
?
fn test_network_graph_persist_error() { | ||
// Test that if we encounter an error during network graph persistence, an error gets returned. | ||
let nodes = create_nodes(2, "test_persist_network_graph_error".to_string()); | ||
open_channel!(nodes[0], nodes[1], 100000); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks like this line is not necessary.
@@ -402,6 +419,27 @@ mod tests { | |||
} | |||
} | |||
|
|||
#[derive(Clone)] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Instead of deriving Clone
, create a new Persister
where needed and clone the data dir to pass to it.
// Test that if we encounter an error during manager persistence, the thread panics. | ||
let nodes = create_nodes(2, "test_persist_error".to_string()); | ||
open_channel!(nodes[0], nodes[1], 100000); | ||
|
||
let persister = |_: &_| Err(std::io::Error::new(std::io::ErrorKind::Other, "test")); | ||
struct ChannelManagerErrorPersister { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
To avoid the duplication and boilerplate needed for the specialized persisters, you can modify the standard one you provided earlier as follows:
struct Persister {
data_dir: String,
graph_error: Option<(std::io::ErrorKind, &'static str)>,
}
impl Persister {
fn new(data_dir: String) -> Self {
Self { data_dir, graph_error: None }
}
fn with_graph_error(self, error: std::io::ErrorKind, message: &'static str) -> Self {
Self { graph_error: Some((error, message)), ..self }
}
}
// ...
fn persist_graph(&self, network_graph: &NetworkGraph) -> Result<(), std::io::Error> {
match self.graph_error {
None => FilesystemPersister::persist_network_graph(self.data_dir.clone(), network_graph),
Some((error, message)) => Err(std::io::Error::new(error, message)),
}
}
And then create one as:
let persister = Persister::new(data_dir).with_graph_error(std::io::ErrorKind::Other, "test");
This will make the tests easier to read as they'll be more concise. You can do something similar for ChannelManager
persistence errors.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this looks really slick, will do this. thanks!
@@ -151,7 +148,12 @@ impl BackgroundProcessor { | |||
/// [`ChannelManager`]. See [`FilesystemPersister::persist_manager`] for Rust-Lightning's | |||
/// provided implementation. | |||
/// | |||
/// Typically, users should either implement [`ChannelManagerPersister`] to never return an | |||
/// `persist_graph` is responsible for writing out the [`NetworkGraph`] to disk. See |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In the paragraphs above, I think the reference to persist_manager
was at one point referring to a parameter to start
but has been renamed a few times since. We should update the docs accordingly and use similar wording in this paragraph for persist_graph
now that these are methods on persister
.
thanks for the review @jkczyz! I made the changes and broke them up by commit. I'm a little unsure on what else needs to be done to the docs beyond explicitly saying that |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
All looks great! Can squash again once @TheBlueMatt is good.
LGTM! Please squash the commits down into logically consistent commits without fixups in later commits and this should be good to go. |
06a2b81
to
de4cb40
Compare
Instead of creating a separate trait for persisting NetworkGraph, use and rename the existing ChannelManagerPersister to handle them both. persist_graph is then called on removal of stale channels and on exit.
de4cb40
to
df2e60d
Compare
@TheBlueMatt @jkczyz squashed. hope the way I organized the commits makes sense. |
This PR addresses issue #1191. I borrowed mostly the same ideas from persisting
ChannelManager
, except instead of throwing an exception on error, I do alog_error!
instead to mirror the non-terminal behavior inldk-sample
.I'm also not sure if we want to be callingpersist_network_graph
at the same time aspersist_manager
, but I couldn't figure out how else to test it, since the prune block only gets called after 60 seconds. Let me know if there are any ideas here.