Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix: more robust lightning receive #3929

Merged
merged 1 commit into from
Dec 13, 2023

Conversation

douglaz
Copy link
Contributor

@douglaz douglaz commented Dec 12, 2023

Closes #3881

Timeout now will happen at the right time: when we're sure the preimage decryption is pending and the expiry time relative to the invoice timestamp has passed.

There are no risks anymore of the timeout not happening because the state machines have been interrupted or it incorrectly happening while the client is dealing with network issues.

Copy link

codecov bot commented Dec 12, 2023

Codecov Report

Attention: 5 lines in your changes are missing coverage. Please review.

Comparison is base (577ed73) 57.06% compared to head (8e03a82) 57.10%.
Report is 2 commits behind head on master.

Files Patch % Lines
modules/fedimint-ln-client/src/receive.rs 90.47% 2 Missing ⚠️
fedimint-load-test-tool/src/main.rs 0.00% 1 Missing ⚠️
modules/fedimint-ln-common/src/contracts/mod.rs 50.00% 1 Missing ⚠️
modules/fedimint-ln-server/src/lib.rs 95.45% 1 Missing ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##           master    #3929      +/-   ##
==========================================
+ Coverage   57.06%   57.10%   +0.04%     
==========================================
  Files         193      193              
  Lines       42916    42949      +33     
==========================================
+ Hits        24489    24528      +39     
+ Misses      18427    18421       -6     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@douglaz douglaz force-pushed the better_ln_receive branch 2 times, most recently from a1e92b4 to 2c66734 Compare December 12, 2023 22:10
};

self.expiry_timestamp() < now
self.expiry_timestamp() < duration_since_epoch().as_secs()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thought: Not a problem of this PR, but would rename expiry_timestamp to expiry_timestamp_secs or something.

// only when we are sure that the invoice is still pending that we can check
// for a timeout
const BUFFER: f64 = 1.10; // 10% buffer to account for clock skew
let invoice_since_epoch_secs =
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

issue:

> date +%s
1702448004

Isn't 10% of 1702448004s ... 5 years? Do we really need such a buffer? Seems like 60s should be enough for everyone?

BTW. "buffer" seems like a wrong word. Would just inline it to not have to come up with a better name.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The previous implementation applied it on top of the relative timeout. So 6min per hour.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@douglaz the factor should be applied on the relative expiry time, not the absolute one.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm sorry, my commit didn't include the fix. Look now.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I like dpc's idea that this should be just an offset. You don't need it to be relative.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think fixed offset between computers is much more likely than clocks going 10% slower/faster (which would right away caused fixed offset anyway). Unless I'm missing something.

BTW. Makes me wonder if federation should have a consensus on time as well. :D

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also good to always allow a fixed grace period as long as there is one.

pub fn duration_since_epoch() -> std::time::Duration {
now()
.duration_since(SystemTime::UNIX_EPOCH)
.expect("time to work")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: I avoided such expects more recently since a horribly misconfigured system might have a time from the past.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If your system is so misconfigured you should expect things to blow up

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually - for this to panic, the now() would have to return a date before unix epoch... ? Which is probably impossible at least on Linux, because the representation of time start with unix epoch?

// only when we are sure that the invoice is still pending that we can check
// for a timeout
const BUFFER: f64 = 1.10; // 10% buffer to account for clock skew
let invoice_since_epoch_secs =
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@douglaz the factor should be applied on the relative expiry time, not the absolute one.

@elsirion elsirion dismissed their stale review December 13, 2023 20:45

Too tired for another review today, don't want to block unnecessarily.

DecryptedPreimageStatus::Pending => {
// only when we are sure that the invoice is still pending that we can check
// for a timeout
const TOLERANCE: Duration = Duration::from_secs(60); // tolerate some clock skew
Copy link
Contributor

@dpc dpc Dec 13, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

praise: I like TOLERANCE much more. Sounds very inclusive, while BUFFER sounded needlessly exclusionary, so 👍 😄 👍

@douglaz douglaz added this pull request to the merge queue Dec 13, 2023
Merged via the queue into fedimint:master with commit ae44ae3 Dec 13, 2023
20 checks passed
@douglaz douglaz deleted the better_ln_receive branch December 13, 2023 22:31
@fedimint-backports
Copy link

Successfully created backport PR for releases/v0.2:

@elsirion
Copy link
Contributor

On closer inspection this looks more like a quick fix. Since the underlying timeout issue was hard to debug we re-introduce polling again for the one API call that can take longer than some built-in timeout. It's the pragmatic thing to do, but we should track fixing this issue properly.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Lightning client payment receive state machines have issues
6 participants