-
Notifications
You must be signed in to change notification settings - Fork 211
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
fix: more robust lightning receive #3929
Conversation
76d9bb9
to
76550bb
Compare
Codecov ReportAttention:
Additional details and impacted files@@ Coverage Diff @@
## master #3929 +/- ##
==========================================
+ Coverage 57.06% 57.10% +0.04%
==========================================
Files 193 193
Lines 42916 42949 +33
==========================================
+ Hits 24489 24528 +39
+ Misses 18427 18421 -6 ☔ View full report in Codecov by Sentry. |
a1e92b4
to
2c66734
Compare
}; | ||
|
||
self.expiry_timestamp() < now | ||
self.expiry_timestamp() < duration_since_epoch().as_secs() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
thought: Not a problem of this PR, but would rename expiry_timestamp
to expiry_timestamp_secs
or something.
// only when we are sure that the invoice is still pending that we can check | ||
// for a timeout | ||
const BUFFER: f64 = 1.10; // 10% buffer to account for clock skew | ||
let invoice_since_epoch_secs = |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
issue:
> date +%s
1702448004
Isn't 10% of 1702448004
s ... 5 years? Do we really need such a buffer? Seems like 60s should be enough for everyone?
BTW. "buffer" seems like a wrong word. Would just inline it to not have to come up with a better name.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The previous implementation applied it on top of the relative timeout. So 6min per hour.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@douglaz the factor should be applied on the relative expiry time, not the absolute one.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm sorry, my commit didn't include the fix. Look now.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I like dpc's idea that this should be just an offset. You don't need it to be relative.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think fixed offset between computers is much more likely than clocks going 10% slower/faster (which would right away caused fixed offset anyway). Unless I'm missing something.
BTW. Makes me wonder if federation should have a consensus on time as well. :D
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also good to always allow a fixed grace period as long as there is one.
2c66734
to
619e576
Compare
pub fn duration_since_epoch() -> std::time::Duration { | ||
now() | ||
.duration_since(SystemTime::UNIX_EPOCH) | ||
.expect("time to work") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: I avoided such expects more recently since a horribly misconfigured system might have a time from the past.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If your system is so misconfigured you should expect things to blow up
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actually - for this to panic, the now()
would have to return a date before unix epoch... ? Which is probably impossible at least on Linux, because the representation of time start with unix epoch?
// only when we are sure that the invoice is still pending that we can check | ||
// for a timeout | ||
const BUFFER: f64 = 1.10; // 10% buffer to account for clock skew | ||
let invoice_since_epoch_secs = |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@douglaz the factor should be applied on the relative expiry time, not the absolute one.
619e576
to
8e03a82
Compare
Too tired for another review today, don't want to block unnecessarily.
DecryptedPreimageStatus::Pending => { | ||
// only when we are sure that the invoice is still pending that we can check | ||
// for a timeout | ||
const TOLERANCE: Duration = Duration::from_secs(60); // tolerate some clock skew |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
praise: I like TOLERANCE much more. Sounds very inclusive, while BUFFER sounded needlessly exclusionary, so 👍 😄 👍
Successfully created backport PR for |
On closer inspection this looks more like a quick fix. Since the underlying timeout issue was hard to debug we re-introduce polling again for the one API call that can take longer than some built-in timeout. It's the pragmatic thing to do, but we should track fixing this issue properly. |
Closes #3881
Timeout now will happen at the right time: when we're sure the preimage decryption is pending and the expiry time relative to the invoice timestamp has passed.
There are no risks anymore of the timeout not happening because the state machines have been interrupted or it incorrectly happening while the client is dealing with network issues.