htlcswitch+lnwallet: log signer-side commit tx on invalid commit sig#10740
htlcswitch+lnwallet: log signer-side commit tx on invalid commit sig#10740ziggie1984 wants to merge 1 commit intolightningnetwork:masterfrom
Conversation
Summary of ChangesHello, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed! This pull request introduces improved diagnostic capabilities for identifying state divergence between peers in the Lightning Network. By enabling the logging of the locally signed commitment transaction during an error event, developers can now compare it against the peer's reported transaction to pinpoint whether a failure stems from a state mismatch or a signing error. Highlights
🧠 New Feature in Public Preview: You can now enable Memory to help Gemini Code Assist learn from your team's feedback. This makes future code reviews more consistent and personalized to your project's style. Click here to enable Memory in your admin console. Using Gemini Code AssistThe full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips. Invoking Gemini You can request assistance from Gemini at any point by creating a comment using either
Customization To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a Limitations & Feedback Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for GitHub and other Google products, sign up here. Footnotes
|
🔴 PR Severity: CRITICAL
🔴 Critical (2 files)
AnalysisBoth changed files fall squarely in CRITICAL packages:
Additionally, this PR touches two distinct critical packages (htlcswitch and lnwallet), which further reinforces the CRITICAL classification. Expert review is required before merging. To override, add a |
c044523 to
058ae08
Compare
There was a problem hiding this comment.
Code Review
This pull request introduces a diagnostic method, BuildLastSignedRemoteCommitTx, to re-derive and log the last signed remote commitment transaction when a peer error occurs. This is intended to help identify state divergence. However, the current implementation re-derives the transaction from the update logs, which is redundant and susceptible to race conditions if the log index advances before the method is called. It is recommended to instead retrieve the transaction directly from the tail of the remote commitment chain, which already stores the exact transaction that was signed.
When a peer rejects our CommitSig with an Error message, the InvalidCommitSigError it sends back contains the commitment transaction the peer built on their side. We never logged the commitment transaction we signed, making it impossible to determine whether the failure was caused by a state divergence (the two peers derived different transactions) or a genuine signing bug.
058ae08 to
6103db9
Compare
🟢 PR Severity: LOW
AnalysisA The override is respected as-is — no file-level classification is performed when an override label exists. To change the override, replace the |
Problem
When a peer rejects our
CommitSigwith anErrormessage, theInvalidCommitSigErrorit sends back contains the commitmenttransaction the peer built on their side (logged via the existing error
handling). We never logged the commitment transaction we signed,
making it impossible to determine whether the failure was caused by a
state divergence (the two peers derived different transactions) or a
genuine signing bug.
This gap was observed in a CI failure on
#10628
(run:
the
zero_conf-channel_policy_update_public_zero_confitest got stuckwith a payment
IN_FLIGHTdue to anInvalidCommitSigErroratcommit_height=5on Carol's side. Carol'scommit_txwas visible inthe error log, but Dave's was not, leaving the root cause unresolvable.
Solution
Add
BuildLastSignedRemoteCommitTxtoLightningChannel. The methodre-derives the commitment transaction most recently signed for the
remote party by mirroring the exact index snapshot used in
SignNextCommitment. It must be called before a subsequentRevokeAndAckfrom the remote rotatesRemoteNextRevocationto a newcommitment point (which is guaranteed on the error path since the peer
sends
Errorinstead ofRevokeAndAck).processRemoteErrorin the htlcswitch now calls this method and logsthe result alongside the peer's error. Comparing the two
commit_txfields immediately reveals any state divergence without requiring a
live debugger or a second reproduction of the failure.
Identifying divergence from logs
With this change, a future failure would log:
If
<carol_tx> != <dave_tx>the two peers derived differenttransactions from the same state — that is the root cause. If they
match, the signature itself is broken and the sighash / signing path
needs investigation.