New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

rpc: add support for a `forgetchannel` RPC only enabled w/ the debug flag #1295

Merged
merged 3 commits into from Sep 22, 2018

Conversation

@joostjager
Collaborator

joostjager commented May 29, 2018

Fixes #1225

Capacity: dbChan.Capacity,
// Appropriate type for forgotten channels?
CloseType: channeldb.LocalForceClose,

This comment has been minimized.

@joostjager

joostjager May 29, 2018

Collaborator

Not sure what to do here. Introduce new close type?

This comment has been minimized.

@Roasbeef

Roasbeef Jun 2, 2018

Member

Yeah a new close type would make sense here IMO.

@Roasbeef

Thanks for the PR! Haven't yet tested this yet, but my only major review comment is to properly utilize build flags rather than use a new command line debug flag. This ensures that users don't accidentally use this feature without first doing a manual build themselves.

config.go Outdated
@@ -175,6 +175,7 @@ type config struct {
Profile string `long:"profile" description:"Enable HTTP profiling on given port -- NOTE port must be between 1024 and 65535"`
DebugHTLC bool `long:"debughtlc" description:"Activate the debug htlc mode. With the debug HTLC mode, all payments sent use a pre-determined R-Hash. Additionally, all HTLCs sent to a node with the debug HTLC R-Hash are immediately settled in the next available state transition."`
Debug bool `long:"debug" description:"Run lnd in debug mode."`

This comment has been minimized.

@Roasbeef

Roasbeef Jun 2, 2018

Member

Rather than use a config parameter for this, build flags in Go should be utilized instead. This path would involve defining a method in the rpc server area, then having two versions: one that actually deletes the channel, and one that's a no-op but gives an error.

This comment has been minimized.

@joostjager

joostjager Jun 2, 2018

Collaborator

I understand that go does not have c style preprocessor directives and only allows conditional compilation on the file level. I see two options:

  1. Create a rpcserver_debug.go and a rpcserver_production.go. These files contain the two versions of the method
  2. Create a config_debug.go and a config_production.go. In these files, a constant DebugBuild is set. This constant is checked run time in the rpc server method implementation. (similar to htlcswitch/hodl - it can probably be removed from there in that case)

I'd prefer option 2 and not spread out the rpcserver method implementation over three files.

What is your opinion? Or suggestion for something else?

This comment has been minimized.

@Roasbeef

Roasbeef Jun 14, 2018

Member

Personally, I favor option 1 as it doesn't require us copying over the entire config flag. In this case, the ForgetChannel RPC method would be the only one copied over in the short term. So the ForgetChannel RPC in rpcserver.go, would call out to a canned forgetChannel method declared at the top of the file. The two versions would then be properly gated on the build flag.

This comment has been minimized.

@joostjager

joostjager Jun 14, 2018

Collaborator

Change made. But what I don't know is how to put that canned forgetChannel func at the top op rpcserver.go. I think I am forced to put it into a separate rpcserver_production.go file. Or how to do it otherwise?

Capacity: dbChan.Capacity,
// Appropriate type for forgotten channels?
CloseType: channeldb.LocalForceClose,

This comment has been minimized.

@Roasbeef

Roasbeef Jun 2, 2018

Member

Yeah a new close type would make sense here IMO.

@joostjager joostjager force-pushed the joostjager:forget branch from aba5777 to 7acc2ea Jun 2, 2018

ChainHash: dbChan.ChainHash,
RemotePub: dbChan.IdentityPub,
Capacity: dbChan.Capacity,
CloseType: channeldb.Forgotten,

This comment has been minimized.

@joostjager

joostjager Jun 2, 2018

Collaborator

Looking at it, isn't AbandonChannel / ClosureType.Abandoned a clearer naming for this operation and state?

This comment has been minimized.

@Roasbeef

Roasbeef Jun 14, 2018

Member

Agreed, I prefer AbandonedChannel.

This comment has been minimized.

@joostjager

joostjager Jun 14, 2018

Collaborator

Renamed

@joostjager joostjager force-pushed the joostjager:forget branch from 7acc2ea to 557abc6 Jun 3, 2018

return nil, err
}
err = r.server.chanDB.MarkChanFullyClosed(chanPoint)

This comment has been minimized.

@halseth

halseth Jun 4, 2018

Collaborator

this call is not necessary if you set ChannelCloseSummary{ IsPending: false }above.

This comment has been minimized.

@joostjager

joostjager Jun 4, 2018

Collaborator

Good point, MarkChanFullyClosed does nothing else than that. Default boolean value is false, but indeed added explicitly for clarity.

@joostjager joostjager force-pushed the joostjager:forget branch from 557abc6 to cc473ae Jun 4, 2018

@joostjager

This comment has been minimized.

Collaborator

joostjager commented Jun 13, 2018

@Roasbeef Can I get your opinion on the two open questions above?

@joostjager joostjager force-pushed the joostjager:forget branch 5 times, most recently from 72f44ba to 6d37868 Jun 14, 2018

@joostjager

This comment has been minimized.

Collaborator

joostjager commented Jun 14, 2018

Rebased onto master to pull in ClosedChannels rpc that needs to be extended in this PR with the abandoned close type.

Ready for review again.

@joostjager joostjager force-pushed the joostjager:forget branch from 6d37868 to f016506 Jun 14, 2018

@joostjager joostjager force-pushed the joostjager:forget branch 3 times, most recently from b00ecfb to 2b71374 Jun 16, 2018

@wpaulino

Tested and works as advertised 👍

*/
rpc AbandonChannel (AbandonChannelRequest) returns (AbandonChannelResponse) {
option (google.api.http) = {
post: "/v1/channels/{channel_point.funding_txid_str}/{channel_point.output_index}/forget"

This comment has been minimized.

@wpaulino

wpaulino Jun 29, 2018

Collaborator

Perhaps the endpoint should be modified to abandon as well?

return nil
}
func parseChannelPoint(ctx *cli.Context) (*lnrpc.ChannelPoint, error) {

This comment has been minimized.

@wpaulino

wpaulino Jun 29, 2018

Collaborator

Nice refactor 👍

Could you add a godoc?

/** lncli: `abandonchannel`
AbandonChannel removes all channel state from the database except for a
close summary. This method can be used to get rid of permanently unusable
channels dued to bugs fixed in newer versions of lnd. Only available

This comment has been minimized.

@wpaulino

wpaulino Jun 29, 2018

Collaborator

nit: s/dued/due/

}
chanPoint := wire.NewOutPoint(txid, index)
dbChan, err := r.fetchOpenDbChannel(*chanPoint)

This comment has been minimized.

@wpaulino

wpaulino Jun 29, 2018

Collaborator

I believe most cases where users have channels stuck are found within the set of pending closed channels instead. If we opt to allow abandoning open channels as well, we should probably ensure that they're inactive first.

This comment has been minimized.

@joostjager

joostjager Jul 2, 2018

Collaborator

With ensuring that they're inactive, do you mean pending close? I am not sure if that is what you want. My impression is that abandon should remove the channel without trying to do anything else that might fail.

This comment has been minimized.

@Roasbeef

Roasbeef Jul 11, 2018

Member

In many cases, the users are also unable to actually fully read the borked channel from disk fully. As a result, we should check if we're unable to read the channel state from disk, and fall back to a more "aggressive" purge if needed. By more aggressive, I mean just deleting the bucket hierarchy that the channel resides in.

This comment has been minimized.

@joostjager

joostjager Jul 28, 2018

Collaborator

But in that case, does lnd start up at all? I see that fetchNodeChannels is called on startup and deserializes the open channels. If it doesn't start, there is no opportunity to make the AbandonChannel call.

Isn't a database integrity check/correct on startup a better solution for this issue?

This comment has been minimized.

@Roasbeef

Roasbeef Jul 30, 2018

Member

Yes it's the case that lnd shouldn't actually get into this state, but it's a bit of a work-around to allow users that have channels that are already borked (so can't properly deserialize for w/e reason) to cleanly remove the channel from the database.

This comment has been minimized.

@joostjager

joostjager Jul 30, 2018

Collaborator

I see that those channels cannot be deserialized and that it shouldn't happen, but what is the best way to solve the problem? Extending the logic of AbandonChannel is not going to work because lnd isn't even starting and cannot accept the RPC call.

This comment has been minimized.

@vegardengen

vegardengen Jul 30, 2018

Contributor

@joostjager Some of us have very old workarounds in our code - an extra continue or a return, making us ignore the error for now.

This PR will make it possible for us to delete the data that makes these workaround necessary, and go back to a clean codebase.

In my opinion, we should try to never make LND exit at errors that affect for example only single channel, but add it to a "borkedchannels" pool, for possibly later fixup or deletion. I do, however, see that it's been useful at this stage for developers to be quickly notified about serious channel issues (whenever I can't start my node, I tend to scream and shout and create a github issue eventually), so it's a bit of a tradeoff here.

This comment has been minimized.

@joostjager

joostjager Aug 1, 2018

Collaborator

So how it would work if we follow the line of this PR, is that when LND crashes on startup:

  • Add continue/returns to make it start
  • Use AbandonChannel to remove the borked channels from the database. In this process hope that the continue/returns are not placed in such a way that AbandonChannels isn't working anymore
  • Revert code to original

This also means that quite some db code will be introduced to deal with this special situation.

I think I'd prefer a command line switch like --fixdb that executes a set of integrity checks and fixes problems. After making a backup of channel.db.

This comment has been minimized.

@halseth

halseth Sep 6, 2018

Collaborator

I think the strategy you outlined in the bullet points makes sense, as this should only affect a small set of users, and if they can create a debug build they can also add the necessary continues/returns.

This comment has been minimized.

@joostjager

joostjager Sep 6, 2018

Collaborator

Looking at the changes required for AbandonChannel to be able to remove a borked channel, we'll be adding quite some code to channeldb just to be able to do that.

func (r *rpcServer) AbandonChannel(ctx context.Context,
in *lnrpc.AbandonChannelRequest) (*lnrpc.AbandonChannelResponse, error) {
if !DebugBuild {

This comment has been minimized.

@wpaulino

wpaulino Jun 29, 2018

Collaborator

@cfromknecht's build package PR (#703) would come in handy here!

This comment has been minimized.

@joostjager

joostjager Jul 2, 2018

Collaborator

Interesting, hadn't noticed that one. Similar construction with setting a variable based on a build tag.

This comment has been minimized.

@halseth

halseth Sep 6, 2018

Collaborator

Another approach would be to add a new file rpcserver_debug.go where this method is defined, which only is built if the debug flag is set.

This comment has been minimized.

@joostjager

joostjager Sep 6, 2018

Collaborator

I tried this before, but after discussion with @Roasbeef we settled on the current mechanism.

@joostjager joostjager force-pushed the joostjager:forget branch from 87c17a7 to 82438b9 Jul 2, 2018

@Roasbeef Roasbeef added the P3 label Jul 10, 2018

@Roasbeef

This comment has been minimized.

Member

Roasbeef commented Jul 10, 2018

Rebase at your leisure, thanks!

@joostjager

This comment has been minimized.

Collaborator

joostjager commented Jul 11, 2018

Rebased

@Roasbeef Roasbeef removed the needs rebase label Jul 11, 2018

Show resolved Hide resolved cmd/lncli/commands.go Outdated
Show resolved Hide resolved rpcserver.go Outdated
func (r *rpcServer) AbandonChannel(ctx context.Context,
in *lnrpc.AbandonChannelRequest) (*lnrpc.AbandonChannelResponse, error) {
if !DebugBuild {

This comment has been minimized.

@halseth

halseth Sep 6, 2018

Collaborator

Another approach would be to add a new file rpcserver_debug.go where this method is defined, which only is built if the debug flag is set.

}
chanPoint := wire.NewOutPoint(txid, index)
dbChan, err := r.fetchOpenDbChannel(*chanPoint)

This comment has been minimized.

@halseth

halseth Sep 6, 2018

Collaborator

I think the strategy you outlined in the bullet points makes sense, as this should only affect a small set of users, and if they can create a debug build they can also add the necessary continues/returns.

@joostjager joostjager force-pushed the joostjager:forget branch from 4515c5c to aa76729 Sep 6, 2018

@joostjager

This comment has been minimized.

Collaborator

joostjager commented Sep 6, 2018

Migration idea turns out to require quite some calls to serialize/deserialize functions. This makes the migration brittle, because these functions may change in the future. Copy/paste the required code is not good for maintenance. Unit test coverage could help, but would still require copy/paste when changes are made in the future to these functions.

After discussion with @halseth, we decided to implement the cleanup of the database behind a command line flag that is only available in debug mode and that that development will be in a separate PR.

@joostjager joostjager force-pushed the joostjager:forget branch 2 times, most recently from bd9f634 to 597d7c5 Sep 7, 2018

@Roasbeef

LGTM 🤺

Needs a rebase as well to fix the conflict in the protos. The one lingering comment I have is to change the target HTTP path to be a bit more REST-y

*/
rpc AbandonChannel (AbandonChannelRequest) returns (AbandonChannelResponse) {
option (google.api.http) = {
post: "/v1/channels/{channel_point.funding_txid_str}/{channel_point.output_index}/abandon"

This comment has been minimized.

@Roasbeef

Roasbeef Sep 18, 2018

Member

Alternatively, this could instead target the DELTE verb, dropping the abandon path at the end.

@Roasbeef

This should also clean up the related state in the contractcourt, htlcswitch, and utxonursery. On restart this won't read the new data, but they may attempt to carry out their duties which involve reading/mutating the channel state.

@joostjager joostjager force-pushed the joostjager:forget branch from 597d7c5 to 907cfd2 Sep 18, 2018

rpcserver+lnrpc+lncli: add AbandonChannel rpc call
Using AbandonChannel, a channel can be abandoned. This means
removing all state without any on-chain or off-chain action.
A close summary is the only thing that is stored in the db after
abandoning.

A specific close type Abandoned is added. Abandoned channels
can be retrieved via the ClosedChannels RPC.

@joostjager joostjager force-pushed the joostjager:forget branch from 907cfd2 to 4f5f5a8 Sep 18, 2018

@joostjager

This comment has been minimized.

Collaborator

joostjager commented Sep 18, 2018

After discussion decided to move cleaning up cnct, htlcswitch and utxonursery to #1857. Possibly complexities will be created when trying to do it "hot".

@halseth halseth added this to the 0.5.1 milestone Sep 19, 2018

t.Fatalf("unable to abandon channel: %v", err)
}
// Assert that channel in no longer open.

This comment has been minimized.

@Roasbeef

Roasbeef Sep 20, 2018

Member

Seeing this flake a bit on Travis, as a result, would recommend that the checks below be wrapped in a WaitPredicate.

@joostjager joostjager force-pushed the joostjager:forget branch 2 times, most recently from 323b326 to 8f49e4d Sep 20, 2018

@halseth halseth modified the milestones: 0.5.1, 0.5.2 Sep 20, 2018

lnd_test.go Outdated
"instead she has %v",
len(aliceClosedList.Channels))
}
}

This comment has been minimized.

@Roasbeef

Roasbeef Sep 20, 2018

Member

The channel is never closed at the end.

This comment has been minimized.

@Roasbeef

Roasbeef Sep 20, 2018

Member

Bob should close the channel at the end of the test.

@joostjager joostjager force-pushed the joostjager:forget branch from 8f49e4d to a75b0c8 Sep 21, 2018

@Roasbeef

LGTM 🐸

@Roasbeef Roasbeef merged commit f430509 into lightningnetwork:master Sep 22, 2018

1 of 2 checks passed

coverage/coveralls Coverage decreased (-0.03%) to 53.279%
Details
continuous-integration/travis-ci/pr The Travis CI build passed
Details
return nil, err
}
summary := &channeldb.ChannelCloseSummary{

This comment has been minimized.

@cfromknecht

cfromknecht Sep 22, 2018

Collaborator

Isn't this close summary missing critical fields that could allow the channel to be rescued at a later point?

@Roasbeef

This comment has been minimized.

Member

Roasbeef commented Sep 22, 2018

@cfromknecht

This comment has been minimized.

Collaborator

cfromknecht commented Sep 24, 2018

Yeah my thinking was that they may choose to abandon it if the channel is causing issues for their node on startup or something, but then later it could be manually swept if we have the correct info

@vegardengen

This comment has been minimized.

Contributor

vegardengen commented Sep 29, 2018

Is anyone working on extending this to also being able to drop channels in pending force close?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment