Skip to content
This repository has been archived by the owner on Aug 2, 2021. It is now read-only.

swap: introduce pending cheques, drop peers on io errors #1887

Merged
merged 19 commits into from
Nov 4, 2019

Conversation

ralph-pichler
Copy link
Member

@ralph-pichler ralph-pichler commented Oct 18, 2019

This PR implements the pending cheque mechanism as described here (#1601 (comment)).
There are two differences:

  • Once a cheque is confirmed the node will immediately recheck wether to send another one.
  • The balance on the sender side is adjusted before sending the cheque.

@ralph-pichler ralph-pichler added incentives builds on open PR Builds on a PR that is not yet merged. It is blocked until then and the diff won't make sense yet. labels Oct 18, 2019
@ralph-pichler ralph-pichler changed the base branch from swap-factory to master October 18, 2019 02:47
@ralph-pichler ralph-pichler added ready for review and removed builds on open PR Builds on a PR that is not yet merged. It is blocked until then and the diff won't make sense yet. labels Oct 18, 2019
@ralph-pichler ralph-pichler force-pushed the swap-pending-cheques branch 2 times, most recently from 1e4be68 to 75bf887 Compare October 18, 2019 03:01
swap/peer.go Outdated
@@ -40,6 +40,7 @@ type Peer struct {
contractAddress common.Address
lastReceivedCheque *Cheque
lastSentCheque *Cheque
pendingCheque *Cheque
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A small comment to each field in the peer struct would be nice.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

swap/swap.go Outdated
}

if err := p.setLastSentCheque(cheque); err != nil {
p.Drop("persistence error")
Copy link
Contributor

@Eknir Eknir Oct 18, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why would we drop the peer if our node cannot save the cheque? I would propose to shut down Swarm with this i/o error.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's not entirely clear to me how to shut down swarm and I didn't want to ignore the error entirely, so I used drop because that's what the other protocols do on io error as well.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Requesting @janos to give his opinion here.
After an IO error, is it sufficient to just drop the peer (as done by @ralph-pichler here) or should Swarm be stopped entirely? If we need to stop Swarm, how to do this safely?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we don't support a shutdown invoked by some high level components. you could panic and risk data corruption with the databases since a shutdown is not invoked when panicking. dropping the peer in this case is the safer option because that way, if you're getting local IO errors and you start dropping peers - you isolate the problem to your local node and technically you do not participate in the network anymore (ideally). I'd stick with dropping peers at the moment since node behavior with IO errors is something we have not tested yet and it is quite an involved task in its own

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A very interesting question. I agree with @acud, as we do not have a way to shut down the node, except with panic. If this error happens, we will drop the peer, but it is very likely that the same I/O error will happen in all next calls, so we will drop all peers, practically removing our node from the network. And maybe recovering if the I/O problem gets resolved. This could happen if the disk is full, for example. Maybe we could drop all peers if this happens which would prevent other peers to step into the same problem. I am not sure how easy is to do so currently. If it requires non trivial changes, this drop is ok for now.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

see #1515

swap/swap.go Outdated Show resolved Hide resolved
swap/swap.go Outdated Show resolved Hide resolved
swap/peer.go Outdated
@@ -154,16 +169,18 @@ func (p *Peer) createCheque() (*Cheque, error) {
// To be called with mutex already held
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you update the comment, mentioning the added functionality?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

swap/swap.go Outdated
}

// since more swap traffic might have occurred since this cheque was already sent, we redo the payment threshold check
if err := s.checkPaymentThreshold(p); err != nil {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If the disconnect threshold is set at less than 200% of the payment threshold, this function will never result in sending another cheque.

Proposing to update the comment:
since more swap traffic might have occurred (when disconnect threshold >= 2 * paymentThreshold)...

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is it possible that more traffic has occurred when this function is under lock?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not during handleConfirmChequeMsg. But traffic might have occurred since the call to checkPaymentThreshold in Add where this cheque was initially created.

swap/swap.go Outdated
}

// since more swap traffic might have occurred since this cheque was already sent, we redo the payment threshold check
if err := s.checkPaymentThreshold(p); err != nil {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If sendCheque returns an error here, it is handled differently than when sendCheque returns an error while being called from the Add function. I don't think this is desirable.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree. I intentionally didn't change the error handling in Add in this PR to keep it simple. Add shouldn't return an error either if cheque sending fails but I would leave that (and the rest of the errors issues in Add) to a separate PR and only deal with pending cheques in this one.

@@ -43,3 +43,7 @@ type HandshakeMsg struct {
type EmitChequeMsg struct {
Cheque *Cheque
}

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please comment

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

@@ -251,6 +251,10 @@ func TestTriggerPaymentThreshold(t *testing.T) {
t.Fatal(err)
}

debitorSwap.handleConfirmChequeMsg(ctx, creditor, &ConfirmChequeMsg{
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would place these additional lines line just above the lines containing:

if creditor.getBalance() != 0 {
		t.Fatalf("Expected debitorSwap balance to be 0, but is %d", creditor.getBalance())
	}

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure what you mean here. Move which lines where exactly?

@@ -198,6 +198,7 @@ const (
balancePrefix = "balance_"
sentChequePrefix = "sent_cheque_"
receivedChequePrefix = "received_cheque_"
pendingChequePrefix = "pending_cheque_"
Copy link
Contributor

@Eknir Eknir Oct 18, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you add this new key to the test TestStoreKeys in swap_test.go?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

Copy link
Contributor

@Eknir Eknir left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good work. This makes SWAP much more robust.

I left a few comments.

Apart from those, I think it would be good to have test scenario's covering (make new tests or add to existing tests):
handleEmit:

  1. After receiving a new cheque, a confirm message is sent, containing the new cheque
  2. After receiving an already sent cheque, a confirm message is sent, containing the old cheque
    handleConfirm
  3. Balance is updated
    sentCheque
  4. balance is not updated before getting the confirm message

swap/peer.go Show resolved Hide resolved
swap/peer.go Show resolved Hide resolved
swap/peer.go Show resolved Hide resolved
swap/swap.go Outdated Show resolved Hide resolved
swap/swap.go Outdated
}

if err := p.setLastSentCheque(cheque); err != nil {
p.Drop("persistence error")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we don't support a shutdown invoked by some high level components. you could panic and risk data corruption with the databases since a shutdown is not invoked when panicking. dropping the peer in this case is the safer option because that way, if you're getting local IO errors and you start dropping peers - you isolate the problem to your local node and technically you do not participate in the network anymore (ideally). I'd stick with dropping peers at the moment since node behavior with IO errors is something we have not tested yet and it is quite an involved task in its own

swap/swap.go Outdated
@@ -318,6 +336,12 @@ func (s *Swap) handleEmitChequeMsg(ctx context.Context, p *Peer, msg *EmitCheque
return err
}

if err := p.Send(ctx, &ConfirmChequeMsg{
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@ralph-pichler the convention to use is:

err := p.Send(...)
if err != nil {
 return err
}

for all error checks in the same block, this way you prevent allocating a new variable for every function call that checks for an error. please fix it in the calls below too

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I changed it for handleConfirmChequeMsg and handleEmitChequeMsg. Hope that's ok now, I remember the opposite recommendation from other PRs.

swap/swap.go Outdated
}

// since more swap traffic might have occurred since this cheque was already sent, we redo the payment threshold check
if err := s.checkPaymentThreshold(p); err != nil {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is it possible that more traffic has occurred when this function is under lock?

Copy link
Contributor

@mortelli mortelli left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

no major gripes, looks good overall.

i'll take another look after having the discussions addressed.

i recommend rebasing/merging with master often to avoid falling behind too much and having to deal with complicated conflicts

swap/peer.go Show resolved Hide resolved
swap/peer.go Outdated Show resolved Hide resolved
swap/peer.go Outdated Show resolved Hide resolved
swap/peer.go Outdated Show resolved Hide resolved
swap/peer.go Outdated Show resolved Hide resolved
swap/swap.go Outdated Show resolved Hide resolved
swap/swap.go Outdated Show resolved Hide resolved
swap/swap.go Outdated Show resolved Hide resolved
swap/swap.go Show resolved Hide resolved
swap/swap.go Outdated Show resolved Hide resolved
Copy link
Contributor

@mortelli mortelli left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

even though not all my comments have been addressed, i am going to go ahead and approve this PR, as i don't perceive any of them as blockers 👍

@mortelli mortelli assigned mortelli and ralph-pichler and unassigned mortelli Nov 4, 2019
Copy link
Contributor

@mortelli mortelli left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i am OK with the change to the moment in which the paying node adjusts its balance. it doesn't directly solve the problem (i.e. prevent it completely) but it makes it harder to occur.

aside from that, please address the open threads from my previous reviews so they can be resolved

swap/swap.go Outdated Show resolved Hide resolved
@ralph-pichler ralph-pichler merged commit 59b2025 into master Nov 4, 2019
@mortelli mortelli deleted the swap-pending-cheques branch November 5, 2019 12:24
@acud acud added this to the 0.5.3 milestone Nov 25, 2019
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

5 participants