swap: fix various data races #2106

ralph-pichler · 2020-02-16T19:24:52Z

This PR fixes various data races in the swap tests. With these fixes go test -race -v ./swap/ should pass without errors.

This covers 4 different issues:

As it turns out the abigen generated functions actually modify the big.Ints passed as an argument (by using the .Mod function on them internally). This meant that the big int underlying cheque.CumulativePayout actually was modified by the cashing which happens in a separate goroutine which caused a race if the value was read elsewhere. As a simple fix a copy is passed to the generated CashChequeBeneficiary function.
In TestTriggerPaymentThreshold there is a loop waiting for the lastSentCheque to become non-nil. The peer lock is now also acquired in the test.
In TestMultiChequeSimulation the wrong lock was used for accessing the TestService.peers map.
TestDebtCheque now waits for cheque cashing to be completed. This is necessary to ensure the cashing goroutine has terminated before the next test starts which could have unintended consequences (e.g. here it lead to a data race regarding the global swapLog which was reset by the next test).

Furthermore it

adds proper locking of peers where it was missing
initialises cashDone during backend creation
fixes various spelling errors

acud · 2020-02-17T04:29:13Z

swap/protocol_test.go

@@ -432,7 +432,10 @@ loop:
 		case <-ctx.Done():
 			t.Fatal("Expected one cheque, but there is none")
 		default:
-			if creditor.getLastSentCheque() != nil {
+			creditor.lock.Lock()


hm. this is test code, but the production version is still not acquiring the swap peer mutex as the getter documentation describes:

// PeerCheques returns the last sent and received cheques for a given peer func (s *Swap) PeerCheques(peer enode.ID) (PeerCheques, error) { var pendingCheque, sentCheque, receivedCheque *Cheque swapPeer := s.getPeer(peer) if swapPeer != nil { pendingCheque = swapPeer.getPendingCheque() sentCheque = swapPeer.getLastSentCheque() receivedCheque = swapPeer.getLastReceivedCheque() }

I suggest to sweep through all getter methods that expect a lock held to find similar mistakes.
And once again this is why I don't like having such getters in the codebase in the first place. I've argued against this already when the initial SWAP PR was being under review. We should eliminate the possibilities for similar developer mistakes by having a simple design.

I added the missing locks, all peer access should now be under lock (except in NewPeer and AddPeer where it is clear that no other go routine could have a reference at this point).

We will most likely get rid of the getters and setters soon as they also don't play nice with the batch writes we want to use in some places. I will discuss this with the team in our next call.

swap/swap_test.go

acud · 2020-02-17T04:39:56Z

swap/swap_test.go

@@ -736,6 +738,9 @@ func TestDebtCheques(t *testing.T) {
 		t.Fatal(err)
 	}

+	testBackend.cashDone = make(chan struct{})
+	defer close(testBackend.cashDone)


i find this kind of strange to put this here. I've noticed this is not a new thing but I don't see any reason why the test case should close this channel.
Also, another note on the usage of this channel - since it should only signal once, you can close it directly in the method that polls the cashing of the cheque. There's no need to put an empty struct in the channel.

Do you mean it's strange because it should be closed elsewhere or because it doesn't need closing at all? I'm not sure what the motivation behind the close was, as far as I understand go, the gc would take care of this anyway.

There are tests (e.g. the simulation tests) where cashing happens multiple times, hence the use of the struct{}{} instead of close.

I got rid of the make in the tests and removed the close.

Do you mean it's strange because it should be closed elsewhere or because it doesn't need closing at all?

I mean it in the sense that a channel's reader should not close it. This is a well defined anti-pattern in go

swap/api.go

acud

one more nitpick but definitely not a blocker. LGTM

acud · 2020-02-18T08:26:40Z

Thanks @ralph-pichler for your patience and for addressing all comments :)

Eknir

Good work, thanks for fixing these!

I could not find where in the abigen package, we are modifying the passed-in big.Int
The PR description does not describe that you have fixed more lock issues with regards to the peer getters (based on comment @acud )
At places, the code becomes hard to read due to additional locks.

Eknir · 2020-02-18T09:49:55Z

contracts/swap/swap.go

@@ -133,7 +133,7 @@ func (s simpleContract) Deposit(auth *bind.TransactOpts, amount *big.Int) (*type
 // CashChequeBeneficiaryStart sends the transaction to cash a cheque as the beneficiary
 func (s simpleContract) CashChequeBeneficiaryStart(opts *bind.TransactOpts, beneficiary common.Address, cumulativePayout *uint256.Uint256, ownerSig []byte) (*types.Transaction, error) {
 	payout := cumulativePayout.Value()
-	tx, err := s.instance.CashChequeBeneficiary(opts, beneficiary, &payout, ownerSig)
+	tx, err := s.instance.CashChequeBeneficiary(opts, beneficiary, big.NewInt(0).Set(&payout), ownerSig)


Add a comment above:
//send a copy of cumulativePayout to instance to prevent a racecondition
(or similar)

Eknir · 2020-02-18T09:50:45Z

contracts/swap/swap.go

@@ -133,7 +133,7 @@ func (s simpleContract) Deposit(auth *bind.TransactOpts, amount *big.Int) (*type
 // CashChequeBeneficiaryStart sends the transaction to cash a cheque as the beneficiary
 func (s simpleContract) CashChequeBeneficiaryStart(opts *bind.TransactOpts, beneficiary common.Address, cumulativePayout *uint256.Uint256, ownerSig []byte) (*types.Transaction, error) {
 	payout := cumulativePayout.Value()


You can already make the copy here

Yes but I can't get a pointer to cumulativePayout.Value() without setting it to variable first. So if I make the copy beforehand all it does is introduce an extra line.

Eknir · 2020-02-18T10:09:54Z

swap/simulations_test.go

 		debitorSvc.swap.peersLock.Lock()
 		debSwapLen = len(debitorSvc.swap.peers)
-		debLen = len(debitorSvc.peers)
 		debitorSvc.swap.peersLock.Unlock()
+		debitorSvc.lock.Lock()
+		debLen = len(debitorSvc.peers)
+		debitorSvc.lock.Unlock()

 		creditorSvc.swap.peersLock.Lock()
 		credSwapLen = len(creditorSvc.swap.peers)
-		credLen = len(creditorSvc.peers)
 		creditorSvc.swap.peersLock.Unlock()
+		creditorSvc.lock.Lock()
+		credLen = len(creditorSvc.peers)
+		creditorSvc.lock.Unlock()


If it can't be done otherwise, it must be done like this, of course. But I must say, the code becomes very ugly here... And I can imagine that any dev who has to make something similar like this will make mistakes here and/or will not understand directly what is done here

I agree, but I don't think it can be avoided here, at least not without significant change to the test itself.

Eknir · 2020-02-18T10:22:46Z

swap/swap_test.go

+	// ...on which we wait until the cashCheque is actually terminated (ensures proper nonce count)
+	select {
+	case <-testBackend.cashDone:
+		log.Debug("cash transaction completed and committed")
+	case <-time.After(4 * time.Second):
+		t.Fatalf("Timeout waiting for cash transactions to complete")
+	}


Wouldn't it make sense to refactor this in a separate helper function? That way, it will be easier to synchronize these things in tests we make in the future and iyou avoid some code-duplication already now (lines 677-683 are the same).

this logic will change soon anyway, so I would leave it for now.

Eknir

LGTM!

swap: fix various data races

f3c4939

ralph-pichler added incentives ready for review labels Feb 16, 2020

acud suggested changes Feb 17, 2020

View reviewed changes

ralph-pichler added 4 commits February 17, 2020 12:01

swap: fix nonce spelling

4a0fe0f

swap: add missing locks in api functions

3d356ce

swap: clarify that lock is expected to be held in processAndVerifyCheque

353e304

swap: initalize cashDone channel during backend creation, remove close

3951984

acud reviewed Feb 18, 2020

View reviewed changes

swap/api.go Outdated Show resolved Hide resolved

acud approved these changes Feb 18, 2020

View reviewed changes

acud assigned ralph-pichler Feb 18, 2020

swap: dont defer unlock in if

f619aec

Eknir reviewed Feb 18, 2020

View reviewed changes

ralph-pichler added 2 commits February 18, 2020 13:08

swap: add a comment

0961fe5

Merge remote-tracking branch 'origin/master' into swap_data_races

20ea7cb

Eknir approved these changes Feb 19, 2020

View reviewed changes

acud merged commit c73509d into master Feb 19, 2020

acud deleted the swap_data_races branch February 19, 2020 11:11

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

swap: fix various data races #2106

swap: fix various data races #2106

ralph-pichler commented Feb 16, 2020 •

edited

Loading

acud Feb 17, 2020

ralph-pichler Feb 17, 2020

acud Feb 17, 2020

ralph-pichler Feb 17, 2020

ralph-pichler Feb 17, 2020

acud Feb 18, 2020

acud left a comment

acud commented Feb 18, 2020

Eknir left a comment

Eknir Feb 18, 2020

ralph-pichler Feb 18, 2020

Eknir Feb 18, 2020

ralph-pichler Feb 18, 2020

Eknir Feb 18, 2020

ralph-pichler Feb 18, 2020

Eknir Feb 18, 2020

ralph-pichler Feb 18, 2020

Eknir left a comment

swap: fix various data races #2106

swap: fix various data races #2106

Conversation

ralph-pichler commented Feb 16, 2020 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

acud left a comment

Choose a reason for hiding this comment

acud commented Feb 18, 2020

Eknir left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Eknir left a comment

Choose a reason for hiding this comment

ralph-pichler commented Feb 16, 2020 •

edited

Loading