Skip to content
This repository has been archived by the owner on Aug 2, 2021. It is now read-only.

swap: fix various data races #2106

Merged
merged 8 commits into from
Feb 19, 2020
Merged

swap: fix various data races #2106

merged 8 commits into from
Feb 19, 2020

Conversation

ralph-pichler
Copy link
Member

@ralph-pichler ralph-pichler commented Feb 16, 2020

This PR fixes various data races in the swap tests. With these fixes go test -race -v ./swap/ should pass without errors.

This covers 4 different issues:

  • As it turns out the abigen generated functions actually modify the big.Ints passed as an argument (by using the .Mod function on them internally). This meant that the big int underlying cheque.CumulativePayout actually was modified by the cashing which happens in a separate goroutine which caused a race if the value was read elsewhere. As a simple fix a copy is passed to the generated CashChequeBeneficiary function.
  • In TestTriggerPaymentThreshold there is a loop waiting for the lastSentCheque to become non-nil. The peer lock is now also acquired in the test.
  • In TestMultiChequeSimulation the wrong lock was used for accessing the TestService.peers map.
  • TestDebtCheque now waits for cheque cashing to be completed. This is necessary to ensure the cashing goroutine has terminated before the next test starts which could have unintended consequences (e.g. here it lead to a data race regarding the global swapLog which was reset by the next test).

Furthermore it

  • adds proper locking of peers where it was missing
  • initialises cashDone during backend creation
  • fixes various spelling errors

@@ -432,7 +432,10 @@ loop:
case <-ctx.Done():
t.Fatal("Expected one cheque, but there is none")
default:
if creditor.getLastSentCheque() != nil {
creditor.lock.Lock()
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

hm. this is test code, but the production version is still not acquiring the swap peer mutex as the getter documentation describes:

// PeerCheques returns the last sent and received cheques for a given peer
func (s *Swap) PeerCheques(peer enode.ID) (PeerCheques, error) {
	var pendingCheque, sentCheque, receivedCheque *Cheque

	swapPeer := s.getPeer(peer)
	if swapPeer != nil {
		pendingCheque = swapPeer.getPendingCheque()
		sentCheque = swapPeer.getLastSentCheque()
		receivedCheque = swapPeer.getLastReceivedCheque()
	}

I suggest to sweep through all getter methods that expect a lock held to find similar mistakes.
And once again this is why I don't like having such getters in the codebase in the first place. I've argued against this already when the initial SWAP PR was being under review. We should eliminate the possibilities for similar developer mistakes by having a simple design.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I added the missing locks, all peer access should now be under lock (except in NewPeer and AddPeer where it is clear that no other go routine could have a reference at this point).

We will most likely get rid of the getters and setters soon as they also don't play nice with the batch writes we want to use in some places. I will discuss this with the team in our next call.

swap/swap_test.go Outdated Show resolved Hide resolved
swap/swap_test.go Outdated Show resolved Hide resolved
@@ -736,6 +738,9 @@ func TestDebtCheques(t *testing.T) {
t.Fatal(err)
}

testBackend.cashDone = make(chan struct{})
defer close(testBackend.cashDone)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i find this kind of strange to put this here. I've noticed this is not a new thing but I don't see any reason why the test case should close this channel.
Also, another note on the usage of this channel - since it should only signal once, you can close it directly in the method that polls the cashing of the cheque. There's no need to put an empty struct in the channel.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you mean it's strange because it should be closed elsewhere or because it doesn't need closing at all? I'm not sure what the motivation behind the close was, as far as I understand go, the gc would take care of this anyway.

There are tests (e.g. the simulation tests) where cashing happens multiple times, hence the use of the struct{}{} instead of close.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I got rid of the make in the tests and removed the close.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you mean it's strange because it should be closed elsewhere or because it doesn't need closing at all?

I mean it in the sense that a channel's reader should not close it. This is a well defined anti-pattern in go

swap/api.go Outdated Show resolved Hide resolved
Copy link
Member

@acud acud left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

one more nitpick but definitely not a blocker. LGTM

@acud
Copy link
Member

acud commented Feb 18, 2020

Thanks @ralph-pichler for your patience and for addressing all comments :)

Copy link
Contributor

@Eknir Eknir left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good work, thanks for fixing these!

  • I could not find where in the abigen package, we are modifying the passed-in big.Int
  • The PR description does not describe that you have fixed more lock issues with regards to the peer getters (based on comment @acud )
  • At places, the code becomes hard to read due to additional locks.

@@ -133,7 +133,7 @@ func (s simpleContract) Deposit(auth *bind.TransactOpts, amount *big.Int) (*type
// CashChequeBeneficiaryStart sends the transaction to cash a cheque as the beneficiary
func (s simpleContract) CashChequeBeneficiaryStart(opts *bind.TransactOpts, beneficiary common.Address, cumulativePayout *uint256.Uint256, ownerSig []byte) (*types.Transaction, error) {
payout := cumulativePayout.Value()
tx, err := s.instance.CashChequeBeneficiary(opts, beneficiary, &payout, ownerSig)
tx, err := s.instance.CashChequeBeneficiary(opts, beneficiary, big.NewInt(0).Set(&payout), ownerSig)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Add a comment above:
//send a copy of cumulativePayout to instance to prevent a racecondition
(or similar)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

@@ -133,7 +133,7 @@ func (s simpleContract) Deposit(auth *bind.TransactOpts, amount *big.Int) (*type
// CashChequeBeneficiaryStart sends the transaction to cash a cheque as the beneficiary
func (s simpleContract) CashChequeBeneficiaryStart(opts *bind.TransactOpts, beneficiary common.Address, cumulativePayout *uint256.Uint256, ownerSig []byte) (*types.Transaction, error) {
payout := cumulativePayout.Value()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You can already make the copy here

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes but I can't get a pointer to cumulativePayout.Value() without setting it to variable first. So if I make the copy beforehand all it does is introduce an extra line.

Comment on lines 319 to +331
debitorSvc.swap.peersLock.Lock()
debSwapLen = len(debitorSvc.swap.peers)
debLen = len(debitorSvc.peers)
debitorSvc.swap.peersLock.Unlock()
debitorSvc.lock.Lock()
debLen = len(debitorSvc.peers)
debitorSvc.lock.Unlock()

creditorSvc.swap.peersLock.Lock()
credSwapLen = len(creditorSvc.swap.peers)
credLen = len(creditorSvc.peers)
creditorSvc.swap.peersLock.Unlock()
creditorSvc.lock.Lock()
credLen = len(creditorSvc.peers)
creditorSvc.lock.Unlock()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If it can't be done otherwise, it must be done like this, of course. But I must say, the code becomes very ugly here... And I can imagine that any dev who has to make something similar like this will make mistakes here and/or will not understand directly what is done here

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree, but I don't think it can be avoided here, at least not without significant change to the test itself.

Comment on lines +749 to +755
// ...on which we wait until the cashCheque is actually terminated (ensures proper nonce count)
select {
case <-testBackend.cashDone:
log.Debug("cash transaction completed and committed")
case <-time.After(4 * time.Second):
t.Fatalf("Timeout waiting for cash transactions to complete")
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Wouldn't it make sense to refactor this in a separate helper function? That way, it will be easier to synchronize these things in tests we make in the future and iyou avoid some code-duplication already now (lines 677-683 are the same).

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this logic will change soon anyway, so I would leave it for now.

Copy link
Contributor

@Eknir Eknir left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM!

@acud acud merged commit c73509d into master Feb 19, 2020
@acud acud deleted the swap_data_races branch February 19, 2020 11:11
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants