Skip to content
This repository has been archived by the owner on Aug 2, 2021. It is now read-only.

swap: fix instabilities in simulation tests #1930

Merged
merged 2 commits into from
Nov 7, 2019

Conversation

holisticode
Copy link
Contributor

This PR fixes #1899 .

It does this by introducing an additional check when waiting for the cheque to be processed, which is not enough (as documented by @ralph-pichler at #1899 (comment)), due to the high parallelization of the test.

We now also wait and check that the message has been received on the other end by checking the appropriate metrics counter.

Note that this should (but may not) fix the test, but does not address an underlying process vulnerability which has been exposed by #1899 , which is that if the message sequence is expressed in an unexpected way, there may be bugs in the workflow

Copy link
Member

@ralph-pichler ralph-pichler left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The fix seems to work. I let the sim tests run a few thousand times and it didn't happen anymore.

It still fails sometimes but due to other issues (see #1899 (comment)). waitForChequeProcessed needs to be adapted to also wait for the cheque to be confirmed. Currently a new cheque is attempted to be sent while the old one is not confirmed yet, leading to the pending cheque to be resent, but waitForChequeProcessed is waiting for a new cheque causing the test to hang and timeout. This can also be a separate PR if you only want to focus on the other issue here.

Regarding the underlying process vulnerability the message sequence being wrong (e.g. the message arriving too late) is not really a problem, after all the other node also sends an extra cheque in response to compensate. (From what I recall if both nodes cashed the cheques at the end the effective transferred value would still be 0).

The bigger issue uncovered here is that we can bring another node to send us a cheque simply by sending a cheque to it first (even if no traffic even occurred) and putting it into high enough debt. Not sure what the solution here is, maybe we should just reject a cheque (as in not confirm) if it would put us too much into debt? I opened an issue for this here.

swap/simulations_test.go Outdated Show resolved Hide resolved
@janos
Copy link
Member

janos commented Nov 7, 2019

I was testing this PR on ubuntu in virtualbox where I could reproduce the problem most frequently. While tests usually passes, sometimes TestMultiChequeSimulation gets blocked at swap/simulations_test.go:454 waitForChequeProcessed(t, creditorSvc, counter, lastCount).

I am running go test -v ./swap -run TestMultiChequeSimulation -count 100 -failfast and failures always have these two warnings:

WARN [11-07|11:14:18.894|github.com/ethersphere/swarm/swap/swap.go:353]              cheque sent by peer has already been received in the past swaplog=* base=d4c01c8b35ecb80f peer=23cb169a912104f3 cumulativePayout=39266562520948
WARN [11-07|11:14:18.894|github.com/ethersphere/swarm/swap/swap.go:415]              ignoring confirm msg, no pending cheque  swaplog=* base=d4c01c8b35ecb80f peer=03728ef208807843 confirm message cheque="Contract: c7cbe27035b74bf40be9c9953e2459f5f0be3f39 Beneficiary: 3c9a68b7574158f6ddafda0a6f5fd4a3c8b4e2f0 CumulativePayout: 39266562520948 Honey: 19633281260474"

@ralph-pichler
Copy link
Member

@janos this issue is known and @holisticode wants to fix it in a separate PR. I created an issue for this here: #1933

Copy link
Member

@janos janos left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you @holisticode @ralph-pichler LGTM.

@holisticode holisticode merged commit d830efd into master Nov 7, 2019
@mortelli mortelli deleted the incentives-simtests-fix branch November 14, 2019 18:49
@acud acud added this to the 0.5.3 milestone Nov 25, 2019
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

fix intermittent failures in incentives simulation tests
4 participants