network/stream: fix flaky tests and first delivered batch of chunks for an unwanted stream #1952

acud · 2019-11-14T16:43:10Z

This PR addresses three problems:

TestStarNetworkSyncWithBogusNodes was flaking on travis due to a too short sync delay timer. This, at least to my interpretation, means Swarm is getting slower over time. We should keep this in mind
TestNodesCorrectBinsDynamic was flaking because syncing was hardcoded to Autostart all the time. This resulted in incorrect bin indexes exchanged because some syncing has already occurred and has introduced non-determinism on the state of binIDs so as the cursors of the nodes. This is solved by a configurable parameter on SyncSimServiceOptions that toggles whether streams should be autostarted
A bug in stream package resulted in a first delivered batch of chunks before an unwanted stream is actually dropped.

The scenario is the following: a node initially establishes streams on certain bins that become obsolete once kademlia depth changes (for example, depth is initially 0 and a stream on SYNC|0 is requested from a node with PO 2. Subsequently, kademlia depth changes to be depth 3 and the subscription on SYNC|0 is no longer relevant). Before this fix, the ChunkDelivery messages that would be received would actually put the chunks into the localstore without checking that provider.WantStream()==true.

This resulted in a first requested range of a live stream always to be persisted, regardless whether that stream is wanted or not at the time of reception. The data race happened when the depth changed between HandleOfferedHashes and the call to requestSubsequentRange at the end of clientHandleOfferedHashes method.

This has now been amended to have the appropriate checks in clientHandleOfferedHashes and clientHandleChunkDelivery handlers.

A change in kademlia depth is still possible between the send of WantedHashes message and the reception of the first ChunkDelivery message. Another change in depth can occur in between ChunkDelivery messages, in the case that the batch is split up to several messages. The two latter cases, however, are mitigated with the check that was added within clientHandleChunkDelivery handler, which will not process the chunks by returning, eventually causing the batch to time-out within clientSealBatch and the subsequent range to never be called

become unwanted due to depth change in kademlia. this resulted in a batch of chunks being delivered on the now unwanted stream before _not_ requesting the next interval (due to WantStream returning false)

janos · 2019-11-15T11:15:16Z

network/stream/common_test.go

 	StreamConstructorFunc func(state.Store, []byte, ...StreamProvider) node.Service
 }

 func newSyncSimServiceFunc(o *SyncSimServiceOptions) func(ctx *adapters.ServiceContext, bucket *sync.Map) (s node.Service, cleanup func(), err error) {
 	if o == nil {
 		o = new(SyncSimServiceOptions)
+		o.Autostart = true // start stream on by default


This my result in different behaviour on Autostart default value. If o is nil, Autostart is true, but if o is not nil, but some value which does not specify Autostart, Autostart will be false like & SyncSimServiceOptions{SyncOnlyWithinDepth: true}, which is inconsistent. I think that the boolean should be named in respect to default value being consistent in both cases, like NoAutosync.

This is even visible in this PR. Options are constructed just to set Autostart to false explicitly, even if it is a default value for the field.

…chunks for an unwanted stream (#1952)" This reverts commit 9b0c910.

acud self-assigned this Nov 14, 2019

acud added the in progress label Nov 14, 2019

acud added this to Backlog in Swarm Core - Sprint planning via automation Nov 14, 2019

acud added the flaky test label Nov 14, 2019

acud moved this from Backlog to In progress in Swarm Core - Sprint planning Nov 14, 2019

acud force-pushed the stream-test-flake branch 2 times, most recently from 7e500e7 to 48d0ae3 Compare November 15, 2019 07:47

acud requested review from janos, nolash and zelig November 15, 2019 07:47

acud added bug ready for review and removed in progress labels Nov 15, 2019

acud moved this from In progress to In review (includes Documentation) in Swarm Core - Sprint planning Nov 15, 2019

acud changed the title ~~network/stream: fix flaky test~~ network/stream: fix flaky test and first delivered batch of chunks for an unwanted stream Nov 15, 2019

acud force-pushed the stream-test-flake branch from 48d0ae3 to 2688b75 Compare November 15, 2019 08:17

network/stream: fix an edge case where a stream that was requested has

bc56fb7

become unwanted due to depth change in kademlia. this resulted in a batch of chunks being delivered on the now unwanted stream before _not_ requesting the next interval (due to WantStream returning false)

acud force-pushed the stream-test-flake branch from 2688b75 to bc56fb7 Compare November 15, 2019 08:50

network/stream: fix flaky TestNodesCorrectBinsDynamic

54ee68f

acud changed the title ~~network/stream: fix flaky test and first delivered batch of chunks for an unwanted stream~~ network/stream: fix flaky tests and first delivered batch of chunks for an unwanted stream Nov 15, 2019

janos approved these changes Nov 15, 2019

View reviewed changes

network/stream: address pr comments

b1c3a77

janos approved these changes Nov 15, 2019

View reviewed changes

stream: change NoAutostart to Autostart

3eb19f5

holisticode approved these changes Nov 15, 2019

View reviewed changes

acud merged commit 9b0c910 into master Nov 15, 2019

Swarm Core - Sprint planning automation moved this from In review (includes Documentation) to Done Nov 15, 2019

acud deleted the stream-test-flake branch November 15, 2019 16:10

acud added a commit that referenced this pull request Nov 18, 2019

Revert "network/stream: fix flaky tests and first delivered batch of …

d405b8e

…chunks for an unwanted stream (#1952)" This reverts commit 9b0c910.

acud added this to the 0.5.3 milestone Nov 25, 2019

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

network/stream: fix flaky tests and first delivered batch of chunks for an unwanted stream #1952

network/stream: fix flaky tests and first delivered batch of chunks for an unwanted stream #1952

acud commented Nov 14, 2019 •

edited

Loading

janos Nov 15, 2019

network/stream: fix flaky tests and first delivered batch of chunks for an unwanted stream #1952

network/stream: fix flaky tests and first delivered batch of chunks for an unwanted stream #1952

Conversation

acud commented Nov 14, 2019 • edited Loading

janos Nov 15, 2019

Choose a reason for hiding this comment

acud commented Nov 14, 2019 •

edited

Loading