This repository has been archived by the owner on Aug 2, 2021. It is now read-only.
-
Notifications
You must be signed in to change notification settings - Fork 112
network/stream: fix flaky tests and first delivered batch of chunks for an unwanted stream #1952
Merged
Conversation
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
7e500e7
to
48d0ae3
Compare
48d0ae3
to
2688b75
Compare
become unwanted due to depth change in kademlia. this resulted in a batch of chunks being delivered on the now unwanted stream before _not_ requesting the next interval (due to WantStream returning false)
2688b75
to
bc56fb7
Compare
janos
approved these changes
Nov 15, 2019
network/stream/common_test.go
Outdated
StreamConstructorFunc func(state.Store, []byte, ...StreamProvider) node.Service | ||
} | ||
|
||
func newSyncSimServiceFunc(o *SyncSimServiceOptions) func(ctx *adapters.ServiceContext, bucket *sync.Map) (s node.Service, cleanup func(), err error) { | ||
if o == nil { | ||
o = new(SyncSimServiceOptions) | ||
o.Autostart = true // start stream on by default |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This my result in different behaviour on Autostart default value. If o
is nil, Autostart is true, but if o
is not nil, but some value which does not specify Autostart, Autostart will be false like & SyncSimServiceOptions{SyncOnlyWithinDepth: true}
, which is inconsistent. I think that the boolean should be named in respect to default value being consistent in both cases, like NoAutosync
.
This is even visible in this PR. Options are constructed just to set Autostart to false explicitly, even if it is a default value for the field.
janos
approved these changes
Nov 15, 2019
holisticode
approved these changes
Nov 15, 2019
Swarm Core - Sprint planning
automation
moved this from In review (includes Documentation)
to Done
Nov 15, 2019
Sign up for free
to subscribe to this conversation on GitHub.
Already have an account?
Sign in.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
This PR addresses three problems:
TestStarNetworkSyncWithBogusNodes
was flaking on travis due to a too short sync delay timer. This, at least to my interpretation, means Swarm is getting slower over time. We should keep this in mindTestNodesCorrectBinsDynamic
was flaking because syncing was hardcoded toAutostart
all the time. This resulted in incorrect bin indexes exchanged because some syncing has already occurred and has introduced non-determinism on the state of binIDs so as the cursors of the nodes. This is solved by a configurable parameter onSyncSimServiceOptions
that toggles whether streams should be autostartedstream
package resulted in a first delivered batch of chunks before an unwanted stream is actually dropped.The scenario is the following: a node initially establishes streams on certain bins that become obsolete once kademlia depth changes (for example, depth is initially
0
and a stream onSYNC|0
is requested from a node with PO2
. Subsequently, kademlia depth changes to be depth3
and the subscription onSYNC|0
is no longer relevant). Before this fix, theChunkDelivery
messages that would be received would actually put the chunks into the localstore without checking thatprovider.WantStream()==true
.This resulted in a first requested range of a live stream always to be persisted, regardless whether that stream is wanted or not at the time of reception. The data race happened when the depth changed between
HandleOfferedHashes
and the call torequestSubsequentRange
at the end ofclientHandleOfferedHashes
method.This has now been amended to have the appropriate checks in
clientHandleOfferedHashes
andclientHandleChunkDelivery
handlers.A change in kademlia depth is still possible between the send of
WantedHashes
message and the reception of the firstChunkDelivery
message. Another change in depth can occur in betweenChunkDelivery
messages, in the case that the batch is split up to several messages. The two latter cases, however, are mitigated with the check that was added withinclientHandleChunkDelivery
handler, which will not process the chunks by returning, eventually causing the batch to time-out withinclientSealBatch
and the subsequent range to never be called