network: new stream! protocol and pull syncer implementation #1538

acud · 2019-07-05T15:12:42Z

This PR is the result of joint efforts in the core track to realise the exact, up to date requirements of the stream! protocol and implement them so that their functionality and behavior is easier to reason about.

The design borrows some code from the old stream package. Namely the way we manage bins we want to stream in the case of syncing and the existing intervals store. However everything else including tests have been built from the ground up.

The PR does not yet remove the existing stream code, does not move the intervals package, but rather builds on top of those. This is in order to decrease the strain of diffing those changes. These would be deferred to subsequent PRs after this one is merged.

The new stream implementation suggested in this PR brings more guarantees, better performance and safety to how we handle streams, namely:

Spamming through streams is far less likely due to tight checking of wanted streams from a certain peer
Spamming of unsolicited chunks is completely forbidden - a peer cannot send us the same chunk twice, a peer cannot send us a chunk that we did not request too (these are two different cases)
Multiple chunks can now be delivered on the same ChunkDelivery message
No duplicate chunks are sent between two peers

In terms of reduced complexity:

Far less state is maintained on each side of the connection. Servers maintain offers, clients maintain wants

PLEASE NOTE: that tests constitute about a half of this PR. Some of them overlap in functionality and will be removed, namely some of the tests in cursors_test.go.

Actionable items that still need to be addressed, however I would personally like to defer to a later PR:

Move stream maintanance for the case of syncing out of the sync provider - the sync provider needs to only sink data to the stream protocol, not manage subscriptions. This, however, needs a bit more tweaking in terms of abstraction for stream - as it would require to expose certain subset of functionalities out of the stream package. In laymen terminology - we would need to call certain functions on stream when a depth change happens for example, but then also to have hooks on stream to know that a StreamInfoRes message has arrived and what to do accordingly....
how to guaranty that historical and live streams are kept alive - we should have metrics to see at least on the smoke test environment that these streams keep on getting requested and served correctly
verify that existing historical and live historical stream fetchers terminate when depth changes and node moves out of depth - this is not so trivial to test but the logic that does this is in the codebase
maxPeers - old stream package had a notion of maximum peers we'd like to stream to/with. we have concluded partially that kademlia should somehow shield us from having to implement this at the moment, since number of connections are/should be enforced by kademlia at the moment. so this needs more info before we jump the gun to implement such a requirement
proper tracing with jaeger and opentracing - need more info from @nonsense and @skylenet on how to do this properly
Update diagrams

HOW TO REVIEW:
Since there is a fair amount of business logic here, a brief overview on how the protocol works is needed in order to facilitate an effective review process.
Flow is as follows:

Stream protocol and Sync Provider are registered on swarm.go. Stream handles the protocol message exchange, syncProvider handles sinking the data and managing streams according to kademlia
Peers connect and establish protocol over devp2p. This uses the run function in the stream.go file
Peer initialises providers
Sync provider initialises streams according to kademlia depth (InitPeer function on sync_provider.go)
Message exchange starts from the previous step - stream protocol requests info about wanted streams with StreamInfoReq. Server replies with info about streams (boundedness, cursors (previously known as sessionIndex) in StreamInfoRes message
Client handles StreamInfoRes, and according to wantedness of streams, initiates GetRange queries to fetch the stream from the server.
Server replies with OfferedHashes message for the requested range
Client, as necessary, requests hashes from the set of OfferedHashes with a WantedHashes message
Server delivers, as necessary, chunks wanted with ChunkDelivery message. This message can contain multiple chunks as needed. Also, theoretically ,multiple ChunkDelivery messages could be sent in reply for the same WantedHashes in order to optimise amounts of messages sent and according to db performance to sink the data
Once all chunks for a requested range are delivered, the interval in the interval store is sealed, and the next range is requested

related to ethersphere/user-stories#2
fixes #1451
somewhat fixes #1393
closes #1457
closes #1247
closes #1246
closes #1234
closes #1204
closes #1203
closes #1105

nonsense · 2019-07-30T13:00:20Z

network/newstream/wire.go

@@ -111,7 +96,7 @@ type GetRange struct {
 	Ruid      uint
 	Stream    ID
 	From      uint64
-	To        uint64 `rlp:nil`
+	To        *uint64 `rlp:"nil"`
 	BatchSize uint
 	Roundtrip bool


Please remove Roundtrip, I really don't want to have to submit any more PRs like #1422 and #1339

I think developers on the team are smart enough to be able to add this field once they need it.

In general do not add code that is not used for no good reason. A good reason is if you will have an immediate PR that will make use of it, for example the PR with bzz-retrieval, even though the code was not immediately used is a good example.

nonsense · 2019-07-30T13:05:16Z

network/newstream/sync_provider.go

+// Function used only in tests to detect chunks that are synced
+// multiple times within the same stream. This function pointer must be
+// nil in production.
+var putSeenTestHook func(addr chunk.Address, id enode.ID)


I think this is a good use-case of something we could test if we redirect logs with a simulation. Then we could check for presence of the given log message containing addr and id. For now we don't have this functionality, so fine to leave this here, but ultimately we will be able to test this kind of impl. behaviour without custom hooks.

Just an idea.

nonsense · 2019-07-30T13:13:16Z

network/newstream/sync_provider.go

+
+func (s *syncProvider) StreamName() string { return s.name }
+
+func (s *syncProvider) Boundedness() bool { return false }


I fail to see why we need Boundedness and Bound. Why do we add such indirection without using it. If this is so important how come we don't have an implementation of it.

From the spec:
when a client requests a bounded range, server should respond to the client range requests with either offered hashes (if roundtrip is required) or chunks (if not) or an end-of-batch message if there are no more to offer. If none of these responses arrive within a timeout interval, client must drop the upstream peer.

We don't have an example of Boundedness or of Roundtrip, so I fail to see why this is useful, apart from introducing indirection in already not simple functionality. We've been working on pull-sync new version for months, the previous version has been buggy in production for probably more than a year (and still throwing subscription errors), but we still continue to add code prematurely so that we make our lives harder...

@zelig @janos this comment is still applicable. @janos no action required from your side, but if you understand this, maybe you can explain it to me :)

nonsense · 2019-07-30T13:27:11Z

swarm.go

-	if !config.SyncEnabled || config.LightNodeEnabled {
-		syncing = stream.SyncingDisabled
-	}
+	//if !config.SyncEnabled || config.LightNodeEnabled {


You are already unhooking the old stream protocol, so please remove this code, not just commented out, and also remove all the code in network/stream that is no longer used. No point in keeping it around, specially when it is not hooked up anymore here in swarm.go, and we are directly switching to the new implementation.

nonsense · 2019-07-30T13:40:56Z

network/newstream/cursors_test.go

+	return nil
+}
+
+func TestCorrectCursorsExchangeRace(t *testing.T) {


The mental taxation in understanding what this is doing is a bit too high for my taste. I'm happy we have started to push away complexity from production Swarm code, but we should probably rethink the complexity in tests or at least be mindful about that.

It is not immediately obvious what this test is testing. Tests should generally be simple and I shouldn't have to read the code and think for 15min to figure that out. What is this testing?

We are using a simulation framework.

We are mocking a devp2p protocol and injecting some custom hook

We are building peer fixtures using the p2p library.

We should probably be building more tools to write tests in a more simple manner.

No action required (or at least I don't know what I would expect), but this is def. not the direction we should be going in general IMO.

nonsense · 2019-07-30T14:00:06Z

@acud @janos overall great work. I don't see any blockers from my side. I've added a few comments, and talked to @acud off-band about some things that need to be fleshed out, but nothing major.

I still see some missing Godoc and left out commented code - it'd be nice to fix this prior to merging.
I think we should remove asap the stream folder with the previous version of the stream protocol - it is no longer used or hooked up, so no need to have dead code in the codebase.
I really think we should not add "code to be used in the future" such as Bounded streams and Roundtrip fields that serve no purpose. I've explained multiple times my opinion on that, I leave it up to you to decide.
Grafana dashboards are not up-to-date with the new bzz-stream protocol - please fix them - namely the P2P dashboard. I also suggest adding a bit of timers around pull-syncing batches, so that we have a baseline of how fast the pull-syncing is, and how much time it takes to request/deliver a batch. I'm sure you can figure out other interesting operational metrics that we should monitor, so that we have a baseline and improve on it. We spoke off-band about that with @acud
Jaeger HTTP GET traces appear to be missing the po log lines for some reason. Not a big deal, but nice to have. I tried to fix this quickly, but failed, not sure whats wrong with the OpenTracing span.
I am not very happy with our custom use of mocked protocols and service functions... this is one of the reasons we are building the new simulation framework that will always start a production Swarm node. I think we can write better tests in general, and by better I mean - easier to reason about, easier to understand, and easier to maintain. Right now the tests are definitely not easy to understand or read, and I don't think they are very maintainable. At the same time I understand that right now, today, we don't have these tools, so I can understand why you wrote those tests the way you did. For now at least improve on the Godoc on them to make things slightly easier for the reader.

These are my general comments. The most important thing is that the code seems to be correct and integration tests seem to be passing, so I see this generally as an improvement to master, and think we can move forward with it.

network/newstream/stream.go

zelig · 2019-08-04T10:31:34Z

@acud @janos

We created a super spec for stream! and didnt implement it. instead we implement a temporary mix of sync and stream which was identified as a problem with the current implementation.

This PR saw the tension between two different approaches one that condones any abstraction in the spirit of 'just write what the thing should do now nothing more' type of simplicity on the one hand and
the other side which cherishes clearly identified and sees simplicity, approachability and testability in the correct separation of concerns in the form of the right abstractions.

I kindly ask you to consider the following simple sketchy architecture. and think how much it would simplify testing, help understanding and incrementally PR-ing this thing.

Thanks

package stream

// Range represents a closed interval
type Range struct {
	From, To uint64 // closed interval
}

// individual streams - no need for actual interface
type Stream interface {
	String() string
	Bound() bool
}

// Provider manages the source of a type of stream
// this is completely stateless
type Provider interface {
	Type() string                                                 // Name of provider, eg. SYNC
	CanHandle(Name) (lastIndex uint64, bound bool, err error) // provider can serve the stream given as argument, return cursor, boundedness and an  error if not provided
	OfferedHashes(Name, Range) []Address                          // returns the offered hashes for the given range
	GetDataByHash(Name, ...Address) []Chunk                       // retrieves the chunks based on Address
}

// Streamer implements Registry interface to be used in provider packages
//
type Registry interface {
	Register(Provider) error
}

// LocalStoreProvider wraps localstore and implements GetDataByHash call with localstore get
type LocalStoreProvider struct {
	localStore
}

func (l *LocalProvider) GetDataByHash(ctx context.Context, hash []byte) (chunk.Chunk, error) {
	return l.localStore.Get(ctx, chunk.ModeGetSync, hash)
}

// Client Logic
// Request Object - no need for interface, just to show methods
type Request interface {
	Cancel()
	Done() (close <-chan struct{}) // like in context
	Err()                          // error to extract after Done() closed
}

// Peer - this is not an interface just demonstrating the relevant methods
type Peer interface {
	GetRange(ctx context.Context, n Name, r Range, need func([]byte) bool, deliver func(Chunk) error) (Request, error)
	Providers() []Info
}

// NetStoreClient is a stream client that
// - proxies need function as localstore Has
// - puts chunk in netstore
// These clients are stateless
type StorerClient struct {
	Streamer
	intervals IntervalStore
	store     NetStore
	need      func([]byte) bool // implements it with netStore.Has and opens a fetcher
	deliver   func(Chunk) error // implements it with netStore.Put chunk.ModePutSync
}

// Request will try to cover the indicated range using interval store
// and peer requests for each gap in the intervals
func (s *StorerClient) Request(p Peer, n Name, r Range) (Request, error) {
	// starts a go routine that keeps getting the intervals based on the interval store
	// continuously requesting subranges according to the interval store
	// and calls
	p.GetRange(ctx, n, r, s.need, s.deliver)
	//  as batches are sealed, intervals are updated
	// the request is alive until the whole range is covered
}

// and now the level doing full sync of a stream
type Mode = uin64

const (
	History Mode = iota
	Live
	Both
)

// StorerClient implements Syncer, here stream.Syncer means syncing a stream
// only interface  used by the pull syncer client
type Syncer interface {
	Request(Context, Peer, Name, Mode) (Request, error)
	// this falls back to until cursor (history) and from cursor (live) range requests from the StorerClient
	//
}

////////////////////////////
// package pullsync

// pullSync.Server implements stream.Provider
// uses localstore SubscribePull for the  bins
// server is node-wide
type Server struct {
	// ...
	*stream.LocalProvider
}

// the node-wide pullsync.Client
type Client struct {
	stream.Syncer // embed stream.Syncer
	// when pullsync
	// here  you simply put the update sync logic listening to kademlia depth changes
	// and call `Request` 
        // remember the request, when no longer relevant just call request.Cancel()
}

zelig

ok for moral's sake lets just merge this PR after some critical changes. The rest should be addressed in subsequent PRs.

there is a lot of changes in packages totally unrelated to this PR, (docker, pyramid chunker, etc) quite a few unnecessary
filestore, hasherstore, localstore and intervals store changes are important and unexplained
the package should be in network/stream/v2 and newstream should be dropped
error handling and peer drops could be massively simplified
select on routine termination conditions are scattered around in many unnecessary places
the technical debt this PR introduces is way too high. Lets avoid this in future

zelig · 2019-09-08T18:07:15Z

api/inspector.go

@@ -96,3 +98,15 @@ func (i *Inspector) Has(chunkAddresses []storage.Address) string {

 	return strings.Join(hostChunks, "")
 }
+
+func (i *Inspector) PeerStreams() (string, error) {


comment exported functions

zelig · 2019-09-08T18:07:41Z

api/inspector_test.go

+		t.Fatal(err)
+	}
+
+	// if want := hex.EncodeToString(baseKey)[:16]; peerInfo.Base != want {


remove commented code

zelig · 2019-09-08T18:09:28Z

cmd/swarm/main.go

@@ -233,6 +233,8 @@ func init() {
 }

 func main() {
+	runtime.SetMutexProfileFraction(1)


surely not in production right?

zelig · 2019-09-08T18:13:12Z

network/newstream/peer.go

-	done      chan error
+	ruid      uint                // the request uid
+	from      uint64              // want from index
+	to        *uint64             //want to index, nil signifies top of range not yet known


zelig · 2019-09-08T18:17:26Z

network/newstream/stream.go

-func NewSlipStream(intervalsStore state.Store, kad *network.Kademlia, providers ...StreamProvider) *SlipStream {
-	slipStream := &SlipStream{
+// New creates a new stream protocol handler
+func New(intervalsStore state.Store, baseKey []byte, providers ...StreamProvider) *Registry {


I would prefer an explicit register method, but can wait till next iteration

zelig · 2019-09-09T04:48:46Z

network/newstream/stream.go

+		return
+	}
+
+	w.to = &msg.LastIndex // now that we know the range of the batch we can set the upped bound of the interval to the open want


upped -> upper

and better say 'we can set the open wants upper bound to the index supplied in the msg.

Maybe some checks would be needed about this number

zelig · 2019-09-09T04:58:10Z

network/newstream/stream.go

+	// wait for all handlers to finish
+	done := make(chan struct{})
+	go func() {
+		r.handlersWg.Wait()


why is this needed?

zelig · 2019-09-09T05:09:40Z

network/newstream/sync_provider.go

+		}
+	}
+	go func(chunks ...chunk.Chunk) {
+		s.cacheMtx.Lock()


better have cache in a separate object

zelig · 2019-09-09T05:19:46Z

network/newstream/testing/snapshot_4.json

@@ -0,0 +1 @@
+{"nodes":[{"node":{"info":{"id":"0b405ded87ce4e712aaebb3055e70716c971fe049a54961294c7a7a0471d12f6","name":"node_0b405ded87ce4e712aaebb3055e70716c971fe049a54961294c7a7a0471d12f6","enode":"enode://918f13e29fcea21b47e553c60ea0ae63f856db5745c6010e31b566f819195f9a5f250788862c43b230dd93fbc24cb6b3d33c86069cc21de3a3612b2f32eeccbe@127.0.0.1:0","enr":"0xf88fb840ad1a67c3b07299bb50337351be97595ef05e5ab7988ab7831e06a3003d1fc4ad74e925ec495ab312ccc889eb9c05f2e14f625e56f813a76dc53ddd94ea0955f90183636170cdc583627a7a08c6846869766508826964827634826970847f00000189736563703235366b31a102918f13e29fcea21b47e553c60ea0ae63f856db5745c6010e31b566f819195f9a","ip":"127.0.0.1","ports":{"discovery":0,"listener":0},"listenAddr":"","protocols":{"bzz":"C0Bd7YfOTnEqrrswVecHFslx/gSaVJYSlMenoEcdEvY=","hive":"\n=========================================================================\nThu Feb 28 17:59:23 UTC 2019 KΛÐΞMLIΛ hive: queen's address: 0b405d\npopulation: 3 (3), NeighbourhoodSize: 2, MinBinSize: 2, MaxBinSize: 4\n000  1 a430                         |  1 a430 (0)\n============ DEPTH: 1 ==========================================\n001  2 77ba 675c                    |  2 77ba (0) 675c (0)\n002  0                              |  0\n003  0                              |  0\n004  0                              |  0\n005  0                              |  0\n006  0                              |  0\n007  0                              |  0\n008  0                              |  0\n009  0                              |  0\n010  0                              |  0\n011  0                              |  0\n012  0                              |  0\n013  0                              |  0\n014  0                              |  0\n015  0                              |  0\n========================================================================="}},"config":{"id":"0b405ded87ce4e712aaebb3055e70716c971fe049a54961294c7a7a0471d12f6","private_key":"173c23bc3aec26afe7d299eb9556e7c189851378fe56125d929966ac7207bcb6","name":"node_0b405ded87ce4e712aaebb3055e70716c971fe049a54961294c7a7a0471d12f6","services":["streamer"],"enable_msg_events":true,"port":38277},"up":true}},{"node":{"info":{"id":"a4301045c0cde5acb5406e1fba46ec868c056439ab43ef795830748fb6ba2e5a","name":"node_a4301045c0cde5acb5406e1fba46ec868c056439ab43ef795830748fb6ba2e5a","enode":"enode://89b66f574168570cc5d347051c0e2b05b33578d42142ef8cc930f1fbb384266f2827e4b087f5e2985a32d8f2b353fb7be416193632da5c692cace99d1b9e4274@127.0.0.1:0","enr":"0xf88fb8402e6392d3d5377977fc4b4d29f4dbf4ca96e963df056ead84ec2fcb7ee071996670a935d3b49250455e02c392baacb4c8499f815f74b09c405316a5d2fdee94e60183636170cdc583627a7a08c6846869766508826964827634826970847f00000189736563703235366b31a10289b66f574168570cc5d347051c0e2b05b33578d42142ef8cc930f1fbb384266f","ip":"127.0.0.1","ports":{"discovery":0,"listener":0},"listenAddr":"","protocols":{"bzz":"pDAQRcDN5ay1QG4fukbshowFZDmrQ+95WDB0j7a6Llo=","hive":"\n=========================================================================\nThu Feb 28 17:59:23 UTC 2019 KΛÐΞMLIΛ hive: queen's address: a43010\npopulation: 3 (3), NeighbourhoodSize: 2, MinBinSize: 2, MaxBinSize: 4\n============ DEPTH: 0 ==========================================\n000  3 675c 77ba 0b40               |  3 675c (0) 77ba (0) 0b40 (0)\n001  0                              |  0\n002  0                              |  0\n003  0                              |  0\n004  0                              |  0\n005  0                              |  0\n006  0                              |  0\n007  0                              |  0\n008  0                              |  0\n009  0                              |  0\n010  0                              |  0\n011  0                              |  0\n012  0                              |  0\n013  0                              |  0\n014  0                              |  0\n015  0                              |  0\n========================================================================="}},"config":{"id":"a4301045c0cde5acb5406e1fba46ec868c056439ab43ef795830748fb6ba2e5a","private_key":"5bcd3bbb886e5ad2317a6ac1de2b19ebe27db9337ba8e7daa76a1cdaa6a3d158","name":"node_a4301045c0cde5acb5406e1fba46ec868c056439ab43ef795830748fb6ba2e5a","services":["streamer"],"enable_msg_events":true,"port":41325},"up":true}},{"node":{"info":{"id":"77ba145a4623b9c83e0a14a375ba49c68719f2abe162ffca66de7ef2e31a75f8","name":"node_77ba145a4623b9c83e0a14a375ba49c68719f2abe162ffca66de7ef2e31a75f8","enode":"enode://ecb56004a128067628a3198244080ba4ac79860cb2cf74407d7571955a80fb9243ad5f8265063a5aa209ea95329b4b57d0f1b492a85835ab87fce79881a895a9@127.0.0.1:0","enr":"0xf88fb84035ed878b2a99ed1bbd55df1701cd82115aebc649b1e8e68302e7caf5480a61322c74189d1fe66f2d16e79174f0e3479213f0bf963a4518d583078e164bd94a220183636170cdc583627a7a08c6846869766508826964827634826970847f00000189736563703235366b31a103ecb56004a128067628a3198244080ba4ac79860cb2cf74407d7571955a80fb92","ip":"127.0.0.1","ports":{"discovery":0,"listener":0},"listenAddr":"","protocols":{"bzz":"d7oUWkYjucg+ChSjdbpJxocZ8qvhYv/KZt5+8uMadfg=","hive":"\n=========================================================================\nThu Feb 28 17:59:23 UTC 2019 KΛÐΞMLIΛ hive: queen's address: 77ba14\npopulation: 3 (3), NeighbourhoodSize: 2, MinBinSize: 2, MaxBinSize: 4\n000  1 a430                         |  1 a430 (0)\n============ DEPTH: 1 ==========================================\n001  1 0b40                         |  1 0b40 (0)\n002  0                              |  0\n003  1 675c                         |  1 675c (0)\n004  0                              |  0\n005  0                              |  0\n006  0                              |  0\n007  0                              |  0\n008  0                              |  0\n009  0                              |  0\n010  0                              |  0\n011  0                              |  0\n012  0                              |  0\n013  0                              |  0\n014  0                              |  0\n015  0                              |  0\n========================================================================="}},"config":{"id":"77ba145a4623b9c83e0a14a375ba49c68719f2abe162ffca66de7ef2e31a75f8","private_key":"6f30c3d0135202cbf540669c8c76d9144bd53764aaf4ae18e8bee98ce05c6211","name":"node_77ba145a4623b9c83e0a14a375ba49c68719f2abe162ffca66de7ef2e31a75f8","services":["streamer"],"enable_msg_events":true,"port":40785},"up":true}},{"node":{"info":{"id":"675cfed0a7b461f43cfebd6a10ecc8868d35b245381d731abec4608b6dce250a","name":"node_675cfed0a7b461f43cfebd6a10ecc8868d35b245381d731abec4608b6dce250a","enode":"enode://64a6b33b028b818a1961272651d1d7a976329e541b6e3e983f518d89326d7427f4c9243f78bb03fd2e2dc059f610ee50d1e664e3107db01ac603b9c3b20ebb4f@127.0.0.1:0","enr":"0xf88fb8407dbbb99300e04c6917c27d61bf396481aae19287c2869e8dd74e11b0f2ebb74174376a6152c26af95173a4868a0349aaf3be5d063918590fd5962f4af174e0010183636170cdc583627a7a08c6846869766508826964827634826970847f00000189736563703235366b31a10364a6b33b028b818a1961272651d1d7a976329e541b6e3e983f518d89326d7427","ip":"127.0.0.1","ports":{"discovery":0,"listener":0},"listenAddr":"","protocols":{"bzz":"Z1z+0Ke0YfQ8/r1qEOzIho01skU4HXMavsRgi23OJQo=","hive":"\n=========================================================================\nThu Feb 28 17:59:23 UTC 2019 KΛÐΞMLIΛ hive: queen's address: 675cfe\npopulation: 3 (3), NeighbourhoodSize: 2, MinBinSize: 2, MaxBinSize: 4\n000  1 a430                         |  1 a430 (0)\n============ DEPTH: 1 ==========================================\n001  1 0b40                         |  1 0b40 (0)\n002  0                              |  0\n003  1 77ba                         |  1 77ba (0)\n004  0                              |  0\n005  0                              |  0\n006  0                              |  0\n007  0                              |  0\n008  0                              |  0\n009  0                              |  0\n010  0                              |  0\n011  0                              |  0\n012  0                              |  0\n013  0                              |  0\n014  0                              |  0\n015  0                              |  0\n========================================================================="}},"config":{"id":"675cfed0a7b461f43cfebd6a10ecc8868d35b245381d731abec4608b6dce250a","private_key":"d77006035afa5194053c0347021f7825a71f9643028c45b65756ef88ee53e6dd","name":"node_675cfed0a7b461f43cfebd6a10ecc8868d35b245381d731abec4608b6dce250a","services":["streamer"],"enable_msg_events":true,"port":37069},"up":true}}],"conns":[{"one":"0b405ded87ce4e712aaebb3055e70716c971fe049a54961294c7a7a0471d12f6","other":"a4301045c0cde5acb5406e1fba46ec868c056439ab43ef795830748fb6ba2e5a","up":true},{"one":"a4301045c0cde5acb5406e1fba46ec868c056439ab43ef795830748fb6ba2e5a","other":"77ba145a4623b9c83e0a14a375ba49c68719f2abe162ffca66de7ef2e31a75f8","up":true},{"one":"77ba145a4623b9c83e0a14a375ba49c68719f2abe162ffca66de7ef2e31a75f8","other":"675cfed0a7b461f43cfebd6a10ecc8868d35b245381d731abec4608b6dce250a","up":true},{"one":"675cfed0a7b461f43cfebd6a10ecc8868d35b245381d731abec4608b6dce250a","other":"0b405ded87ce4e712aaebb3055e70716c971fe049a54961294c7a7a0471d12f6","up":true},{"one":"77ba145a4623b9c83e0a14a375ba49c68719f2abe162ffca66de7ef2e31a75f8","other":"0b405ded87ce4e712aaebb3055e70716c971fe049a54961294c7a7a0471d12f6","up":true},{"one":"a4301045c0cde5acb5406e1fba46ec868c056439ab43ef795830748fb6ba2e5a","other":"675cfed0a7b461f43cfebd6a10ecc8868d35b245381d731abec4608b6dce250a","up":true}]}


please just use the snapshots innetwork/stream/testing

The idea is to remove the whole network/stream package. Why create references between the new and old packages, when we should have only one in the codebase?

@nonsense, no the idea is to reimplement streamer

we want to use the same snapshots as before

we want pr to be smaller

bitvector and intervalstore are also dependencies from old stream

the new streamer should be in stream/v2

OK, I was unaware that we will be maintaining both streamers. cc @janos

Actually @janos is already removing it at 120c034, which I thought is a good idea, but now I am a bit confused why we need to keep it around? @zelig why do you want to keep both v1 and v2? Is there another library that is using the code in v1?

zelig · 2019-09-09T05:28:16Z

storage/netstore.go

 		}

-		log.Trace("netstore.chunk-not-in-localstore", "ref", ref.String())
+		//n.logger.Trace("netstore.chunk-not-in-localstore", "ref", ref.String())


nonsense · 2019-09-09T08:27:18Z

Dockerfile.alltools

+RUN echo "http://dl-cdn.alpinelinux.org/alpine/edge/testing" >> /etc/apk/repositories
+RUN apk --update add perf
+RUN wget https://github.com/stedolan/jq/releases/download/jq-1.6/jq-linux64
+RUN chmod +x /jq-linux64


I am not sure why we need jq and websocat within the image. What is the use-case here? I use those tools from my local machine, why do we need them in the swarm binary?

nonsense · 2019-09-09T08:33:37Z

cmd/swarm-smoke/upload_and_sync.go

@@ -47,7 +47,7 @@ func uploadAndSyncCmd(ctx *cli.Context) error {
 		seed = inputSeed
 	}

-	randomBytes := testutil.RandomBytes(seed, filesize*1000)
+	randomBytes := testutil.RandomBytes(seed, filesize)


I think we should revert this. I didn't understand why we wanted to have it smaller in the first place to be honest.

network/newstream/common_test.go

nonsense · 2019-09-09T12:45:34Z

network/newstream/peer.go

+	if !ok {
+		p.logger.Error("ruid not found, dropping peer", "ruid", ruid)
+		p.Drop()
+		return o, true


return nil, true, let's be explicit that o is no found.

janos · 2019-09-10T12:33:12Z

network/stream/v2/syncing_test.go

+func BenchmarkHistoricalStream_15000(b *testing.B) { benchmarkHistoricalStream(b, 15000) }
+func BenchmarkHistoricalStream_20000(b *testing.B) { benchmarkHistoricalStream(b, 20000) }
+
+func benchmarkHistoricalStream(b *testing.B, chunks uint64) {


@zelig @nonsense Could you check this benchmark also? I think that the measuring method is not correct. The simulation is constructed outside of the benchmark loop and in the benchmark loop chunks are uploaded on the same node of the same simulation, making the state on every benchmark iteration different. Every iteration measures different state and with larger number of chunks in the localstore, it becomes slower. This makes this benchmark result dependent on the number of iterations performed, where only the precisions of results should be influenced by the number of iterations. I think that this benchmark should be rewritten in a way that every iteration measures time on the independent state of other iterations, or to remove the benchmark as that may make this benchmark time consuming.

@janos it does look like the benchmark is indeed not correct for the reasons you mentioned.

I have changed the benchmark to address this problem, and it is even usable.

zelig

thanks @janos for the quick cleanup and reorg, I dont like that you put in the deletion of old stream code in this PR.

i approve it but lets make sure:

we heed the issues raised in later PRs
we understand the changes to localstore, intervalstore, simulation, etc packages

janos · 2019-09-11T07:44:24Z

@zelig Thank you for the review and approval.

I will create issues on the comments that are there after this PR gets merged.

I do not mind putting back the old streamer code. Would you like to restore the files?

nonsense · 2019-09-11T12:29:50Z

network/stream/v2/snapshot_sync_test.go

@@ -68,7 +68,7 @@ func TestSyncingViaGlobalSync(t *testing.T) {
 		log.Info(fmt.Sprintf("Running test with %d chunks and %d nodes...", *chunks, *nodes))
 		testSyncingViaGlobalSync(t, *chunks, *nodes)
 	} else {
-		chunkCounts := []int{4, 32}


We can't sync 32 chunks ? :)

It times out on travis. :) And it actually does not even flake locally. The only thing that I managed to see is the goroutine leak in hive that is fixed. If it is flaky even with only 4 chunks on travis, we should disable this test and dedicate more time to find the root cause.

acud added in progress sync-optimization stability labels Jul 5, 2019

acud assigned janos and acud Jul 5, 2019

acud force-pushed the new-syncer-gs branch 2 times, most recently from 63263ee to d6f6042 Compare July 9, 2019 16:39

janos force-pushed the new-syncer-gs branch 2 times, most recently from 83fca80 to a0f0e35 Compare July 10, 2019 14:03

acud force-pushed the new-syncer-gs branch 4 times, most recently from a9965ac to 23f841e Compare July 18, 2019 13:46

acud force-pushed the new-syncer-gs branch from 9bc4454 to 6d9b734 Compare July 25, 2019 15:09

acud added this to Backlog in Swarm Core - Sprint planning via automation Jul 29, 2019

acud moved this from Backlog to In review (includes Documentation) in Swarm Core - Sprint planning Jul 29, 2019

acud marked this pull request as ready for review July 29, 2019 15:51

acud force-pushed the new-syncer-gs branch from b1c72fe to 04b362a Compare July 29, 2019 16:27

nonsense reviewed Jul 30, 2019

View reviewed changes

network/newstream/stream.go Outdated Show resolved Hide resolved

acud added ready for review and removed in progress labels Jul 30, 2019

acud force-pushed the new-syncer-gs branch from 5c4145b to 0b13961 Compare August 9, 2019 15:02

acud and others added 4 commits September 5, 2019 18:33

smoke: remove submitMetrics and add number of nodes metric

5e9024b

stream: move the locks again

b00ce81

network/newstream: encode sync stream keys with base 36 (#1727)

c418705

stream: remove unnecessary shadowing

6d466b9

zelig suggested changes Sep 9, 2019

View reviewed changes

nonsense reviewed Sep 9, 2019

View reviewed changes

network/newstream/common_test.go Outdated Show resolved Hide resolved

nonsense reviewed Sep 9, 2019

View reviewed changes

janos added 3 commits September 10, 2019 09:04

network/stream: remove package

120c034

network/simulation, pss, storage/localstore: fix tests

9be1a68

all: address some review comments

61d7452

janos reviewed Sep 10, 2019

View reviewed changes

zelig approved these changes Sep 10, 2019

View reviewed changes

janos added 5 commits September 11, 2019 10:32

network/stream/v2: fix BenchmarkHistoricalStream

4f311d3

Merge branch 'master' into new-syncer-gs

2a8bca9

Merge branch 'master' into new-syncer-gs

94a4f83

network/stream/v2: enable TestSyncingViaGlobalSync on travis

0965d3c

network/stream/v2: reduce TestSyncingViaGlobalSync test cases

645f520

nonsense reviewed Sep 11, 2019

View reviewed changes

nonsense approved these changes Sep 11, 2019

View reviewed changes

janos added 2 commits September 11, 2019 15:19

network/stream/v2: align want.remaining

ae2bf00

network/stream/v2: limit test cases on 386 arch

7b906c2

janos merged commit 3fd9021 into master Sep 11, 2019

Swarm Core - Sprint planning automation moved this from In review (includes Documentation) to Done Sep 11, 2019

janos deleted the new-syncer-gs branch September 11, 2019 14:57

skylenet added this to the 0.5.0 milestone Sep 17, 2019

acud restored the new-syncer-gs branch October 1, 2019 12:15

acud deleted the new-syncer-gs branch October 1, 2019 12:37


		func (s *syncProvider) StreamName() string { return s.name }

		func (s *syncProvider) Boundedness() bool { return false }

		@@ -0,0 +1 @@
		{"nodes":[{"node":{"info":{"id":"0b405ded87ce4e712aaebb3055e70716c971fe049a54961294c7a7a0471d12f6","name":"node_0b405ded87ce4e712aaebb3055e70716c971fe049a54961294c7a7a0471d12f6","enode":"enode://918f13e29fcea21b47e553c60ea0ae63f856db5745c6010e31b566f819195f9a5f250788862c43b230dd93fbc24cb6b3d33c86069cc21de3a3612b2f32eeccbe@127.0.0.1:0","enr":"0xf88fb840ad1a67c3b07299bb50337351be97595ef05e5ab7988ab7831e06a3003d1fc4ad74e925ec495ab312ccc889eb9c05f2e14f625e56f813a76dc53ddd94ea0955f90183636170cdc583627a7a08c6846869766508826964827634826970847f00000189736563703235366b31a102918f13e29fcea21b47e553c60ea0ae63f856db5745c6010e31b566f819195f9a","ip":"127.0.0.1","ports":{"discovery":0,"listener":0},"listenAddr":"","protocols":{"bzz":"C0Bd7YfOTnEqrrswVecHFslx/gSaVJYSlMenoEcdEvY=","hive":"\n=========================================================================\nThu Feb 28 17:59:23 UTC 2019 KΛÐΞMLIΛ hive: queen's address: 0b405d\npopulation: 3 (3), NeighbourhoodSize: 2, MinBinSize: 2, MaxBinSize: 4\n000 1 a430 \| 1 a430 (0)\n============ DEPTH: 1 ==========================================\n001 2 77ba 675c \| 2 77ba (0) 675c (0)\n002 0 \| 0\n003 0 \| 0\n004 0 \| 0\n005 0 \| 0\n006 0 \| 0\n007 0 \| 0\n008 0 \| 0\n009 0 \| 0\n010 0 \| 0\n011 0 \| 0\n012 0 \| 0\n013 0 \| 0\n014 0 \| 0\n015 0 \| 0\n========================================================================="}},"config":{"id":"0b405ded87ce4e712aaebb3055e70716c971fe049a54961294c7a7a0471d12f6","private_key":"173c23bc3aec26afe7d299eb9556e7c189851378fe56125d929966ac7207bcb6","name":"node_0b405ded87ce4e712aaebb3055e70716c971fe049a54961294c7a7a0471d12f6","services":["streamer"],"enable_msg_events":true,"port":38277},"up":true}},{"node":{"info":{"id":"a4301045c0cde5acb5406e1fba46ec868c056439ab43ef795830748fb6ba2e5a","name":"node_a4301045c0cde5acb5406e1fba46ec868c056439ab43ef795830748fb6ba2e5a","enode":"enode://89b66f574168570cc5d347051c0e2b05b33578d42142ef8cc930f1fbb384266f2827e4b087f5e2985a32d8f2b353fb7be416193632da5c692cace99d1b9e4274@127.0.0.1:0","enr":"0xf88fb8402e6392d3d5377977fc4b4d29f4dbf4ca96e963df056ead84ec2fcb7ee071996670a935d3b49250455e02c392baacb4c8499f815f74b09c405316a5d2fdee94e60183636170cdc583627a7a08c6846869766508826964827634826970847f00000189736563703235366b31a10289b66f574168570cc5d347051c0e2b05b33578d42142ef8cc930f1fbb384266f","ip":"127.0.0.1","ports":{"discovery":0,"listener":0},"listenAddr":"","protocols":{"bzz":"pDAQRcDN5ay1QG4fukbshowFZDmrQ+95WDB0j7a6Llo=","hive":"\n=========================================================================\nThu Feb 28 17:59:23 UTC 2019 KΛÐΞMLIΛ hive: queen's address: a43010\npopulation: 3 (3), NeighbourhoodSize: 2, MinBinSize: 2, MaxBinSize: 4\n============ DEPTH: 0 ==========================================\n000 3 675c 77ba 0b40 \| 3 675c (0) 77ba (0) 0b40 (0)\n001 0 \| 0\n002 0 \| 0\n003 0 \| 0\n004 0 \| 0\n005 0 \| 0\n006 0 \| 0\n007 0 \| 0\n008 0 \| 0\n009 0 \| 0\n010 0 \| 0\n011 0 \| 0\n012 0 \| 0\n013 0 \| 0\n014 0 \| 0\n015 0 \| 0\n========================================================================="}},"config":{"id":"a4301045c0cde5acb5406e1fba46ec868c056439ab43ef795830748fb6ba2e5a","private_key":"5bcd3bbb886e5ad2317a6ac1de2b19ebe27db9337ba8e7daa76a1cdaa6a3d158","name":"node_a4301045c0cde5acb5406e1fba46ec868c056439ab43ef795830748fb6ba2e5a","services":["streamer"],"enable_msg_events":true,"port":41325},"up":true}},{"node":{"info":{"id":"77ba145a4623b9c83e0a14a375ba49c68719f2abe162ffca66de7ef2e31a75f8","name":"node_77ba145a4623b9c83e0a14a375ba49c68719f2abe162ffca66de7ef2e31a75f8","enode":"enode://ecb56004a128067628a3198244080ba4ac79860cb2cf74407d7571955a80fb9243ad5f8265063a5aa209ea95329b4b57d0f1b492a85835ab87fce79881a895a9@127.0.0.1:0","enr":"0xf88fb84035ed878b2a99ed1bbd55df1701cd82115aebc649b1e8e68302e7caf5480a61322c74189d1fe66f2d16e79174f0e3479213f0bf963a4518d583078e164bd94a220183636170cdc583627a7a08c6846869766508826964827634826970847f00000189736563703235366b31a103ecb56004a128067628a3198244080ba4ac79860cb2cf74407d7571955a80fb92","ip":"127.0.0.1","ports":{"discovery":0,"listener":0},"listenAddr":"","protocols":{"bzz":"d7oUWkYjucg+ChSjdbpJxocZ8qvhYv/KZt5+8uMadfg=","hive":"\n=========================================================================\nThu Feb 28 17:59:23 UTC 2019 KΛÐΞMLIΛ hive: queen's address: 77ba14\npopulation: 3 (3), NeighbourhoodSize: 2, MinBinSize: 2, MaxBinSize: 4\n000 1 a430 \| 1 a430 (0)\n============ DEPTH: 1 ==========================================\n001 1 0b40 \| 1 0b40 (0)\n002 0 \| 0\n003 1 675c \| 1 675c (0)\n004 0 \| 0\n005 0 \| 0\n006 0 \| 0\n007 0 \| 0\n008 0 \| 0\n009 0 \| 0\n010 0 \| 0\n011 0 \| 0\n012 0 \| 0\n013 0 \| 0\n014 0 \| 0\n015 0 \| 0\n========================================================================="}},"config":{"id":"77ba145a4623b9c83e0a14a375ba49c68719f2abe162ffca66de7ef2e31a75f8","private_key":"6f30c3d0135202cbf540669c8c76d9144bd53764aaf4ae18e8bee98ce05c6211","name":"node_77ba145a4623b9c83e0a14a375ba49c68719f2abe162ffca66de7ef2e31a75f8","services":["streamer"],"enable_msg_events":true,"port":40785},"up":true}},{"node":{"info":{"id":"675cfed0a7b461f43cfebd6a10ecc8868d35b245381d731abec4608b6dce250a","name":"node_675cfed0a7b461f43cfebd6a10ecc8868d35b245381d731abec4608b6dce250a","enode":"enode://64a6b33b028b818a1961272651d1d7a976329e541b6e3e983f518d89326d7427f4c9243f78bb03fd2e2dc059f610ee50d1e664e3107db01ac603b9c3b20ebb4f@127.0.0.1:0","enr":"0xf88fb8407dbbb99300e04c6917c27d61bf396481aae19287c2869e8dd74e11b0f2ebb74174376a6152c26af95173a4868a0349aaf3be5d063918590fd5962f4af174e0010183636170cdc583627a7a08c6846869766508826964827634826970847f00000189736563703235366b31a10364a6b33b028b818a1961272651d1d7a976329e541b6e3e983f518d89326d7427","ip":"127.0.0.1","ports":{"discovery":0,"listener":0},"listenAddr":"","protocols":{"bzz":"Z1z+0Ke0YfQ8/r1qEOzIho01skU4HXMavsRgi23OJQo=","hive":"\n=========================================================================\nThu Feb 28 17:59:23 UTC 2019 KΛÐΞMLIΛ hive: queen's address: 675cfe\npopulation: 3 (3), NeighbourhoodSize: 2, MinBinSize: 2, MaxBinSize: 4\n000 1 a430 \| 1 a430 (0)\n============ DEPTH: 1 ==========================================\n001 1 0b40 \| 1 0b40 (0)\n002 0 \| 0\n003 1 77ba \| 1 77ba (0)\n004 0 \| 0\n005 0 \| 0\n006 0 \| 0\n007 0 \| 0\n008 0 \| 0\n009 0 \| 0\n010 0 \| 0\n011 0 \| 0\n012 0 \| 0\n013 0 \| 0\n014 0 \| 0\n015 0 \| 0\n========================================================================="}},"config":{"id":"675cfed0a7b461f43cfebd6a10ecc8868d35b245381d731abec4608b6dce250a","private_key":"d77006035afa5194053c0347021f7825a71f9643028c45b65756ef88ee53e6dd","name":"node_675cfed0a7b461f43cfebd6a10ecc8868d35b245381d731abec4608b6dce250a","services":["streamer"],"enable_msg_events":true,"port":37069},"up":true}}],"conns":[{"one":"0b405ded87ce4e712aaebb3055e70716c971fe049a54961294c7a7a0471d12f6","other":"a4301045c0cde5acb5406e1fba46ec868c056439ab43ef795830748fb6ba2e5a","up":true},{"one":"a4301045c0cde5acb5406e1fba46ec868c056439ab43ef795830748fb6ba2e5a","other":"77ba145a4623b9c83e0a14a375ba49c68719f2abe162ffca66de7ef2e31a75f8","up":true},{"one":"77ba145a4623b9c83e0a14a375ba49c68719f2abe162ffca66de7ef2e31a75f8","other":"675cfed0a7b461f43cfebd6a10ecc8868d35b245381d731abec4608b6dce250a","up":true},{"one":"675cfed0a7b461f43cfebd6a10ecc8868d35b245381d731abec4608b6dce250a","other":"0b405ded87ce4e712aaebb3055e70716c971fe049a54961294c7a7a0471d12f6","up":true},{"one":"77ba145a4623b9c83e0a14a375ba49c68719f2abe162ffca66de7ef2e31a75f8","other":"0b405ded87ce4e712aaebb3055e70716c971fe049a54961294c7a7a0471d12f6","up":true},{"one":"a4301045c0cde5acb5406e1fba46ec868c056439ab43ef795830748fb6ba2e5a","other":"675cfed0a7b461f43cfebd6a10ecc8868d35b245381d731abec4608b6dce250a","up":true}]}

network: new stream! protocol and pull syncer implementation #1538

network: new stream! protocol and pull syncer implementation #1538

Conversation

acud commented Jul 5, 2019 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

nonsense commented Jul 30, 2019

zelig commented Aug 4, 2019

zelig left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

zelig left a comment

Choose a reason for hiding this comment

janos commented Sep 11, 2019

Choose a reason for hiding this comment

Choose a reason for hiding this comment

acud commented Jul 5, 2019 •

edited

Loading