add --disable-auto-connect flag #1576

nonsense · 2019-07-16T13:28:59Z

This PR is adding a --disable-auto-connect flag, so that we can run Swarm without hive discovery and reproduce the same connections between deployments, without relying on the non-deterministic SuggestPeer functionality.

Next step would be to add tools to extract and apply connections from running deployments (a very early attempt at #1183) and to generate snapshots out of them so that we can have more determinism in our tests.

network/hive.go

zelig

the flag should cause hive.Start not to start the connect loop

network/hive.go

nonsense · 2019-07-17T15:59:57Z

network/simulations/discovery/discovery_test.go

@@ -147,7 +147,7 @@ func TestDiscoverySimulationSimAdapter(t *testing.T) {
 	testDiscoverySimulationSimAdapter(t, *nodeCount, *initCount)
 }

-func TestDiscoveryPersistenceSimulationSimAdapter(t *testing.T) {
+func XTestDiscoveryPersistenceSimulationSimAdapter(t *testing.T) {


@zelig could you maybe add a few comments on what this test is supposed to test?

I am also not sure how the vars persistenceEnabled and discoveryEnabled are supposed to work - what if tests are run concurrently?

It seems to me that this test is starting a network and connecting nodes in a chain, and then waiting for the Kademlia table to be setup by hive. At this point it stops the nodes, stops discovery and starts them back up, expecting for the Kademlia table to be setup without discovery.

If this is correct, it seems like we are changing the meaning of the NoDiscovery flag - previously it was just preventing subPeerMsg message exchanges, and not preventing actual connections, whereas now if we set the NoDiscovery flag to true, we don't want hive to trigger any connections what-so-ever.

Therefore if my assumption on this test is correct, it is expected for it to fail after the change.

What do you think?

well you seemed to want to redefine this flag to not use the address book that is saved just allow manual (or snapshot driven) connection.

This test is supposed to test that peers persist across sessions, i.e., that the addressbook is saved and used when bootstrapping connectivity

BTW the addressbook persistence is broken for a long time - if there are peers that no longer exist, we never clear them out, and we keep on trying to connect to them indefinitely - something we tried fixing with @homotopycolimit at some point, but didn't work nicely as a quick hack.

@nonsense this test was my first coding task on swarm so i take responsibility :D

Discovery should indeed not impact on whether we are connecting to peer from our address book, so I don't see why this should impact the way we connect to existing known peers, and I'm not sure that disabling the test would help this fact at all

In fact, it makes more sense to have --no-hive to mean no connect loop. So you could simply fix this test by wrapping hive service in a struct and redefine its Protocols function to return empty. That way you can get the hive service with connect loop but no protocol running, so you can safely test bootstrapping from addressbook.

I'd rather not redefine the Protocols function to test persistence of the kademlia address book, this seems like a very ugly hack.

With this PR --no-hive-discovery does not run the connect() loop.

Let's talk about the address book as part of the Kademlia epic, this is out of scope for this PR. Both me and Elad seem to think that similar to how bootnodes are connected to outside of hive, it makes sense to do something similar for the address book.

zelig

address book persistence test should not be removed.
Maybe we need two flags: discoveryDisabled and autoconnectDisabled.
i am not sure of persistanceEnabled , cannot see the usecase of disabling it

nonsense · 2019-07-18T10:30:12Z

Yes, the definition of discovery is changing, and we should decide what to do with it.

We obviously need a discovery flag, which pretty much disables both message exchange in hive as well as automatic connections, because we want to be able to deterministically setup networks.

I am not sure if we need an autoconnect flag - it seems to be used only to test persistence, and we can probably test persistence without this flag. Furthermore we want the Kademlia table to be mostly constructed while preferring peers we had connections to before, rather than just kicking in the Discovery mechanism and randomly getting new ones.

nonsense · 2019-07-18T14:44:47Z

Address book persistence in my opinion doesn't work well, so should be getting an issue for itself, and is probably out of scope for this PR. If there are peers in it that are not existent anymore, a node will try to connect to them indefinitely. I think it is more important to be able to deterministically build networks right now, so that we can improve our overall testing - something that @acud is having pain with in pull sync AFAICT seeing the new issues.

acud

@nonsense let's re-approach this.
do you want to block the possibility of getting notified about new peers from your peers or would you like to disable address book persistence?

acud · 2019-07-18T18:22:06Z

network/simulations/discovery/discovery_test.go

@@ -147,7 +147,7 @@ func TestDiscoverySimulationSimAdapter(t *testing.T) {
 	testDiscoverySimulationSimAdapter(t, *nodeCount, *initCount)
 }

-func TestDiscoveryPersistenceSimulationSimAdapter(t *testing.T) {
+func XTestDiscoveryPersistenceSimulationSimAdapter(t *testing.T) {


@nonsense this test was my first coding task on swarm so i take responsibility :D

Discovery should indeed not impact on whether we are connecting to peer from our address book, so I don't see why this should impact the way we connect to existing known peers, and I'm not sure that disabling the test would help this fact at all

nonsense · 2019-07-19T09:25:09Z

@nonsense let's re-approach this.
do you want to block the possibility of getting notified about new peers from your peers or would you like to disable address book persistence?

@acud this issue/PR is not at all about any of this. I don't see address book persistence having to do anything with hive being disabled and disabling hive seems to impact its tests - I actually think the test is wrong, because it expects that we will be connecting to peers when NoDiscovery is set to false, whereas the only thing we do with the address book is load it in Kademlia.

So the question is do we want to couple address book loading with NoDiscovery - I'd say no.

Ultimately I just want to be able to start Swarm without having hive create connections to peers, so that we can have deterministic networks.

nonsense · 2019-07-19T09:31:42Z

@zelig @acud we spoke about this on a meeting 2-3 weeks ago, and I claimed that there is no way to create a deterministic network with Kademlia - I think this PR mostly shows that, because there is no way to disable hive and to stop it from making connections.

So how do you want to complete this?

I think the test that checks for persistence is wrong, because it checks for connectivity and not for saving/loading of peers. In general I think persistence should be improved as a larger effort, since it doesn't clear up old peers.

Bottom line - I don't think the Discovery flag (which cannot be set by a command line argument right now) is useful at all in its current state, therefore I am changing its behaviour. It seems to be used only for the persistence test.

Having the possibility to create deterministic networks is quite important for all our efforts right now, so I'd suggest we go forward with this, and not wait for the Kademlia refactor we have on the roadmap. I think it'd be useful to have reproducible tests sooner rather than later.

acud · 2019-07-19T11:04:36Z

I think the basic misunderstanding here is the discovery flag. This actually has nil to do with discovery. Discovery is the act of gossiping peers to your peers and this is what the flag should control.

The persistence test semantics to this flag in this case is correct, since the content of the persisted address book should be bootstrapped to the node (or at least intercepted with a hook and asserted correctly).

I agree that address book persistence should not be related to hive. If currently the address book content is loaded through hive then this is incorrect and should be done through the p2p server somehow.

zelig · 2019-07-21T18:01:48Z

I am in favour of getting rid of the current interpretation of hive discovery (whether to gossip peer information via the hive protocol) because i dont see a need for disabling hive protocol apart from the persistance test.
The persistance test is supposed to test if the hive correctly remembers and reuses the peers in the address book accross sessions when you bootstrap connectivity for a node. If hive protocol is gossiping peer info, this cannot be tested easily, so i recommend simply wrapping the hive service in a struct and redefine its Protocols function to return empty. That way you can get the hive service with connect loop but no protocol running, so you can safely test bootstrapping from addressbook without allowing such configuration in production code. This is not a lot of work.
In Kademlia Upgrade #1535 There has been a new requirement with regards to bootstrapping: to give preference to peers we were last connected (if they exist) over other peers (even if they are freshly being gossipped). While this needs to be further specified, it is likely to replace the persistance test described in 2. so ultimately we will not need to support a case with running connect loop with no hive protocol. But I really prefer fixing tests when you change something, rather than disable or delete them just cos they will need to be reworked at some future time.
As for the flag, I recommend renaming it to -disable-auto-connect to avoid ambiguity and simply have it not call hive.Start or not start the go connect() in the hive start.
As for bootstrapping from snapshot and saving a snapshot i guess there are create tool to generate addresses for manual snapshot creation #1584 and snapshots should allow stepping through topologies #1585 now.

nonsense · 2019-07-22T12:19:40Z

@zelig about 2:

I'd rather not redefine the Protocols function to test persistence of the Kademlia address book, this seems like a very ugly hack. This feels like the completely wrong level to test something like address book persistence. It is not a matter of a lot of work, or not. Furthermore currently we seem to only Register the loaded peers, and not connect to them. This is not really reproducible, because the idea is that you connect to peers that you were previously connected to, whereas now Hive is free to override that loaded peers list and connect to other suggested peers.

zelig

minor plus I think address book persistance simulation should be restored and adapted

zelig · 2019-07-24T12:43:07Z

api/config.go

+	MaxStreamPeerServers   int
+	LightNodeEnabled       bool
+	BootnodeMode           bool
+	HiveDisableAutoConnect bool


no need for Hive prefix

zelig · 2019-07-24T12:44:35Z

network/simulation/kademlia_test.go

@@ -130,7 +130,7 @@ func createSimServiceMap(discovery bool) map[string]ServiceFunc {
 		"bzz": func(ctx *adapters.ServiceContext, b *sync.Map) (node.Service, func(), error) {
 			addr := network.NewAddr(ctx.Config.Node())
 			hp := network.NewHiveParams()
-			hp.Discovery = discovery
+			hp.AutoConnect = discovery


please change the variable name too

zelig · 2019-07-24T12:50:28Z

network/simulations/discovery/discovery_test.go

@@ -186,26 +178,6 @@ func testDiscoverySimulation(t *testing.T, nodes, conns int, adapter adapters.No
 	t.Logf("Setup: %s, shutdown: %s", result.StartedAt.Sub(startedAt), finishedAt.Sub(result.FinishedAt))
 }

-func testDiscoveryPersistenceSimulation(t *testing.T, nodes, conns int, adapter adapters.NodeAdapter) map[int][]byte {


I disagree with removing this test when you can easily fix it. Please

zelig · 2019-07-24T12:51:36Z

network/simulations/discovery/discovery_test.go

@@ -513,9 +328,9 @@ func newService(ctx *adapters.ServiceContext) (node.Service, error) {
 	kad := network.NewKademlia(addr.Over(), kp)
 	hp := network.NewHiveParams()
 	hp.KeepAliveInterval = time.Duration(200) * time.Millisecond
-	hp.Discovery = discoveryEnabled
+	hp.AutoConnect = discoveryEnabled


please change the variable name too. otherwise we accumulate technical debt

zelig · 2019-07-24T12:51:49Z

network/simulations/overlay.go

@@ -94,7 +94,7 @@ func (s *Simulation) NewService(ctx *adapters.ServiceContext) (node.Service, err
 	kp.RetryInterval = 1000000
 	kad := network.NewKademlia(addr.Over(), kp)
 	hp := network.NewHiveParams()
-	hp.Discovery = !*noDiscovery
+	hp.AutoConnect = !*noDiscovery


zelig

allright so you are only adding a new flag. Fine.

* 'master' of github.com:ethersphere/swarm: (54 commits) api, chunk, cmd, shed, storage: add support for pinning content (ethersphere#1509) docs/swarm-guide: cleanup (ethersphere#1620) travis: split jobs into different stages (ethersphere#1615) simulation: retry if we hit a collision on tcp/udp ports (ethersphere#1616) api, chunk: rename Tag.New to Tag.Create (ethersphere#1614) pss: instrumentation and refactor (ethersphere#1580) api, cmd, network: add --disable-auto-connect flag (ethersphere#1576) changelog: fix typo (ethersphere#1605) version: update to v0.4.4 unstable (ethersphere#1603) swarm: release v0.4.3 (ethersphere#1602) network/retrieve: add bzz-retrieve protocol (ethersphere#1589) PoC: Network simulation framework (ethersphere#1555) network: structured output for kademlia table (ethersphere#1586) client: add bzz client, update smoke tests (ethersphere#1582) swarm-smoke: fix check max prox hosts for pull/push sync modes (ethersphere#1578) cmd/swarm: allow using a network interface by name for nat purposes (ethersphere#1557) pss: disable TestForwardBasic (ethersphere#1544) api, network: count chunk deliveries per peer (ethersphere#1534) network/newstream: new stream! protocol base implementation (ethersphere#1500) swarm: fix bzz_info.port when using dynamic port allocation (ethersphere#1537) ...

nonsense added the ready for review label Jul 16, 2019

nonsense requested review from janos, skylenet and acud July 16, 2019 13:29

nonsense force-pushed the no-hive-discovery-flag branch 2 times, most recently from 3843663 to b54d873 Compare July 16, 2019 13:35

nonsense commented Jul 16, 2019

View reviewed changes

network/hive.go Outdated Show resolved Hide resolved

nonsense added this to Backlog in Swarm Core - Sprint planning via automation Jul 16, 2019

nonsense added in progress and removed ready for review labels Jul 16, 2019

janos previously approved these changes Jul 16, 2019

View reviewed changes

skylenet previously approved these changes Jul 16, 2019

View reviewed changes

zelig suggested changes Jul 17, 2019

View reviewed changes

network/hive.go Outdated Show resolved Hide resolved

network/hive.go Outdated Show resolved Hide resolved

network/hive.go Outdated Show resolved Hide resolved

nonsense dismissed stale reviews from skylenet and janos via 4d4ec2b July 17, 2019 08:39

nonsense commented Jul 17, 2019

View reviewed changes

zelig reviewed Jul 18, 2019

View reviewed changes

acud reviewed Jul 18, 2019

View reviewed changes

acud requested review from acud and removed request for acud July 19, 2019 11:04

acud previously approved these changes Jul 19, 2019

View reviewed changes

acud requested review from zelig and janos July 19, 2019 11:31

janos previously approved these changes Jul 19, 2019

View reviewed changes

nonsense added the ready for review label Jul 19, 2019

nonsense removed the in progress label Jul 19, 2019

nonsense moved this from Backlog to In progress in Swarm Core - Sprint planning Jul 22, 2019

nonsense dismissed stale reviews from janos and acud via 9cc92ba July 22, 2019 13:50

nonsense changed the title ~~add --no-hive-discovery flag~~ add --disable-auto-connect flag Jul 22, 2019

nonsense added ready for another review and removed ready for review labels Jul 22, 2019

zelig suggested changes Jul 24, 2019

View reviewed changes

nonsense force-pushed the no-hive-discovery-flag branch 2 times, most recently from b6fed64 to 1752f76 Compare July 25, 2019 09:21

api, cmd, network: add --disable-auto-connect flag

25b1804

nonsense force-pushed the no-hive-discovery-flag branch from 1752f76 to 25b1804 Compare July 25, 2019 09:25

nonsense added ready for review and removed ready for another review labels Jul 25, 2019

zelig approved these changes Jul 26, 2019

View reviewed changes

skylenet self-requested a review July 29, 2019 08:42

skylenet approved these changes Jul 29, 2019

View reviewed changes

nonsense merged commit 9f553ea into master Jul 29, 2019

Swarm Core - Sprint planning automation moved this from In progress to Done Jul 29, 2019

nonsense deleted the no-hive-discovery-flag branch July 29, 2019 08:43

skylenet added this to the 0.4.4 milestone Aug 9, 2019

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

add --disable-auto-connect flag #1576

add --disable-auto-connect flag #1576

nonsense commented Jul 16, 2019 •

edited

Loading

zelig left a comment

nonsense Jul 17, 2019 •

edited

Loading

zelig Jul 18, 2019

nonsense Jul 18, 2019

acud Jul 18, 2019

zelig Jul 19, 2019

nonsense Jul 19, 2019

zelig left a comment

nonsense commented Jul 18, 2019

nonsense commented Jul 18, 2019

acud left a comment

acud Jul 18, 2019

nonsense commented Jul 19, 2019

nonsense commented Jul 19, 2019

acud commented Jul 19, 2019

zelig commented Jul 21, 2019 •

edited

Loading

nonsense commented Jul 22, 2019

zelig left a comment

zelig Jul 24, 2019

zelig Jul 24, 2019

zelig Jul 24, 2019

zelig Jul 24, 2019

zelig Jul 24, 2019

zelig left a comment

add --disable-auto-connect flag #1576

add --disable-auto-connect flag #1576

Conversation

nonsense commented Jul 16, 2019 • edited Loading

zelig left a comment

Choose a reason for hiding this comment

nonsense Jul 17, 2019 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

zelig left a comment

Choose a reason for hiding this comment

nonsense commented Jul 18, 2019

nonsense commented Jul 18, 2019

acud left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

nonsense commented Jul 19, 2019

nonsense commented Jul 19, 2019

acud commented Jul 19, 2019

zelig commented Jul 21, 2019 • edited Loading

nonsense commented Jul 22, 2019

zelig left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

zelig left a comment

Choose a reason for hiding this comment

nonsense commented Jul 16, 2019 •

edited

Loading

nonsense Jul 17, 2019 •

edited

Loading

zelig commented Jul 21, 2019 •

edited

Loading