Skip to content
This repository has been archived by the owner on Aug 2, 2021. It is now read-only.

add --disable-auto-connect flag #1576

Merged
merged 1 commit into from
Jul 29, 2019
Merged

add --disable-auto-connect flag #1576

merged 1 commit into from
Jul 29, 2019

Conversation

nonsense
Copy link
Contributor

@nonsense nonsense commented Jul 16, 2019

This PR is adding a --disable-auto-connect flag, so that we can run Swarm without hive discovery and reproduce the same connections between deployments, without relying on the non-deterministic SuggestPeer functionality.

Next step would be to add tools to extract and apply connections from running deployments (a very early attempt at #1183) and to generate snapshots out of them so that we can have more determinism in our tests.

@nonsense nonsense requested review from janos, skylenet and acud July 16, 2019 13:29
@nonsense nonsense force-pushed the no-hive-discovery-flag branch 2 times, most recently from 3843663 to b54d873 Compare July 16, 2019 13:35
network/hive.go Outdated Show resolved Hide resolved
@nonsense nonsense added this to Backlog in Swarm Core - Sprint planning via automation Jul 16, 2019
janos
janos previously approved these changes Jul 16, 2019
skylenet
skylenet previously approved these changes Jul 16, 2019
Copy link
Member

@zelig zelig left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the flag should cause hive.Start not to start the connect loop

network/hive.go Outdated Show resolved Hide resolved
network/hive.go Outdated Show resolved Hide resolved
network/hive.go Outdated Show resolved Hide resolved
@nonsense nonsense dismissed stale reviews from skylenet and janos via 4d4ec2b July 17, 2019 08:39
@@ -147,7 +147,7 @@ func TestDiscoverySimulationSimAdapter(t *testing.T) {
testDiscoverySimulationSimAdapter(t, *nodeCount, *initCount)
}

func TestDiscoveryPersistenceSimulationSimAdapter(t *testing.T) {
func XTestDiscoveryPersistenceSimulationSimAdapter(t *testing.T) {
Copy link
Contributor Author

@nonsense nonsense Jul 17, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@zelig could you maybe add a few comments on what this test is supposed to test?

I am also not sure how the vars persistenceEnabled and discoveryEnabled are supposed to work - what if tests are run concurrently?

It seems to me that this test is starting a network and connecting nodes in a chain, and then waiting for the Kademlia table to be setup by hive. At this point it stops the nodes, stops discovery and starts them back up, expecting for the Kademlia table to be setup without discovery.

If this is correct, it seems like we are changing the meaning of the NoDiscovery flag - previously it was just preventing subPeerMsg message exchanges, and not preventing actual connections, whereas now if we set the NoDiscovery flag to true, we don't want hive to trigger any connections what-so-ever.

Therefore if my assumption on this test is correct, it is expected for it to fail after the change.

What do you think?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

well you seemed to want to redefine this flag to not use the address book that is saved just allow manual (or snapshot driven) connection.

This test is supposed to test that peers persist across sessions, i.e., that the addressbook is saved and used when bootstrapping connectivity

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

BTW the addressbook persistence is broken for a long time - if there are peers that no longer exist, we never clear them out, and we keep on trying to connect to them indefinitely - something we tried fixing with @homotopycolimit at some point, but didn't work nicely as a quick hack.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@nonsense this test was my first coding task on swarm so i take responsibility :D

Discovery should indeed not impact on whether we are connecting to peer from our address book, so I don't see why this should impact the way we connect to existing known peers, and I'm not sure that disabling the test would help this fact at all

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In fact, it makes more sense to have --no-hive to mean no connect loop. So you could simply fix this test by wrapping hive service in a struct and redefine its Protocols function to return empty. That way you can get the hive service with connect loop but no protocol running, so you can safely test bootstrapping from addressbook.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd rather not redefine the Protocols function to test persistence of the kademlia address book, this seems like a very ugly hack.

With this PR --no-hive-discovery does not run the connect() loop.

Let's talk about the address book as part of the Kademlia epic, this is out of scope for this PR. Both me and Elad seem to think that similar to how bootnodes are connected to outside of hive, it makes sense to do something similar for the address book.

Copy link
Member

@zelig zelig left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

address book persistence test should not be removed.
Maybe we need two flags: discoveryDisabled and autoconnectDisabled.
i am not sure of persistanceEnabled , cannot see the usecase of disabling it

@nonsense
Copy link
Contributor Author

Yes, the definition of discovery is changing, and we should decide what to do with it.

We obviously need a discovery flag, which pretty much disables both message exchange in hive as well as automatic connections, because we want to be able to deterministically setup networks.

I am not sure if we need an autoconnect flag - it seems to be used only to test persistence, and we can probably test persistence without this flag. Furthermore we want the Kademlia table to be mostly constructed while preferring peers we had connections to before, rather than just kicking in the Discovery mechanism and randomly getting new ones.

@nonsense
Copy link
Contributor Author

Address book persistence in my opinion doesn't work well, so should be getting an issue for itself, and is probably out of scope for this PR. If there are peers in it that are not existent anymore, a node will try to connect to them indefinitely. I think it is more important to be able to deterministically build networks right now, so that we can improve our overall testing - something that @acud is having pain with in pull sync AFAICT seeing the new issues.

Copy link
Member

@acud acud left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@nonsense let's re-approach this.
do you want to block the possibility of getting notified about new peers from your peers or would you like to disable address book persistence?

@@ -147,7 +147,7 @@ func TestDiscoverySimulationSimAdapter(t *testing.T) {
testDiscoverySimulationSimAdapter(t, *nodeCount, *initCount)
}

func TestDiscoveryPersistenceSimulationSimAdapter(t *testing.T) {
func XTestDiscoveryPersistenceSimulationSimAdapter(t *testing.T) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@nonsense this test was my first coding task on swarm so i take responsibility :D

Discovery should indeed not impact on whether we are connecting to peer from our address book, so I don't see why this should impact the way we connect to existing known peers, and I'm not sure that disabling the test would help this fact at all

@nonsense
Copy link
Contributor Author

@nonsense let's re-approach this.
do you want to block the possibility of getting notified about new peers from your peers or would you like to disable address book persistence?

@acud this issue/PR is not at all about any of this. I don't see address book persistence having to do anything with hive being disabled and disabling hive seems to impact its tests - I actually think the test is wrong, because it expects that we will be connecting to peers when NoDiscovery is set to false, whereas the only thing we do with the address book is load it in Kademlia.

So the question is do we want to couple address book loading with NoDiscovery - I'd say no.

Ultimately I just want to be able to start Swarm without having hive create connections to peers, so that we can have deterministic networks.

@nonsense
Copy link
Contributor Author

@zelig @acud we spoke about this on a meeting 2-3 weeks ago, and I claimed that there is no way to create a deterministic network with Kademlia - I think this PR mostly shows that, because there is no way to disable hive and to stop it from making connections.

So how do you want to complete this?

I think the test that checks for persistence is wrong, because it checks for connectivity and not for saving/loading of peers. In general I think persistence should be improved as a larger effort, since it doesn't clear up old peers.

Bottom line - I don't think the Discovery flag (which cannot be set by a command line argument right now) is useful at all in its current state, therefore I am changing its behaviour. It seems to be used only for the persistence test.

Having the possibility to create deterministic networks is quite important for all our efforts right now, so I'd suggest we go forward with this, and not wait for the Kademlia refactor we have on the roadmap. I think it'd be useful to have reproducible tests sooner rather than later.

@acud
Copy link
Member

acud commented Jul 19, 2019

I think the basic misunderstanding here is the discovery flag. This actually has nil to do with discovery. Discovery is the act of gossiping peers to your peers and this is what the flag should control.

The persistence test semantics to this flag in this case is correct, since the content of the persisted address book should be bootstrapped to the node (or at least intercepted with a hook and asserted correctly).

I agree that address book persistence should not be related to hive. If currently the address book content is loaded through hive then this is incorrect and should be done through the p2p server somehow.

@acud acud requested review from acud and removed request for acud July 19, 2019 11:04
acud
acud previously approved these changes Jul 19, 2019
@acud acud requested review from zelig and janos July 19, 2019 11:31
janos
janos previously approved these changes Jul 19, 2019
@zelig
Copy link
Member

zelig commented Jul 21, 2019

  1. I am in favour of getting rid of the current interpretation of hive discovery (whether to gossip peer information via the hive protocol) because i dont see a need for disabling hive protocol apart from the persistance test.

  2. The persistance test is supposed to test if the hive correctly remembers and reuses the peers in the address book accross sessions when you bootstrap connectivity for a node. If hive protocol is gossiping peer info, this cannot be tested easily, so i recommend simply wrapping the hive service in a struct and redefine its Protocols function to return empty. That way you can get the hive service with connect loop but no protocol running, so you can safely test bootstrapping from addressbook without allowing such configuration in production code. This is not a lot of work.

  3. In Kademlia Upgrade #1535 There has been a new requirement with regards to bootstrapping: to give preference to peers we were last connected (if they exist) over other peers (even if they are freshly being gossipped). While this needs to be further specified, it is likely to replace the persistance test described in 2. so ultimately we will not need to support a case with running connect loop with no hive protocol. But I really prefer fixing tests when you change something, rather than disable or delete them just cos they will need to be reworked at some future time.

  4. As for the flag, I recommend renaming it to -disable-auto-connect to avoid ambiguity and simply have it not call hive.Start or not start the go connect() in the hive start.

  5. As for bootstrapping from snapshot and saving a snapshot i guess there are create tool to generate addresses for manual snapshot creation #1584 and snapshots should allow stepping through topologies #1585 now.

@nonsense
Copy link
Contributor Author

@zelig about 2:

I'd rather not redefine the Protocols function to test persistence of the Kademlia address book, this seems like a very ugly hack. This feels like the completely wrong level to test something like address book persistence. It is not a matter of a lot of work, or not. Furthermore currently we seem to only Register the loaded peers, and not connect to them. This is not really reproducible, because the idea is that you connect to peers that you were previously connected to, whereas now Hive is free to override that loaded peers list and connect to other suggested peers.

@nonsense nonsense moved this from Backlog to In progress in Swarm Core - Sprint planning Jul 22, 2019
@nonsense nonsense dismissed stale reviews from janos and acud via 9cc92ba July 22, 2019 13:50
@nonsense nonsense changed the title add --no-hive-discovery flag add --disable-auto-connect flag Jul 22, 2019
Copy link
Member

@zelig zelig left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

minor plus I think address book persistance simulation should be restored and adapted

api/config.go Outdated
MaxStreamPeerServers int
LightNodeEnabled bool
BootnodeMode bool
HiveDisableAutoConnect bool
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

no need for Hive prefix

@@ -130,7 +130,7 @@ func createSimServiceMap(discovery bool) map[string]ServiceFunc {
"bzz": func(ctx *adapters.ServiceContext, b *sync.Map) (node.Service, func(), error) {
addr := network.NewAddr(ctx.Config.Node())
hp := network.NewHiveParams()
hp.Discovery = discovery
hp.AutoConnect = discovery
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

please change the variable name too

@@ -186,26 +178,6 @@ func testDiscoverySimulation(t *testing.T, nodes, conns int, adapter adapters.No
t.Logf("Setup: %s, shutdown: %s", result.StartedAt.Sub(startedAt), finishedAt.Sub(result.FinishedAt))
}

func testDiscoveryPersistenceSimulation(t *testing.T, nodes, conns int, adapter adapters.NodeAdapter) map[int][]byte {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I disagree with removing this test when you can easily fix it. Please

@@ -513,9 +328,9 @@ func newService(ctx *adapters.ServiceContext) (node.Service, error) {
kad := network.NewKademlia(addr.Over(), kp)
hp := network.NewHiveParams()
hp.KeepAliveInterval = time.Duration(200) * time.Millisecond
hp.Discovery = discoveryEnabled
hp.AutoConnect = discoveryEnabled
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

please change the variable name too. otherwise we accumulate technical debt

@@ -94,7 +94,7 @@ func (s *Simulation) NewService(ctx *adapters.ServiceContext) (node.Service, err
kp.RetryInterval = 1000000
kad := network.NewKademlia(addr.Over(), kp)
hp := network.NewHiveParams()
hp.Discovery = !*noDiscovery
hp.AutoConnect = !*noDiscovery
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

same

@nonsense nonsense force-pushed the no-hive-discovery-flag branch 2 times, most recently from b6fed64 to 1752f76 Compare July 25, 2019 09:21
Copy link
Member

@zelig zelig left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

allright so you are only adding a new flag. Fine.

@skylenet skylenet self-requested a review July 29, 2019 08:42
@nonsense nonsense merged commit 9f553ea into master Jul 29, 2019
Swarm Core - Sprint planning automation moved this from In progress to Done Jul 29, 2019
@nonsense nonsense deleted the no-hive-discovery-flag branch July 29, 2019 08:43
@skylenet skylenet added this to the 0.4.4 milestone Aug 9, 2019
chadsr added a commit to chadsr/swarm that referenced this pull request Sep 23, 2019
* 'master' of github.com:ethersphere/swarm: (54 commits)
  api, chunk, cmd, shed, storage: add support for pinning content (ethersphere#1509)
  docs/swarm-guide: cleanup (ethersphere#1620)
  travis: split jobs into different stages (ethersphere#1615)
  simulation: retry if we hit a collision on tcp/udp ports (ethersphere#1616)
  api, chunk: rename Tag.New to Tag.Create (ethersphere#1614)
  pss: instrumentation and refactor (ethersphere#1580)
  api, cmd, network: add --disable-auto-connect flag (ethersphere#1576)
  changelog: fix typo (ethersphere#1605)
  version: update to v0.4.4 unstable (ethersphere#1603)
  swarm: release v0.4.3 (ethersphere#1602)
  network/retrieve: add bzz-retrieve protocol (ethersphere#1589)
  PoC: Network simulation framework (ethersphere#1555)
  network: structured output for kademlia table (ethersphere#1586)
  client: add bzz client, update smoke tests (ethersphere#1582)
  swarm-smoke: fix check max prox hosts for pull/push sync modes (ethersphere#1578)
  cmd/swarm: allow using a network interface by name for nat purposes (ethersphere#1557)
  pss: disable TestForwardBasic (ethersphere#1544)
  api, network: count chunk deliveries per peer (ethersphere#1534)
  network/newstream: new stream! protocol base implementation (ethersphere#1500)
  swarm: fix bzz_info.port when using dynamic port allocation (ethersphere#1537)
  ...
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
No open projects
Development

Successfully merging this pull request may close these issues.

None yet

5 participants