Add NetworkDB docs #2238

talex5 · 2018-07-20T10:32:35Z

This documentation addition is based on reading the code in the networkdb directory.

GordonTheTurtle · 2018-07-20T10:32:37Z

Please sign your commits following these rules:
https://github.com/moby/moby/blob/master/CONTRIBUTING.md#sign-your-work
The easiest way to do this is to amend the last commit:

$ git clone -b "networkdb-docs" git@github.com:talex5/libnetwork.git somewhere
$ cd somewhere
$ git commit --amend -s --no-edit
$ git push -f

Amending updates the existing PR. You DO NOT need to open a new one.

fcrisciani

left some comments
Definitely helpful for new people

fcrisciani · 2018-07-20T17:19:14Z

docs/networkdb.md

+There are two databases used in libnetwork:
+
+- A persistent database that stores the network configuration requested by the user. This is typically the SwarmKit managers' raft store.
+- A non-persistent peer-to-peer gossip-based database that keeps track of the current runtime state. This is NetworkDB.


maybe we can add that is mainly used for transport of the data, there is no actual Get being done on the DB itself, is used mainly a pub/sub

I've added a bit more about this below ("Nodes look up information using their local networkdb instance. Queries are not sent to remote nodes.").

fcrisciani · 2018-07-20T17:21:29Z

docs/networkdb.md

+- For each peer node, the set of networks to which that node is connected.
+- For each of the node's currently-in-use networks, a set of named tables of key/value pairs.
+
+Updates are spread throughout the cluster through the gossip protocols, and nodes may have inconsistent views at any given time.


plus periodic tcp syncs

I've reworded this to avoid mentioning gossip here. We explain about gossip and full syncs later.

fcrisciani · 2018-07-20T17:23:52Z

docs/networkdb.md

+Note that nodes only keep track of tables for networks to which they belong.
+Updates to a network's tables are only shared between nodes that are on that network.
+
+NetworkDB does not impose any structure on the tables.


Maybe we can rephrase saying that networkDB is a key value store and the only requirement is that the key is a string while the value is a []byte

fcrisciani · 2018-07-20T17:25:01Z

docs/networkdb.md

+For example, there are tables for service discovery and load balancing,
+and the [overlay](overlay.md) driver uses NetworkDB to store routing information.
+
+All nodes in a libnetwork cluster join the gossip network.


s/network/cluster
mentioning network here can be misleading with the concept of network of libnetwork

fcrisciani · 2018-07-20T17:26:55Z

docs/networkdb.md

+
+All nodes in a libnetwork cluster join the gossip network.
+To do this, they need the IP address and port of at least one other member of the cluster.
+In the case of a SwarmKit cluster, for example, each Docker engine will use the IP addresses of the swarm managers as the initial join addresses.


We should also mention today's limitation that there is no feedback loop on the manager list, so if the 3 managers IP changes the list won't be updated.

It appears that nDB.Join(addrs) can be called at any time to update the list, so maybe this problem is outside of networkDB?

fcrisciani · 2018-07-20T17:34:10Z

docs/networkdb.md

+It will also perform a bulk-sync of the network-specific state (the tables) with every other node on the network being joined.
+This will allow it to get all the network-specific information quickly.
+The tables will mostly be kept up-to-date by UDP gossip messages between the nodes on that network, but
+each node in the network will also do a full TCP sync of the tables with another random node on the same network from time to time.


maybe redundant with line 33? one of them can go

I think it's important to show that there are two systems here, even though they are similar. I've added a clarification of this below.

fcrisciani · 2018-07-20T18:13:14Z

docs/networkdb.md

+
+When a node wishes to leave a network, it will send a `NetworkEventTypeLeave` via gossip. It will then delete the network's table data.
+When a node hears that another node is leaving a network, it deletes all table entries belonging to the leaving node.
+Deleting an entry in this case means marking it for deletion for a while (so that the deletion can propagate via gossip too).


and more important to protect itself from receiving an old CREATE with lower version and accept it

fcrisciani · 2018-07-20T18:14:17Z

docs/networkdb.md

+
+When a node wishes to leave the cluster, it will send a `NodeEventTypeLeave` message via gossip.
+Nodes receiving this will mark the node as "left".
+The node will then send a memberlist leave message too.


meaning will forward it to others?

Sorry, that was unclear. I meant that the original node will send a memberlist leave message.

fcrisciani · 2018-07-20T18:16:12Z

networkdb/delegate.go

@@ -402,6 +403,7 @@ func (d *delegate) NotifyMsg(buf []byte) {
 	d.nDB.handleMessage(buf, false)
 }

+// XXX: should this limit be shared?
 func (d *delegate) GetBroadcasts(overhead, limit int) [][]byte {


Not sure why this PR(#1446) changed it this way. To me makes more sense the original code with the simple return...

fcrisciani · 2018-07-20T18:21:25Z

networkdb/cluster.go

@@ -492,7 +492,7 @@ func (nDB *NetworkDB) gossip() {
 			nDB.RUnlock()

 			if mnode == nil {
-				break
+				break // XXX: shouldn't this be "continue"?


yep a continue makes more sense

This is based on reading the code in the `networkdb` directory. Signed-off-by: Thomas Leonard <thomas.leonard@docker.com>

talex5

I've updated the doc based on @fcrisciani's feedback. I'll move the other (non-documentation) parts to another issue.

talex5 · 2018-08-08T09:15:41Z

docs/networkdb.md

+
+All nodes in a libnetwork cluster join the gossip network.
+To do this, they need the IP address and port of at least one other member of the cluster.
+In the case of a SwarmKit cluster, for example, each Docker engine will use the IP addresses of the swarm managers as the initial join addresses.


It appears that nDB.Join(addrs) can be called at any time to update the list, so maybe this problem is outside of networkDB?

talex5 · 2018-08-08T10:50:25Z

docs/networkdb.md

+It will also perform a bulk-sync of the network-specific state (the tables) with every other node on the network being joined.
+This will allow it to get all the network-specific information quickly.
+The tables will mostly be kept up-to-date by UDP gossip messages between the nodes on that network, but
+each node in the network will also do a full TCP sync of the tables with another random node on the same network from time to time.


I think it's important to show that there are two systems here, even though they are similar. I've added a clarification of this below.

talex5 · 2018-08-08T10:58:54Z

docs/networkdb.md

+- For each peer node, the set of networks to which that node is connected.
+- For each of the node's currently-in-use networks, a set of named tables of key/value pairs.
+
+Updates are spread throughout the cluster through the gossip protocols, and nodes may have inconsistent views at any given time.


I've reworded this to avoid mentioning gossip here. We explain about gossip and full syncs later.

talex5 · 2018-08-08T10:59:41Z

docs/networkdb.md

+There are two databases used in libnetwork:
+
+- A persistent database that stores the network configuration requested by the user. This is typically the SwarmKit managers' raft store.
+- A non-persistent peer-to-peer gossip-based database that keeps track of the current runtime state. This is NetworkDB.


I've added a bit more about this below ("Nodes look up information using their local networkdb instance. Queries are not sent to remote nodes.").

talex5 · 2018-08-08T11:15:11Z

docs/networkdb.md

+
+When a node wishes to leave the cluster, it will send a `NodeEventTypeLeave` message via gossip.
+Nodes receiving this will mark the node as "left".
+The node will then send a memberlist leave message too.


Sorry, that was unclear. I meant that the original node will send a memberlist leave message.

thaJeztah · 2019-01-23T20:46:54Z

ping @fcrisciani PTAL

fcrisciani

LGTM

GordonTheTurtle added the dco/no label Jul 20, 2018

talex5 force-pushed the networkdb-docs branch from 60a4701 to c7867aa Compare July 20, 2018 10:32

GordonTheTurtle removed the dco/no label Jul 20, 2018

talex5 force-pushed the networkdb-docs branch from c7867aa to e7c0e39 Compare July 20, 2018 11:14

fcrisciani reviewed Jul 20, 2018

View reviewed changes

Add NetworkDB docs

f5b53fe

This is based on reading the code in the `networkdb` directory. Signed-off-by: Thomas Leonard <thomas.leonard@docker.com>

talex5 force-pushed the networkdb-docs branch from e7c0e39 to f5b53fe Compare August 8, 2018 12:35

talex5 commented Aug 8, 2018

View reviewed changes

fcrisciani approved these changes Mar 14, 2019

View reviewed changes

fcrisciani merged commit ebcade7 into moby:master Mar 14, 2019

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add NetworkDB docs #2238

Add NetworkDB docs #2238

talex5 commented Jul 20, 2018 •

edited

Loading

GordonTheTurtle commented Jul 20, 2018

fcrisciani left a comment

fcrisciani Jul 20, 2018

talex5 Aug 8, 2018

fcrisciani Jul 20, 2018

talex5 Aug 8, 2018

fcrisciani Jul 20, 2018

fcrisciani Jul 20, 2018

fcrisciani Jul 20, 2018

talex5 Aug 8, 2018

fcrisciani Jul 20, 2018

talex5 Aug 8, 2018

fcrisciani Jul 20, 2018

fcrisciani Jul 20, 2018

talex5 Aug 8, 2018

fcrisciani Jul 20, 2018

fcrisciani Jul 20, 2018

talex5 left a comment

talex5 Aug 8, 2018

talex5 Aug 8, 2018

talex5 Aug 8, 2018

talex5 Aug 8, 2018

talex5 Aug 8, 2018

thaJeztah commented Jan 23, 2019

fcrisciani left a comment

Add NetworkDB docs #2238

Add NetworkDB docs #2238

Conversation

talex5 commented Jul 20, 2018 • edited Loading

GordonTheTurtle commented Jul 20, 2018

fcrisciani left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

talex5 left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

thaJeztah commented Jan 23, 2019

fcrisciani left a comment

Choose a reason for hiding this comment

talex5 commented Jul 20, 2018 •

edited

Loading