Issue with 22+ devices in a mesh #8

benhaley · 2016-09-14T00:14:45Z

Babeld has been working great for me with a small number of devices. When I add the 22nd device to a mesh network the amount of network chatter between machines went up tremendously; to the point that the management overhead prevents other data from getting transferred. It is reproducible in my environment.

I have tried reducing the hello interval and traced through the code to see what is happening. Have not spotted a hard limit on the number of devices or routes that can be supported. I am seeing evidence that a device gets dropped the set of neighbors and then new packets from that device trigger the new neighbor behavior.

Any suggestions on possible causes?

benhaley · 2016-09-14T18:40:29Z

The problem seems to come from the sequence number check logic. If the network has some packet loss or packets arrive out of order, the reach can get cleared. This causes babel to send hellos immediately; adding to the congestion. The result is that in a network that is suffering congestion problems, babel ramps up the volume tremendously. Removing the send_hello solved the problem for me, but that is probably not a general solution for mobile networks.

dtaht · 2017-02-20T18:41:57Z

I am curious where you made this change? I was looking over this section of code also, and seeing what you were seeing... https://github.com/dtaht/rabeld/blob/master/neighbour.c#L144

jech · 2017-03-09T00:50:31Z

Reopening, this looks interesting.

m13253 · 2018-03-24T19:45:24Z

Reproduced on a ZeroTier mesh network production environment with 9 nodes.

My network contains 8 nodes across the world with low packet loss, and one node with high packet loss to some (but not all) of the other nodes.
As soon as I connect the 9th node to babel, very high rate of packet loss and even intermittent interruptions that sometimes last for minutes are observed.
An interesting fact is that affected routes, according to mtr, does not go through the problematic 9th node.

My babeld configuration is:

interface zt0 type tunnel link-quality true rxcost 16 hello-interval 1 rtt-min 16 rtt-max 1024 max-rtt-penalty 1008

P.S. I cannot tell precisely if my problem is exactly the same with this issue, since some of my nodes are located in China, where a packet with certain combination of bytes can kill the entire TCP/UDP connection.

christf · 2018-08-12T16:51:24Z

@m13253 could you please have a look with your network sniffer of choice? It would be interesting to learn more about what is going on.

m13253 · 2018-08-13T08:43:08Z

@m13253 could you please have a look with your network sniffer of choice? It would be interesting to learn more about what is going on.

Nice idea.

But the original network is for productive use now that I would not want to add a node with 60% packet loss every day at 8pm.
I plan to build another test network to reproduce the problem.

ealione · 2018-09-07T11:41:07Z

I have the same issue with around 9 nodes, with link metrics growing to such levels that communication becomes almost impossible, could this have something to do with the wireless cards switching to a lower modulation due to the many interrupts from the nodes communicating with each other constantly?

jech · 2018-09-07T14:55:26Z

I have the same issue with around 9 nodes, with link metrics growing to such levels that communication becomes almost impossible, could this have something to do with the wireless cards switching to a lower modulation due to the many interrupts from the nodes communicating with each other constantly?

Probably not. Link quality sensing is done over multicast, which is unreliable in the presence of congestion. The network is probably dropping too many multicasts, and Babel is reacting to the resulting packet loss. We need to do more work on wireless link quality sensing. RFC 6126bis has provisions for link quality sensing over unicast, but we need to do more experimentation with that -- and other things are more important right now.

…

-- Juliusz

dtaht · 2018-09-08T15:10:18Z

One thing I've had to do is arbitrarily mark all babel packets (on my extensively fq_codel'd networks) as ecn-capable in net.c. Perhaps I've been treating a symptom.

jech · 2018-09-16T10:26:19Z

Anyone still inverstigating this? I'd really like to understand what's going on.

m13253 · 2018-09-16T11:45:59Z

Anyone still inverstigating this? I'd really like to understand what's going on.

I can't reproduce the problem with my current network condition any more.

And, perhaps the problem will be fixed when unicast is merged to the mainline.

tecoboot · 2018-12-08T19:52:27Z

I'm playing a bit with CORE emulator and babeld.
This problem seems easy to reproduce, I've a small script that generates whatever number of nodes, fully meshed, no loss.
With <20 nodes, no issues.
With ~ 35 nodes, I have huge load, almost all IHU. Rxcost is 0xffff or 0x060.
Sometimes a packet has multiple IHU for same neighbor.

Here a pcap, captured during slow start: each second an additional node is started.
babeld-35nodes-1s-slowstart-1000packets.pcapng.zip

dtaht · 2018-12-08T20:32:32Z

@tecoboot - nice to see you! Could you share your core emulator setup? I've been trying to get to a better emulation envionrment myself

I note there is a bug in mainline babeld at the moment... fixed in a patch I sent to the mailing list a few weeks back. Might be related to this....

tecoboot · 2018-12-09T08:12:34Z

Maybe I can help to get babeld in good shape. I heard about disappointing results where babel was compared to olsrv2. Too bad (to be true?).

I run core in Xubuntu 1804, in VMware Fusion 11, on new MacBook. Any Linux will do, but be careful running it on a native OS, as normal users can easily run privileged code.
Installation:
https://github.com/coreemu/core

The babeld config generator puts N routers in a square, all connected to a wireless lan.
I use sleep pauses to set up Wireshark on the wlan (is a bridge interface set up by core when config is started). And to start babeld one by one.

The results I shared was with current ubuntu provided babeld: 1.7.0.

When this issue is fixed, I want to dig into packet overhead (why no address compression for IHU?), dynamic timers (triple?) etc. I only will ask questions and share results, I don't code myself.
babeld-core.zip

How it looks like:

I note there is a bug in mainline babeld at the moment... fixed in a patch I sent to the mailing list a few weeks back. Might be related to this....

I'll check. I'll use master for follow-up tests.

tecoboot · 2018-12-09T08:49:35Z

OK, with babeld-1.8.4-42-g73d0b1a I have better results.
During startup, overhead is quite high. In my emulator, the topology converges and load reduces from 1Mb/s to 40kb/s. I'll test with emane (L1-simulator, works well with core).

BTW: i didnt test cleanup of my generator script. Or add rm config file or change >> to > in cat config line.

tecoboot · 2018-12-10T20:31:11Z

Hmm, retried on my older macbook (Core2 duo), bad results. Exactly the same script, but now a total network meltdown. The 35 babeld instances eaten all the CPU and it was hard to stop core or wireshark. Nice test tool for linux dispatcher and wireshark memory management :-).

The sad news is that babeld cannot handle a bottleneck. Don't know if it is caused by shortage of CPU or lost packets due to shortage of CPU.

I reran the test, now with longer delays: first pause of 100s before starting the babeld daemons. Then starting babeld one by one, 10s interval. This works well.

But in real networks, we don't want the meltdown trigger, do we?

I'll need some thoughts on how to locate the trigger.

tecoboot · 2018-12-10T20:47:22Z

Here the slowstart pcap file.

babeld-35nodes-slowstart-10s.zip

Still lots of requests and IHU.
Without average, lots of spikes.

tecoboot · 2018-12-11T07:49:52Z

I reran with 8 nodes. I turned off dynamic Rxcost, to eliminate IHU caused by slow metric algorithm and IHUs to exchange that. I expected just a few packets to insert the new node in the topology.

babeld-8nodes-slowstart-10s.zip

See for example addition of second node in packets 9 to 24. It takes:
Hello: 9
Update: 3
NH: 1
RouterID: 1
IHU: 10
Request: 2
(and a couple of ICMPv6)
My thoughts:

Far to many IHUs and Hellos. Seems like a bug.
Learning RouterID, ND etc is slow, newly joined nodes have to wait for update interval.
@Juliusz?

dtaht · 2018-12-11T17:49:19Z

Excellent work, thanks! Could you apply this patch?

[PATCH 1/2] Fix ifup bug in send_multicast

From: Dave Taht <dave@taht.net>

It was essentially a no-op with an inverted test.
---
 message.c | 6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)

diff --git a/message.c b/message.c
index 043e4b6..930fc69 100644
--- a/message.c
+++ b/message.c
@@ -1746,7 +1746,7 @@ send_multicast_request(struct interface *ifp,
     if(ifp == NULL) {
         struct interface *ifp_auxn;
         FOR_ALL_INTERFACES(ifp_auxn) {
-            if(if_up(ifp_auxn))
+            if(!if_up(ifp_auxn))
                 continue;
             send_multicast_request(ifp_auxn, prefix, plen, src_prefix, src_plen);
         }
@@ -1765,10 +1765,10 @@ send_multicast_request(struct interface *ifp,
             if(neigh->ifp == ifp) {
                 send_request(&neigh->buf, prefix, plen,
                              src_prefix, src_plen);
-            } else {
-                send_request(&ifp->buf, prefix, plen, src_prefix, src_plen);
             }
         }
+    } else {
+        send_request(&ifp->buf, prefix, plen, src_prefix, src_plen);
     }
 }

dtaht · 2018-12-11T18:18:29Z

As for "not being able to handle a bottleneck" - babeld has two problems - one we've seen pathlogical cases like yours that exhaust local cpu, hellos become late, retransmits spike, and more cpu gets used - which I exposed via my "rtod" tool a few months back. One fallout of that was that babel now logs late hellos, and it would be interesting if you are seeing that in your tests. Finding and fixing those cases is definately high on my mind. The other, is that babel uses naive data structures which have a tendency not to scale well (while you are exhausting cpu). I've got some work in progress adding hashes to multiple formerly linked list lookups that appear quite promising, and @christf has been working on improving the scheduler.

tecoboot · 2018-12-11T19:40:57Z

Patched.
On slow-start with 8 nodes, I don't see a difference.
Here log (-d 1) from node 1.

babeld-8nodes-node1.zip

Do you want me to run the meltdown test?
I prefer handling the huge number of helloś and IHUs first.
For example frame 16, 3x IHU for same neighbor. Silly behavior.

babeld-8nodes-slowstart-10s-11dec2018-2026.zip

Frame 16: 122 bytes on wire (976 bits), 122 bytes captured (976 bits) on interface 0
Ethernet II, Src: 00:00:00_aa:00:00 (00:00:00:aa:00:00), Dst: IPv6mcast_01:00:06 (33:33:00:01:00:06)
Internet Protocol Version 6, Src: fe80::200:ff:feaa:0, Dst: ff02::1:6
User Datagram Protocol, Src Port: 6696, Dst Port: 6696
Babel Routing Protocol
Magic: 42
Version: 2
Body Length: 56
Message hello (4)
Message Type: hello (4)
Message Length: 6
Seqno: 0x8512
Interval: 400
Message ihu (5)
Message Type: ihu (5)
Message Length: 14
Rxcost: 0x0060
Interval: 1200
Address: fe80::200:ff:feaa:1
Address Encoding: Link-Local IPv6 (3)
Raw Prefix: 006004b0020000fffeaa0001
Message ihu (5)
Message Type: ihu (5)
Message Length: 14
Rxcost: 0x0060
Interval: 1200
Address: fe80::200:ff:feaa:1
Address Encoding: Link-Local IPv6 (3)
Raw Prefix: 006004b0020000fffeaa0001
Message ihu (5)
Message Type: ihu (5)
Message Length: 14
Rxcost: 0x0060
Interval: 1200
Address: fe80::200:ff:feaa:1
Address Encoding: Link-Local IPv6 (3)
Raw Prefix: 006004b0020000fffeaa0001

Maybe it has to do with updates of internal state, eg rechability.

dtaht · 2018-12-11T19:52:03Z

Teco Boot <notifications@github.com> writes:

Patched. On slow-start with 8 nodes, I don't see a difference. Here log (-d 1) from node 1. babeld-8nodes-node1.zip Do you want me to run the meltdown test?

Heh. I LOVE melting things down. :) But no, I agree with your below...

I prefer handling the huge number of helloś and IHUs first. For example frame 16, 3x IHU for same neighbor. Silly behavior.

You are right that there is a bug there! Regrettably I don't have time this week to jump on this, and juliusz is busy with finals and so on, so I mentioned it on the babel-users list. It sounds like we have at *least* two bugs here. :sigh: I'll have time next week.

…

babeld-8nodes-slowstart-10s-11dec2018-2026.zip Frame 16: 122 bytes on wire (976 bits), 122 bytes captured (976 bits) on interface 0 Ethernet II, Src: 00:00:00_aa:00:00 (00:00:00:aa:00:00), Dst: IPv6mcast_01:00:06 (33:33:00:01:00:06) Internet Protocol Version 6, Src: fe80::200:ff:feaa:0, Dst: ff02::1:6 User Datagram Protocol, Src Port: 6696, Dst Port: 6696 Babel Routing Protocol Magic: 42 Version: 2 Body Length: 56 Message hello (4) Message Type: hello (4) Message Length: 6 Seqno: 0x8512 Interval: 400 Message ihu (5) Message Type: ihu (5) Message Length: 14 Rxcost: 0x0060 Interval: 1200 Address: fe80::200:ff:feaa:1 Address Encoding: Link-Local IPv6 (3) Raw Prefix: 006004b0020000fffeaa0001 Message ihu (5) Message Type: ihu (5) Message Length: 14 Rxcost: 0x0060 Interval: 1200 Address: fe80::200:ff:feaa:1 Address Encoding: Link-Local IPv6 (3) Raw Prefix: 006004b0020000fffeaa0001 Message ihu (5) Message Type: ihu (5) Message Length: 14 Rxcost: 0x0060 Interval: 1200 Address: fe80::200:ff:feaa:1 Address Encoding: Link-Local IPv6 (3) Raw Prefix: 006004b0020000fffeaa0001 Maybe it has to do with updates of internal state, eg rechability. — You are receiving this because you commented. Reply to this email directly, view it on GitHub, or mute the thread.

MisterDA · 2018-12-11T22:20:25Z

Is there a way to profile babeld while you run it in the emulator?

tecoboot · 2018-12-12T05:31:47Z

I guess so. Each router is a mini VM in a container. Double click opens a root shell. What is your interest?

…

Op 11 dec. 2018 om 23:20 heeft Antonin Décimo ***@***.***> het volgende geschreven: Is there a way to profile babeld while you run it in the emulator? — You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub, or mute the thread.

MisterDA · 2018-12-12T17:26:23Z

They say one should profile before optimizing. I’d be interested to have real-world data on where babeld spends its time and look there.
The easiest is to compile with -pg option, then when babeld exits, it will write its call graph to a gmon.out file wich can be processed with the gprof tool.

dtaht · 2018-12-13T20:42:42Z

well, kill it when it's misbehaving, after compiling with -pg, and let us know what gprof reports? I have at least one other patch that might be relevant (it's in my uthash branch) - but I'm certain we have at least one real bug here.

christf · 2018-12-19T15:40:18Z

According to the code there is one IHU (that is including marginal IHU) per Hello.

Also, the hello-count of 9 seems high.

jech · 2018-12-22T17:15:34Z

Thanks, that's very helpful.

Could you please try again after applying the following patch?

diff --git a/neighbour.c b/neighbour.c
index 8bf53a2..7604b4c 100644
--- a/neighbour.c
+++ b/neighbour.c
@@ -201,12 +201,6 @@ update_neighbour(struct neighbour *neigh, struct hello_history *hist,
         }
     }
 
-    if((hist->reach & 0xFC00) == 0xC000) {
-        /* This is a newish neighbour, let's request a full route dump.
-           We ought to avoid this when the network is dense */
-        send_unicast_request(neigh, NULL, 0, NULL, 0);
-        send_ihu(neigh, NULL);
-    }
     return rc;
 }

jech · 2018-12-22T18:05:04Z

After testing myself, this patch appears to pretty much solve the problem. It will slow down insertion of a new node, but I think avoiding a network meltdown is worth it.

tecoboot · 2018-12-23T12:26:42Z

I'm not so sure this patch helps much. I still can see the silly IHU behavior. There are other packet storms.

See a couple of storms and the IHU peaks; 49 nodes, 10s interval between babeld start:

Normal node insertion (not one of the three peaks):

Router-ID storm during first peak:

babeld-49nodes-10s-delay.pcapng.zip

I think the triggered messages needs much more thoughts.

jech · 2018-12-23T16:07:59Z

I'm not so sure this patch helps much. I still can see the silly IHU behavior. There are other packet storms.

The issue here is that we're sending a Hello whenever we learn a new peer, and we always send IHUs together with the Hello. I think the fundamental sin is not rate-limiting the Hellos, the IHUs are just a consequence. Sending an unscheduled Hello to a new peer is a good idea, we just shouldn't be doing it when we quickly learn a sequence of new nodes in quick succession. Ideas?

Router-ID storm during first peak:

The router-ids are an epiphenomenon -- what we've got here is an Update storm. It's not too extreme, but yes, the triggered update logic might be too aggressive.

…

-- Juliusz

tecoboot · 2018-12-23T18:26:23Z

Sending an unscheduled Hello to a new peer is a good idea, we just shouldn't be doing it when we quickly learn a sequence of new nodes in quick succession. Ideas?

The interval between startup of nodes was 10s.I don't think this is "in quick succession".
My first testrun was with concurrent startup. This resulted in a meltdown.

Can the behavior be caused by triggered messages, without any checks if such messages can be delayed, combined and/or suppressed due to duplicates? I would say a message should not be send more than X times in Y centiseconds, e.g. 1 time each 50 centiseconds.

jech · 2018-12-23T22:31:42Z

What about this? (Not committing yet, I need to think over the consequences.)

diff --git a/neighbour.c b/neighbour.c
index 7604b4c..c478762 100644
--- a/neighbour.c
+++ b/neighbour.c
@@ -117,7 +117,6 @@ find_neighbour(const unsigned char *address, struct interface *ifp)
     neigh->next = neighs;
     neighs = neigh;
     local_notify_neighbour(neigh, LOCAL_ADD);
-    send_hello(ifp);
     return neigh;
 }
 
@@ -184,23 +183,6 @@ update_neighbour(struct neighbour *neigh, struct hello_history *hist,
             rc = 1;
     }
 
-    if(unicast)
-        return rc;
-
-    /* Make sure to give neighbours some feedback early after association */
-    if((hist->reach & 0xBF00) == 0x8000) {
-        /* A new neighbour */
-        send_hello(neigh->ifp);
-    } else {
-        /* Don't send hellos, in order to avoid a positive feedback loop. */
-        int a = (hist->reach & 0xC000);
-        int b = (hist->reach & 0x3000);
-        if((a == 0xC000 && b == 0) || (a == 0 && b == 0x3000)) {
-            /* Reachability is either 1100 or 0011 */
-            send_self_update(neigh->ifp);
-        }
-    }
-
     return rc;
 }

tecoboot · 2018-12-24T09:58:49Z

This helps a lot. Peaks up to 100kbit/sec, avg 40kbit/sec.

babel packets: 2989

msg. packets. #msgs
hello 2351 2351
ihu 2268 23192
router-id 1673 17034
nh 1673 1673
request 70 70
update 1708 17104

babeld-24dec2018.pcapng.gz

Question is: is there a convergence-time trade-off?

tecoboot · 2018-12-24T10:09:15Z

I tried the 49-node concurrent start also. Looks much better :-)
Keep in mind this is ethernet bridging, not EMANE or real wireless. I'll work with EMANE as next step.

babeld-24dec2018-concurrent-start.pcapng.gz

jech · 2018-12-24T11:18:52Z

Question is: is there a convergence-time trade-off?

Yes. The code in question sends an update whenever it sees a new node. This is bad in the very dense network that you're simulating -- a node discovers its neighbours one by one, and sends an update for each. The tradeoff, of course, is that now a node doesn't receive an immediate update when it moves from one neighborhood of the network to another, it must wait for the periodic update. A ------------ B A ------------ B | ----> | C C (If you could simulate the scenario above, that would be helpful.) Of course, I'm entirely in favour of avoiding a meltdown, even in a rather unrealistic situation (Babel is not designed for very dense networks), so I will remove the faulty code. I just want to think it over some more.

…

-- Juliusz

dtaht · 2018-12-24T14:23:51Z

One of my overall proposals is that the code start taking advantage of having different length intervals available in the protocol. In this case, as the observed density (in IHUs and/or hellos, or route announcements with a unique routerid) goes up, the hello interval, increases. You have 4 speakers, use 4sec, perhaps w/16, 32sec. (pick a scaling factor, any one...) a hello interval of merely " observed speakers"... essentially preserves babel's existing properties for less dense meshes - a lonely hello would be 1sec (or min 2) which might improve matters some on sparse meshes...

Same concept applies to route announcements also, although I'd announce default routes more frequently than obscure routes, with a stated longer interval.

A new node will typically learn "enough" of the topology within a few route transfers, also. Trying to pick a "good" node(s) to request a route dump from could possibly be smarter. Picking the most disjoint set of hellos/ihus? Ask for a route dump from the best + worst + random, initially? Do you even need to have a hello/ihu exchange to start picking up routes or to calculate number of speakers?

If reachability is on the decline, rather than increase, additively decrease the announced hello or route announcement interval. (this is a very tcp-ish idea)

@tecoboot I'd rather like to witness what happens on this test when each node has 1k (or more!) distinct routes. From within a bunker at a safe distance. With popcorn. (I do really want to set this emulation environment up, but am working on revising my rtod tool instead). I'm loving these graphs, overall.

dtaht · 2018-12-24T16:13:30Z

@tecoboot Re: "For example frame 16, 3x IHU for same neighbor. Silly behavior."... later... "I still see the silly ihu behavior"...

By this "silly behavior" you mean the more than one IHU per neighbor is still there? I see in frame 36 of babeld-24dec2018-concurrent-start.pcapng.gz that it is, indeed, repeating fe80::200:ff:feaa:2 & :1 twice, me, skipping randomly forward by eyeball, it's doing the same on frame 85. So there is a bug here, also, in how ihus are merged.

dtaht · 2018-12-24T17:14:59Z

@m13253 - I was looking over the zerotier thing, which looks pretty neat though I don't grok how it works. Along the way I noticed the https://github.com/nanomsg/nng library which might be a substrate worth building other things on.

tecoboot · 2018-12-24T18:49:03Z

Question is: is there a convergence-time trade-off?
Yes. The code in question sends an update whenever it sees a new node.

I guessed so.

This is bad in the very dense network that you're simulating

This issue was reported with 22 and 9 nodes.
My current test uses 35 nodes (not 49 yet). I'll add more to see how far I can push babeld.

Of course, I'm entirely in favour of avoiding a meltdown, even in a rather unrealistic situation (Babel is not designed for very dense networks), so I will remove the faulty code. I just want to think it over some more.

Maybe keep the trigged mode with non-default config option. With UseAtYourOwnRisk in man page.

My thoughts are in line with what Dave suggest: smarter triggered updates with variable timers. A smart message scheduler could incorporate line utilization (like Cisco EIGRP, don't blow up the medium), link quality (higher repetition rate on lossy links), neighbor state (don't send unused messages at high frequency). The goal would be: important messages go first.

tecoboot · 2018-12-24T18:59:47Z

@tecoboot Re: "For example frame 16, 3x IHU for same neighbor. Silly behavior."... later... "I still see the silly ihu behavior"...

By this "silly behavior" you mean the more than one IHU per neighbor is still there? I see in frame 36 of babeld-24dec2018-concurrent-start.pcapng.gz that it is, indeed, repeating fe80::200:ff:feaa:2 & :1 twice, me, skipping randomly forward by eyeball, it's doing the same on frame 85. So there is a bug here, also, in how ihus are merged.

Yes, still unneeded redundancy :-)

It was frame 38 (not 36), IHU neighbor addresses in same packet, sorted :
Address: fe80::200:ff:feaa:0
Address: fe80::200:ff:feaa:1
Address: fe80::200:ff:feaa:1
Address: fe80::200:ff:feaa:2
Address: fe80::200:ff:feaa:2
Address: fe80::200:ff:feaa:3
Address: fe80::200:ff:feaa:3
Address: fe80::200:ff:feaa:5
Address: fe80::200:ff:feaa:5
Address: fe80::200:ff:feaa:6
Address: fe80::200:ff:feaa:6
Address: fe80::200:ff:feaa:7
Address: fe80::200:ff:feaa:7

This could be caused by message queuing, but lack of removal from queue when message is updated.
Addresses and Rxcost:
Rxcost: 0x03ff
Address: fe80::200:ff:feaa:2
Rxcost: 0x01ff
Address: fe80::200:ff:feaa:0
Rxcost: 0x03ff
Address: fe80::200:ff:feaa:1
Rxcost: 0x0155 (was 0x03ff)
Address: fe80::200:ff:feaa:2
Rxcost: 0x0155 (was 0x03ff)
Address: fe80::200:ff:feaa:1
Rxcost: 0x03ff
Address: fe80::200:ff:feaa:6
Rxcost: 0x03ff
Address: fe80::200:ff:feaa:3
Rxcost: 0x03ff
Address: fe80::200:ff:feaa:5
Rxcost: 0x03ff
Address: fe80::200:ff:feaa:7
Rxcost: 0x0155 (was 0x03ff)
Address: fe80::200:ff:feaa:5
Rxcost: 0x0155 (was 0x03ff)
Address: fe80::200:ff:feaa:7
Rxcost: 0x0155 (was 0x03ff)
Address: fe80::200:ff:feaa:6
Rxcost: 0x0155 (was 0x03ff)
Address: fe80::200:ff:feaa:3

Could the duplicates be removed?

tecoboot · 2018-12-24T19:48:26Z

Wit concurrent startup with 48, 63 and 99 nodes, there was no meltdown :-)

48 nodes, 70kbit/sec:

63 nodes, 120kbit/sec:

99 nodes, 280kbit/sec:

Core with 99 nodes:

I'm quite sure that this density with real radio's will end up in lousy performance. But good to know the protocol and implementation can handle this.

jech · 2018-12-26T23:03:55Z

I've committed just part of this patch, the bit about sending Hellos proactively. The other part I don't feel comfortable with removing, although I don't like the current code. Teco, thanks for showing me this, that's a lot of help.

spiccinini · 2019-05-29T20:04:10Z

Hi @jech thanks for working on this!

I want to let you know that we are currently using / testing 39c5f0a at some small (<20 nodes) community networks in Argentina and Brazil with good results for now. For the following weeks we will be performing a test in a ~60 nodes network. We will update with results.

dtaht · 2019-05-29T20:09:38Z

With that many nodes, also giving the new unicast support a go might be interesting.

jech · 2019-06-28T15:16:01Z

I've committed the rest of the patch in b66c173, so this issue is believed to be fixed.

Thanks to all of you, please create a new issue if there's anything more to do.

benhaley closed this as completed Sep 14, 2016

jech reopened this Mar 9, 2017

jech closed this as completed Dec 22, 2018

jech reopened this Dec 23, 2018

jech closed this as completed Jun 28, 2019

Issue with 22+ devices in a mesh #8

Issue with 22+ devices in a mesh #8

Comments

benhaley commented Sep 14, 2016

benhaley commented Sep 14, 2016

dtaht commented Feb 20, 2017

jech commented Mar 9, 2017

m13253 commented Mar 24, 2018 • edited Loading

christf commented Aug 12, 2018

m13253 commented Aug 13, 2018 • edited Loading

ealione commented Sep 7, 2018

jech commented Sep 7, 2018 via email

dtaht commented Sep 8, 2018

jech commented Sep 16, 2018

m13253 commented Sep 16, 2018

tecoboot commented Dec 8, 2018

dtaht commented Dec 8, 2018

tecoboot commented Dec 9, 2018

tecoboot commented Dec 9, 2018

tecoboot commented Dec 10, 2018

tecoboot commented Dec 10, 2018

tecoboot commented Dec 11, 2018

dtaht commented Dec 11, 2018 • edited Loading

dtaht commented Dec 11, 2018

tecoboot commented Dec 11, 2018

dtaht commented Dec 11, 2018 via email

MisterDA commented Dec 11, 2018

tecoboot commented Dec 12, 2018 via email

MisterDA commented Dec 12, 2018

dtaht commented Dec 13, 2018

christf commented Dec 19, 2018

jech commented Dec 22, 2018

jech commented Dec 22, 2018

tecoboot commented Dec 23, 2018

jech commented Dec 23, 2018 via email

tecoboot commented Dec 23, 2018

jech commented Dec 23, 2018

tecoboot commented Dec 24, 2018

tecoboot commented Dec 24, 2018

jech commented Dec 24, 2018 via email

dtaht commented Dec 24, 2018 • edited Loading

dtaht commented Dec 24, 2018 • edited Loading

dtaht commented Dec 24, 2018

tecoboot commented Dec 24, 2018

tecoboot commented Dec 24, 2018

tecoboot commented Dec 24, 2018

jech commented Dec 26, 2018

spiccinini commented May 29, 2019

dtaht commented May 29, 2019 via email

jech commented Jun 28, 2019

m13253 commented Mar 24, 2018 •

edited

Loading

m13253 commented Aug 13, 2018 •

edited

Loading

dtaht commented Dec 11, 2018 •

edited

Loading

dtaht commented Dec 24, 2018 •

edited

Loading

dtaht commented Dec 24, 2018 •

edited

Loading