Multiply OTBR do not allow scaling of devices number limit #12746
-
|
We are trying to build an OpenThread mesh network using multiple OTBRs. With a single OTBR, we can connect around 25 devices, and they are able to reach external resources (outside the mesh), such as IoT Hub. After adding a second OTBR and another 25 devices (all devices are in one large room and can see each other), the new OTBR becomes a secondary router. The new devices do not connect to IoT Hub. In fact, some of the first 25 devices also lose connectivity to IoT Hub. It looks like there is no proper scaling mechanism and the mesh becomes overloaded. All traffic leaving the mesh goes through a single border router, creating a bottleneck. We are using NAT64. The secondary OTBR sees the primary one over both radio and Ethernet. We tried modifying the radio selection algorithm to always prefer TREL. As a result, the two OTBRs started communicating via TREL instead of radio. However, all devices still use the primary OTBR to send their data outside the mesh, so the bottleneck remains. Is there any way to force some devices to route their traffic through the second OTBR? Should both OTBRs use different prefixes? Is there a way to create sub-networks or partitions within a single network? Or maybe NAT64 is the problem because two OTBRs cannot have it enabled at the same time. Or maybe we should use separate partitions and both OTBRs should provide a synchronization mechanism for communication between both partitions? |
Beta Was this translation helpful? Give feedback.
Replies: 5 comments 3 replies
-
|
If you are using NAT64 on the BR itself, then you would be limited by the fact that only one of the BRs would act as the NAT64 translator. The fact that all traffic will go through that BR is expected and normal behavior in such a case. |
Beta Was this translation helpful? Give feedback.
-
|
NAT64 is stateful. As a result, all traffic for a given NAT64 prefix must flow through a single NAT64 translator. This is not specific to Thread. Can you describe what is causing the bottleneck performance issue? You mentioned enabling TREL, so the 802.15.4 channel utilization should not be the bottleneck. |
Beta Was this translation helpful? Give feedback.
-
Beta Was this translation helpful? Give feedback.
-
|
In Case 1, NAT64 is not the bottleneck. A larger number of OTBRs might eventually cause CPU or RAM pressure on the primary border router — for example, five OTBRs serving more than 100 devices could become problematic. But with only two OTBRs, everything works fine. Regarding “multiple Thread networks”: |
Beta Was this translation helpful? Give feedback.
-
Beta Was this translation helpful? Give feedback.



Here are some insights that might help clarify your observations:
a) Distinguishing BR Bottleneck vs. RF Congestion
Yes, the MAC counters you mentioned are reliable indicators of RF channel conditions:
TxErrCca: Increments when the device fails to transmit because the Clear Channel Assessment (CCA) detected that the channel was busy.TxErrBusyChannel: Increments when a frame is dropped due to repeated channel access failures.Patterns indicating RF Congestion:
TxErrCcaandTxErrBusyChannelrelative to successful transmissions (TxSuccess).Patterns indicating BR…