New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
olsrd fails to get routes when multiple mesh networks are present #14
Comments
When we were testing this a couple weeks ago, we found out the issue wasn't with olsrd establishing routes, but rather forming ad-hoc links with neighbors. Although the client's network-manager indicates a successful connection to the ad-hoc network, pinging or communicating with any or some of the ad-hoc peers fails (which then leads to olsrd not getting any packets from which to establish routes). |
-----BEGIN PGP SIGNED MESSAGE----- Hrm... that doesn't surprise me at all, and is wholly consistent with
iQEcBAEBCgAGBQJS5rYdAAoJEL+9ounAjYBC0rwH/2SaY1Y3M1XIXBRXjM8BTE8c |
I'm really certain this has nothing to do with interfering ad-hoc networks. We were able to reproduce the problem both when there were other ad-hoc networks present, and when there weren't any others. I did extensive testing to show that the olsrd route problem occurred if and only if the client had bad links with all its ad-hoc neighbors. As soon as one good ad-hoc link was established, olsrd started receiving olsrd packets and building a routing table. This indicates that the problem was at the link layer, not application layer. My working hypothesis is that since the problem occurs inconsistently in the presence of other constant factors, it is indicative of a race condition. I believe the network-manager scripts are running asynchronously, and that this is responsible for screwing up the network stack during the connection process, possibly related to wpa_supplicant. To test this hypothesis, we'd need to do try joining the same ad-hoc network both through network-manager, and without network-manager. If the later doesn't show the same symptoms, then we'll know the source of the problem. |
Okay, that last test is easy to do: the fallback version of commotion-linux circumvents network-manager entirely, and even includes a patch that (theoretically) disables channel hopping. I think I've seen this fallback method fail in precisely the same way that the normal method does - but I didn't control for exact order of network connection operations. So let's give this a try, perhaps tomorrow. Thanks for all the good debugging info! |
Suspected to be related to low-level channel hopping/merging tendencies in wpa_supplicant, various wireless driver stacks, or both, this is a long-outstanding problem that appears to have no easy solution. For now, it is recommended that the Linux client be used only in environments in which only one Commotion mesh network is active.
The text was updated successfully, but these errors were encountered: