Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

No client network with "mesh on lan/wan" but no cable plugged in. #905

Closed
jannic opened this issue Oct 16, 2016 · 11 comments
Closed

No client network with "mesh on lan/wan" but no cable plugged in. #905

jannic opened this issue Oct 16, 2016 · 11 comments
Assignees
Labels
0. type: bug This is a bug
Milestone

Comments

@jannic
Copy link
Contributor

jannic commented Oct 16, 2016

This may be the same or related to #635, at least the descriptions are similar. As #635 is closed, and I'm not sure it's the same issue, I created a new one.

In short, I also observe that with Mesh on LAN + Mesh on WAN activated, but no cable plugged into any LAN ports, client networking is completely broken. This time on gluon 2016.2 (to be exact, experimental build 2016.2-1~exp20161004 of Freifunk Aachen, which is, to my knowledge, just 2016.2 with Aachen site.conf). TL-WR841v9 with serial console attached to see what's happening.

To be sure it's not some unrelated configuration issue, I started by resetting the whole config with 'firstboot'. Then, I only configured the following fields in config mode:

  • Enable meshing on the WAN interface
  • Enable meshing on the LAN interface
  • Set node name
  • Set contact info

After the following reboot, everything was fine. (LAN cable still connected.)

Then I removed the LAN cable and powercycled. Now, the client network (interface br-client) didn't come up, resulting in missing network connectivity, even though the batman network seems to be perfectly fine.

One obvious difference is in the output of 'ip link list'. (See attached files.):

  • 'NO-CARRIER' and missing 'LOWER_UP' on eth0 (expected!)
  • 'NO-CARRIER' and missing 'LOWER_UP' on br-mesh_lan (kind of expected)
  • Strange interface numbers in 'ip link list': 9-12 unused, 15: br-client, 16: primary0, 17: bat0 (instead of 9, 11,12), local-node@br-client missing, missing 'UP' and 'LOWER_UP' on br-client.
  • The latter probably is the cause for the missing connectivity.

During the whole process, there was no cable plugged in into the WAN port, and Wifi-Mesh was available. Meshing is still on IBSS, as 802.11s is not activated in the Aachen firmware, yet.

with_lan.txt
no_lan.txt

@neocturne
Copy link
Member

Please provide the output of logread and brctl show in the broken state.

@neocturne neocturne added the 0. type: bug This is a bug label Oct 16, 2016
@neocturne neocturne added this to the 2016.3 milestone Oct 16, 2016
@jannic
Copy link
Contributor Author

jannic commented Oct 16, 2016

Output of brctl show:

bridge name bridge id       STP enabled interfaces
br-mesh_lan     7fff.12ef6055700c   no      eth0
br-wan      7fff.12ef60557008   no      eth1
br-client       8000.5a78d9786cd9   no      client0

Output of logread is attached.
logread.txt

@neocturne
Copy link
Member

Hmm, probably another netifd race condition... I'll see if I can reproduce the issue and test the lastest netifd patches.

@rotanid
Copy link
Member

rotanid commented Nov 30, 2016

@neoraider did a netifd backport, could you @jannic test if your issue persists when building gluon using the update-netifd branch? https://github.com/freifunk-gluon/gluon/tree/update-netifd

@jannic
Copy link
Contributor Author

jannic commented Jan 18, 2017

Sorry that it took so long to test the updated netifd.

As the mentioned tree doesn't exist any more, I tried to find the commit manually. The one I found was fb2e14d, "netifd: update to LEDE 9a5801e7f6e8bc6641ca320e4497d298080f1b24"
I cherry-picked that commit onto v2016.2.2 and built a firmare for tp-wr841v9.

Unfortunately, the result was that I found exactly the same behavior as before: Without a cable plugged in, br-client is down, completely messing up networking.

@neocturne
Copy link
Member

Hmm. Could you test the current master, now that we've moved to LEDE?

@jannic
Copy link
Contributor Author

jannic commented Jan 18, 2017

Yes, and already did. First I thought the issue was solved, but on a second try, I observed the very same behavior as before. So it seems like it's still the same.
(Didn't do any detailed analysis, just compiled 78b2775 using make -j18 GLUON_TARGET=ar71xx-tiny DEVICES='tp-link-tl-wr841n-nd-v9' GLUON_RELEASE=myversion, put the result on the same router as before (config like described above) and rebootet. 1st try successful, 2nd try no connectivity)

@neocturne neocturne added this to the 2017.1 milestone Feb 25, 2017
@neocturne neocturne self-assigned this Feb 27, 2017
@A-Kasper
Copy link
Contributor

A-Kasper commented Mar 8, 2017

probably related to this:
Bullet hangs after a fey days. In Chat neoraider told me, that this may be a race-condition problem in netifd

Here you can find some collected Infos from two different nodes

http://pastebin.com/rdSegU5E
http://pastebin.com/iYmRK29h

@neocturne
Copy link
Member

While was not able to reproduce the exact same issues described here or in #1079, I did get my node into a somewhat similar broken situation (in which bat0 would suddenly have br-mesh_lan as primary interface instead of primary0), caused by setup and teardown scripts running concurrently.

I'll try to find a way to prevent such race conditions by adding some locking in the appropriate places.

Another issue is that something (probably the internal switch) is reset late in boot, after the network is already partially set up, causing some interfaces to be torn down again after being set up. I think these two issues could cause the behaviour described here when the timing is very unfortunate.

@neocturne
Copy link
Member

Please test with the current master (e45c303 or later), I've refactored the batman-adv interface management to make it much more robust.

@neocturne
Copy link
Member

As #1079 is fixed, and I assume that this is basically the same issue, I'll close this as well.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
0. type: bug This is a bug
Projects
None yet
Development

No branches or pull requests

4 participants