New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

802.11s mesh down for several hours #1148

Closed
bobcanthelpyou opened this Issue Jun 22, 2017 · 13 comments

Comments

Projects
None yet
7 participants
@bobcanthelpyou
Copy link
Contributor

bobcanthelpyou commented Jun 22, 2017

Freifunk Ilmenau (802.11s site-ffil) has only a small network with around 20 nodes, we are a subsite of the batman adv v14 based Freifunk Erfurt (ibss site-ffef) with around 200 nodes.

Most of our nodes are mesh-only nodes without a fastd uplink. At the moment nodes with Gluon v2016.2, v2016.2.4 and v2017.1 are online (of our ffil subsite).

We see the following issue from time to time with different TP Link models within our network:

  • the client wifi interface of a node is up but all mesh connections are down for several hours
  • on our map nodes with fastd uplink are still online and mesh-only nodes are shown as offline
  • the status page of the affected node does not show any mesh0 nodes around

@NeoRaider suggested to do a iw dev mesh0 station dump on an affected node. Are there any other information which could be useful to bring more light up to that issue?

@rotanid

This comment has been minimized.

Copy link
Member

rotanid commented Jun 22, 2017

are you sure the described issue affects every one of your three firmware versions?
i wouldn't mind if v2016.2 is affected, that may very well be.

@rotanid rotanid added the bug label Jun 22, 2017

@bobcanthelpyou

This comment has been minimized.

Copy link
Contributor Author

bobcanthelpyou commented Jun 22, 2017

I am not totally sure about v2016.2 (there is only one node online) but v2016.2.4 is affected by that issue for sure. We see these issues for some time now and i hoped v2017.1 might fixed it but that issue still exists.

@Sunz3r

This comment has been minimized.

Copy link
Contributor

Sunz3r commented Jun 22, 2017

I can confirm this with v2016.2.5 on several WR841. It happens often on high traffic or airtime. There is nothing in dmesg or logread if this problem occur.
After "iw dev client0 scan" all Wifi-Mesh-Links are back.
https://pastebin.com/uW0NC7J8

On another Node i found this in dmesg. I dont know that this has a coherence:
https://pastebin.com/n8zgqB7E

@NeoRaider

This comment has been minimized.

Copy link
Member

NeoRaider commented Jun 22, 2017

Gluon v2016.2.x and earlier are known for poor ath9k stability. It would be interesting if you can also see the issue between nodes running v2017.1.

@freifunk-gluon freifunk-gluon deleted a comment from rubo77 Jun 25, 2017

@rotanid

This comment has been minimized.

Copy link
Member

rotanid commented Jun 25, 2017

please don't always propagate hard workarounds, the bugs will become more and more serious if we don't dig into them but just use some workaround. @rubo77 (but others do the same thing)
let's see what they have to say about v2017.1 nodes beneath eachother.

@bobcanthelpyou

This comment has been minimized.

Copy link
Contributor Author

bobcanthelpyou commented Jun 27, 2017

It happened again to a TP-Link TL-WDR4300 v1 with v2017.1. Neighbour nodes running Gluon v2017.1 and v2016.2.4

https://ilmenau.freifunk.net/meshviewer/#!v:m;n:30b5c23839e3

The mac address (no node name) of the affected node is visible via the status page (mesh0) of the neighbour nodes. Sadly, I don't have physical access to the node at the moment and can't gather any additional information.

Could some information of the neighbour nodes could help here, too?

Edit: The mac address of the affected node was also visible to the neighbour nodes when doing a scan or dump on the mesh0 interface.

The node was more or less 9 hour out of the mesh.

@valcryst

This comment has been minimized.

Copy link

valcryst commented Jul 29, 2017

This is also happening on all nodes we have recently upgradet from 2015 to 2016/2017.
Its a total mess, because we already lost around 400 routers that are not coming back without a powercycle.
If we had known this before, we would still be on 2015.1.2.

So why was there no warning about critical instabillity ?
"poor ath9k stability" i would call this "unuseable ath9k stability"

What can we do now to improve stability, flashing the next beta release where noone knows
what happens?
Use quickfix from Fichtenfunk and this script from FFFM?
Sorry for being so rude, but i would NEVER call this a stable release.

@A-Kasper

This comment has been minimized.

Copy link
Contributor

A-Kasper commented Jul 29, 2017

@rotanid

This comment has been minimized.

Copy link
Member

rotanid commented Jul 29, 2017

please stay on topic! such discussions should use the mailing list.
the problem @valcryst is describing probably has nothing to do with 802.11s and meshing, but may be #1160 - there's a fix already for master and v2017.1.x

@bobcanthelpyou

This comment has been minimized.

Copy link
Contributor Author

bobcanthelpyou commented Aug 5, 2017

We still hit that meshing problem (mesh0 of affected node is up, the node is seen by neighbour nodes, it is listed on the status pages, but no mesh connection) mostly for one or several hours. Running v2017.1.1 on e.g. WR841N-v10/v11

@NeoRaider mentioned an upstream issue in the LEDE bug tracker via IRC, unfortunately i can't find it any more.

@FreifunkUFO

This comment has been minimized.

Copy link

FreifunkUFO commented Oct 16, 2017

see https://bugs.lede-project.org/index.php?do=details&task_id=863

atm its a well-known problem, "wifi doenst work until a wifi-scan is done", and many communities with many different bugfixes in the wild..

@rotanid

This comment has been minimized.

Copy link
Member

rotanid commented Jul 12, 2018

please check if this bug is still there with Gluon v2017.1.8 or Gluon v2018.1, if yes, try Gluon master branch (based on OpenWrt 18.06)

@bobcanthelpyou

This comment has been minimized.

Copy link
Contributor Author

bobcanthelpyou commented Jul 14, 2018

That behavior didn't hit our routers lately. (with the latest v2017.1.x)

I could not reproduce it. It is fixed with the latest updates or our setup changed where it can't be reproduced.

I close the issue. Everyone with a similar issue and a resent Gluon version should open a new issue.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment