New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

IBSS mesh and client mode doesn't work in parallel with ath10k-ct #1584

Open
oliver opened this Issue Nov 25, 2018 · 36 comments

Comments

Projects
None yet
7 participants
@oliver
Copy link

oliver commented Nov 25, 2018

After the general 5GHz wifi problems with AC7 and 2018.1.1+bremen1 firmware were solved (in #1561), we found out that meshing on 5GHz still does not work.

This affects all FFHB 2018.1 firmwares so far, and has (only) been tested on Archer C7 v2 so far.

Symptoms are that a 5GHz-only device (eg. CPE510) does not create a working wifi mesh connection to the AC7.

Example device: http://og-ac7-testing-2.nodes.ffhb.de (https://map.ffhb.de/#!/en/map/f4f26d70b277).
The AC7 is in the same room as a CPE510 (https://map.ffhb.de/#!/en/map/c4e984b0a84a) and a WDR3600 (https://map.ffhb.de/#!/en/map/c4e984d5138e), both with 2017.1.8+bremen1 (ie. stable) firmware.
The Stable-FW devices are meshing nicely. The AC7 is meshing only via 2.4GHz. As result, the WDR3600 meshes with both other devices while the CPE510 meshes only with WDR3600.

Interestingly the devices appear to see each other via IBSS, but still don't get a mesh connection working:

root@cpe510-og-1:~# iw dev ibss0 station dump
Station aa:85:d3:37:f0:5e (on ibss0)
	inactive time:	0 ms
	rx bytes:	167231521
	rx packets:	1267310
	tx bytes:	15690420
	tx packets:	65650
	tx retries:	2531
	tx failed:	0
	rx drop misc:	0
	signal:  	-58 [-60, -62] dBm
	signal avg:	-59 [-61, -63] dBm
	tx bitrate:	144.4 MBit/s MCS 15 short GI
	rx bitrate:	144.4 MBit/s MCS 15 short GI
	expected throughput:	46.875Mbps
	authorized:	yes
	authenticated:	yes
	associated:	yes
	preamble:	long
	WMM/WME:	yes
	MFP:		no
	TDLS peer:	no
	DTIM period:	0
	beacon interval:100
	short slot time:yes
	connected time:	5629 seconds
Station 0e:9e:42:44:91:aa (on ibss0)
	inactive time:	30 ms
	rx bytes:	73632280
	rx packets:	567327
	tx bytes:	0
	tx packets:	0
	tx retries:	0
	tx failed:	0
	rx drop misc:	0
	signal:  	-61 [-63, -65] dBm
	signal avg:	-60 [-63, -63] dBm
	tx bitrate:	6.0 MBit/s
	authorized:	yes
	authenticated:	yes
	associated:	yes
	preamble:	long
	WMM/WME:	yes
	MFP:		no
	TDLS peer:	no
	DTIM period:	0
	beacon interval:100
	short slot time:yes
	connected time:	2544 seconds

Nothing strange in batctl if:

root@og-ac7-testing-2:~# batctl if
mesh-vpn: active
ibss1: active
ibss0: active
primary0: active

I'm AFK now for a while; will post dmesg etc. later, but there's not much to see there. What info would be useful to debug this?

@mweinelt

This comment has been minimized.

Copy link
Contributor

mweinelt commented Nov 25, 2018

Please post the output of batctl n

@rotanid rotanid added the bug label Nov 25, 2018

@oliver

This comment has been minimized.

Copy link
Author

oliver commented Nov 25, 2018

root@og-ac7-testing-2:~# batctl n
Error - no valid command or debug table specified: n
[...]
root@og-ac7-testing-2:~# batctl -v
batctl 2013.4.0 [batman-adv: 2013.4.0]
@oliver

This comment has been minimized.

Copy link
Author

oliver commented Nov 25, 2018

Various command results of an AC7 v2 with 2018.1.1+bremen2:

Command results of another AC7 v2 with (edit) 2017.1.8+bremen1:

@oliver

This comment has been minimized.

Copy link
Author

oliver commented Nov 25, 2018

root@og-ac7-testing-2:~# modinfo batman-adv
module:		/lib/modules/4.4.153/batman-adv.ko
version:	2013.4.0
description:	B.A.T.M.A.N. advanced
author:		Marek Lindner <lindner_marek@yahoo.de>, Simon Wunderlich <siwu@hrz.tu-chemnitz.de>
license:	GPL
depends:	
root@og-ac7-testing-2:~# lsmod | grep batman
batman_adv            106595  0 
root@og-ac7-testing-2:~# lsmod | wc -l
18

On 2018.1.1+bremen2 there are 18 modules loaded; on 2017.1.8+bremen1 there are 117 modules loaded! This is probably related to #1580, but I don't know whether it actually causes these problems.

Anyway, here's lsmod output for old and new firmware:

@mweinelt

This comment has been minimized.

Copy link
Contributor

mweinelt commented Nov 25, 2018

On 2018.1.1+bremen2 there are 18 modules loaded; on 2017.1.8+bremen1 there are 117 modules loaded! This is probably related to #1580, but I don't know whether it actually causes these problems.

Sounds like it. Please retry with the latest v2018.1.x commit and report back if that fixes the issue.

@NeoRaider

This comment has been minimized.

Copy link
Member

NeoRaider commented Nov 25, 2018

The low number of loaded module is expected, we started building as much as possible into the kernel.

@NeoRaider

This comment has been minimized.

Copy link
Member

NeoRaider commented Nov 30, 2018

To check whether the issue is in ath10k or in batadv, try if a simple ping (over IPv6 link-local) on ibss0 works.

Unfortunately, the logs don't show anything unusual. Does the issue only occur between ath9k and ath10k devices, or is it reproducible with two ath10k devices as well?

@NeoRaider

This comment has been minimized.

Copy link
Member

NeoRaider commented Nov 30, 2018

One more thing to test: Since v2018.1, we set the htmode of 11ac devices to 'VHT20' in /etc/config/wireless, in older versions, it was always set to 'HT20'.

@blocktrron

This comment has been minimized.

Copy link
Contributor

blocktrron commented Nov 30, 2018

rx_bitrate is missing on iw dev ibss0 station dump, this may be a hint.

@oliver

This comment has been minimized.

Copy link
Author

oliver commented Dec 7, 2018

(sorry for the silence on my side, I'm not online a lot at the moment and will probably get to this issue in 2019. Thanks for the comments, I will try these hints)

@rotanid

This comment has been minimized.

Copy link
Member

rotanid commented Dec 7, 2018

@oliver we will close this ticket now. feel free to reopen as soon as you can provide additional information like requested

@rotanid rotanid closed this Dec 7, 2018

@oszilloskop

This comment has been minimized.

Copy link
Contributor

oszilloskop commented Dec 9, 2018

Same behavior with Gluon 2018.1.3 in combination with ibss, an Ubiquiti Loco M5 XW and an Ubiquiti UniFi-AC-MESH are here.

Hint:
rx_bitrate are missing at both devices after iw dev ibss0 station dump.
tx_bitrate are at both just 6.0 MBit/s (same as oliver's).

Our sites are here: https://github.com/freifunk-ffm/site-ffffm/tree/test

@mweinelt mweinelt reopened this Dec 9, 2018

@blocktrron

This comment has been minimized.

Copy link
Contributor

blocktrron commented Dec 9, 2018

@oszilloskop Can you link your binary firmware files?

427c837
Might this patch be the reason? (Neoraider already pointed this patch out) Curently only issues were reported for 802.11n <--> 802.11ac mesh links.

I can't reproduce this here with 11s and non-ct firmware (802.11n 20MHz <--> 802.11ac 80MHz works w/o a single problem).

@oszilloskop

This comment has been minimized.

Copy link
Contributor

oszilloskop commented Dec 9, 2018

@blocktrron
You will find the binary files here: https://dl.ffm.freifunk.net/firmware/test/

EDIT:
Our Firmware v2.4.10-test-1127 is Gluon v2018.1.3
Our Firmware v2.4.4-test-0430 is Gluon v2017.1.7

@rotanid

This comment has been minimized.

Copy link
Member

rotanid commented Dec 10, 2018

@blocktrron he pointed out on IRC, that it also doesn't work with v2017.1.7 - this does not contain the mentioned patch.
interestingly though, v2017.1.8 does work for Freifunk Bremen if i understood correctly what @oliver wrote.
the low-hanging fruit, err, conclusion: one of both tests went wrong or it's a different issue.
i think(!) v2017.1.x does not have a general issue with 5 GHZ ibss mesh as this would have hopefully been reported last year - and it works for FF Bremen as far as i understand.

@oliver

This comment has been minimized.

Copy link
Author

oliver commented Dec 19, 2018

Short update: ping via ibss0 (5 GHZ IBSS) shows some weirdness on 2018.1.3: ping6 ff02::1%ibss0 returns only the local IPv6 address but no replies from other nodes. But doing the same via ibss1 shows replies from the other node that is connected via 2.4 GHz.

But if I directly ping the link-local IPv6 of another node via ibss0 (ie. not via broadcast address), I do get replies. And afterwards I will also get a reply from that node with ping ff02::1%ibss0; and iw dev ibss0 station dump now actually shows an "rx bitrate:" for the connection to this node. And on the Gluon status page the name of the remote node will now also appear (before, there was only the MAC shown).

I don't know enough about wifi IPv6 broadcasting to understand what's the cause or effect here. So maybe the broadcast ping is indeed broken and this causes the Batman problem; or maybe Batman (or something else) is broken, and that also causes the IBSS connection to stay in some "idle" state where broadcast pings don't work.

But at least this shows that in general some data can be transferred over 5 GHz IBSS with 2018.1.

(Edit: this comment is mainly so that I remember what I've tried already :-) . I will do some more analysis in the future).

@NeoRaider

This comment has been minimized.

Copy link
Member

NeoRaider commented Dec 20, 2018

Might be a power save issue. You can try disabling it using iw dev <dev> set power_save off.

@NeoRaider

This comment has been minimized.

Copy link
Member

NeoRaider commented Dec 20, 2018

Please also try the Gluon master, which is based on OpenWrt 18.06.

@oliver

This comment has been minimized.

Copy link
Author

oliver commented Dec 20, 2018

Powersave appears to be off already (iw dev ibss0 get power_save prints Power save: off).
Running iw dev ibss0 set power_save off doesn't appear to make any difference regarding broadcast ping.
Running iw dev ibss0 set power_save on prints command failed: Not supported (-122).

@oliver

This comment has been minimized.

Copy link
Author

oliver commented Dec 20, 2018

I've just installed 2018.1.3+bremen1 on a CPE510 v1 which is a 5GHz-only device, and mesh works fine (see https://map.ffhb.de/#!/en/map/c4e984b0a84a and http://cpe510-og-1.nodes.ffhb.de/). The device successfully meshes with another node. The CPE510 uses the ath9k driver.

So this problem doesn't affect all devices. So far the ath10k driver with -ct firmware shows the problem, while the ath9k driver works.

@oliver

This comment has been minimized.

Copy link
Author

oliver commented Dec 20, 2018

@oszilloskop can you check which driver is used on the devices which don't work for you? What does lsmod | grep ath show?

@oszilloskop

This comment has been minimized.

Copy link
Contributor

oszilloskop commented Dec 20, 2018

I don't have a master firmware.

Ubiquiti Loco M5 XW (5GHz-only)
===============================
Gluon 2018.1.3
ibss
htmode 'HT20'

5GHz Mesh does not work

~# iw dev ibss0 station dump 
-> tx bitrate: 6.0 MBit/s
-> shows no rx bitrate

~# iw dev ibss0 get power_save
Power save: off

~# lsmod | grep ath
ath                    18387  3 ath9k,ath9k_common,ath9k_hw
ath9k                 109160  0 
ath9k_common           22062  1 ath9k
ath9k_hw              359564  2 ath9k,ath9k_common
cfg80211              234680  4 ath9k,ath9k_common,ath,mac80211
compat                 11245  4 ath9k,ath9k_common,mac80211,cfg80211
mac80211              416898  1 ath9k

The statuspage shows a "Nachbarknoten ibss0" graph of all 5GHz neighbor mesh nodes.

---

Ubiquiti UniFi-AC-MESH (dual 2.4/5GHz)
======================================
Gluon 2017.1.7
ibss
htmode 'HT20'

5GHz Mesh does not work
2.4GHz Mesh works fine

~# iw dev ibss0 station dump
-> tx bitrate: 6.0 MBit/s
-> shows no rx bitrate

~# iw dev ibss0 get power_save
Power save: off

~# lsmod | grep ath
ath                    18387  4 ath9k,ath9k_common,ath9k_hw,ath10k_core
ath10k_core           310523  1 ath10k_pci
ath10k_pci             34719  0 
ath9k                 109160  0 
ath9k_common           22062  1 ath9k
ath9k_hw              359564  2 ath9k,ath9k_common
cfg80211              234552  5 ath9k,ath9k_common,ath10k_core,ath,mac80211
compat                 11245  4 ath9k,ath9k_common,mac80211,cfg80211
mac80211              416898  2 ath9k,ath10k_core

The statuspage shows a "Nachbarknoten ibss0" graph of all 5GHz neighbor mesh nodes.

---

Ubiquiti UniFi-AC-MESH (dual 2.4/5GHz)
======================================
Gluon 2018.1.3
ibss
htmode 'HT20'

5GHz Mesh does not work
2.4GHz Mesh works fine

~# iw dev ibss0 station dump 
-> tx bitrate: 6.0 MBit/s
-> shows no rx bitrate

~# iw dev ibss0 get power_save
Power save: off

~# lsmod | grep ath
ath                    18387  4 ath9k,ath9k_common,ath9k_hw,ath10k_core
ath10k_core           310299  1 ath10k_pci
ath10k_pci             34687  0 
ath9k                 109160  0 
ath9k_common           22062  1 ath9k
ath9k_hw              359564  2 ath9k,ath9k_common
cfg80211              234680  5 ath9k,ath9k_common,ath10k_core,ath,mac80211
compat                 11245  4 ath9k,ath9k_common,mac80211,cfg80211
mac80211              416898  2 ath9k,ath10k_core

The statuspage shows a "Nachbarknoten ibss0" graph of all 5GHz neighbor mesh nodes.
@oliver

This comment has been minimized.

Copy link
Author

oliver commented Dec 22, 2018

@blocktrron he pointed out on IRC, that it also doesn't work with v2017.1.7 - this does not contain the mentioned patch.
interestingly though, v2017.1.8 does work for Freifunk Bremen if i understood correctly what @oliver wrote.
the low-hanging fruit, err, conclusion: one of both tests went wrong or it's a different issue.
i think(!) v2017.1.x does not have a general issue with 5 GHZ ibss mesh as this would have hopefully been reported last year - and it works for FF Bremen as far as i understand.

@rotanid: good thing you mentioned this! I just did a test of this specific functionality, and it looks like 5 GHz meshing on AC7 was already broken with 2017.1.8+bremen1 :-( So no, v2017.1.8 does not really work for FFHB, and v2017.1.x does have a general issue with 5 GHZ ibss mesh.

Or maybe my test is wrong. I have set up three devices with 2017.1.8+bremen1, all located in the same room:

Result: the WDR3600 meshes with both devices. The AC7 and the CPE510 only mesh with the WDR3600. To me this indicates that the 5GHz mesh on AC7 doesn't work.

I guess we never systematically tested this, and the problem was never really noticed probably because the devices are still meshing via 2.4 GHz.

Anyway, I'll report back when I've found out more.

@rotanid

This comment has been minimized.

Copy link
Member

rotanid commented Dec 28, 2018

as discussed in person, please try v2016.2.x based firmware (with a device like Archer C7 v2) and also try current master (or, the same, v2018.2.x as soon as it is released)

@oszilloskop

This comment has been minimized.

Copy link
Contributor

oszilloskop commented Jan 2, 2019

I tested it on two Ubiquiti UniFi-AC-MESH with Gluon 2018.2 today.
Unfortunately the result are the same 5GHz ibss mesh behavior like as the previous Gluon versions.

@rotanid

This comment has been minimized.

Copy link
Member

rotanid commented Jan 2, 2019

thanks. now the last missing bit of information is if an ath10k devices works with v2016.2.x in IBSS 5 GHz mesh

@oliver

This comment has been minimized.

Copy link
Author

oliver commented Jan 2, 2019

Just installed FFHB firmware 2016.2.7+bremen1 on an AC7v2 (https://map.ffhb.de/#!/en/map/f4f26d70b277), and IBSS with 5 GHz is still broken. Same symptoms: no mesh connection to a CPE510, the status page only shows MAC address rather than name of the CPE510, and iw dev ibss0 station dump shows 6.0 MBit/s as tx/rx bitrate for all stations.

So it looks like this is not a regression, but rather it was broken since the beginning?

@rotanid rotanid removed the awaiting answer label Jan 2, 2019

@rotanid

This comment has been minimized.

Copy link
Member

rotanid commented Jan 2, 2019

i can't add anything to your conclusion ... and we likely can't fix anything if this never worked before.
let's wait for @NeoRaider (who said to me at 35c3 that it should have worked before) - but maybe this issue will go into future release notes as "cant fix" ...

@mortzu

This comment has been minimized.

Copy link

mortzu commented Jan 7, 2019

I have the same issue on mANTBox 15s (ath10k). The mesh works in the moment I disabled the client wireless device.

@oszilloskop

This comment has been minimized.

Copy link
Contributor

oszilloskop commented Jan 7, 2019

@mortzu
Nice founding.
I can confirm that this workaround works fine with my two UniFi-AC-MESH (dual 2.4/5GHz, ath9k/ath10k, IBSS).

So this problem doesn't affect all devices. So far the ath10k driver with -ct firmware shows the problem, while the ath9k driver works.

Furthermore I can confirm that my ath9k only device (Ubiquiti Loco M5 XW) does not have a 5GHz IBSS mesh problem. Now with correct working mesh partners, it meshed very well without that workaround.

EDIT:
My tests were done with Gluon v2018.2.

@mweinelt

This comment has been minimized.

Copy link
Contributor

mweinelt commented Jan 7, 2019

This essentially means that ath10k with the candelatech driver/firmware has become unusable for Gluon, as it does not support ap+ibss at the same time.

@rotanid

This comment has been minimized.

Copy link
Member

rotanid commented Jan 7, 2019

@mweinelt "become" ? according to the tests done by @oliver the situation didnt change in any way - maybe that's nitpicking, but i think that's a difference.

@mweinelt

This comment has been minimized.

Copy link
Contributor

mweinelt commented Jan 7, 2019

Yeah, nitpicking, because it renders the same result.

@blocktrron

This comment has been minimized.

Copy link
Contributor

blocktrron commented Jan 7, 2019

@oszilloskop
This is expected behavior. ath9k is by far more open as ath10k. If the ath10k firmware does not support the service-set combination we need, we are - simply put - out of luck.

Maybe you can try an mt76 based device, apparently this driver sports IBSS support for 11ac (as it's "comparably libre" to ath9k), but it is flagged as broken for IBSS as it is untested.

@oszilloskop

This comment has been minimized.

Copy link
Contributor

oszilloskop commented Jan 8, 2019

I only affirmation the test results which were already reported by @oliver and @mortzu.
In my case, the result "not at the same time" is very important. ath10k can do 5GHz ibss mesh, but not simultaneously with client network. This is different for me than "it does not work".

@rotanid rotanid changed the title 5GHz Mesh doesn't work with 2018.1 5 GHz IBSS mesh and client mode doesn't work Jan 8, 2019

@rotanid rotanid changed the title 5 GHz IBSS mesh and client mode doesn't work 5 GHz IBSS mesh and client mode doesn't work in parallel with ath10k Jan 8, 2019

@rotanid

This comment has been minimized.

Copy link
Member

rotanid commented Jan 8, 2019

i adjusted the title of this issue accordingly.

@mweinelt mweinelt changed the title 5 GHz IBSS mesh and client mode doesn't work in parallel with ath10k 5 GHz IBSS mesh and client mode doesn't work in parallel with ath10k-ct Jan 8, 2019

@mweinelt mweinelt changed the title 5 GHz IBSS mesh and client mode doesn't work in parallel with ath10k-ct IBSS mesh and client mode doesn't work in parallel with ath10k-ct Jan 8, 2019

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment