Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support Babeld without VLAN on ethernet interfaces inside br-lan, replaces #600 #631

Open
wants to merge 4 commits into
base: master
Choose a base branch
from

Conversation

ilario
Copy link
Member

@ilario ilario commented Oct 20, 2019

This is #600 with very few modifications, like the removal of a conditional dependency.
Will fix #596 and is a step towards #581.
I tested on various routers (YouHua WR1200JS, TP-Link WDR3600, Ubiquiti NanoStation LoCo M2, Comtrend AR-5387un, Comtrend AR-5315u) with or without VLAN.
This PR allows to use YouHua WR1200JS also with ethernet connections (there's a drivers bug to be confirmed which maybe affects all the Mediatek routers) when setting VLAN 0 (no VLAN).

local uci = config.get_uci_cursor()

if(vlanId ~= 0 and ifname:match("^eth")) then
if tonumber(vlanId) ~= 0 and ifname:match("^eth") then
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmmm.... can vlanId be a string?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is a very bad idea to set it as a string, but if an user sets it as a string, network.lua at this line

xpcall(function() proto.configure(args) ; proto.setup_interface(device, args) end,

passes the string to the proto, and no protocol force it to be a number. e.g. babeld just takes it

lime-config returns no error if a string is used and results in a nil value here

and interfaces with appended a "_nil" string here

linux802adIfName = linux802adIfName:gsub("[^%w-]", "-")..network.protoVlanSeparator..vlanId

I'm going to prepare a PR about this :)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not only this, it is always a string now as how it is parsed from the config. So yes, the conversion to numeric is always needed... (the problem is that we dont have a single test for this for example with vlanid=0)

Copy link
Contributor

@spiccinini spiccinini left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please add tests (I can help!)

@G10h4ck
Copy link
Member

G10h4ck commented Oct 27, 2019

babeld_over_batman option should not be removed, is what permit us to eventually setup and test babel only on frontier nodes, as per the original libremesh scalability proposal

Copy link
Member

@G10h4ck G10h4ck left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

see my comment in discussion

Copy link
Member Author

@ilario ilario left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

babeld_over_batman option should not be removed, is what permit us to eventually setup and test babel only on frontier nodes, as per the original libremesh scalability proposal

@G10h4ck ah... Rly? This scenario is the number 1 feature in LibreMesh and I did not understand how it should work until now. More documentation is needed (this is not explained neither in the How it works). I will add that option back as soon as possible.

Comment on lines +171 to +175
if ifname:match("^wlan") and tonumber(vlanId) == 0 then
uci:set("network", owrtInterfaceName, "ifname", "@"..owrtDeviceName)
else
uci:set("network", owrtInterfaceName, "ifname", linuxVlanIfName)
end
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@spiccinini @G10h4ck @nicopace @gmarcos87
Please review this, I think is the most likely-to-be-suboptimal part of this PR.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That is not a new issue, we have workarounds for similar thing in other places of the code, it seems that when one uses linuxVlanIfName this trigger a race condition inside netifd due to some unsolved upstream bug, and that causes unreliable behaviour, if with this workaround it works well in all cases (I assume you tested it a lot) I am ok to merge it!

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I did test it on 5 different routers (listed in the first comment in the PR), but more testing would also be nice to have :)
Can you point me to the other places in the code where such a workaround is used (so that I can compare solutions)?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

babeld_over_batman option should not be removed, is what permit us to eventually setup and test babel only on frontier nodes, as per the original libremesh scalability proposal

@G10h4ck ah... Rly? This scenario is the number 1 feature in LibreMesh and I did not understand how it should work until now. More documentation is needed (this is not explained neither in the How it works). I will add that option back as soon as possible.

To implement libremesh full proposal is not a simple task, so we started from the easier part, which is solvable more or less just by generating a static configuration when lime-config is run. The other parts of the proposal (for example automatically turn on L3 routing protocol only on frontier nodes) needs more thins going on at run-time, and we are implementing those things bit by bit (shared-state is one of those things). In the meanwhile even if we don't have all those pieces in place is important to keep the pieces we already have compatible with that scenario, so when we get there it is not too overwhelming to make it work.

I just added back the babeld_over_batman option, I just took the code for it from #600 so it should work (not tested now, but I tested some times ago in the previous PR).

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@spiccinini could you test this on LibreRouter? Two scenarios should be tested: with VLAN and without VLAN.
To verify that it works: issue echo dump | nc ::1 30003 and check if all the included interfaces (apart from the WAN one) have an IPv4 in both VLAN and noVLAN configurations. In the noVLAN configuration the br-lan interface has to be present in the list of the Babeld interfaces.

Copy link
Contributor

@spiccinini spiccinini Nov 14, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes I used the code from this PR (rebased to master). Maybe I did a mistake. I deleted both /etc/config wireless and network, regenerated them with generate_config, then lime-config and lime-apply,

Copy link
Member Author

@ilario ilario Nov 14, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am not familiar with config_generate and I never trusted lime-apply...
Maybe #75 is related?
@spiccinini could you try without deleting the config files, and just running lime-config and rebooting? Just for being sure...

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If I don't delete the config the vlan 17 still exists after lime-config

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Wow... Even after the reboot? So something is failing baaaad.
Can you post the full /etc/config/network with in the two cases of VLAN set in /etc/config/lime
and VLAN set to zero in /etc/config/lime?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I did more testing and for me it works. Can you confirm that it fails with LibreRouter or with your compilation procedure?
Regarding the tests, by myself I am not able to write them, can you do it or guide me?

@G10h4ck
Copy link
Member

G10h4ck commented Oct 31, 2019

babeld_over_batman option should not be removed, is what permit us to eventually setup and test babel only on frontier nodes, as per the original libremesh scalability proposal

@G10h4ck ah... Rly? This scenario is the number 1 feature in LibreMesh and I did not understand how it should work until now. More documentation is needed (this is not explained neither in the How it works). I will add that option back as soon as possible.

To implement libremesh full proposal is not a simple task, so we started from the easier part, which is solvable more or less just by generating a static configuration when lime-config is run. The other parts of the proposal (for example automatically turn on L3 routing protocol only on frontier nodes) needs more thins going on at run-time, and we are implementing those things bit by bit (shared-state is one of those things). In the meanwhile even if we don't have all those pieces in place is important to keep the pieces we already have compatible with that scenario, so when we get there it is not too overwhelming to make it work.

if ifname:match("^wlan") then
--! currently (2019-10-12) mode-ap and mode-apname have an hardcoded
--! "option network lan" so they are always in the br-lan bridge
if ifname:match("^wlan.*ap$") or ifname:match("^wlan.*apname$") then
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

At the first part of this function there is the following code:

	if not args["specific"] and ifname:match("^wlan%d+.ap") then
		utils.log("lime.proto.babeld.setup_interface(...)", ifname, "ignored")
		return
	end

As far as I understand for ap and apname it won't reach this part of the code. Is it ok?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, by default the AP and APname interfaces are not used for the backbone and the routing.
For using an AP interface for routing with Babeld, APbb should be used, as implemented in #554 and documented in lime-example here.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok maybe I am not understanding. In which conditions the program will get into the if of line 137?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You mean this if, right?

if ifname:match("^wlan.*ap$") or ifname:match("^wlan.*apname$") then

It gets hit when an AP or an APname is configured using interface-specific configuration (some documentation on the website here) and babeld is also selected in the interface-specific configuration.

An example of a case where configuring an AP with interface-specific configuration is here in #262 (5 GHz radio used just for mesh and 2.4 GHz radio used for AP+APname+mesh, which cannot currently be done with general configuration).

Something like: (the non-specified options in the wifi sections are taken from the generic wifi configuration, see db1c350. I don't know if more options should be specified for the net sections?)

config wifi radio0 # assuming that radio0 is 2.4 GHz
	list modes 'ap'
	list modes 'apname'
	list modes 'ieee80211s'

config net wireless2ap
	option linux_name 'wlan0-ap'
	list protocols lan # this is hardcoded anyway in ap proto, adding also here just for clarity
	list protocols babeld:17

config net wireless2apname
	option linux_name 'wlan0-apname'
	list protocols lan # this is hardcoded anyway in apname proto, adding also here just for clarity
	list protocols babeld:17

config net wireless2mesh
	option linux_name 'wlan0-mesh'
	list protocols babeld:17

config wifi radio1 # assuming that radio1 is 5 GHz
	list modes 'ieee80211s'

config net wireless5mesh
	option linux_name 'wlan1-mesh'
	list protocols babeld:17

(beware, I did not test this example)

@ilario ilario added this to Needs triage in LibreMesh 20.xx release Jan 9, 2020
@ilario ilario moved this from Needs triage to In Progress in LibreMesh 20.xx release Jan 9, 2020
@ilario ilario added this to Needs triage in LibreMesh 2020.2 release Jan 10, 2020
@ilario
Copy link
Member Author

ilario commented Jan 10, 2020

Reminder for me: I have to check if the absurd workaround for the "known bug" (see #631 (comment) ) is still needed also on OpenWrt 19.07.

@spiccinini did you do more testing on LibreRouter? I cannot understand how it can fail so terribly.
Regarding the tests, I have little idea on how to write them :(

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
LibreMesh 2020.2 release
  
Mid priority - Worth to fix
LibreMesh 20.xx release
  
In Progress
Development

Successfully merging this pull request may close these issues.

Babeld without VLAN does not run on LAN ethernet ports
3 participants