Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

RFC add babeld-auto-gw-mode #844

Merged
merged 5 commits into from Feb 25, 2021

Conversation

spiccinini
Copy link
Contributor

@spiccinini spiccinini commented Feb 7, 2021

By default babeld will redistribute all the routes installed even if they "don't work". For example when the internet provider use DHCP and the service is not working but interface is up, the route is installed but not working and babeld will anounce the non working route to the network.

This package provides a solucion using watchping hooks that adds routes with a special protocol number (7) when the WAN port has a working internet access and removes this route when the internet connection is not working as detected by watchping.

Fixes #93 #822

Open questions:

  • Is it ok to use protocol 7?
  • Should the route be installed with its src attribute? No
  • Should we remove the default gw route when the internet connection is not working and replace it with a route to only the src address? This would fix Internet failover for every router #822 any other ideas? DONE

TODO: add ipv6 support Not for this PR

packages/babeld-auto-gw-mode/Makefile Outdated Show resolved Hide resolved
packages/babeld-auto-gw-mode/Makefile Outdated Show resolved Hide resolved
@ilario
Copy link
Member

ilario commented Feb 8, 2021

Amazing SAn, thanks!!!!
I cannot test right now, maybe @amuuza?

@amuuza
Copy link

amuuza commented Feb 8, 2021

Thank you @ilario for letting me know about this, I did not notice.
Oh, but I am reading it does not fix #822... I'm afraid I do not understand what this fix is about!

In my previous tests, when an Internet gateway of a router failed, any other router stopped using it, and they automatically (after several minutes) switched to a working Internet gateway. Any other router but itself. The router with the failing Internet gateway did not automatically switch to a working one, so all its clients were not able to reach the Internet unless the router was manually rebooted.
Has anything been fixed of this situation?
I would be very happy to do a test if it fixed #822

@spiccinini
Copy link
Contributor Author

Hi! To fix #822 what do you think about Increasing the metric, by a very large amount, of the default gw when it is not working so that this route is not used if there is other default route distributed by other node.
Do you think this is a viable solution?

@spiccinini
Copy link
Contributor Author

Hi! To fix #822 what do you think about Increasing the metric, by a very large amount, of the default gw when it is not working so that this route is not used if there is other default route distributed by other node.
Do you think this is a viable solution?

@G10h4ck what do you think? I will try this idea and report back but I would love a confirmation that it is sound.

@spiccinini spiccinini force-pushed the babeld-auto-gw-mode branch 3 times, most recently from 0be853f to 8f372ae Compare February 12, 2021 17:16
@spiccinini
Copy link
Contributor Author

Hi! To fix #822 what do you think about Increasing the metric, by a very large amount, of the default gw when it is not working so that this route is not used if there is other default route distributed by other node.
Do you think this is a viable solution?

I have implemented and tested this using the new qemu cloud work from guido (#813). It is working as expected :)
Note that the default watchping configuration takes up to 120 seconds to refresh the interfaces so, it is not instantaneous.

@amuuza can you test it?!

@spiccinini
Copy link
Contributor Author

Note that ipv6 support is still not done.

@amuuza
Copy link

amuuza commented Feb 12, 2021

Hi @spiccinini
Yes, I can, but how?
I guess I should follow the compiling instructions at https://libremesh.org/development.html but use
src-git libremesh https://github.com/spiccinini/lime-packages.git;master instead of
src-git libremesh https://github.com/libremesh/lime-packages.git;master
Is that correct? Any other change?

@ilario
Copy link
Member

ilario commented Feb 13, 2021

I guess I should follow the compiling instructions at https://libremesh.org/development.html but use
src-git libremesh https://github.com/spiccinini/lime-packages.git;master

Quite there! In that same line you also have to replace master by babeld-auto-gw-mode which is the name of the branch mentioned at the top of this page.
Following the guide on the development page you should also refresh the feeds.
Finally, in Menuconfig, you'll have to select the new package Utilities/babeld-auto-gw-mode and compile.
Let us know pls!

@amuuza
Copy link

amuuza commented Feb 13, 2021

Thank you @ilario. I'm getting some problems.

I did
scripts/feeds update -a
It seemed to work ok but then I was asked for a Github username and password, I don't know why.

Then I did
scripts/feeds install -a
and got many warnings. See here.

And menuconfig misses many libremesh items.

@ilario
Copy link
Member

ilario commented Feb 14, 2021

Weird. Try deleting the feeds/ directory and run again the update.
Also, paste here the content of your feeds.conf

@amuuza
Copy link

amuuza commented Feb 14, 2021

Thanks. Everything seems ok now, I don't know why. It is now compiling.
The only difference I have found is that now I removed the break before
src-git libremesh https://github.com/spiccinini/lime-packages.git;babeld-auto-gw-mode
I don't know if that could be the reason, otherwise I did some mistake I cannot identify.

I'll let you know when the tests are done.

@amuuza
Copy link

amuuza commented Feb 14, 2021

I did the test:

 ISP-A                     ISP-C
   |                         |
RouterA <--> RouterB <--> RouterC

RouterA, RouterB and RouterC are wirelessly meshed.
RouterA WAN port is connected to ISP-A to reach the Internet.
RouterC WAN port is connected to ISP-C to reach the Internet.

If RouterB is reaching the Internet through RouterA, and ISP-A loses its Internet connectivity, then both RouterA and RouterB obviously lose their Internet connectivity. After 8 minutes approximately, Router B switches its Internet gateway automatically to ISP-C.

However, RouterA will not switchover automatically.
I waited for more than 10 minutes.
Then I rebooted it manually and it did not switchover either.
It only did the switch after both pulling off its WAN cable and rebooting it manually.

@spiccinini
Copy link
Contributor Author

@amuuza thanks for testing!! So it is not working as expected for you. Can you check that you have correctly installed the new package babeld-auto-gw-mode in RouterA ? (with opkg -l | grep babel)
Also please check if watchping is running with logread | grep watchping, you should see some mesages.
If you can join the chatroom at https://www.libremesh.org/communication.html we can debug this faster :)

@amuuza
Copy link

amuuza commented Feb 14, 2021

Ooooh, I forgot to include it with Menuconfig!! Ilario told me so, but I forgot!!

I'll do everything again and let you know.

By the way, isn't there any way to do the tests through software? Maybe GNS3?

@amuuza
Copy link

amuuza commented Feb 14, 2021

I reinstalled RouterA, now with the right image (I hope so).

I waited for more than 10 minutes and there was no automatic switchover. It only worked when I manually rebooted the router.

There is a just a little improvement: this time it only required a manual reboot to work, there was no need to take out the active (but without Internet) WAN cable to make it work.

While waiting for the switchover I entered your suggestions:


root@MV-aa8ebe:~# opkg list | grep babel
babeld - 1.9.2-1
babeld-auto-gw-mode - git-21.043.62159-8f372ae-1
lime-proto-babeld - git-21.043.62159-8f372ae
shared-state-babeld_hosts - 2021-02-07-1612741008
root@MV-aa8ebe:~# 
root@MV-aa8ebe:~# logread | grep babel
Sun Dec  6 07:31:33 2020 daemon.notice netifd: Interface 'lm_net_eth0_1_babeld_if' is enabled
Sun Dec  6 07:31:33 2020 daemon.notice netifd: Interface 'lm_net_eth0_1_babeld_if' is setting up now
Sun Dec  6 07:31:33 2020 daemon.notice netifd: Interface 'lm_net_eth0_1_babeld_if' is now up
Sun Dec  6 07:31:33 2020 daemon.notice netifd: Interface 'lm_net_eth0_2_babeld_if' is enabled
Sun Dec  6 07:31:33 2020 daemon.notice netifd: Interface 'lm_net_eth0_2_babeld_if' is setting up now
Sun Dec  6 07:31:33 2020 daemon.notice netifd: Interface 'lm_net_eth0_2_babeld_if' is now up
Sun Dec  6 07:31:34 2020 daemon.notice netifd: Interface 'lm_net_eth0_babeld_if' is enabled
Sun Dec  6 07:31:34 2020 daemon.notice netifd: Interface 'lm_net_eth0_babeld_if' is setting up now
Sun Dec  6 07:31:34 2020 daemon.notice netifd: Interface 'lm_net_eth0_babeld_if' is now up
Sun Dec  6 07:31:34 2020 daemon.notice netifd: Interface 'lm_net_eth0_1_babeld_if' has link connectivity
Sun Dec  6 07:31:34 2020 daemon.notice netifd: Interface 'lm_net_eth0_2_babeld_if' has link connectivity
Sun Dec  6 07:31:34 2020 daemon.notice netifd: Interface 'lm_net_eth0_babeld_if' has link connectivity
Sun Dec  6 07:31:38 2020 daemon.err babeld[1921]: Warning: couldn't determine channel of interface eth0-1_17.
Sun Dec  6 07:31:39 2020 daemon.err babeld[1921]: Warning: couldn't determine channel of interface eth0_17.
Sun Dec  6 07:31:39 2020 daemon.err babeld[1921]: Warning: couldn't determine channel of interface eth0-2_17.
Sun Dec  6 07:31:48 2020 daemon.notice netifd: Interface 'lm_net_wlan1_mesh_babeld_if' is enabled
Sun Dec  6 07:31:48 2020 daemon.notice netifd: Interface 'lm_net_wlan1_mesh_babeld_if' is setting up now
Sun Dec  6 07:31:48 2020 daemon.notice netifd: Interface 'lm_net_wlan1_mesh_babeld_if' is now up
Sun Dec  6 07:31:49 2020 daemon.notice netifd: Interface 'lm_net_wlan0_mesh_babeld_if' is enabled
Sun Dec  6 07:31:49 2020 daemon.notice netifd: Interface 'lm_net_wlan0_mesh_babeld_if' is setting up now
Sun Dec  6 07:31:49 2020 daemon.notice netifd: Interface 'lm_net_wlan0_mesh_babeld_if' is now up
Sun Dec  6 07:31:49 2020 daemon.err babeld[1921]: Warning: couldn't determine channel of interface wlan1-mesh_17.
Sun Dec  6 07:31:49 2020 daemon.err babeld[1921]: send: Address not available
Sun Dec  6 07:31:49 2020 daemon.err babeld[1921]: Warning: couldn't determine channel of interface wlan0-mesh_17.
Sun Dec  6 07:31:49 2020 daemon.notice netifd: Interface 'lm_net_wlan1_mesh_babeld_if' has link connectivity
Sun Dec  6 07:31:49 2020 daemon.err babeld[1921]: send: Address not available
Sun Dec  6 07:31:49 2020 daemon.notice netifd: Interface 'lm_net_wlan0_mesh_babeld_if' has link connectivity
Sun Dec  6 07:31:58 2020 daemon.notice netifd: Interface 'lm_net_wlan1_mesh_babeld_if' is now down
Sun Dec  6 07:31:58 2020 daemon.notice netifd: Interface 'lm_net_wlan1_mesh_babeld_if' is disabled
Sun Dec  6 07:31:58 2020 daemon.notice netifd: Interface 'lm_net_wlan1_mesh_babeld_if' has link connectivity loss
Sun Feb 14 21:50:00 2021 cron.info crond[1461]: USER root pid 3941 cmd ((sleep $(($RANDOM % 120)); shared-state sync babeld-hosts &> /dev/null)&)
Sun Feb 14 21:55:00 2021 cron.info crond[1461]: USER root pid 5002 cmd ((sleep $(($RANDOM % 120)); shared-state sync babeld-hosts &> /dev/null)&)
Sun Feb 14 22:00:00 2021 cron.info crond[1461]: USER root pid 5817 cmd ((sleep $(($RANDOM % 120)); shared-state sync babeld-hosts &> /dev/null)&)
root@MV-aa8ebe:~#

@spiccinini, if you want me to test something while being on IRC, let me know which UTC times are ok with you.

@spiccinini
Copy link
Contributor Author

Ooooh, I forgot to include it with Menuconfig!! Ilario told me so, but I forgot!!

I'll do everything again and let you know.

By the way, isn't there any way to do the tests through software? Maybe GNS3?

For testing with software I am using full virtualization with qemu as explained in https://github.com/libremesh/lime-packages/blob/master/TESTING.md#development-with-qemu-virtual-machine

In particular to test this I used this PR #813 that allows to create multiple mesh clouds interconected. But I had to do some manual things to do the WAN/internet access.

@spiccinini
Copy link
Contributor Author

I reinstalled RouterA, now with the right image (I hope so).

I waited for more than 10 minutes and there was no automatic switchover. It only worked when I manually rebooted the router.

There is a just a little improvement: this time it only required a manual reboot to work, there was no need to take out the active (but without Internet) WAN cable to make it work.

OK so it is not working as expected yet.

While waiting for the switchover I entered your suggestions:


root@MV-aa8ebe:~# opkg list | grep babel
babeld - 1.9.2-1
babeld-auto-gw-mode - git-21.043.62159-8f372ae-1
lime-proto-babeld - git-21.043.62159-8f372ae
shared-state-babeld_hosts - 2021-02-07-1612741008

Ok this looks good :)

root@MV-aa8ebe:#
root@MV-aa8ebe:
# logread | grep babel
Sun Dec 6 07:31:33 2020 daemon.notice netifd: Interface 'lm_net_eth0_1_babeld_if' is enabled
root@MV-aa8ebe:~#

Can you grep using watchping instead of babel?

@spiccinini, if you want me to test something while being on IRC, let me know which UTC times are ok with you.

I am usually online from 10 to 20 UTC-3.

Thanks for all the work!

@amuuza
Copy link

amuuza commented Feb 15, 2021

Oops, sorry, I did grep babel instead of grep watchping.

I am usually online from 10 to 20 UTC-3.

Cool. Maybe later today or tomorrow, I'll let you know.

@spiccinini
Copy link
Contributor Author

I've found a bug! I have already fixed it.

@amuuza You will have to run the feed update, please check if there is an error delete the feeds/libremesh folder and re run the feeds update.

@amuuza
Copy link

amuuza commented Feb 16, 2021

Yes, now it does work!! Thank you!

In one or two minutes RouterA was back online, despite having its no-internet-WAN connected.

@amuuza
Copy link

amuuza commented Feb 16, 2021

I think this is an important step for Internet sharing.

Now I have some questions about metrics and monitoring... but I'll ask them in the list.

@spiccinini
Copy link
Contributor Author

Thanks to @amuuza and @ilario I think this package is ready to be merged now

@spiccinini
Copy link
Contributor Author

Sorry, ipv6 support is not done yet, I will work on that now.

@spiccinini
Copy link
Contributor Author

Travis build is failing with

anyrequests: You have reached your pull rate limit. You may increase the limit by authenticating and upgrading: https://www.docker.com/increase-rate-limit

I don't know if there is something we can do other than changing the docker registry.

@nicopace
Copy link
Member

Travis build is failing with

anyrequests: You have reached your pull rate limit. You may increase the limit by authenticating and upgrading: https://www.docker.com/increase-rate-limit

I don't know if there is something we can do other than changing the docker registry.

seems there are some options here: https://docs.travis-ci.com/user/docker/

@rallep71
Copy link

Hello,
Question, it is an important extension of Libremesh, when will it go further?
Thanks already to all who made it possible.

@spiccinini
Copy link
Contributor Author

I have reverted the ipv6 support as it may be problematic in some scenarios. I propose that we merge this with ipv4 and then if there is need of ipv6 support we can implement it (I reverted the ipv6 patch so we can have it at hand in the git history).

@G10h4ck
Copy link
Member

G10h4ck commented Feb 23, 2021

Hi! To fix #822 what do you think about Increasing the metric, by a very large amount, of the default gw when it is not working so that this route is not used if there is other default route distributed by other node.
Do you think this is a viable solution?

@G10h4ck what do you think? I will try this idea and report back but I would love a confirmation that it is sound.

Seems a good idea to me, at worst changing the route metric may confuse DHCP client so it may forget to remove it when the lease is lost, but since we increased the metric so much and don't share that route to other nodes, even if the route remain after the lease is lost it should not create any problem.

@amuuza
Copy link

amuuza commented Feb 23, 2021

I see I've been asked for a review, but I can't hardly read any code. What I have done is testing it, if that's enough to give an approval review let me know and I'll do it.

@spiccinini
Copy link
Contributor Author

I see I've been asked for a review, but I can't hardly read any code. What I have done is testing it, if that's enough to give an approval review let me know and I'll do it.

Given that we need 2 reviews to merge, in my opinion, testing is a valid review.

@@ -0,0 +1,11 @@
#!/usr/bin/lua
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there any concrete use of the lime-config hotplug.d subdirectory?
It looks like a grouping that is not documented, and it is not clear for me if it should be there or not.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Indeed lime-config hotplug.d documentation is missing, it was introduced in early 2019.
hotplug.d for lime-config is similar to having a generic run_asset configured with ATCONFIG but it is enabled just by puting the script in the directory.

Copy link
Member

@nicopace nicopace left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

did some comments... looks pretty straightforward. haven't tested it out myself, but have read each line.
I am a little worried how the ipv6 conversation is harder... @altergui may be you want to contribute to it (as I remember you are invested in ipv6 support).
Thanks @spiccinini for working on this!

Copy link

@amuuza amuuza left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I tested it. It works.

@AndyMcSchopf
Copy link

Wow, just saw this in the mailinglist. Great to see progress and a fix for this feature! Will also test it the next days. Thanks for the work!!!

@spiccinini spiccinini merged commit f3aa201 into libremesh:master Feb 25, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
7 participants