-
Notifications
You must be signed in to change notification settings - Fork 325
gluon-scheduled-domain-switch: add package #1555
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
fixed |
package/gluon-scheduled-domain-switch/files/usr/bin/gluon-check-gw-reachability
Outdated
Show resolved
Hide resolved
Addressed all issues pointed out by @mweinelt |
Maybe a |
@genofire This is indirectly archieved by the abitlity to put the settings into the respective domain files. This way, it is also possible to e.g. configure |
package/gluon-scheduled-domain-switch/files/usr/bin/gluon-check-gw-reachability
Outdated
Show resolved
Hide resolved
4a7fe3d
to
82df55e
Compare
package/gluon-scheduled-domain-switch/luasrc/usr/bin/gluon-switch-domain
Outdated
Show resolved
Hide resolved
package/gluon-scheduled-domain-switch/luasrc/usr/bin/gluon-switch-domain
Show resolved
Hide resolved
If I remember correctly then @neoraider was thinking about some rollback mechanism, too. So, if after switching to the new domain settings and if gateways are unreachable then, then revert to the previous domain. Could probably be implemented and added later, too, but @blocktrron, have you spend some thoughts on something like this? Would it be like switching back and forth between old and new domain settings after some timeout? (Could that run into some undesired side-effects?) |
A little bit of background here: This is pretty much a stripped down, tidied up version of the package we used to migrate the network of Freifunk Darmstadt. We thought about this also but dropped this idea due to the fact we might run into issues in larger meshes, switching back an forth between domains. We never wanted nodes to go back. If you run into the fallback switch, you will see a node at the point all other nodes have switched, even if this means you will have no mesh connection for ~5 days (IMHO. a reasonable timespan between rolling out the scheduled domain switch and the actual switch-date is 1 Week or maybe 10 days). So a back-and-forth switching is not necessary, all nodes will be back online after a week. As long as undesired side-effects go: I've never thought about potential problems, except for the time to stabilize a larger mesh with different fallback-intervals and domain states. This will sort out itself after time, but i think the one-time switch is superior. |
Unless you've made some mistake in the new domain settings. Or have other bugs that only show up with the new domain settings. While the domain switching might seem safer than the autoupdater because it does not need to flash the whole system, that might make it less safer without a revert option: With the autoupdater you can validate your images/changes via beta branches and an updating timespan. With the domain-switch it's an "all-in / all-jump-now" approach. With the presumption that humans will make mistakes and will create broken domain settings (or will run into new protocol or driver bugs that will only surface in the combination of the new domain settings and this particular network), do we have enough safety measures for the scheduled-domain-switch approach to accommodate for that? |
Certainly, but I'd argue that as usual any firmware doing such a major migration should be properly tested ahead of time. Additionally signing a firmware manifest should imply that it has received testing, more so with a domain migration, anything else feels unreasonable. Apart from misconfiguration I don't believe the general concept of this PR is particularly error-prone. It makes sure that nodes definitely migrate at some point, it's reliability basically comes down to how reliable you can roll out new firmware before the switch time is up. I don't necessarily believe wiggling back and forth between old and new configuration is strictly necessary, although it would be a nice addition. The primary culprit for me is that you have to configure the switch time at build time, unable to further control and monitor the switch at run time, like in a remote-controlled approach.
Fortunately with a non-parallel approach we don't run into load issues along the way. I'd say most communities can reliably roll out a firmware to most devices in about a week, with most nodes receiving the update very quickly, and only a few select routers lagging behind, because most famously
So I'd say a recommendation of at least 10-14 days between firmware rollout and the actual switch would be suitable. Obviously more days are needed if you have configured a slower rollout. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is not a complete review, just a few comments.
package/gluon-scheduled-domain-switch/luasrc/lib/gluon/upgrade/950-domain-switch
Outdated
Show resolved
Hide resolved
package/gluon-scheduled-domain-switch/luasrc/usr/bin/gluon-switch-domain
Outdated
Show resolved
Hide resolved
PR updated, changes still need testing on a real device though. |
23e2666
to
edd5889
Compare
Just tested my changes and they behave as expected. Also switched to using system uptime instead of date and time to determine the offline-duration of the node. |
This uses the uptime instead of the date to determine whether or not to switch due to the node being offline for a period of time. This way, we mitigate race conditions when a node is powered on and sets it's clock after X minutes via NTP. Thanks to Linus Lüssing for the suggestion freifunk-gluon/gluon#1555 (comment)
package/gluon-scheduled-domain-switch/files/usr/bin/gluon-check-gw-reachability
Outdated
Show resolved
Hide resolved
...age/gluon-scheduled-domain-switch/luasrc/lib/gluon/upgrade/950-gluon-scheduled-domain-switch
Outdated
Show resolved
Hide resolved
package/gluon-scheduled-domain-switch/luasrc/usr/bin/gluon-switch-domain
Outdated
Show resolved
Hide resolved
package/gluon-scheduled-domain-switch/luasrc/usr/bin/gluon-switch-domain
Outdated
Show resolved
Hide resolved
package/gluon-scheduled-domain-switch/luasrc/usr/bin/gluon-switch-domain
Outdated
Show resolved
Hide resolved
package/gluon-scheduled-domain-switch/luasrc/usr/bin/gluon-switch-domain
Outdated
Show resolved
Hide resolved
package/gluon-scheduled-domain-switch/luasrc/usr/bin/gluon-switch-domain
Outdated
Show resolved
Hide resolved
package/gluon-scheduled-domain-switch/luasrc/usr/bin/gluon-switch-domain
Outdated
Show resolved
Hide resolved
package/gluon-scheduled-domain-switch/luasrc/usr/bin/gluon-switch-domain
Show resolved
Hide resolved
c739e48
to
a81d55b
Compare
Updated this PR. Tested on one node without issues. I hope i did catch everything. 😄 |
This uses the uptime instead of the date to determine whether or not to switch due to the node being offline for a period of time. This way, we mitigate race conditions when a node is powered on and sets it's clock after X minutes via NTP. Thanks to Linus Lüssing for the suggestion freifunk-gluon/gluon#1555 (comment)
a81d55b
to
928b64d
Compare
Fixed two minor inconsistencys (see fixup commit). |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ony minor issues left, I think we can get this merged after this round 👍
package/gluon-scheduled-domain-switch/luasrc/usr/bin/gluon-check-connection
Outdated
Show resolved
Hide resolved
package/gluon-scheduled-domain-switch/luasrc/usr/bin/gluon-check-connection
Outdated
Show resolved
Hide resolved
package/gluon-scheduled-domain-switch/luasrc/usr/bin/gluon-switch-domain
Outdated
Show resolved
Hide resolved
package/gluon-scheduled-domain-switch/luasrc/usr/bin/gluon-switch-domain
Outdated
Show resolved
Hide resolved
package/gluon-scheduled-domain-switch/luasrc/usr/bin/gluon-switch-domain
Outdated
Show resolved
Hide resolved
928b64d
to
9fb828c
Compare
Updated PR.
|
This package allows to automatically switch to another domain, either at a given point in time or after the node was offline long enough.
85e02c3
to
d9b2a32
Compare
This package allows to automatically switch to another domain, either at a given point in time or after the node was offline long enough.
This package allows to automatically switch to another domain, either at a given point in time or after the node was offline long enough.
This package allows to automatically switch to another domain, either at a given point in time or after the node was offline long enough.
It's primary goal is to allow for communities which still use IBSS to migrate over to 802.11s without having to run both protocols at the same time, which might lead to overloaded routers.
Depending how #1377 is scoped, this might close it.