-
Notifications
You must be signed in to change notification settings - Fork 596
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Feature Request -- Safe Mode (Auto Rollback Changes if connection lost or not re-established within X time) #2976
Comments
Thank you for creating an issue. For more information about the policies for this repository, The easiest option to gain traction is to close this ticket and open a new one using one of our templates. |
I would love this in opnsense! I loved using |
Same. I use |
Thats why auto sync is disabled when using a HA setup ;) |
@mimugmail , I can see that, however forgive me, are you saying that HA is the only option where this would be considered? Apologies, sometimes I'm a bit dense. :-) |
Honestly I have no idea how to implement such a huge change in an easy way, the HA setup is already here, stable, easy to understand :) |
Forgive me if I come off wrong, that is not the intent. I'm not entirely sure how to answer this comment correctly. While I agree that HA does have it's place, I feel like this is one, very unnecessary overhead you are suggesting, and two, this isn't exactly solving the same problem. HA, to my understanding, requires at least duplicating hardware to achieve sync (I also thought it requires multiple public IPs but I'm not as sure over that). Apologies, but I'm not going to spend another $600+ for identical hardware and additional power consumption just for a misconfiguration. I would be time out of pocket but I could justify a couple of "mistake trips" and it be cheaper as opposed to the cost of duplicate hardware. HA solves for availability and redundancy, which granted, is covered by the solution. However, I am wanting a solution for recoverability only. If the hardware dies, fine.... in a HA setup I'd still have to drive out and replace the bad hardware, I would just still be running. Also, I would like to understand the complexity of the request. I don't quite see how this is a huge change when the underpinnings are already there. We have the ability to backup / restore configs, I am simply suggesting a means of enabling a system that does it automatically. It could be expanded and made much more, but at the heart, that's what I think most everyone would agree they wanted. Saved point and a auto restore if there was an issue. I do realize there is probably more to consider so I would welcome someone helping me understand more. :-) I hope that came off in the open spirit of debate rather than rude. |
There's just no generic concept of "configure and commit all pending changes" in a reliable way, which always will make such a feature incomplete and disappointing when really needed. You can in theory offer functionality like this on a per component level, which is also what we did for the firewall api (https://docs.opnsense.org/development/api/plugins/firewall.html#concept). It's rather simple, if one could determine the conditions reliably (which I don't think one can knowing quite some different support scenario's), one could also built a plugin for it and open a PR for discussion. A reliable failsafe, which would cover more different scenario's, would probably be to offer snapshots with zfs and go back in time when the user asks for it during boot. As this would also cover kernel/driver issues or software changes people forgot to act upon. Maybe that's something to look into for a future business release, you never know. |
I can understand your explanation of "Configure and Commit all pending changes". I also believe you are also right in using ZFS snapshots for a more robust solution. However, what if a more simplistic approach was considered. As I said before, we have the ability to backup "config.xml" as well as restore it again. We even have the "opnsense-importer" to restore the config file on boot I believe. My understanding to this point is ..... If I have router 1 that dies, I can replace it with identical router 2 and restore with the config.xml from router 1 and, again assuming they are identical hardware, be right back ready to go. If this is correct, I believe we could reasonably be able to assume that the hardware isn't changing since the option would be more for a configuration error rather than a hardware change (which I believe even a ZFS snapshot would have issues with as well?) Working from that, let us consider Juniper's "Confirm Commit X" Command Per their site.
This could look like this in OPNSense
Using this approach we are not storing changes for bulk commit or really changing the commit process at all..... we are simply saying "I'm about to do something I may not come back from, let me make a backup and set a timer to restore the backup if I don't finish in time" Again, I do agree this could be built out more with ZFS in future. I will be the first to say I'm not familiar with the deep inner workings of the OS. I believe, however, I see most of the processes in place currently to make the above process work. As always, I welcome the feedback and do appreciate the consideration. I lack developer skills, however, I am passionate about this project and feel this feature would be most welcome based on the feedback when I have posed the thought. |
I don't expect it will work, but as mentioned if it would, the plugin framework offers everything you need, so no need to keep this in core. you (or anyone else) could start working on such a plugin and open a PR there. |
My apologies for the incorrect placement. I should have realized this could be a plug in. I appreciate the discussion very much. |
Chiming in to say that I would appreciate this feature too. It’d be much nicer to have a simple “wait 20 secs after lights stop blinking”, than (in my case) connecting to the same LAN as the Proxmox server virtualizing OPNsense (rather than my daily VLAN), reverting a snapshot or typing in the command line to undo the changes, and possibly reboot if needed. Is there some recognizable way for other users to express interest without butting in the middle of implementation discussions? |
No problem at all, it's easy to overlook. Don't mind keeping the ticket open for now, just mentioning its (currently) not a core priority and someone wil have to do some work at some point in time in order to mature ideas. When keeping it simple, a plugin would probably be the better place anyway. |
@JJGadgets Thank you for the interest! I don't believe anyone would view it as butting in. @AdSchellevis Thank you for leaving the request open. I understand there are bigger things than my request. 🙂 I will create a request in plug-ins and reference here. |
@tcsi-github let me move this then |
Thank you sir! |
I may look in to implementing this for OPNsense nodes on ZFS filesystems, but no guarantees. Will update if I decide to take this on |
I've been using zfs with snapshots in a virtualized environment for quite some time to rollback if an update failed. Would love to see that in the opnsense GUI as well - either with some kind of timer (roll back in 5 minutes if not stopped in time) or completely manually (like to boot and check out different versions). In the shell it works. Installed opnsense 22.1.2 on a Sophos appliance on top of zfs. Prior an update I ran a zfs clones have some disadvantage if one wanted to keep the snapshots/clones for a longer time, so a zfs send/receive might be a better approach. Nice would be an integration into the GUI and the boot loader - like to choose the zfs dataset to be next mounted as root. From quick testing the only interesting zfs dataset to be snapshotted is
but something like
It's much easier then to snapshot all relevant datasets by e.g. If something breaks it's also possible to boot from an opnsense usb installer thumb drive, issue a I think zfs snapshots might be a way to roll back from a failed update or a misconfiguration. Happy to discuss this further. |
yeah I figured ZFS snapshots would be the way to go. I've heard that UFS supports snapshots as well, but don't know much about that |
Anyone experienced in writing plugins for opnsense? Could try and create something together for zfs snapshots and rollbacks. |
I think the first step would be a plugin that you can interact with in the UI to manually snapshot and restore snapshots. Once we've got that, hooking it in to system events should be a little less... painful? I have no experience with OPNsense plugins but will look around and see if I can get some knowledge to start building a base for that or something |
see https://github.com/opnsense/plugins/tree/master/devel/grid_example and https://github.com/opnsense/plugins/tree/master/devel/helloworld. I'm poking around those rn |
Yep, that sounds like a good way to start. |
Looks like we can write the 'backend' part of this in python |
and here are some more docs. I'll see if I can create a bit of a template interface for this, give me a bit https://docs.opnsense.org/development/examples/helloworld.html |
If the backend is in python - does it have an API for zfs or do we need to call zfs shell commands? |
I suspect we'll have to use zfs shell commands, even if there are python libraries for interacting with zfs. see here for configd which we would probably use to do this: https://docs.opnsense.org/development/backend/configd.html. It would be nice to use a pip package, unsure on what we are allowed to do with that |
I quickly checked some update packages from https://pkg.opnsense.org/FreeBSD:13:amd64/22.1/sets/. From what I can see, the update changes files in / (obviously) and some other folders being part of the Would it be a huge trouble to change the zfs hierarchy from |
@CorvetteCole |
This issue has been automatically timed-out (after 180 days of inactivity). For more information about the policies for this repository, If someone wants to step up and work on this issue, |
unstale! |
Anyone else interested in that topic? I can help with zfs but would need somebody to work on the plugin part. |
I can't work on this anymore, I do not have the time unfortunately. I am very sorry! |
what about having a minimal OpnSense in a separate partition? |
That's what we can get with a zfs snapshot to boot from. |
@OPNsense-bot reopen |
+1 |
I'm interested to work on a plugin. However, I'll probably go for an extremely simplified approach in the initial version. Something like the |
I am not familiar with plugin development for opnsense but can try to help with the zfs part although I decided to keep my own full virtualized environment setup rather than switching to a separate HW with a zfs filesystem for opnsense. |
My proposal for a new plugin was rejected. Unfortunately, I cannot put more time into this. I'll unassign myself from this issue. If someone wants to take ownership, please let me know. Otherwise the issue may be automatically closed due to timeout. |
This issue has been automatically timed-out (after 180 days of inactivity). For more information about the policies for this repository, If someone wants to step up and work on this issue, |
Important notices
Before you add a new report, we ask you kindly to acknowledge the following:
I realize I am duplicating a post, however, I assumed when I created a reply it would re-open the closed issue. That appears not to be the case and I simply hoped that opening a new issue would generate more feedback than a closed one. Should the original issue be able to be re-opened I am happy to close this.
Is your feature request related to a problem? Please describe.
I have multiple sites that am having to make several changes that could break access (Network address changes, Firewall changes, Routing, Etc.) ........ I'm scared. HAHA!
The remote sites have 0.75% technical experience. Yes, I might be able to find someone to power cycle a router. However, I've also worked as an MSP and had the resident "IT person" unplug the network cable to "reboot" which they thought worked because "the lights went off and came back on" (Link Lights) #facepalm
Needless to say, took me a bit to figure out that's what they were doing.
Describe alternatives you considered
To my knowledge OPNSense doesn't have any reasonable alternative to this particular problem. If I am wrong, my apologies for wasting time and please help direct me in a correct path.
Additional context
Originally posted by @tcsi-github in opnsense/core#3042 (comment)
I also created a forum post regarding this but no reply as of yet.
https://forum.opnsense.org/index.php?topic=28238.0
@banym had a good template as well and I include his here for comparison. Personally, I feel like the simpler thing would be to take a snapshot as soon as the breaking change option is enabled. I can see the use in giving the option for a timer, however, I would think setting a "middle-road" 120 seconds would be a default.
Describe the solution you'd like
It would be nice to lock the firewall in a "major change" mode where only one session is able to do changes until the major change mode is exited. This mode should be able to define a working configuration from the backup config history or the current configuration when this specific mode is activated. It should be possible to set a timer for change commitment. Now the administrator can for example make significant changes to routing or rules that possible could lock him out of the firewall. If he does not approve that his change was successful and works as intended the firewall roles back to the defined configuration. This way the administrator can log in an "try" again or rethink his change
Describe the solution you like
I was thinking about my MikroTik days and remember that if I happened to make a incorrect change that those changes would be reverted if I failed to connect back or didn't apply them within X time. Sometimes it was a pain, but it saved my butt so so many times. Kept me from having to make calls to talk someone through a reboot and just gave me a bit of room to breathe.
I would submit something the below as a rough draft for an option.
The ability might stay disabled by default and only enabled prior to the change. This would allow the option to stay out of the way and only be used when explicitly needed.
@fichtner -- I believe this to be a worthwhile addition to the OS and would be a step in the direction of more enterprise use-cases. My development skills are limited, however, I would commit to time testing and anything within my abilities to help this become a actual feature.
I would welcome any thoughts or feedback on the suggestion.
Example Change ---> Admin has to change a WAN IP
Now lets try a screw-up....
The text was updated successfully, but these errors were encountered: