-
Notifications
You must be signed in to change notification settings - Fork 1.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Captive Portal : Fix the bug affecting users getting disconnected when reconfiguring a zone #4031
Captive Portal : Fix the bug affecting users getting disconnected when reconfiguring a zone #4031
Conversation
@Augustin-FL I'm not sure about preserving users after a reboot. It's a significant change that will alter significantly how captive portal works, and there are a lot of corner cases to consider. For example, if RADIUS is in use the reboot procedure disconnect all users (a captiveportal_radius_stop_all(7) call in system.inc) so after the boot there will be spurious RADIUS accounting packets for sessions that don't exist anymore that could be a problem for many RADIUS servers. Also, even assuming you don't explicitly disconnect the users, there's no guarantee that pfsense will be back online in a short time and that time will be considered used time, which could be a problem if users have a limited usage time (vaucher, etc.). There will also be problems with traffic accounting because the new tables you're creating lose all the previous traffic information and this will be a problem when you reinit the rules, not only after a reboot. |
Yep, I'm seeing the same thing : https://forum.netgate.com/topic/130420/any-change-and-save-update-captive-portal-bug/41 (but : there will be no traffic when the tables are flushed ...)) I'm testing this path since a day or two in a "live" environment - all good ! edit : Rebooting will disconnect users ... (bottom to top)
|
Thanks for your comment @plumbeo !
That is completly true, thanks for pointing me out. Just to make it clear, When rebooting, a
Well, i don't think that's an issue at all (do you? please gainsay me if you think so). If you are worried about RADIUS accounting updates, the next accounting update should simply override the value before reconfigure/reboot. Anyway, i will include soon a commit to this PR (for removing
I agree, but i think it's better to implement this feature then fix potential issues, rather than block ourselves from ever including it. |
That’s the problem, the next accounting packets will override the correct values with the new values, that is the traffic generated from the moment the rules were reinitialised, not from the start of the session. I haven’t studied the code so I can’t suggest an alternative right now, but I think that those tables shouldn’t be touched. EDIT: when did this bug appear? I’m pretty sure I didn’t notice it about a year ago. Maybe it can be fixed in a different way.
But why should you include it? Shutting down or rebooting a firewall has always reset all the state (firewall states, VPNs, NPT states, etc) and that’s what anybody can reasonably expect, I don’t see why the list of captive portal users logged in should be an exception, and I don’t think you should change something like this without even offering an option to preserve the current behaviour. What actual problem does this solve? The users will not be able to access the network while the firewall is rebooting anyway and all their connections will be cut and will have to be recreated, so they may as well login again as soon as the firewall is up. |
@plumbeo : the original bug : affecting users getting disconnected when reconfiguring a (captive portal) zone - as the subject stated. The quick solution was / is right now : flush all users (when re configuring the zone) A next-best solution could be : flushing the database that belongs to the zone . @Augustin-FL chooses IMHO a better method : before flushing the ipfw tables (these : tables myzone_auth_up and myzone_auth_down), update the up and down byte counter into the database, which contains now everything to re build them afterwards. @Augustin-FL went one step further : logged in users will stay logged in after a reboot. @Augustin-FL : please correct my explanation is ok for you. |
I'd prefer not to preserve the database across a shutdown/reboot by default. It may be an option, but it must default to disabled. To do otherwise is a drastic and unexpected change. |
@Gertjanpfsense I see. I'm not sure when I'll have time to work on this, but from a rapid look at the relevant code the original bug is caused by captiveportal_init_rules(), which always calls captiveportal_free_dnrules() that resets the list of pipes, and captiveportal_delete_rules(), that destroys the firewall tables. To fix it you'd just need a check in those functions to skip removing pipes and tables when the captive portal is already enabled (for example when the user database file exists). |
I didn't specifically intend to implement that feature. This pull request is focused on fixing captive portal re-configure problem. Because a reboot is essentially a captive portal reconfigure, this pull request allow users to be saved across reboot. This was not purposely intended, it's more a side effect of fixing the current reconfigure problem. I can implement some additional code to disallow login saving across reboot (eg, drop database during shutdown), but in this case feature 5644 should maybe be closed ("rejected - won't implement") ? In mid/long term, i also plan to implement true High Avaliability for captive portal (Redmine 97) and i am thinking about the behavior that pfSense should have when "one node goes back online". I'd like to implement saving users across reboot in order to prepare for High Avaliability sync. @jim-p could you confirm that I need to implement a database drop when pfSense turns off, with these new elements?
This bug appeared in pfSense 2.4 (2.3.5 is fine, 2.4.0 is not). At that time, Netgate chosed to deprecate pfsense-tools and to rely on stock FreeBSD kernel/ports instead. Multiple "custom" ipfw features developed specifically for pfSense have been removed at that time, such as ipfw context, meaning that captive portal has been essentially rewritten at that time (see 517b893 for details). |
I didn't say it couldn't be done, I just said it had to be optional and default to disabled. There are people who would benefit from the feature, it's just too much of a change to implement without a way to retain the current behavior. Make it a checkbox in the GUI they have to opt into (e.g. Retain Captive Portal login database across reboots), and if it is needed for HA, then document that as a required option when the time comes. |
Implement redmine #5644
Hi, I think this PR is doing a lot more than fixing the original bug, and that the way it fixes it has some unfortunate side effects, so IMHO it shouldn't be merged as it is. I have a different fix here (plumbeo@c43be93) that preserves traffic accounting and applies to 2.4.4_p1 too. I did some light testing and as a proof-of-concept it seemed ok, but there are a couple of things that I'd like more time to think about and I haven't tried the various MAC authentication options yet. @Gertjanpfsense and @Augustin-FL, could you see if it works for you too? The other changes in this PR could then be discussed and eventually applied separately. |
@jim-p I added an option for retaining captive portal logins across reboot, defaulted to false. @plumbeo Nice catch: I didn't think about changing I noted a small issue however: "Connected users" are not completely reset when rebooting (the database still shows connected users, while ipfw rules are flushed during reboot, resulting in a "you are connected" error). This can be fixed easily (a simple unlink_if_exist() should do the trick) Maybe you could improve your work and submit it as PR, so that Netgate can decide what's the best way to fix this issue?
It is true that I included some slight refactoring in this PR (such as removing old |
I suggest adding an option to the GUI setting (as implement by @Augustin-FL after @jim-p's remark) : remove all users when the portal settings are saved.
The first or second option should be the default. If have not real reason to ask for the extension of this GUI option, but 'something' says to me it would be nice to have all the options covered. Especially, the first option is easy to code ^^ I'm using the latest patches proposed by @Augustin-FL in a live environment. |
@jim-p could I ask again to put this on hold for a while? as I said I have a branch (here: https://github.com/plumbeo/pfsense/commits/fix-reconfig) that needs a bit of testing but fixes this bug without interfering with traffic accounting. I think that if someone cares about traffic statistics, resetting them without warning during normal operations will not be what those users want or expect. |
I submitted PR #4042. The other changes included here could be applied over it, that is, for example, rebuilding the firewall tables from scratch only after a reboot instead of every time settings are changed. |
@rbgarga could you please review pr #4042 before merging this? I still believe that this pr has significant side effects that shouldn't just be handwaved away. We will be forced to revert it internally for the foreseeable future because we need precise traffic accounting and this pr makes it unreliable. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks like we'll take #4042 over this one for now, and then the good parts from this can be adapted as optional behavior on top of that.
Ok, fine ! Closing this for now |
Note : please do not backport this PR on 2.4.4 branch, as it include a config version bump.
Forum thread : https://forum.netgate.com/topic/137824/pfsense-no-internet-when-it-is-said-you-are-connected/13