Captive Portal : Fix the bug affecting users getting disconnected when reconfiguring a zone #4031

Augustin-FL · 2019-01-06T14:13:14Z

Redmine Issue: https://redmine.pfsense.org/issues/8616 (When reconfiguring a captive portal zone, connected users get disconnected and can't login back)
Also implement: https://redmine.pfsense.org/issues/5644 (Captive Portal retain logins across reboot. This feature was related to 8616, because a reboot calls captive portal (re)configure functions)
Also resolve : https://redmine.pfsense.org/issues/9255 (Performance issue when using multiple authentication servers)
Ready for review

Note : please do not backport this PR on 2.4.4 branch, as it include a config version bump.

Forum thread : https://forum.netgate.com/topic/137824/pfsense-no-internet-when-it-is-said-you-are-connected/13

fix #8616

fix #9255

fix #8616

plumbeo · 2019-01-08T14:24:53Z

@Augustin-FL I'm not sure about preserving users after a reboot. It's a significant change that will alter significantly how captive portal works, and there are a lot of corner cases to consider.

For example, if RADIUS is in use the reboot procedure disconnect all users (a captiveportal_radius_stop_all(7) call in system.inc) so after the boot there will be spurious RADIUS accounting packets for sessions that don't exist anymore that could be a problem for many RADIUS servers.

Also, even assuming you don't explicitly disconnect the users, there's no guarantee that pfsense will be back online in a short time and that time will be considered used time, which could be a problem if users have a limited usage time (vaucher, etc.).

There will also be problems with traffic accounting because the new tables you're creating lose all the previous traffic information and this will be a problem when you reinit the rules, not only after a reboot.

Gertjanpfsense · 2019-01-08T15:33:58Z

There will also be problems with traffic accounting because the new tables you're creating lose all the previous traffic information and this will be a problem when you reinit the rules, not only after a reboot.

Yep, I'm seeing the same thing : https://forum.netgate.com/topic/130420/any-change-and-save-update-captive-portal-bug/41 (but : there will be no traffic when the tables are flushed ...))

I'm testing this path since a day or two in a "live" environment - all good !

edit : Rebooting will disconnect users ... (bottom to top)

Jan  8 16:40:04 | php-fpm | 67158 | /diag_reboot.php: Stopping all packages. -- | -- | -- | --
Jan  8 16:40:04 | logportalauth | 67158 | Zone: cpzone1 - DISCONNECT: x, 90:b9:31:77:5e:26, 192.168.2.9
Jan  8 16:38:22 | logportalauth | 338 | Zone: cpzone1 - ACCEPT: x, 90:b9:31:77:5e:26, 192.168.2.9

Augustin-FL · 2019-01-08T21:43:45Z

Thanks for your comment @plumbeo !

if RADIUS is in use the reboot procedure disconnect all users (a captiveportal_radius_stop_all(7) call in system.inc) so after the boot there will be spurious RADIUS accounting packets for sessions that don't exist anymore that could be a problem for many RADIUS servers.

That is completly true, thanks for pointing me out.

Just to make it clear, captiveportal_radius_stop_all() is responsible for sending Accounting-Stop messages (if RADIUS accounting is enabled), and for writing "disconnect" log messages for each user, but this function does not actually disconnect users (That explains the logs of @Gertjanpfsense ). Its call inside system.inc causes the RADIUS server to consider all users as disconnected when rebooting.

When rebooting, a captiveportal_send_server_accounting('off') is performed during shutdown, and a captiveportal_send_server_accounting('on') is performed during boot....So i think removing this captiveportal_radius_stop_all(7) during shutdown should be enough to mitigate the problem.

There will also be problems with traffic accounting because the new tables you're creating lose all the previous traffic information and this will be a problem [...]

Well, i don't think that's an issue at all (do you? please gainsay me if you think so). If you are worried about RADIUS accounting updates, the next accounting update should simply override the value before reconfigure/reboot.

Anyway, i will include soon a commit to this PR (for removing captiveportal_radius_stop_all(7)). Thanks again for the notice !

I'm not sure about preserving users after a reboot. It's a significant change that will alter significantly how captive portal works, and there are a lot of corner cases to consider.

I agree, but i think it's better to implement this feature then fix potential issues, rather than block ourselves from ever including it.

plumbeo · 2019-01-08T22:47:30Z

@Augustin-FL

There will also be problems with traffic accounting because the new tables you're creating lose all the previous traffic information and this will be a problem [...]

Well, i don't think that's an issue at all (do you? please gainsay me if you think so). If you are worried about RADIUS accounting updates, the next accounting update should simply override the value before reconfigure/reboot.

That’s the problem, the next accounting packets will override the correct values with the new values, that is the traffic generated from the moment the rules were reinitialised, not from the start of the session. I haven’t studied the code so I can’t suggest an alternative right now, but I think that those tables shouldn’t be touched.

EDIT: when did this bug appear? I’m pretty sure I didn’t notice it about a year ago. Maybe it can be fixed in a different way.

Anyway, i will include soon a commit to this PR (for removing captiveportal_radius_stop_all(7)). Thanks again for the notice !

I'm not sure about preserving users after a reboot. It's a significant change that will alter significantly how captive portal works, and there are a lot of corner cases to consider.

I agree, but i think it's better to implement this feature then fix potential issues, rather than block ourselves from ever including it.

But why should you include it? Shutting down or rebooting a firewall has always reset all the state (firewall states, VPNs, NPT states, etc) and that’s what anybody can reasonably expect, I don’t see why the list of captive portal users logged in should be an exception, and I don’t think you should change something like this without even offering an option to preserve the current behaviour.

What actual problem does this solve? The users will not be able to access the network while the firewall is rebooting anyway and all their connections will be cut and will have to be recreated, so they may as well login again as soon as the firewall is up.

Gertjanpfsense · 2019-01-09T23:52:02Z

@plumbeo : the original bug : affecting users getting disconnected when reconfiguring a (captive portal) zone - as the subject stated.
When a portal zone is re configured, all ipfw rules and tables are flushed, and then recreated.
tables myzone_auth_up and myzone_auth_down (see here>) will be empty.
But : the database belonging that this captive portal zone with logged in clients is not flushed.
This is where things go wrong.
What happens is : portal users are re directed again to the captive portal "login" web page. The index.php will discover that the user is already logged in (that is : it figures in the database) : the login procedure stops there. No way the user can login again.

The quick solution was / is right now : flush all users (when re configuring the zone)

A next-best solution could be : flushing the database that belongs to the zone .
(not a good thing for captive portals with many users and admin that like to re configure their portal many times a day ;) )

@Augustin-FL chooses IMHO a better method : before flushing the ipfw tables (these : tables myzone_auth_up and myzone_auth_down), update the up and down byte counter into the database, which contains now everything to re build them afterwards.
When rules and tables are re created, insert back in the users from the database (which is still intact).

@Augustin-FL went one step further : logged in users will stay logged in after a reboot.
IMHO : this goes very far ... what about update / upgrade issues ?
A re configuration : ok for me if connected users stay connected.
A reboot, for me : better make it a clean start.

@Augustin-FL : please correct my explanation is ok for you.

jim-p · 2019-01-10T15:37:53Z

I'd prefer not to preserve the database across a shutdown/reboot by default. It may be an option, but it must default to disabled. To do otherwise is a drastic and unexpected change.

plumbeo · 2019-01-10T15:53:28Z

@Gertjanpfsense I see. I'm not sure when I'll have time to work on this, but from a rapid look at the relevant code the original bug is caused by captiveportal_init_rules(), which always calls captiveportal_free_dnrules() that resets the list of pipes, and captiveportal_delete_rules(), that destroys the firewall tables. To fix it you'd just need a check in those functions to skip removing pipes and tables when the captive portal is already enabled (for example when the user database file exists).

Augustin-FL · 2019-01-11T18:33:32Z

I didn't specifically intend to implement that feature. This pull request is focused on fixing captive portal re-configure problem.

Because a reboot is essentially a captive portal reconfigure, this pull request allow users to be saved across reboot. This was not purposely intended, it's more a side effect of fixing the current reconfigure problem.

I can implement some additional code to disallow login saving across reboot (eg, drop database during shutdown), but in this case feature 5644 should maybe be closed ("rejected - won't implement") ?

In mid/long term, i also plan to implement true High Avaliability for captive portal (Redmine 97) and i am thinking about the behavior that pfSense should have when "one node goes back online". I'd like to implement saving users across reboot in order to prepare for High Avaliability sync.

@jim-p could you confirm that I need to implement a database drop when pfSense turns off, with these new elements?

When did this bug appear? I’m pretty sure I didn’t notice it about a year ago. Maybe it can be fixed in a different way.

This bug appeared in pfSense 2.4 (2.3.5 is fine, 2.4.0 is not). At that time, Netgate chosed to deprecate pfsense-tools and to rely on stock FreeBSD kernel/ports instead. Multiple "custom" ipfw features developed specifically for pfSense have been removed at that time, such as ipfw context, meaning that captive portal has been essentially rewritten at that time (see 517b893 for details).

jim-p · 2019-01-11T18:36:26Z

I didn't say it couldn't be done, I just said it had to be optional and default to disabled. There are people who would benefit from the feature, it's just too much of a change to implement without a way to retain the current behavior. Make it a checkbox in the GUI they have to opt into (e.g. Retain Captive Portal login database across reboots), and if it is needed for HA, then document that as a required option when the time comes.

Implement redmine #5644

plumbeo · 2019-01-12T18:06:21Z

Hi, I think this PR is doing a lot more than fixing the original bug, and that the way it fixes it has some unfortunate side effects, so IMHO it shouldn't be merged as it is.

I have a different fix here (plumbeo@c43be93) that preserves traffic accounting and applies to 2.4.4_p1 too. I did some light testing and as a proof-of-concept it seemed ok, but there are a couple of things that I'd like more time to think about and I haven't tried the various MAC authentication options yet. @Gertjanpfsense and @Augustin-FL, could you see if it works for you too?

The other changes in this PR could then be discussed and eventually applied separately.

Augustin-FL · 2019-01-13T11:15:13Z

@jim-p I added an option for retaining captive portal logins across reboot, defaulted to false.

@plumbeo Nice catch: I didn't think about changing captiveportaldn.rules format. Your patch is also a lot lighter than me.

I noted a small issue however: "Connected users" are not completely reset when rebooting (the database still shows connected users, while ipfw rules are flushed during reboot, resulting in a "you are connected" error). This can be fixed easily (a simple unlink_if_exist() should do the trick)

Maybe you could improve your work and submit it as PR, so that Netgate can decide what's the best way to fix this issue?

I think this PR is doing a lot more than fixing the original bug, and that the way it fixes it has some unfortunate side effects, so IMHO it shouldn't be merged as it is.

It is true that I included some slight refactoring in this PR (such as removing old unlink_if_exists(); performed on files that don't exist anymore, or change the location of some lock()), but this PR is not doing "a lot more" than fixing the original issue.
The side effect you are talking about, is assumed : in my opinion, accounting data should be reset when reconfiguring a zone. I'm not sure if that's the right thing to do however, so I'll let Netgate be the judge on what's the best.

Gertjanpfsense · 2019-01-21T07:49:50Z

I suggest adding an option to the GUI setting (as implement by @Augustin-FL after @jim-p's remark) : remove all users when the portal settings are saved.
This could be a third option, making it a three-state option :

Remove connected users users when config changes
Keep connected users when config changes
Keep connected users when pfSense reboots.

The first or second option should be the default.

If have not real reason to ask for the extension of this GUI option, but 'something' says to me it would be nice to have all the options covered. Especially, the first option is easy to code ^^

I'm using the latest patches proposed by @Augustin-FL in a live environment.
I'm editing portal setting rather often these days on a live portal setup, to check for any unsuspected behavior. I'm using a FreeRadius with a SQL server present on LAN. Some users have a daily quota restriction.
I didn't find any issues yet.

plumbeo · 2019-01-21T15:53:45Z

@jim-p could I ask again to put this on hold for a while? as I said I have a branch (here: https://github.com/plumbeo/pfsense/commits/fix-reconfig) that needs a bit of testing but fixes this bug without interfering with traffic accounting.

I think that if someone cares about traffic statistics, resetting them without warning during normal operations will not be what those users want or expect.

plumbeo · 2019-02-03T10:09:03Z

I submitted PR #4042. The other changes included here could be applied over it, that is, for example, rebuilding the firewall tables from scratch only after a reboot instead of every time settings are changed.

plumbeo · 2019-02-18T19:39:32Z

@rbgarga could you please review pr #4042 before merging this? I still believe that this pr has significant side effects that shouldn't just be handwaved away. We will be forced to revert it internally for the foreseeable future because we need precise traffic accounting and this pr makes it unreliable.

jim-p

Looks like we'll take #4042 over this one for now, and then the good parts from this can be adapted as optional behavior on top of that.

Augustin-FL · 2019-02-18T21:16:06Z

Ok, fine ! Closing this for now

Augustin-FL added 4 commits January 5, 2019 18:20

Add allowed bandwith settings to sqlite DB

856081f

fix #8616

Update users database when bandwith is updated

3564f40

fix #8616

do not look for other servers when an auth is successful

64a0a53

fix #9255

Create ipfw rules for connected users after configuring captive portal

8e0b300

fix #8616

Create an option for saving connected users across reboot

83bcda5

Implement redmine #5644

jim-p requested review from jim-p and rbgarga and removed request for jim-p January 21, 2019 15:04

jim-p added the Needs to be tested label Jan 21, 2019

jim-p approved these changes Jan 21, 2019

View reviewed changes

rbgarga approved these changes Feb 18, 2019

View reviewed changes

Augustin-FL mentioned this pull request Feb 18, 2019

Captive portal: don't disconnect users when changing captive portal settings #4042

Merged

2 tasks

jim-p requested changes Feb 18, 2019

View reviewed changes

Augustin-FL closed this Feb 18, 2019

Augustin-FL deleted the patch-captivportal-reconfigure branch March 16, 2019 10:58

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Captive Portal : Fix the bug affecting users getting disconnected when reconfiguring a zone #4031

Captive Portal : Fix the bug affecting users getting disconnected when reconfiguring a zone #4031

Augustin-FL commented Jan 6, 2019 •

edited

Loading

plumbeo commented Jan 8, 2019

Gertjanpfsense commented Jan 8, 2019 •

edited

Loading

Augustin-FL commented Jan 8, 2019 •

edited

Loading

plumbeo commented Jan 8, 2019 •

edited

Loading

Gertjanpfsense commented Jan 9, 2019 •

edited

Loading

jim-p commented Jan 10, 2019

plumbeo commented Jan 10, 2019 •

edited

Loading

Augustin-FL commented Jan 11, 2019 •

edited

Loading

jim-p commented Jan 11, 2019

plumbeo commented Jan 12, 2019

Augustin-FL commented Jan 13, 2019 •

edited

Loading

Gertjanpfsense commented Jan 21, 2019 •

edited

Loading

plumbeo commented Jan 21, 2019 •

edited

Loading

plumbeo commented Feb 3, 2019

plumbeo commented Feb 18, 2019

jim-p left a comment

Augustin-FL commented Feb 18, 2019

Captive Portal : Fix the bug affecting users getting disconnected when reconfiguring a zone #4031

Captive Portal : Fix the bug affecting users getting disconnected when reconfiguring a zone #4031

Conversation

Augustin-FL commented Jan 6, 2019 • edited Loading

plumbeo commented Jan 8, 2019

Gertjanpfsense commented Jan 8, 2019 • edited Loading

Augustin-FL commented Jan 8, 2019 • edited Loading

plumbeo commented Jan 8, 2019 • edited Loading

Gertjanpfsense commented Jan 9, 2019 • edited Loading

jim-p commented Jan 10, 2019

plumbeo commented Jan 10, 2019 • edited Loading

Augustin-FL commented Jan 11, 2019 • edited Loading

jim-p commented Jan 11, 2019

plumbeo commented Jan 12, 2019

Augustin-FL commented Jan 13, 2019 • edited Loading

Gertjanpfsense commented Jan 21, 2019 • edited Loading

plumbeo commented Jan 21, 2019 • edited Loading

plumbeo commented Feb 3, 2019

plumbeo commented Feb 18, 2019

jim-p left a comment

Choose a reason for hiding this comment

Augustin-FL commented Feb 18, 2019

Augustin-FL commented Jan 6, 2019 •

edited

Loading

Gertjanpfsense commented Jan 8, 2019 •

edited

Loading

Augustin-FL commented Jan 8, 2019 •

edited

Loading

plumbeo commented Jan 8, 2019 •

edited

Loading

Gertjanpfsense commented Jan 9, 2019 •

edited

Loading

plumbeo commented Jan 10, 2019 •

edited

Loading

Augustin-FL commented Jan 11, 2019 •

edited

Loading

Augustin-FL commented Jan 13, 2019 •

edited

Loading

Gertjanpfsense commented Jan 21, 2019 •

edited

Loading

plumbeo commented Jan 21, 2019 •

edited

Loading