Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Clear stale SIP UDP states when WAN IP changes. #4726

Closed
wants to merge 3 commits into from

Conversation

wkochFPV
Copy link

This is a suggestion to fix VOIP/SIP connection problems after WAN IP change.
See issue #4652
It should be an alternative to #4594 and also to #2414 (both similar or even identical problems).

The purpose is to kill all remaining states referring to the previous IP (typically UDP MULTIPLE:MULTIPLE connections for voice-over-ip-communication on port 5060). No cheap hardware router does have these issues. From my point of view, this is a critical issue for any business use of OPNsense, particularly if many phones, PBXs, fax servers are connected to the router.

Unfortunately I do not have a build environment set up and the code provided IS NOT TESTED. This is supposed to be a replacement for awful workarounds with external scripts to monitor IP change and kill remaining states.

This is my first pull request ever, so please forgive me, if there is anything wrong.

Best regards,
Walter

@AdSchellevis
Copy link
Member

Hi Walter,

This won't work, filter list states returns all states (not filtered by an ip address). If I'm not mistaken we're not able to filter the correct states based on the interface ip since /sbin/pfctl -s state doesn't contain all required details.

I'm open for alternatives to replace the current option, which isn't pretty, but I do want to ask to test code before sending in a PR. Opening a new issue to discuss a concept is also an option (for example with some pfctl commands which explain how it should work).

Best regards,

Ad

Clear states.
@wkochFPV
Copy link
Author

Dear Ad,

if I am not totally wrong, we don't need to know, which states are associated with which interface. If the ip address of an interface changes, all states containing the previous ip address of that interface are useless and not needed any more.

I have set up a development environment with 2 VMs (one running OPNsense, the other a LAN client PC). On the client I installed a SIP client and registered it to a SIP server.

Changing WAN IP leads to remaining states with the previous WAN ip. These can be easily cleared with "pfctl -k [oldIP]".

The previouly suggested code was far too complex. My second commit essentially only runs this single command.

I've tested this with success multiple times with changing WAN IP addresses. The SIP client is able to reconnect at any time.

Best regards,
Walter

@wkochFPV
Copy link
Author

There appears to be a difference between
pfctl -k oldip -k 0.0.0.0/0
pfctl -k 0.0.0.0/0 -k oldip
which seems to leave some states
and
pfctl -k oldip
which seems to kill the remaining states.

This can also be tested using the "kill" button in the Dump States GUI (entering the old WAN ip to clear the residual states), which also fails to clear these remaining states (by applying the -k -k approach).

@AdSchellevis
Copy link
Member

Hi Walter,

I'm not 100% sure we should pin this on the inverse of the ip_change_kill_states, but this might improve the current situation. When the ip address doesn't change ($cacheip == $ip) I don't think we should reset states, this would prevent unwanted resets, this would also prevent resets when multiple virtual addresses are used, since these are static anyway and would never apply for one.

Best regards,

Ad

@wkochFPV
Copy link
Author

You are right, maybe rc.newwanip this is not the right place to reset the states. The error should be corrected where it happens in first place.

Here you mentioned, that pfctl -i interface -Fs would not work, because states are not bound to the interface (default setting)
#4594 (comment)

This is how the "dhclient-script" tries to clear the states:

                        $LOGGER "Removing states from old IP '${OLD_IP}' (new IP '${new_ip_address}')"
			pfctl -i $interface -Fs
			pfctl -K ${OLD_IP}/32
			_FLUSHED=1

The "pfctl -i $interface -Fs" fails to clear the states, because these are not bound to the interface (as you mentioned). The second command uses "-K" which (according to the manual) clears source tracking tables, but not the states.

This is how ppp-linkup.sh tries to clear the states:

                echo "Removing states to old router ${OLD_ROUTER}" | logger -t ppp-linkup
		pfctl -i ${1} -k 0.0.0.0/0 -k ${OLD_ROUTER}/32
		pfctl -i ${1} -k ${OLD_ROUTER}/32 -k 0.0.0.0/0

This partially works, but would leave some states as described earlier.

Fixing these scripts seems to be the key. I added a "pfctl -k" command to both scripts with initial successful results. I am going to test this further the next days.

Best regards, Walter

@AdSchellevis
Copy link
Member

Hi Walter,

Sounds like a plan, just let me know your findings and we'll discuss next steps. Fixing the two shell scripts sounds reasonable indeed.

Best regards,

Ad

@wkochFPV
Copy link
Author

Now I have investigated that a little further...

Currently (apart from the option ip_change_kill_states) there is no working state clearing code for DHCP, PPPOE and also not for manually changing a static WAN ip. The dhclient-script code is outdated, the ppp-linkup script has no code at all to clear states in case of ip change and is not aware of the previous ip address.

I looked at pfsense to find out, how they managed to fix this. They disabled the "dhclient-script" function to clear states entirely and moved that functionality to rc.newwanip. State clearing is only performed upon ip change (as you suggested earlier):

Here is a part of their code in rc.newwanip:

/* If the IP address changed, kill old states after rules and routing have been updated */
	if ($curwanip != $oldip) {
		if (isset($config['system']['ip_change_kill_states'])) {
			log_error("IP Address has changed, killing all states (ip_change_kill_states is set).");
			pfSense_kill_states(utf8_encode($oldip));
			filter_flush_state_table();
		} else {
			log_error("IP Address has changed, killing states on former IP Address $oldip.");
			pfSense_kill_states(utf8_encode($oldip));
		}
	}

And here is how they commented the old function in their dhclient script:

# NOTE: use of the below has been disabled because rc.newwanip handles this correctly and this
# unnecessarily killed states in multiple circumstances. Leaving here for now, should be safe
# to remove later.  -cmb 20141105
delete_old_states() {
	$LOGGER "Starting delete_old_states()"
	_FLUSHED=0
(...)

I find this a resonable solution that avoids duplicate code, adapted this to OPNsense (quite similar approach as chosen previously) and tested this with DHCP IP changes. I have no way to test this with PPPOE except our production system, that I am afraid to modify. It should work, however, with PPPOE WAN IP change also, as rc.newwanip is also called in these cases.

This code still does not handle manual IP changes of static WAN IPs, where the same problem arises. Maybe it would make sense to call rc.newwanip in these cases, also. The interface reconfiguration is too complex for me to understand all aspects and predict potential conflicts of calling rc.newwanip in these circumstances.

What do you think about that?

@AdSchellevis
Copy link
Member

Hi Walter,

It looks reasonable to me to remove the functionality from dhclient-script and make sure to drop states when the address changes, can you try 57f0575 on your end?

The changes to diag_dump_states.php I'm not sure about, wouldn't a single -k host do the same? it looks like an overkill (and is different than what we're trying todo with the newwanip).

Best regards,

Ad

@wkochFPV
Copy link
Author

wkochFPV commented Mar 7, 2021

Works like a charm, thank you! You are right, the -k host in diag_dump_states.php is enough.

@AdSchellevis
Copy link
Member

@wkochFPV this will likely take some time to land in a production version, I expect we will move some bits and pieces somewhere along the way (as mentioned in the commit comments). Thanks for confirming this fixes the issue on your end!

@fichtner
Copy link
Member

fichtner commented Mar 7, 2021

Close this then? :)

@AdSchellevis
Copy link
Member

yep :)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Development

Successfully merging this pull request may close these issues.

3 participants