Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

IPsec config reload uninstalls manual SPD entries #6950

Closed
2 tasks done
Daggolin opened this issue Oct 20, 2023 · 10 comments
Closed
2 tasks done

IPsec config reload uninstalls manual SPD entries #6950

Daggolin opened this issue Oct 20, 2023 · 10 comments
Assignees
Labels
support Community support
Milestone

Comments

@Daggolin
Copy link
Contributor

Important notices

Before you add a new report, we ask you kindly to acknowledge the following:

Describe the bug

When triggering an action on the web interface that leads to a new swanctl.conf file being written and the config being reloaded any IPsec tunnel relying on manual SPD entries (VPN > IPsec > Security Policy Database) stops working.

Additional context

I am not sure where the core issue originates from, but I have noticed the following:

  • when the IPsec tunnel is running correctly the manual SPD entries in the list of installed entries all show the Reqid 1, which is not the ID of the tunnel they belong to
  • when the new swanctl.conf file is written (for instance by clicking the apply button on VPN > IPsec > Connections) the function ipsec_configure_spd generates the spddelete instructions for the manual entries and stores them for regid 1, thus triggering their deletion when the commands are written to file and applied using /sbin/setkey -f; this seems to be what leads to the loss of connection
  • the manual SPD entries are stored in /usr/local/etc/swanctl/reqid_events.conf and get applied by the ipsec/updown_event.py script on such an event, however the config reload doesn't trigger such an event so whenever ipsec_configure_spd is called the manual entries get removed and not reapplied until the tunnel is reconnected (triggering a new event)

The manual SPD entries used in our setup don't have an explicit Reqid set, they are associated to the phase 2 using the Connection child dropdown menu.

Expected behavior

Either ipsec_configure_spd should skip the removal of manual SPD entries for active phase 2 connections or it should invoke the same logic as ipsec/updown_event.py to restore them after their deletion.

The active IPsec tunnels that rely on manual SPD entries should keep functioning when the IPsec config gets updated. Adding a new PSK (VPN > IPsec > Pre-Shared Keys) or enabling/disabling an IPsec tunnel (VPN > IPsec > Connections; VPN > IPsec > Tunnel Settings [legacy]) should not remove the installed manual SPD entries and break tunnels.

Environment

Versions:

  • OPNsense 23.7.6-amd64
  • FreeBSD 13.2-RELEASE-p3
  • OpenSSL 1.1.1w 11 Sep 2023

CPU type:

  • Intel(R) Xeon(R) CPU D-1518 @ 2.20GHz (4 cores, 8 threads)
@AdSchellevis
Copy link
Member

Which option was used to build the tunnels? The legacy tunnel one or the new connections?

@Daggolin
Copy link
Contributor Author

The new connections. Currently in the process of migrating from legacy to the new connections. The legacy tunnels had a field to specify the SPD entries and those still work fine. The OPNsense instance still has several legacy tunnels running, but only the one running as new connection is using the VPN > IPsec > Security Policy Database manual entries (cause the new connections don't have an easy to use field like the legacy connections do). The issue only affects the new connections, because the legacy instances still use the field.

@AdSchellevis
Copy link
Member

The new one is more flexible as it can also hook multiple clients, but this issue can likely be triggered when using both in combination. If you remove the ipsec_configure_spd() in the following line, does it work correctly then?

ipsec_configure_spd();

I'm not sure I can offer a 100% solid fix, but preventing this from happening when manual spd's aren't used in legacy shouldn't be an issue.

@Daggolin
Copy link
Contributor Author

Yes, I had tested this in the meantime to confirm the observations and conclusions I initially posted.

@AdSchellevis AdSchellevis self-assigned this Oct 20, 2023
@AdSchellevis AdSchellevis added the cleanup Low impact changes label Oct 20, 2023
@AdSchellevis
Copy link
Member

I'm thinking maybe we can just remove the items attached to any of the legacy tunnels, when a tunnel is dropped and the reqid isn't used, the garbage left shouldn't hurt anyone.

@fichtner fichtner added this to the 24.1 milestone Oct 20, 2023
@AdSchellevis
Copy link
Member

@Daggolin looking at the ticket and the code again, I think you're reusing the same reqid as being used in the legacy tunnel settings. According to the following code section it should only drop entries which belong to the legacy tunnel already:

$reqid_mapping = ipsec_parse_phase2($ph1ent['ikeid'])['uniqid_reqid'];
foreach (array_unique(array_values($reqid_mapping)) as $reqid) {
if (!empty($previous_spd_entries[$reqid])) {
foreach ($previous_spd_entries[$reqid] as $spd_entry) {
$spd_entries[] = $spd_entry;
}
unset($previous_spd_entries[$reqid]);
}
}

If you use a higher reqid (or link the entry to the correct child) it should work as expected as far as I can see.

@AdSchellevis AdSchellevis added support Community support and removed cleanup Low impact changes labels Oct 21, 2023
@AdSchellevis AdSchellevis removed their assignment Oct 21, 2023
@Daggolin
Copy link
Contributor Author

@Daggolin looking at the ticket and the code again, I think you're reusing the same reqid as being used in the legacy tunnel settings.

As I said in the inital description of the issue, the shown reqid for the installed manual SPD entries is 1, which is the ID of another tunnel. So yes, a reuse is very likely happening under the hood, but it's not explicitly set by me to collide. I may have worded that unclear in my initial description of the issue. I have now manually checked swanctl.conf and swanctl -l to confirm this.

The new connection does not have an explicit reqid set for its child on the WebUI, because the help text suggests that OPNsense or swanctl is going to automatically pick incremental ones (This might be helpful in some scenarios, like route based tunnels (VTI), but works only if each CHILD_SA configuration is instantiated not more than once. The default uses dynamic reqids, allocated incrementally). Looking at the swanctl.conf file generated by OPNsense it seems that the new connection doesn't get an explicit reqid assigned by OPNsense, but it's rather left unset which implicates the default-value of 0 for reqid, causing swantctl to allocate a reqid incrementally on start. In our case - according to swanctl -l - this means that the new connection's child got the reqid 1 at swanctl restart/reconnect, colliding with the legacy tunnel that OPNsense explicity assigned the same id to. It seems the existence of legacy tunnels is enough to make the automatic value unreliable.

Manually assigning a high reqid to the child of the new tunnel solves the issue of SPD entries getting deleted. However, this still means that the use of automatically incrementally allocated reqid in new connections does not work reliable if you have even a single legacy connection configured (even if it’s disabled the WebUI might draw the wrong conclusions, haven’t tested that). Maybe OPNsense could handle those incrementally allocated reqid internally, taking legacy tunnels into consideration and explicity writing a reqid to the swantctl.conf file instead of leaving it up to swanctl by not setting it. I honestly think that mixed setups with legacy connections and new connections existing side-by-side are going to be quite common for a while when users migrate their existing legacy connections to new connections one by one.

@AdSchellevis
Copy link
Member

I'm afraid we can't fix this, legacy pins the reqids in which case you will have to pin them in the connections as well. We can make a note somewhere in the documentation that there are constraints when using both options, but there's not much else we can do.

The reason for not pinning reqid's in connections in quite simple, when multiple clients connect, they all will use a unique reqid at strongswan's end unless you enforce anything. One other option you could try is to replace the lower numbers in the tunnels manually (they're stored in the config.xml) with higher ones so their less likely to collide.

@Daggolin
Copy link
Contributor Author

The reason for not pinning reqid's in connections in quite simple, when multiple clients connect, they all will use a unique reqid at strongswan's end unless you enforce anything.

I was afraid something like that was the reason why you decided to pass no value (0) into the swanctl config in the first place.

One other option you could try is to replace the lower numbers in the tunnels manually (they're stored in the config.xml) with higher ones so their less likely to collide.

I had already considered this option, but I don’t think it offers any advantages over assigning high reqid to the new connections within the setup I am working with, but thanks for the suggestion.

We can make a note somewhere in the documentation that there are constraints when using both options, but there's not much else we can do.

It would be very helpful if the help text of the reqid field of the new connections would explicitly mention that using both, legacy and new connections, can cause issues and users should always assign a unique value themselves if they still have legacy tunnels set up.

Alternatively, as OPNsense explicitly assigns the values for legacy tunnels, OPNsense could automatically assign high numbers to them, but that would just shift the issue elsewhere and if you would automatically reassign existing config values during an update it could mess with users who already set manual values, potentially breaking their setup. So that‘s not a good option really.

As you said this is not something you can fix currently and as possible workarounds exist, feel free to close this issue if you don‘t need any infos or input from me unless you want to keep it around until documentation and/or possible button adjustments are done.

@AdSchellevis
Copy link
Member

Let me think a bit about wording and close the issue later. I do agree explaining the constraints would be helpful, although our documentation is likely a better place to do that.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
support Community support
Development

No branches or pull requests

3 participants