Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

IPSec faulty routing with multiple phase 2 tunnels and NAT #2173

Closed
joakim666 opened this issue Feb 6, 2018 · 16 comments
Closed

IPSec faulty routing with multiple phase 2 tunnels and NAT #2173

joakim666 opened this issue Feb 6, 2018 · 16 comments
Labels
support Community support

Comments

@joakim666
Copy link

When tunneling multiple subnets through an IPSec tunnel I have a problem that all traffic gets routed through one of the p2-tunnels.

I can see the bytes out counter increase on one of the p2-tunnels and by tcpdumping on the WAN-interface I can see that the ESP-packets all go out with the same SPI-identifier no matter the destination subnet.

After lots of troubleshooting I tracked it down to how the manually added SPD entries in the GUI are added to the kernel. As I'm using NAT to get all outbound traffic behind one IP no matter the subnet I enter the same LAN subnet as 'Manual SPD entries' for all phase 2 tunnels.

In ipsec.inc these manually added SPD entries are added to the kernel like this:

"spdadd -%s %s %s any -P out ipsec %s/tunnel/%s-%s/require;",

But for me %s/tunnel/%s-%s/require isn't exact enough and will apparently match all (or rather just the last) of the outbound SPD entries added by Strongswan.

I find the man-page for setkey a bit unclear:

             A value of require means that an SA is required
	     whenever the kernel sends a packet	matched	that matches the pol-
	     icy.  The unique level is the same	as require but,	in addition,
	     it	allows the policy to bind with the unique out-bound SA.	 For
	     example, if you specify the policy	level unique, racoon(8)	will
	     configure the SA for the policy.  If you configure	the SA by man-
	     ual keying	for that policy, you can put the decimal number	as the
	     policy identifier after unique separated by colon `:' as in the
	     following example:	unique:number.	In order to bind this policy
	     to	the SA,	number must be between 1 and 32767, which corresponds
	     to	extensions -u of manual	SA configuration.

But I do know that if I remove the SPD entries added by OPNsense and add new ones using the unique level like this:

spdadd -%s %s %s any -P out ipsec %s/tunnel/%s-%s/unique:<id of matching outbound Strongswan added SPD for given destination subnet>;

it works and the traffic to the different destination subnets get routed through the correct p2-tunnel.

However, I'm not sure how to fix this correctly in OPNsense.

  • Should the handling of the 'Manual SPD entries' be changed to always use unique instead of require? Or would that break something in some other case?
  • And how do we get the so called 'policy identifier' (i.e. the number) from the SPD entries added by Strongswan. Can we really trust that Strongswan always has added the SPD entries when ipsec_configure_spd() is called? If so, it's easy to just parse them out.
  • There is also a weird assumption when removing previous added SPD entries that all entries added by OPNsense are of the required type "we'll assume that the require items in the spd are our manual items, so they will be removed first". This won't hold anymore, so we need some other way to keep track of which SPD entries that OPNsense added, so they can be removed. Not really sure of a good way to do that either.

I can help out with a patch for this problem, but given that I don't know the thoughts and ideas behind the current implementation I feel I need some guidance!

@joakim666
Copy link
Author

joakim666 commented Feb 6, 2018

Support for manually defined spd entries was added by @AdSchellevis in 814d18a for #440

@joakim666
Copy link
Author

@AdSchellevis apparently as you discussed in this comment in the #440 issue you thought about doing it with unique:<id> but got convinced that it wasn't necessary and didn't do it.

Something in my setup apparently breaks the assumption that using require is enough :(

@AdSchellevis
Copy link
Member

@joakim666 yes, that was indeed the case. As far as I know you can't control the internal spd's, which makes it quite hard to setup a policy which will work guaranteed (sensitive to race conditions). Eventually during our testing it didn't seem necessary to bind to specific tunnels.

What does your setup look like? what nets are you trying to route?

@joakim666
Copy link
Author

Basically it's local lan NAT:ed behind single IP to two remote subnets. So nothing that much out of the ordinary.

192.168.122.0/24 -> NAT(x.y.z.10/32) -> 10.20.0.0/16, 172.16.0.0/16

Using IKEv2, one Phase 1 entry with multiple Phase 2 entries.

I replaced the public ips below. I hope it's still understandable:

a.b.c.d = remote peer public ip
x.y.z.9 = my peer public ip
x.y.z.10 = my public ip that all traffic is NAT:ed behind

192.168.122.0/24 = local lan added under Manual SPD entries in GUI

Output of setkey -DP:

10.20.0.0/16[any] x.y.z.10[any] any
	in ipsec
	esp/tunnel/a.b.c.d-x.y.z.9/unique:1
	created: Feb  6 14:24:43 2018  lastused: Feb  6 14:24:43 2018
	lifetime: 9223372036854775807(s) validtime: 0(s)
	spid=55 seq=7 pid=99980
	refcnt=1
172.16.0.0/16[any] x.y.z.10[any] any
	in ipsec
	esp/tunnel/a.b.c.d-x.y.z.9/unique:4
	created: Feb  3 21:01:49 2018  lastused: Feb  3 21:01:49 2018
	lifetime: 9223372036854775807(s) validtime: 0(s)
	spid=23 seq=6 pid=99980
	refcnt=1
192.168.122.0/24[any] 10.20.0.0/16[any] any
	out ipsec
	esp/tunnel/x.y.z.9-a.b.c.d/require
	spid=25 seq=5 pid=99980
	refcnt=1
192.168.122.0/24[any] 172.16.0.0/16[any] any
	out ipsec
	esp/tunnel/x.y.z.9-a.b.c.d/require
	spid=27 seq=3 pid=99980
	refcnt=1
x.y.z.10[any] 10.20.0.0/16[any] any
	out ipsec
	esp/tunnel/x.y.z.9-a.b.c.d/unique:1
	created: Feb  6 14:24:43 2018  lastused: Feb  6 14:24:43 2018
	lifetime: 9223372036854775807(s) validtime: 0(s)
	spid=56 seq=1 pid=99980
	refcnt=1
x.y.z.10[any] 172.16.0.0/16[any] any
	out ipsec
	esp/tunnel/x.y.z.9-a.b.c.d/unique:4
	created: Feb  3 21:01:49 2018  lastused: Feb  3 21:01:49 2018
	lifetime: 9223372036854775807(s) validtime: 0(s)
	spid=24 seq=0 pid=99980
	refcnt=1

Given the above, all traffic will go through the 'x.y.z.10[any] 172.16.0.0/16[any] any' tunnel no matter the destination.

If I remove the SPD entries OPNsense added (that uses require) and add new ones using either 'esp/tunnel/x.y.z.9-a.b.c.d/unique:1' or 'esp/tunnel/x.y.z.9-a.b.c.d/unique:4' depending on the destination the traffic will go out the right way.

@AdSchellevis
Copy link
Member

At the moment I don't have time to build a test setup, not sure if the current construction can work for this. If traffic from 192.168.122.0/24 -> 172.16.0.0/16 doesn't use the second tunnel [4] we might need to bind it explicit, although that will require more complex logic to keep it in sync.

@joakim666
Copy link
Author

For now I removed the added 'Manual SPD entries' through the GUI and add them manually (using unique to point to the correct p2-tunnel) with a script through ssh. But that's not an optimal solution.

@fraenki
Copy link
Member

fraenki commented Feb 7, 2018

@joakim666 Excuse me if I'm wrong, but doesn't this sound similar to #1773?

@joakim666
Copy link
Author

@fraenki sorry for the late reply but it does indeed sound similar, but in your described case it's the ip-address that gets wrong and for me it's the require vs unique:x part. But maybe the core issue is the same?

@mimugmail
Copy link
Member

@joakim666 can you send me the script you use with SSH? I'd like to work on it to get it to core or first as a plugin.

@joakim666
Copy link
Author

@mimugmail I'm afraid the script got lost when we changed to terminate the IPSec tunnels in strongswan directly and stopped using OPNsense for this. But if I remember correctly I executed the script every few minutes from cron and it basically just checked which ipsec tunnels were up and for each tunnel if it has SPD entries with require remove those and add new SPD entries using unique instead.

@mikipn
Copy link

mikipn commented Aug 28, 2020

hi @mimugmail I have just run into same problem, and for me I wrote a script to help me
I can attach it here, not sure how it can help you.

fixipsec.txt

This solves my problem, it is not generic one.

@mimugmail
Copy link
Member

Wow .. it seems this one fixes it. Got a confirmation of affected customer that adding NAT IP to WAN as an Alias works (but only for outgoing nat and not BINAT).
https://forum.opnsense.org/index.php?topic=19733.msg91414#msg91414 (sadly in German)

@mimugmail
Copy link
Member

@joakim666 @mikipn this is now working out of the box starting with 21.1 RC1 .. maybe you have time to test :)

@cluck
Copy link

cluck commented May 3, 2021

Please update the documentation at https://docs.opnsense.org/manual/how-tos/ipsec-s2s-binat.html: the note at the bottom still mentions a limitation.

@AdSchellevis
Copy link
Member

@cluck you know contributing is also an option ? (https://github.com/opnsense/docs)

@cluck
Copy link

cluck commented May 3, 2021

I already contributed what I can herein. I haven't contributed the fix nor do I have studied the original use case, so I don't feel competent to revise the documentation. But I feel competent enough to notice the discrepancy.

Kind instructions on various options for contributing to the quality of the documentation would be best placed in the documentation footer itself.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
support Community support
Development

No branches or pull requests

7 participants