Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

NAT reflection is broken #6650

Closed
2 tasks done
danwilliams opened this issue Jul 4, 2023 · 74 comments
Closed
2 tasks done

NAT reflection is broken #6650

danwilliams opened this issue Jul 4, 2023 · 74 comments
Labels
help wanted Contributor missing / timeout support Community support

Comments

@danwilliams
Copy link

Note: It is perhaps not "new", but there are no open tickets relating to it. Some very similar, potentially the same, have been closed with unclear reasons. It may be that this ticket provides additional material which will help achieve a resolution.

Describe the bug

Using a clean, brand-new installation of the latest OPNsense, NAT reflection does not work.

Background

It has been a few years since I last set up pfSense, and in the intervening time it appears OPNsense has grown in popularity (I had not previously heard of it). The debacle with the pfSense codebase (dodgy Wireguard patches from Netgate to the BSD kernel, that kinda thing) made interesting reading, and I therefore deleted my freshly-downloaded copy of pfSense and instead turned to OPNsense. Setup was painless, and within a short space of time I was up and running, and had replaced my Ubiquiti EdgeRouter with equivalent settings for Internet connectivity and DHCP leases. Just some simple port forwarding to set up to have a complete replacement, and...

The problem

Having configured a pretty vanilla setup, with a PPPoE WAN and a single-subnet LAN, I was surprised when the port forwards I set up did not work. Reading a little around the subject I realised that they were actually working, and external access was fine, but I had to enable NAT reflection in the main/global settings in order to get internal resolution. I dutifully did this, and descended into a rabbit-hole for several hours, trying lots of things with no success. Along the way I found several reports by other people who have apparently encountered the same, or a very similar, issue.

In a nutshell (TLDR)

If you set up NAT port forwarding, even if you have NAT reflection enabled in the main settings and on the forwarding rule, there is no internal resolution of traffic that is directed towards the WAN interface.

To reproduce

Bear with me, as this is a detailed account of how to start from scratch and verify the issue, along with notes and data.

Initial setup

Define VM

Create VM configuration

This setup is running on a VM using KVM on Linux, specifically, Ubuntu Server 23.04.

Create a new virtual machine using virt-manager. Mine is:

  • 4 CPUs
  • 4GB RAM
  • Assign one virtual NIC against br0 (this is a host bridge that is present on the LAN)
  • Assign a physical PCI card for the enp60s0 Intel NIC (this is the interface for the WAN)
  • Assign disk device to the ZFS zvol (the host runs ZFS, and OPNsense is installed into a zvol)

Ensure there are no references to enp60s0 in the host's netplan config (as this card is a passthrough device to be owned by the OPNsense VM).

Autostart

This is so it will always come to life with the host machine.

virsh autostart OPNsense

Install OPNsense

Boot VM

Install by following the steps laid out, but notably:

  • Use UFS (as this system is backed by ZFS, so no point having ZFS on ZFS)
  • Interfaces should get assigned automatically, but best to specify manually:
    • igc0 to WAN (this is the passthrough physical NIC)
    • vtnet0 to LAN (this is the br0 bridge)

Configure via CLI

Using the CLI, log in and, using the menu:

  • Assign an IP address of 10.0.0.1 to the LAN interface
  • Enable DHCP when prompted
  • Disable HTTPS when prompted

Configure via web GUI

Log into the web UI and run through the setup wizard.

  • Hostname: router
  • Domain: lan
  • Do not allow ISP to override DNS (i.e. disable this)
  • Set DNS server to 10.0.0.2 (the network currently runs a separate DNSmasq server that will later be moved into OPNsense)
  • PPPoE
    • Username: xxxxx
    • Password: yyyyy

It should connect and assign the correct public IP address from the ISP, which will be referred to from now on as 1.2.3.4.

Configure OPNsense

Static IPs for DHCP clients

  • Services → DHCPv4 → [LAN]
    • Set range to 10.0.0.150-239
    • Add all static mappings:
      • MAC address
      • IP address
      • Hostname
      • ARP table static entry: enabled

Port forwarding

  • Firewall → Settings → Advanced
    • Reflection for port forwards: Enabled
    • Reflection for 1:1: Enabled (I am not sure this one should be strictly necessary, but I tried with and without)
    • Automatic outbound NAT for Reflection: Enabled
  • Firewall → NAT → Port Forward
    • Interface: WAN, LAN
    • TCP/IP version: IPv4
    • Protocol: TCP/UDP
    • Destination: WAN address
    • Destination port range: (as required - here I have 80, 443, and 22, so three rules in total)
    • Redirect target IP: Single host or Network
      • 10.0.0.5
    • Redirect target port: (as required - here I have 80, 443, and 2222)
    • Description: (as required)
    • NAT reflection: Enable
    • Filter rule association: Add associated filter rule

I added three rules, for:

  • 80 → 80 (HTTP -> HTTP)
  • 443 → 443 (HTTPS -> HTTPS)
  • 22 → 2222 (SSH -> Other: 2222)

Examination of configuration

The steps above represent my latest, most complete attempt, which is I believe the most accurate and therefore the best to represent as a test case for reproduction.

It is worth noting that, along the way, I tried multiple variations:

  1. My first attempt added the port forwarding rules with no NAT reflection. I then enabled this globally (no effect), and then enabled it in each rule as well (no effect). I then deleted all the rules and recreated them (no effect). This attempt notably only got applied to the WAN interface, as indicated by most documentation.
  2. A later attempt (reverting config and indeed disk snapshot in the meantime) ensured to set up the global NAT reflection config from the start (thinking, perhaps the order mattered, and caused the problem). This made no difference.
  3. I then played with other options, eventually arriving at a position where the LAN interface was specified along with the WAN interface. Paranoid that my fiddling had caused unseen issues, I rolled back the ZFS disk snapshot several times, and made a number of careful comparisons of the config changes.

Note: Although it seems reasonable that the LAN interface might need specifying as well as the WAN interface, I stumbled upon this by accident when reading around the subject. Some comments somewhere. It does make sense, and as the target is the WAN address, should not conflict with the LAN address services. Unfortunately this still didn't work for me, but if it is indeed a requirement then maybe it should be documented more clearly - for instance in the UI, which itself suggests the WAN interface is what is usually needed.

First config changes

These are the changes resulting from my first attempt, i.e.:

  • Adding the rules to WAN, without reflection
  • Adding the global reflection config
  • Enabling reflection in the rules

I am including these lines for completeness, because I believe that path should work - if not done properly first time, a correction through editing should surely be sufficient.

Configuration diff from 7/4/23 08:08:45 to 7/4/23 10:25:06
--- /conf/backup/config-1688458125.1867.xml	2023-07-04 08:08:45.187386000 +0000
+++ /conf/backup/config-1688466306.9975.xml	2023-07-04 10:25:06.999178000 +0000
@@ -228,14 +228,13 @@
       <protocol>http</protocol>
       <ssl-certref>64a3c924280cb</ssl-certref>
     </webgui>
-    <disablenatreflection>yes</disablenatreflection>
     <usevirtualterminal>1</usevirtualterminal>
     <disableconsolemenu/>
     <disablevlanhwfilter>1</disablevlanhwfilter>
     <disablechecksumoffloading>1</disablechecksumoffloading>
     <disablesegmentationoffloading>1</disablesegmentationoffloading>
     <disablelargereceiveoffloading>1</disablelargereceiveoffloading>
-    <ipv6allow/>
+    <ipv6allow>1</ipv6allow>
     <powerd_ac_mode>hadp</powerd_ac_mode>
     <powerd_battery_mode>hadp</powerd_battery_mode>
     <powerd_normal_mode>hadp</powerd_normal_mode>
@@ -255,6 +254,11 @@
     </firmware>
     <language>en_US</language>
     <dnsserver>10.0.0.2</dnsserver>
+    <enablenatreflectionhelper>yes</enablenatreflectionhelper>
+    <maximumstates/>
+    <maximumfrags/>
+    <aliasesresolveinterval/>
+    <maximumtableentries/>
   </system>
   <interfaces>
     <wan>
@@ -619,6 +623,96 @@
     <outbound>
       <mode>automatic</mode>
     </outbound>
+    <rule>
+      <protocol>tcp/udp</protocol>
+      <interface>wan</interface>
+      <category/>
+      <ipprotocol>inet</ipprotocol>
+      <descr>Gitea HTTP</descr>
+      <tag/>
+      <tagged/>
+      <poolopts/>
+      <associated-rule-id>nat_64a3d5cec4db08.05189375</associated-rule-id>
+      <target>10.0.0.5</target>
+      <local-port>80</local-port>
+      <source>
+        <any>1</any>
+      </source>
+      <destination>
+        <network>wanip</network>
+        <port>80</port>
+      </destination>
+      <updated>
+        <username>root@10.0.0.40</username>
+        <time>1688466293.9502</time>
+        <description>/firewall_nat_edit.php made changes</description>
+      </updated>
+      <created>
+        <username>root@10.0.0.40</username>
+        <time>1688458702.8064</time>
+        <description>/firewall_nat_edit.php made changes</description>
+      </created>
+    </rule>
+    <rule>
+      <protocol>tcp/udp</protocol>
+      <interface>wan</interface>
+      <category/>
+      <ipprotocol>inet</ipprotocol>
+      <descr>Gitea HTTPS</descr>
+      <tag/>
+      <tagged/>
+      <poolopts/>
+      <associated-rule-id>nat_64a3d5f4d81474.92278692</associated-rule-id>
+      <target>10.0.0.5</target>
+      <local-port>443</local-port>
+      <source>
+        <any>1</any>
+      </source>
+      <destination>
+        <network>wanip</network>
+        <port>443</port>
+      </destination>
+      <updated>
+        <username>root@10.0.0.40</username>
+        <time>1688466302.5817</time>
+        <description>/firewall_nat_edit.php made changes</description>
+      </updated>
+      <created>
+        <username>root@10.0.0.40</username>
+        <time>1688458740.8851</time>
+        <description>/firewall_nat_edit.php made changes</description>
+      </created>
+    </rule>
+    <rule>
+      <protocol>tcp/udp</protocol>
+      <interface>wan</interface>
+      <category/>
+      <ipprotocol>inet</ipprotocol>
+      <descr>Gitea SSH</descr>
+      <tag/>
+      <tagged/>
+      <poolopts/>
+      <associated-rule-id>nat_64a3d6261c0f80.88097215</associated-rule-id>
+      <target>10.0.0.5</target>
+      <local-port>2222</local-port>
+      <source>
+        <any>1</any>
+      </source>
+      <destination>
+        <network>wanip</network>
+        <port>22</port>
+      </destination>
+      <updated>
+        <username>root@10.0.0.40</username>
+        <time>1688466306.9404</time>
+        <description>/firewall_nat_edit.php made changes</description>
+      </updated>
+      <created>
+        <username>root@10.0.0.40</username>
+        <time>1688458790.115</time>
+        <description>/firewall_nat_edit.php made changes</description>
+      </created>
+    </rule>
   </nat>
   <filter>
     <rule>
@@ -645,6 +739,69 @@
         <any/>
       </destination>
     </rule>
+    <rule>
+      <source>
+        <any>1</any>
+      </source>
+      <interface>wan</interface>
+      <statetype>keep state</statetype>
+      <protocol>tcp/udp</protocol>
+      <ipprotocol>inet</ipprotocol>
+      <destination>
+        <address>10.0.0.5</address>
+        <port>80</port>
+      </destination>
+      <descr>Gitea HTTP</descr>
+      <category/>
+      <associated-rule-id>nat_64a3d5cec4db08.05189375</associated-rule-id>
+      <created>
+        <username>root@10.0.0.40</username>
+        <time>1688458702.8063</time>
+        <description>/firewall_nat_edit.php made changes</description>
+      </created>
+    </rule>
+    <rule>
+      <source>
+        <any>1</any>
+      </source>
+      <interface>wan</interface>
+      <statetype>keep state</statetype>
+      <protocol>tcp/udp</protocol>
+      <ipprotocol>inet</ipprotocol>
+      <destination>
+        <address>10.0.0.5</address>
+        <port>443</port>
+      </destination>
+      <descr>Gitea HTTPS</descr>
+      <category/>
+      <associated-rule-id>nat_64a3d5f4d81474.92278692</associated-rule-id>
+      <created>
+        <username>root@10.0.0.40</username>
+        <time>1688458740.8851</time>
+        <description>/firewall_nat_edit.php made changes</description>
+      </created>
+    </rule>
+    <rule>
+      <source>
+        <any>1</any>
+      </source>
+      <interface>wan</interface>
+      <statetype>keep state</statetype>
+      <protocol>tcp/udp</protocol>
+      <ipprotocol>inet</ipprotocol>
+      <destination>
+        <address>10.0.0.5</address>
+        <port>2222</port>
+      </destination>
+      <descr>Gitea SSH</descr>
+      <category/>
+      <associated-rule-id>nat_64a3d6261c0f80.88097215</associated-rule-id>
+      <created>
+        <username>root@10.0.0.40</username>
+        <time>1688458790.1149</time>
+        <description>/firewall_nat_edit.php made changes</description>
+      </created>
+    </rule>
   </filter>
   <rrd>
     <enable/>
@@ -700,9 +857,9 @@
     <column_count>2</column_count>
   </widgets>
   <revision>
-    <username>root@10.0.0.161</username>
-    <time>1688458125.1867</time>
-    <description>/services_dhcp_edit.php made changes</description>
+    <username>root@10.0.0.40</username>
+    <time>1688466306.9975</time>
+    <description>/firewall_nat_edit.php made changes</description>
   </revision>
   <OPNsense>
     <IPsec version="1.0.1">

Latest config changes

These are the changes resulting from my latest attempt, which is the most complete and correct, i.e. as described in my steps to reproduce:

  • Adding the global reflection config first, including all three options being enabled
  • Adding the rules to WAN plus LAN, ensuring to enable reflection

I believe that these steps should have resulted in a correct outcome.

Configuration diff from 7/4/23 08:08:45 to 7/4/23 13:25:07
--- /conf/backup/config-1688458125.1867.xml	2023-07-04 08:08:45.187386000 +0000
+++ /conf/config.xml	2023-07-04 13:25:07.469589000 +0000
@@ -228,14 +228,13 @@
       <protocol>http</protocol>
       <ssl-certref>64a3c924280cb</ssl-certref>
     </webgui>
-    <disablenatreflection>yes</disablenatreflection>
     <usevirtualterminal>1</usevirtualterminal>
     <disableconsolemenu/>
     <disablevlanhwfilter>1</disablevlanhwfilter>
     <disablechecksumoffloading>1</disablechecksumoffloading>
     <disablesegmentationoffloading>1</disablesegmentationoffloading>
     <disablelargereceiveoffloading>1</disablelargereceiveoffloading>
-    <ipv6allow/>
+    <ipv6allow>1</ipv6allow>
     <powerd_ac_mode>hadp</powerd_ac_mode>
     <powerd_battery_mode>hadp</powerd_battery_mode>
     <powerd_normal_mode>hadp</powerd_normal_mode>
@@ -255,6 +254,12 @@
     </firmware>
     <language>en_US</language>
     <dnsserver>10.0.0.2</dnsserver>
+    <enablebinatreflection>yes</enablebinatreflection>
+    <enablenatreflectionhelper>yes</enablenatreflectionhelper>
+    <maximumstates/>
+    <maximumfrags/>
+    <aliasesresolveinterval/>
+    <maximumtableentries/>
   </system>
   <interfaces>
     <wan>
@@ -619,6 +624,99 @@
     <outbound>
       <mode>automatic</mode>
     </outbound>
+    <rule>
+      <protocol>tcp/udp</protocol>
+      <interface>lan,wan</interface>
+      <category/>
+      <ipprotocol>inet</ipprotocol>
+      <descr>Gitea HTTP</descr>
+      <tag/>
+      <tagged/>
+      <poolopts/>
+      <associated-rule-id>nat_64a41d269dddd8.11732562</associated-rule-id>
+      <target>10.0.0.5</target>
+      <local-port>80</local-port>
+      <source>
+        <any>1</any>
+      </source>
+      <destination>
+        <network>wanip</network>
+        <port>80</port>
+      </destination>
+      <natreflection>purenat</natreflection>
+      <updated>
+        <username>root@10.0.0.40</username>
+        <time>1688476966.6467</time>
+        <description>/firewall_nat_edit.php made changes</description>
+      </updated>
+      <created>
+        <username>root@10.0.0.40</username>
+        <time>1688476966.6467</time>
+        <description>/firewall_nat_edit.php made changes</description>
+      </created>
+    </rule>
+    <rule>
+      <protocol>tcp/udp</protocol>
+      <interface>lan,wan</interface>
+      <category/>
+      <ipprotocol>inet</ipprotocol>
+      <descr>Gitea HTTPS</descr>
+      <tag/>
+      <tagged/>
+      <poolopts/>
+      <associated-rule-id>nat_64a41d7112c513.18400326</associated-rule-id>
+      <target>10.0.0.5</target>
+      <local-port>443</local-port>
+      <source>
+        <any>1</any>
+      </source>
+      <destination>
+        <network>wanip</network>
+        <port>443</port>
+      </destination>
+      <natreflection>purenat</natreflection>
+      <updated>
+        <username>root@10.0.0.40</username>
+        <time>1688477041.0769</time>
+        <description>/firewall_nat_edit.php made changes</description>
+      </updated>
+      <created>
+        <username>root@10.0.0.40</username>
+        <time>1688477041.0769</time>
+        <description>/firewall_nat_edit.php made changes</description>
+      </created>
+    </rule>
+    <rule>
+      <protocol>tcp/udp</protocol>
+      <interface>lan,wan</interface>
+      <category/>
+      <ipprotocol>inet</ipprotocol>
+      <descr>Gitea SSH</descr>
+      <tag/>
+      <tagged/>
+      <poolopts/>
+      <associated-rule-id>nat_64a41dafcab5c5.02730287</associated-rule-id>
+      <target>10.0.0.5</target>
+      <local-port>2222</local-port>
+      <source>
+        <any>1</any>
+      </source>
+      <destination>
+        <network>wanip</network>
+        <port>22</port>
+      </destination>
+      <natreflection>purenat</natreflection>
+      <updated>
+        <username>root@10.0.0.40</username>
+        <time>1688477103.8303</time>
+        <description>/firewall_nat_edit.php made changes</description>
+      </updated>
+      <created>
+        <username>root@10.0.0.40</username>
+        <time>1688477103.8303</time>
+        <description>/firewall_nat_edit.php made changes</description>
+      </created>
+    </rule>
   </nat>
   <filter>
     <rule>
@@ -645,6 +743,75 @@
         <any/>
       </destination>
     </rule>
+    <rule>
+      <source>
+        <any>1</any>
+      </source>
+      <interface>lan,wan</interface>
+      <statetype>keep state</statetype>
+      <protocol>tcp/udp</protocol>
+      <ipprotocol>inet</ipprotocol>
+      <destination>
+        <address>10.0.0.5</address>
+        <port>80</port>
+      </destination>
+      <floating>1</floating>
+      <quick>yes</quick>
+      <descr>Gitea HTTP</descr>
+      <category/>
+      <associated-rule-id>nat_64a41d269dddd8.11732562</associated-rule-id>
+      <created>
+        <username>root@10.0.0.40</username>
+        <time>1688476966.6466</time>
+        <description>/firewall_nat_edit.php made changes</description>
+      </created>
+    </rule>
+    <rule>
+      <source>
+        <any>1</any>
+      </source>
+      <interface>lan,wan</interface>
+      <statetype>keep state</statetype>
+      <protocol>tcp/udp</protocol>
+      <ipprotocol>inet</ipprotocol>
+      <destination>
+        <address>10.0.0.5</address>
+        <port>443</port>
+      </destination>
+      <floating>1</floating>
+      <quick>yes</quick>
+      <descr>Gitea HTTPS</descr>
+      <category/>
+      <associated-rule-id>nat_64a41d7112c513.18400326</associated-rule-id>
+      <created>
+        <username>root@10.0.0.40</username>
+        <time>1688477041.0769</time>
+        <description>/firewall_nat_edit.php made changes</description>
+      </created>
+    </rule>
+    <rule>
+      <source>
+        <any>1</any>
+      </source>
+      <interface>lan,wan</interface>
+      <statetype>keep state</statetype>
+      <protocol>tcp/udp</protocol>
+      <ipprotocol>inet</ipprotocol>
+      <destination>
+        <address>10.0.0.5</address>
+        <port>2222</port>
+      </destination>
+      <floating>1</floating>
+      <quick>yes</quick>
+      <descr>Gitea SSH</descr>
+      <category/>
+      <associated-rule-id>nat_64a41dafcab5c5.02730287</associated-rule-id>
+      <created>
+        <username>root@10.0.0.40</username>
+        <time>1688477103.8303</time>
+        <description>/firewall_nat_edit.php made changes</description>
+      </created>
+    </rule>
   </filter>
   <rrd>
     <enable/>
@@ -700,9 +867,9 @@
     <column_count>2</column_count>
   </widgets>
   <revision>
-    <username>root@10.0.0.161</username>
-    <time>1688458125.1867</time>
-    <description>/services_dhcp_edit.php made changes</description>
+    <username>root@10.0.0.40</username>
+    <time>1688477107.4679</time>
+    <description>/firewall_nat.php made changes</description>
   </revision>
   <OPNsense>
     <IPsec version="1.0.1">

Difference between config changes

What's strange is that there's very little changed between the two.

When specifying WAN alone, the rules end up in Firewall -> Rules -> WAN. When specifying WAN plus LAN, the same rules get shown in Firewall -> Rules -> Floating. I don't know if that's expected, or what "floating" means exactly - I was perhaps expecting to see some rules under both WAN and LAN, but other than appearing in a slightly different place, the rules seem the same.

The later config has <natreflection>purenat</natreflection> added to each of the rules, and the rules also say they are "floating" and "quick". They also obviously specify lan,wan and not just wan. But these differences do not appear to have actually changed anything, and I am not sure if they are needed. What is clear is that either a) the rules in place are not working correctly, or b) there are some missing.

Expected behavior

My expectation was as follows:

  • Given that I have a working public IP of 1.2.3.4, and a DNS entry defined against a domain, pointing at it, external traffic should see it and route to it. This does indeed happen, correctly, and external clients can use the Gitea system without issue.
  • Given that I have set up NAT reflection, internal traffic directed toward the WAN address of 1.2.3.4 should be translated and redirected to 10.0.0.5. This is not happening.
  • Given that I have moreover specified the rules to apply to the LAN interface (but not address) as well as the WAN interface, this should definitely be enough to route/redirect all internal traffic sent to the WAN address. This is not happening.

Additionally, we could say that I expect more rules - but I am not 100% sure of that, as it may be that the ones in place are not working correctly.

Describe alternatives you considered

I have considered using unbound DNS in order to get ahead of the requests and make the clients send those requests to the internal 10.0.0.5 IP instead of the external IP. However, this is not particularly practicable, as a) there are various TLS certificates and similar in place that complicate that approach, and b) there are some setups (moving away from the simple Gitea setup here) that need to correctly use proper DNS as part of their automated testing. So this is not really a solution for me.

Another alternative might be to go back to my EdgeRouter, which worked fine... or, maybe, swap over to pfSense, which apparently does not have this specific issue. But I am reluctant to do either of those things.

Screenshots

There are some screenshots that might be useful; not sure how much they add to the steps and config above, but here goes:

List of rules: Firewall -> NAT -> Port Forward

image

Rule configuration: Firewall -> NAT -> Port Forward -> (Rule)

image

List of rules: Firewall -> Rules -> Floating

image

Main/global settings: Firewall -> Settings -> Advanced

image

Relevant log files

I'm not sure what log files would be of use here - I have trawled through what is available but not seen anything notable to submit. Please let me know if you would like something specific.

Additional context

It's worth noting that this is a brand-new setup on a brand-new system. It's not an upgrade, it's not a port or import of settings from elsewhere, and it doesn't have anything unusual going on - i.e. it should not be an edge case. It's just bog-standard port forwarding. Therefore it should hopefully be easier to validate than situations where there are additional factors taking effect.

Environment

Host system

  • Ubuntu Server 23.04
  • AMD Ryzen 9 5950X
  • 128GB RAM
  • NVMe drives using ZFS
  • KVM hypervisor

VM configured for OPNsense

  • OPNsense 23.1 OpenSSL AMD64s (using the ISO)
  • 4 CPU cores
  • 4GB RAM
  • UFS chosen for storage
  • LAN: bridge interface present on physical network (10GbE NIC)
  • WAN: PCI card passthrough (Intel 2.5GbE NIC)
  • Internet: PPPoE, gigabit FTTP
@AdSchellevis
Copy link
Member

What happens if you change WAN address to the actual address configured on the interface in Firewall → NAT → Port Forward, does that change anything?

@danwilliams
Copy link
Author

@AdSchellevis oh, good question - I don't know! The rule has a selectbox containing WAN address and as that's what the documentation said to use, that's what I've used. Looking at what's available, I think you must mean changing it to Single host or Network and then specifying the public IP, i.e. 1.2.3.4 - correct?

(short commercial break while I try that...)

That made no difference. Just attaching a couple of screenshots to ensure that I tried what you had in mind...?

image

image

...obviously I tried with my real public IP and not actually 1.2.3.4! But is that what you meant...?

@AdSchellevis
Copy link
Member

...obviously I tried with my real public IP and not actually 1.2.3.4! But is that what you meant...?

It was just a hunch, but if the result is the same, I would use the packet capture to test where traffic is going. Rules like " reply-to" and "route-to" (policy based routing) might interfere with the traffic. To debug the generated ruleset, you can inspect the contents of /tmp/rules.debug

@danwilliams
Copy link
Author

I will take a look at some packet capturing, and the rules debug file, but there are no policies in place. My entire setup is exactly as described above, so there is no other configuration to interfere with the routing here.

@mimugmail
Copy link
Member

Most interesting is, when you do a packet capture on your .5 with port 443 and connect from internal via your wan address, which source do you see?

@danwilliams
Copy link
Author

Okay, so I've been looking through all logs and whatnot to try to get more info on this.

@AdSchellevis see rules.debug below
@mimugmail see packet captures below

Firewall logging

Firstly, I enabled logging for the HTTPS rule, and then I checked Firewall -> Log Files -> Live View, and applied filters for dst=10.0.0.5 and dstport=443. I then tried hitting the URL from my browser.

Interesting, I saw entries in the log for my computer's IP 10.0.0.40 and the gateway 10.0.0.1.

Screenshot of live firewall logs

image

Screenshot of firewall log details for 10.0.0.40 -> 10.0.0.5

image

Screenshot firewall log details for 10.0.0.1 -> 10.0.0.5

image

I don't know if that looks correct or not.

Rules debug list

Next, looking at /tmp/rules.debug:

Contents of `/tmp/rules.debug`
set limit table-entries 1000000
set optimization normal
set timeout { adaptive.start 0, adaptive.end 0 }
set limit states 405000
set limit src-nodes 405000
set hostid 0x62630d0d


# Lockout tables
table <sshlockout> persist

# Other tables
table <virusprot>

# User Aliases
table <bogons>  persist
bogons = "<bogons>"
table <bogonsv6>  persist
bogonsv6 = "<bogonsv6>"
table <virusprot>  persist
virusprot = "<virusprot>"
table <sshlockout>  persist
sshlockout = "<sshlockout>"
table <__wan_network>  persist
__wan_network = "<__wan_network>"
table <__lan_network>  persist
__lan_network = "<__lan_network>"
table <__lo0_network>  persist
__lo0_network = "<__lo0_network>"
table <bogons> persist file "/usr/local/etc/bogons"
table <bogonsv6> persist file "/usr/local/etc/bogonsv6"

# Plugins tables

set loginterface vtnet0

set skip on pfsync0

scrub on vtnet0 all
scrub on pppoe0 all


# NAT Redirects
no nat proto carp all
no rdr proto carp all
# [prio: 200]
nat on pppoe0 inet from (vtnet0:network) to any port 500 -> (pppoe0:0) static-port # Automatic outbound rule
nat on pppoe0 inet from (lo0:network) to any port 500 -> (pppoe0:0) static-port # Automatic outbound rule
nat on pppoe0 inet from 127.0.0.0/8 to any port 500 -> (pppoe0:0) static-port # Automatic outbound rule
nat on pppoe0 inet from (vtnet0:network) to any -> (pppoe0:0) port 1024:65535 # Automatic outbound rule
nat on pppoe0 inet from (lo0:network) to any -> (pppoe0:0) port 1024:65535 # Automatic outbound rule
nat on pppoe0 inet from 127.0.0.0/8 to any -> (pppoe0:0) port 1024:65535 # Automatic outbound rule
# [prio: 300]
no rdr on vtnet0 proto tcp to {(vtnet0)} port {22} # Anti lockout, prevent redirects for protected ports to this interface ip
no rdr on vtnet0 proto tcp to {(vtnet0)} port {80} # Anti lockout, prevent redirects for protected ports to this interface ip
# [prio: 600]
rdr on vtnet0 inet proto {tcp udp} from {any} to {(pppoe0)} port {80} -> 10.0.0.5 port 80 # Gitea HTTP
nat on vtnet0 inet proto {tcp udp} from (vtnet0:network) to {10.0.0.5} port {80} -> (vtnet0) port 1024:65535 # Gitea HTTP
rdr on lo0 inet proto {tcp udp} from {any} to {(pppoe0)} port {80} -> 10.0.0.5 port 80 # Gitea HTTP
nat on lo0 inet proto {tcp udp} from (lo0:network) to {10.0.0.5} port {80} -> (lo0) port 1024:65535 # Gitea HTTP
rdr on pppoe0 inet proto {tcp udp} from {any} to {(pppoe0)} port {80} -> 10.0.0.5 port 80 # Gitea HTTP
nat on pppoe0 inet proto {tcp udp} from (pppoe0:network) to {10.0.0.5} port {80} -> (pppoe0) port 1024:65535 # Gitea HTTP
rdr on vtnet0 inet proto {tcp udp} from {any} to {(pppoe0)} port {80} -> 10.0.0.5 port 80 # Gitea HTTP
nat on vtnet0 inet proto {tcp udp} from (vtnet0:network) to {10.0.0.5} port {80} -> (vtnet0) port 1024:65535 # Gitea HTTP
rdr on lo0 inet proto {tcp udp} from {any} to {(pppoe0)} port {80} -> 10.0.0.5 port 80 # Gitea HTTP
nat on lo0 inet proto {tcp udp} from (lo0:network) to {10.0.0.5} port {80} -> (lo0) port 1024:65535 # Gitea HTTP
rdr log on vtnet0 inet proto {tcp udp} from {any} to {(pppoe0)} port {443} -> 10.0.0.5 port 443 # Gitea HTTPS
nat on vtnet0 inet proto {tcp udp} from (vtnet0:network) to {10.0.0.5} port {443} -> (vtnet0) port 1024:65535 # Gitea HTTPS
rdr log on lo0 inet proto {tcp udp} from {any} to {(pppoe0)} port {443} -> 10.0.0.5 port 443 # Gitea HTTPS
nat on lo0 inet proto {tcp udp} from (lo0:network) to {10.0.0.5} port {443} -> (lo0) port 1024:65535 # Gitea HTTPS
rdr log on pppoe0 inet proto {tcp udp} from {any} to {(pppoe0)} port {443} -> 10.0.0.5 port 443 # Gitea HTTPS
nat on pppoe0 inet proto {tcp udp} from (pppoe0:network) to {10.0.0.5} port {443} -> (pppoe0) port 1024:65535 # Gitea HTTPS
rdr log on vtnet0 inet proto {tcp udp} from {any} to {(pppoe0)} port {443} -> 10.0.0.5 port 443 # Gitea HTTPS
nat on vtnet0 inet proto {tcp udp} from (vtnet0:network) to {10.0.0.5} port {443} -> (vtnet0) port 1024:65535 # Gitea HTTPS
rdr log on lo0 inet proto {tcp udp} from {any} to {(pppoe0)} port {443} -> 10.0.0.5 port 443 # Gitea HTTPS
nat on lo0 inet proto {tcp udp} from (lo0:network) to {10.0.0.5} port {443} -> (lo0) port 1024:65535 # Gitea HTTPS
rdr on vtnet0 inet proto {tcp udp} from {any} to {(pppoe0)} port {22} -> 10.0.0.5 port 2222 # Gitea SSH
nat on vtnet0 inet proto {tcp udp} from (vtnet0:network) to {10.0.0.5} port {2222} -> (vtnet0) port 1024:65535 # Gitea SSH
rdr on lo0 inet proto {tcp udp} from {any} to {(pppoe0)} port {22} -> 10.0.0.5 port 2222 # Gitea SSH
nat on lo0 inet proto {tcp udp} from (lo0:network) to {10.0.0.5} port {2222} -> (lo0) port 1024:65535 # Gitea SSH
rdr on pppoe0 inet proto {tcp udp} from {any} to {(pppoe0)} port {22} -> 10.0.0.5 port 2222 # Gitea SSH
nat on pppoe0 inet proto {tcp udp} from (pppoe0:network) to {10.0.0.5} port {2222} -> (pppoe0) port 1024:65535 # Gitea SSH
rdr on vtnet0 inet proto {tcp udp} from {any} to {(pppoe0)} port {22} -> 10.0.0.5 port 2222 # Gitea SSH
nat on vtnet0 inet proto {tcp udp} from (vtnet0:network) to {10.0.0.5} port {2222} -> (vtnet0) port 1024:65535 # Gitea SSH
rdr on lo0 inet proto {tcp udp} from {any} to {(pppoe0)} port {22} -> 10.0.0.5 port 2222 # Gitea SSH
nat on lo0 inet proto {tcp udp} from (lo0:network) to {10.0.0.5} port {2222} -> (lo0) port 1024:65535 # Gitea SSH

antispoof log for vtnet0
antispoof log for pppoe0
# [prio: 1]
# pass in log quick on lo0 inet6 from {any} to {any} label "62bc9bf7ea7b56454e39925bfa2d5741" # Pass all loopback IPv6
# block in log quick inet6 from {any} to {any} label "0ec8294e29827da393c3bfad611eecbb" # Block all IPv6
block in log inet from {any} to {any} label "02f4bab031b57d1e30553ce08e0ec131" # Default deny / state violation rule
block in log inet6 from {any} to {any} label "02f4bab031b57d1e30553ce08e0ec131" # Default deny / state violation rule
pass in log quick inet6 proto ipv6-icmp from {any} to {any} icmp6-type {1,2,135,136} keep state label "1d245529367b2e34eeaff16086aeafe9" # IPv6 RFC4890 requirements (ICMP)
pass out log quick inet6 proto ipv6-icmp from {(self)} to {fe80::/10,ff02::/16} icmp6-type {128,129,133,134,135,136} keep state label "acdbb900b50d8fb4ae21ddfdc609ecf8" # IPv6 RFC4890 requirements (ICMP)
pass in log quick inet6 proto ipv6-icmp from {fe80::/10} to {fe80::/10,ff02::/16} icmp6-type {128,133,134,135,136} keep state label "42e9d787749713a849d8e92432efdfaa" # IPv6 RFC4890 requirements (ICMP)
pass in log quick inet6 proto ipv6-icmp from {ff02::/16} to {fe80::/10} icmp6-type {128,133,134,135,136} keep state label "8752fca75c6be992847ea984161bd3f1" # IPv6 RFC4890 requirements (ICMP)
pass in log quick inet6 proto ipv6-icmp from {::} to {ff02::/16} icmp6-type {128,133,134,135,136} keep state label "71dd196398b3f1da265dbd9dcad00e70" # IPv6 RFC4890 requirements (ICMP)
block in log quick inet proto {tcp udp} from {any} port {0} to {any} label "7b5bdc64d7ae74be1932f6764a591da5" # block all targeting port 0
block in log quick inet6 proto {tcp udp} from {any} port {0} to {any} label "7b5bdc64d7ae74be1932f6764a591da5" # block all targeting port 0
block in log quick inet proto {tcp udp} from {any} to {any} port {0} label "ae69f581dc429e3484a65f8ecd63baa5" # block all targeting port 0
block in log quick inet6 proto {tcp udp} from {any} to {any} port {0} label "ae69f581dc429e3484a65f8ecd63baa5" # block all targeting port 0
pass log quick proto carp from {any} to {ff02::12} label "3b14fa6f8072123bf7a59d2fd29cbec3" # CARP defaults
pass log quick proto carp from {any} to {224.0.0.18} label "8203357325e6f08a501a6dec36b19112" # CARP defaults
block in log quick proto tcp from {<sshlockout>} to {(self)} port {22} label "669143f420c3ab4118bcb0bf4b5fd823" # sshlockout
block in log quick proto tcp from {<sshlockout>} to {(self)} port {80} label "b523e02acc7c2758dc28e60501bc95c2" # sshlockout
block in log quick from {<virusprot>} to {any} label "8e367e2f9944d93137ae56d788c5d5e1" # virusprot overload table
pass in log quick on pppoe0 proto udp from {fe80::/10} port {546} to {fe80::/10} port {546} label "a6cd2cce1bc1d912f6258ef1f3fb07e1" # allow dhcpv6 client in WAN
pass in log quick on pppoe0 proto udp from {any} port {547} to {any} port {546} label "f7e4334c3e7dc4ba900c5780b828d4a3" # allow dhcpv6 client in WAN
pass out log quick on pppoe0 proto udp from {any} port {546} to {any} port {547} label "5ba1258fcaf073eff4060b40ff63044d" # allow dhcpv6 client in WAN
# [prio: 5]
# block in log quick on vtnet0 inet from {<bogons>} to {any} label "bf8a7b329d048c5183805d4f016fede9" # Block bogon IPv4 networks from LAN
# block in log quick on vtnet0 inet6 from {<bogonsv6>} to {any} label "14dde492ca55ec468310c537f693dc8f" # Block bogon IPv6 networks from LAN
# block in log quick on vtnet0 inet from {10.0.0.0/8,127.0.0.0/8,100.64.0.0/10,172.16.0.0/12,192.168.0.0/16} to {any} label "59eaa3b97b11c51ddfce6afe4f71eeb8" # Block private networks from LAN
# block in log quick on vtnet0 inet6 from {fc00::/7} to {any} label "b41015c9cba1b7ab9fa566f6ee78f58c" # Block private networks from LAN
# block in log quick on lo0 inet from {<bogons>} to {any} label "ea4c1d75c7d0d4ee589a59cc88870f11" # Block bogon IPv4 networks from Loopback
# block in log quick on lo0 inet6 from {<bogonsv6>} to {any} label "509540f44cde74df1d28e2bc76b0a691" # Block bogon IPv6 networks from Loopback
# block in log quick on lo0 inet from {10.0.0.0/8,127.0.0.0/8,100.64.0.0/10,172.16.0.0/12,192.168.0.0/16} to {any} label "9d59048c2ca76128e62ef15066bef954" # Block private networks from Loopback
# block in log quick on lo0 inet6 from {fc00::/7} to {any} label "e0abd0daa005c9bd545c57004e7c1603" # Block private networks from Loopback
block in log quick on pppoe0 inet from {<bogons>} to {any} label "b7cd97a164650b538506fb551a0369e7" # Block bogon IPv4 networks from WAN
block in log quick on pppoe0 inet6 from {<bogonsv6>} to {any} label "f140a48ddade668b9d6f5259669a1d5c" # Block bogon IPv6 networks from WAN
block in log quick on pppoe0 inet from {10.0.0.0/8,127.0.0.0/8,100.64.0.0/10,172.16.0.0/12,192.168.0.0/16} to {any} label "1eb94a38e58994641aff378c21d5984f" # Block private networks from WAN
block in log quick on pppoe0 inet6 from {fc00::/7} to {any} label "45afd72424c84d011c07957569151480" # Block private networks from WAN
pass in log quick on vtnet0 proto udp from {any} port {68} to {255.255.255.255} port {67} label "5168be2cca1e130b1ef2ac18161356a8" # allow access to DHCP server
pass in log quick on vtnet0 proto udp from {any} port {68} to {(self)} port {67} label "0b032d1bab91fc97e4a7faf03a7f17c3" # allow access to DHCP server
pass out log quick on vtnet0 proto udp from {(self)} port {67} to {any} port {68} label "5039e43005a9aa50eb032af274cc9aad" # allow access to DHCP server
pass in quick on lo0 from {any} to {any} no state label "7535c94082e72e2207679aadb26afd92" # pass loopback
pass out log from {any} to {any} keep state allow-opts label "fae559338f65e11c53669fc3642c93c2" # let out anything from firewall host itself
pass in log quick on vtnet0 proto tcp from {any} to {(self)} port {22 80} keep state label "535fb49265487de284cbc79f8048a934" # anti-lockout rule
# [prio: 100000]
pass out log route-to ( pppoe0 37.48.225.106 ) from {(pppoe0)} to {!(pppoe0:network)} keep state allow-opts label "ad8aa2ca822be4bf97efdf4f9d29e4ef" # let out anything from firewall host itself (force gw)
# [prio: 200000]
pass in quick on vtnet0 inet proto {tcp udp} from {any} to {10.0.0.5} port {80} keep state label "69751f0b7ea43b37113249b002f06557" # : Gitea HTTP
pass in quick on pppoe0 reply-to ( pppoe0 37.48.225.106 ) inet proto {tcp udp} from {any} to {10.0.0.5} port {80} keep state label "69751f0b7ea43b37113249b002f06557" # : Gitea HTTP
pass in log quick on vtnet0 inet proto {tcp udp} from {any} to {10.0.0.5} port {443} keep state label "f7db2aee39de7b375553e8a063ee393e" # : Gitea HTTPS
pass in log quick on pppoe0 reply-to ( pppoe0 37.48.225.106 ) inet proto {tcp udp} from {any} to {10.0.0.5} port {443} keep state label "f7db2aee39de7b375553e8a063ee393e" # : Gitea HTTPS
pass in quick on vtnet0 inet proto {tcp udp} from {any} to {10.0.0.5} port {2222} keep state label "554f6ac599b0d36ebc062960ff146ae6" # : Gitea SSH
pass in quick on pppoe0 reply-to ( pppoe0 37.48.225.106 ) inet proto {tcp udp} from {any} to {10.0.0.5} port {2222} keep state label "554f6ac599b0d36ebc062960ff146ae6" # : Gitea SSH
# [prio: 400000]
pass in quick on vtnet0 inet from {(vtnet0:network)} to {any} label "c9fb95f8275a8a73ce4a68190ed7bc51" # : Default allow LAN to any rule
pass in quick on vtnet0 inet6 from {(vtnet0:network),fe80::/10} to {any} label "e0d7d87c02c29ff98108738507811fec" # : Default allow LAN IPv6 to any rule

There's nothing sensitive in there, so that's a complete and unmodified dump.

Packet capturing

Moving on, I ran the following command on the host machine being forwarded to:

sudo tcpdump --interface br0 host 10.0.0.5 and tcp port 443

Note, my actual command had some greps in there too, to filter out external traffic (as the system is in active use). Once I had eliminated those packets, this is what I saw:

Packets appearing 20 seconds after the browser request was made
09:47:08.536944 IP proxy.lan.https > router.lan.47454: Flags [S.], seq 535307004, ack 3368579723, win 64240, options [mss 1460,nop,nop,sackOK,nop,wscale 7], length 0
09:47:08.538434 IP proxy.lan.https > router.lan.47454: Flags [.], ack 518, win 501, length 0
09:47:08.539802 IP proxy.lan.https > router.lan.47454: Flags [P.], seq 1:4277, ack 518, win 501, length 4276
09:47:08.558316 IP proxy.lan.https > router.lan.47454: Flags [P.], seq 2921:4277, ack 518, win 501, length 1356
09:47:08.758501 IP proxy.lan.https > router.lan.47454: Flags [.], ack 518, win 501, options [nop,nop,sack 1 {1:518}], length 0
09:47:08.765854 IP proxy.lan.https > router.lan.47454: Flags [.], seq 1:1461, ack 518, win 501, length 1460
09:47:09.068122 IP proxy.lan.https > router.lan.47454: Flags [.], ack 518, win 501, options [nop,nop,sack 1 {1:518}], length 0
09:47:09.179834 IP proxy.lan.https > router.lan.47454: Flags [.], seq 1:1461, ack 518, win 501, length 1460
09:47:09.670436 IP proxy.lan.https > router.lan.47454: Flags [.], ack 518, win 501, options [nop,nop,sack 1 {1:518}], length 0
09:47:10.011672 IP proxy.lan.https > router.lan.47454: Flags [.], seq 1:1461, ack 518, win 501, length 1460
09:47:10.871285 IP proxy.lan.https > router.lan.47454: Flags [.], ack 518, win 501, options [nop,nop,sack 1 {1:518}], length 0
09:47:11.675968 IP proxy.lan.https > router.lan.47454: Flags [.], seq 1:1461, ack 518, win 501, length 1460
09:47:13.277001 IP proxy.lan.https > router.lan.47454: Flags [.], ack 518, win 501, options [nop,nop,sack 1 {1:518}], length 0
09:47:15.131833 IP proxy.lan.https > router.lan.47454: Flags [.], seq 1:1461, ack 518, win 501, length 1460
09:47:18.081920 IP proxy.lan.https > router.lan.47454: Flags [.], ack 518, win 501, options [nop,nop,sack 1 {1:518}], length 0
Packets appearing 26 seconds after the browser request was made
09:47:18.546144 IP proxy.lan.https > router.lan.47454: Flags [F.], seq 4277, ack 518, win 501, length 0
09:47:21.788105 IP proxy.lan.https > router.lan.47454: Flags [.], seq 1:1461, ack 518, win 501, length 1460
09:47:27.685338 IP proxy.lan.https > router.lan.60843: Flags [S.], seq 2400529997, ack 2834759019, win 64240, options [mss 1460,nop,nop,sackOK,nop,wscale 7], length 0
09:47:27.686830 IP proxy.lan.https > router.lan.60843: Flags [.], ack 518, win 501, length 0
09:47:27.688031 IP proxy.lan.https > router.lan.60843: Flags [P.], seq 1:4278, ack 518, win 501, length 4277
09:47:27.705814 IP proxy.lan.https > router.lan.60843: Flags [P.], seq 2921:4278, ack 518, win 501, length 1357
09:47:27.900193 IP proxy.lan.https > router.lan.60843: Flags [.], ack 518, win 501, options [nop,nop,sack 1 {1:518}], length 0
09:47:27.914337 IP proxy.lan.https > router.lan.60843: Flags [.], seq 1:1461, ack 518, win 501, length 1460
09:47:28.207856 IP proxy.lan.https > router.lan.60843: Flags [.], ack 518, win 501, options [nop,nop,sack 1 {1:518}], length 0
09:47:28.348124 IP proxy.lan.https > router.lan.60843: Flags [.], seq 1:1461, ack 518, win 501, length 1460
09:47:28.810009 IP proxy.lan.https > router.lan.60843: Flags [.], ack 518, win 501, options [nop,nop,sack 1 {1:518}], length 0
09:47:29.179928 IP proxy.lan.https > router.lan.60843: Flags [.], seq 1:1461, ack 518, win 501, length 1460
09:47:30.011401 IP proxy.lan.https > router.lan.60843: Flags [.], ack 518, win 501, options [nop,nop,sack 1 {1:518}], length 0
09:47:30.844163 IP proxy.lan.https > router.lan.60843: Flags [.], seq 1:1461, ack 518, win 501, length 1460
09:47:32.415738 IP proxy.lan.https > router.lan.60843: Flags [.], ack 518, win 501, options [nop,nop,sack 1 {1:518}], length 0
Packets appearing a little while after
09:47:34.331691 IP proxy.lan.https > router.lan.60843: Flags [.], seq 1:1461, ack 518, win 501, length 1460
09:47:37.228509 IP proxy.lan.https > router.lan.60843: Flags [.], ack 518, win 501, options [nop,nop,sack 1 {1:518}], length 0
09:47:37.695840 IP proxy.lan.https > router.lan.60843: Flags [F.], seq 4278, ack 518, win 501, length 0
09:47:40.987916 IP proxy.lan.https > router.lan.60843: Flags [.], seq 1:1461, ack 518, win 501, length 1460

Those last few lines appeared while I was writing this up, so I re-ran it again:

Packets appearing 8 seconds after the browser request was made
09:52:24.161612 IP proxy.lan.https > router.lan.12369: Flags [S.], seq 2387431960, ack 3664031273, win 64240, options [mss 1460,nop,nop,sackOK,nop,wscale 7], length 0
09:52:24.162716 IP proxy.lan.https > router.lan.12369: Flags [.], ack 518, win 501, length 0
09:52:24.164085 IP proxy.lan.https > router.lan.12369: Flags [P.], seq 1:4278, ack 518, win 501, length 4277
09:52:24.181872 IP proxy.lan.https > router.lan.12369: Flags [P.], seq 2921:4278, ack 518, win 501, length 1357
09:52:24.378410 IP proxy.lan.https > router.lan.12369: Flags [.], ack 518, win 501, options [nop,nop,sack 1 {1:518}], length 0
09:52:24.390240 IP proxy.lan.https > router.lan.12369: Flags [.], seq 1:1461, ack 518, win 501, length 1460
09:52:24.687185 IP proxy.lan.https > router.lan.12369: Flags [.], ack 518, win 501, options [nop,nop,sack 1 {1:518}], length 0
09:52:24.827533 IP proxy.lan.https > router.lan.12369: Flags [.], seq 1:1461, ack 518, win 501, length 1460
09:52:25.300890 IP proxy.lan.https > router.lan.12369: Flags [.], ack 518, win 501, options [nop,nop,sack 1 {1:518}], length 0
09:52:25.659922 IP proxy.lan.https > router.lan.12369: Flags [.], seq 1:1461, ack 518, win 501, length 1460
09:52:26.503470 IP proxy.lan.https > router.lan.12369: Flags [.], ack 518, win 501, options [nop,nop,sack 1 {1:518}], length 0
09:52:27.324108 IP proxy.lan.https > router.lan.12369: Flags [.], seq 1:1461, ack 518, win 501, length 1460
09:52:28.907809 IP proxy.lan.https > router.lan.12369: Flags [.], ack 518, win 501, options [nop,nop,sack 1 {1:518}], length 0
Packets appearing 28 seconds after the browser request was made
09:52:30.780134 IP proxy.lan.https > router.lan.12369: Flags [.], seq 1:1461, ack 518, win 501, length 1460
09:52:33.716320 IP proxy.lan.https > router.lan.12369: Flags [.], ack 518, win 501, options [nop,nop,sack 1 {1:518}], length 0
09:52:34.167349 IP proxy.lan.https > router.lan.12369: Flags [F.], seq 4278, ack 518, win 501, length 0
09:52:37.436187 IP proxy.lan.https > router.lan.12369: Flags [.], seq 1:1461, ack 518, win 501, length 1460
09:52:43.318333 IP proxy.lan.https > router.lan.30761: Flags [S.], seq 139268357, ack 1391950047, win 64240, options [mss 1460,nop,nop,sackOK,nop,wscale 7], length 0
09:52:43.320258 IP proxy.lan.https > router.lan.30761: Flags [.], ack 518, win 501, length 0
09:52:43.321432 IP proxy.lan.https > router.lan.30761: Flags [P.], seq 1:4278, ack 518, win 501, length 4277
09:52:43.337967 IP proxy.lan.https > router.lan.30761: Flags [P.], seq 2921:4278, ack 518, win 501, length 1357
Packets appearing 47 seconds after the browser request was made
09:52:43.533784 IP proxy.lan.https > router.lan.30761: Flags [.], ack 518, win 501, options [nop,nop,sack 1 {1:518}], length 0
09:52:43.546252 IP proxy.lan.https > router.lan.30761: Flags [.], seq 1:1461, ack 518, win 501, length 1460
09:52:43.842000 IP proxy.lan.https > router.lan.30761: Flags [.], ack 518, win 501, options [nop,nop,sack 1 {1:518}], length 0
09:52:43.963530 IP proxy.lan.https > router.lan.30761: Flags [.], seq 1:1461, ack 518, win 501, length 1460
09:52:44.444922 IP proxy.lan.https > router.lan.30761: Flags [.], ack 518, win 501, options [nop,nop,sack 1 {1:518}], length 0
09:52:44.795630 IP proxy.lan.https > router.lan.30761: Flags [.], seq 1:1461, ack 518, win 501, length 1460
09:52:45.647497 IP proxy.lan.https > router.lan.30761: Flags [.], ack 518, win 501, options [nop,nop,sack 1 {1:518}], length 0
09:52:46.459907 IP proxy.lan.https > router.lan.30761: Flags [.], seq 1:1461, ack 518, win 501, length 1460
09:52:48.051752 IP proxy.lan.https > router.lan.30761: Flags [.], ack 518, win 501, options [nop,nop,sack 1 {1:518}], length 0
09:52:49.980120 IP proxy.lan.https > router.lan.30761: Flags [.], seq 1:1461, ack 518, win 501, length 1460
09:52:52.858188 IP proxy.lan.https > router.lan.30761: Flags [.], ack 518, win 501, options [nop,nop,sack 1 {1:518}], length 0
09:52:53.323900 IP proxy.lan.https > router.lan.30761: Flags [F.], seq 4278, ack 518, win 501, length 0
09:52:56.636167 IP proxy.lan.https > router.lan.30761: Flags [.], seq 1:1461, ack 518, win 501, length 1460

Running it again several times, it seems that the most common pattern is for packets to appear at around 20, 28, and 48 seconds, but it does vary.

So to answer the specific question, it seems that the OPNsense system is the one seen by 10.0.0.5, as that's router.lan. I don't see anything from 10.0.0.40 at all. I'm not sure what the packets above are for, but they do seem related as none appear unless I've made a request.

Going back to the firewall live logs, it seems that both the OPNsense system (router.lan on 10.0.0.1) and my PC (on 10.0.0.40) are attempting to reach 10.0.0.5, but only the OPNsense system gets through. Which makes zero sense, as I can access 10.0.0.5 direct just fine - for instance by putting the IP in my hosts file.

Does any of that help? Is there anything else I can provide?

@danwilliams
Copy link
Author

Just a quick addendum - after I realised the image available for download is not the latest version, I ran an update and repeated my setup and checks (i.e. I rolled back my ZFS image to before the port forwarding setup, updated the system, and then added the rules afresh).

The behaviour did not change - i.e. the problem is still extant.

Current version

image

@kub3let
Copy link

kub3let commented Jul 5, 2023

I'm having the same/similar issue, installed a fresh OPNsense with latest ISO yesterday.

Setting up port forwarding wit associated rule has no effect.

The rule exists on the rules tab but it has no effect ?!

Setting Filter rule association to pass instead of Add associated filter rule makes it work.

Also replicating the rule that OPNsense automatically creates using floating rules does work too.

I never had to do this before and on my main OPNsense it's still working with the automatically created rules.

@danwilliams
Copy link
Author

danwilliams commented Jul 5, 2023

@kub3let that's very interesting. I edited my rules and changed the "Filter rule association" to "Pass" as you suggest, but although that removed the floating rules, it did not result in any change of behaviour for me. Is there anything else that is different about your setup?

You mention that replicating the automatic OPNsense-generated floating rule also works, so I tried adding one. Does this look right?

Screenshot of manually-generated floating rule

image

You can see one of them (HTTPS) in the list below, with the automatically-generated rules:

Screenshot of floating rules list with two automatic and one manual rule

image

Unfortunately, this still didn't make any difference.

I tried fiddling around a bit, but with no joy. I'm not sure whether the floating rules are at fault, or the NAT ones.

@mimugmail
Copy link
Member

Without a packet capture its just a guessing game :(

@danwilliams
Copy link
Author

@mimugmail I gave you two separate sets of packet captures. Did I not provide enough information? Please let me know how to get what you need, if so.

@kub3let
Copy link

kub3let commented Jul 5, 2023

@danwilliams I'm not using nat reflection but I think the issue could still be the same, I prefer split dns over reflection.

So all I did was a generic OPNsense install, configure DHCP etc. then setup port forwarding e.g.
Firewall -> NAT -> Port Forwarding -> Add
Interface: WAN
Protocol: TCP
Destination: WAN Address
Destination Port: 2222
Redirect target IP: testvm (alias of 10.123.0.100)
Redirect target port: 22

This results in the following which previously worked:
image
image

But the Rule has no effect...

If I create a floating rule like this it starts working:
image

--> I can not select any interface on the floating rule or it stops working, even selecting all available interfaces does not work ! <-- Really weird bug, seems like the Interface selection is broken !

@danwilliams please remove the interface selection on your floating rule. But I think using pass on the port forwarding should still have bypassed it for you.

@AdSchellevis
Copy link
Member

@danwilliams

@mimugmail I gave you two separate sets of packet captures. Did I not provide enough information? Please let me know how to get what you need, if so.

In order to provide feedback more easily, I would like to suggest a couple of things, also to limit the noise in the thread:

  1. replace the "wan ip" for the actual ip,
  2. use a single telnet to do the "port knock" so you have a limited set of packets which relate to the flow
  3. tcpdump without hostnames for easier validation (-n) from the target you are trying to reach (e.g. if 10.0.0.40 tries to access 8.8.8.8 which reflects to 10.0.0.5, look at 10.0.0.5 first)

The questions you are trying to answer are basically:

  1. is the request arriving at the target]
  2. is het using the correct source address (the firewalls)

Ohw, and by the way, to use reflection, don't select the lan interface if your waiting for an automatic (reflection) rule there, I think I saw lan+wan in one of your screenshots.

(you can replace the actual ip with everything you post as long as it's consistent)

@AdSchellevis AdSchellevis added the support Community support label Jul 5, 2023
@danwilliams
Copy link
Author

@kub3let I'm not sure you're talking about the same thing as me, after all?

I'm not using nat reflection but I think the issue could still be the same, I prefer split dns over reflection.

As noted in my original bug report, port forwarding is working fine, including with split DNS. I.e. external clients can connect without issue. My problem is that internal NAT reflection is not working.

Therefore I'm afraid your problem must be different to mine.

@danwilliams
Copy link
Author

@AdSchellevis Thanks for your reply, I've given that a go...

  1. replace the "wan ip" for the actual ip,
  2. use a single telnet to do the "port knock" so you have a limited set of packets which relate to the flow
  3. tcpdump without hostnames for easier validation (-n) from the target you are trying to reach (e.g. if 10.0.0.40 tries to access 8.8.8.8 which reflects to 10.0.0.5, look at 10.0.0.5 first)

Happy to do that. Note, I had already tried the actual WAN IP in place of "WAN address", which made no difference, but I understand if that makes testing easier for you. So I have changed it again, in the same way as I tried on Tuesday. I've actually tried a few things...

Note: I tried all these tests purely with HTTP over port 80, to not obscure anything with encryption.

1. Interface: WAN+LAN, Destination: WAN address

  • This is how things are set up in the latest configuration detailed in my original bug report
  • The interface of WAN+LAN is due to several comments suggesting reflection would not work otherwise
First test: telnet knock
telnet 1.2.3.4 80
Trying 1.2.3.4...
Connected to 1.2.3.4.
Escape character is '^]'.
sudo tcpdump --interface br0 -n host 10.0.0.5 and tcp port 80
tcpdump: verbose output suppressed, use -v[v]... for full protocol decode
listening on br0, link-type EN10MB (Ethernet), snapshot length 262144 bytes
11:50:24.832948 IP 10.0.0.5.80 > 10.0.0.1.25765: Flags [S.], seq 2756379437, ack 3133918894, win 65160, options [mss 1460,sackOK,TS val 462937176 ecr 1299734193,nop,wscale 7], length 0
1 packet captured
1 packet received by filter
0 packets dropped by kernel
Second test: telnet message
telnet 1.2.3.4 80
Trying 1.2.3.4...
Connected to 1.2.3.4.
Escape character is '^]'.
hello
sudo tcpdump --interface br0 -n host 10.0.0.5 and tcp port 80
tcpdump: verbose output suppressed, use -v[v]... for full protocol decode
listening on br0, link-type EN10MB (Ethernet), snapshot length 262144 bytes
11:54:25.937784 IP 10.0.0.5.80 > 10.0.0.1.61549: Flags [S.], seq 2707536489, ack 53393574, win 65160, options [mss 1460,sackOK,TS val 463178281 ecr 1299975301,nop,wscale 7], length 0
11:54:27.784961 IP 10.0.0.5.80 > 10.0.0.1.61549: Flags [F.], seq 208, ack 8, win 510, options [nop,nop,TS val 463180128 ecr 1299977148], length 0
11:54:27.784961 IP 10.0.0.5.80 > 10.0.0.1.61549: Flags [P.], seq 1:208, ack 8, win 510, options [nop,nop,TS val 463180128 ecr 1299977148], length 207: HTTP: HTTP/1.1 400 Bad request
11:54:27.802034 IP 10.0.0.5.80 > 10.0.0.1.61549: Flags [P.], seq 1:208, ack 8, win 510, options [nop,nop,TS val 463180147 ecr 1299977151], length 207: HTTP: HTTP/1.1 400 Bad request
11:54:28.011586 IP 10.0.0.5.80 > 10.0.0.1.61549: Flags [P.], seq 1:208, ack 8, win 510, options [nop,nop,TS val 463180355 ecr 1299977151], length 207: HTTP: HTTP/1.1 400 Bad request
11:54:28.443708 IP 10.0.0.5.80 > 10.0.0.1.61549: Flags [P.], seq 1:208, ack 8, win 510, options [nop,nop,TS val 463180787 ecr 1299977151], length 207: HTTP: HTTP/1.1 400 Bad request
11:54:29.275828 IP 10.0.0.5.80 > 10.0.0.1.61549: Flags [P.], seq 1:208, ack 8, win 510, options [nop,nop,TS val 463181619 ecr 1299977151], length 207: HTTP: HTTP/1.1 400 Bad request
7 packets captured
7 packets received by filter
0 packets dropped by kernel
Third test: wget
wget http://1.2.3.4
--2023-07-06 12:55:52--  http://1.2.3.4/
Connecting to 1.2.3.4:80... connected.
HTTP request sent, awaiting response...
sudo tcpdump --interface br0 -n host 10.0.0.5 and tcp port 80
tcpdump: verbose output suppressed, use -v[v]... for full protocol decode
listening on br0, link-type EN10MB (Ethernet), snapshot length 262144 bytes
11:55:51.857660 IP 10.0.0.5.80 > 10.0.0.1.46722: Flags [S.], seq 4245683333, ack 3502492172, win 65160, options [mss 1460,sackOK,TS val 463264201 ecr 1300061221,nop,wscale 7], length 0
11:55:51.859854 IP 10.0.0.5.80 > 10.0.0.1.46722: Flags [P.], seq 1:235, ack 130, win 509, options [nop,nop,TS val 463264205 ecr 1300061225], length 234: HTTP: HTTP/1.1 200 OK
11:55:52.068110 IP 10.0.0.5.80 > 10.0.0.1.46722: Flags [P.], seq 1:235, ack 130, win 509, options [nop,nop,TS val 463264411 ecr 1300061225], length 234: HTTP: HTTP/1.1 200 OK
11:55:52.079357 IP 10.0.0.5.80 > 10.0.0.1.46722: Flags [.], ack 130, win 509, options [nop,nop,TS val 463264424 ecr 1300061445,nop,nop,sack 1 {1:130}], length 0
11:55:52.275977 IP 10.0.0.5.80 > 10.0.0.1.46722: Flags [P.], seq 1:235, ack 130, win 509, options [nop,nop,TS val 463264619 ecr 1300061445], length 234: HTTP: HTTP/1.1 200 OK
11:55:52.299380 IP 10.0.0.5.80 > 10.0.0.1.46722: Flags [.], ack 130, win 509, options [nop,nop,TS val 463264644 ecr 1300061665,nop,nop,sack 1 {1:130}], length 0
11:55:52.700095 IP 10.0.0.5.80 > 10.0.0.1.46722: Flags [P.], seq 1:235, ack 130, win 509, options [nop,nop,TS val 463265043 ecr 1300061665], length 234: HTTP: HTTP/1.1 200 OK
11:55:52.729379 IP 10.0.0.5.80 > 10.0.0.1.46722: Flags [.], ack 130, win 509, options [nop,nop,TS val 463265074 ecr 1300062095,nop,nop,sack 1 {1:130}], length 0
11:55:53.532238 IP 10.0.0.5.80 > 10.0.0.1.46722: Flags [P.], seq 1:235, ack 130, win 509, options [nop,nop,TS val 463265875 ecr 1300062095], length 234: HTTP: HTTP/1.1 200 OK
11:55:53.579190 IP 10.0.0.5.80 > 10.0.0.1.46722: Flags [.], ack 130, win 509, options [nop,nop,TS val 463265924 ecr 1300062945,nop,nop,sack 1 {1:130}], length 0
10 packets captured
10 packets received by filter
0 packets dropped by kernel

2. Interface: WAN only, Destination: WAN address

  • This is how I had thought things should be set up by reading documentation
First test: telnet knock
telnet 1.2.3.4 80
Trying 1.2.3.4...
Connected to 1.2.3.4.
Escape character is '^]'.
sudo tcpdump --interface br0 -n host 10.0.0.5 and tcp port 80
tcpdump: verbose output suppressed, use -v[v]... for full protocol decode
listening on br0, link-type EN10MB (Ethernet), snapshot length 262144 bytes
12:01:27.319029 IP 10.0.0.5.80 > 10.0.0.1.8138: Flags [S.], seq 1915928963, ack 2325731977, win 65160, options [mss 1460,sackOK,TS val 463599662 ecr 1300396687,nop,wscale 7], length 0
1 packet captured
1 packet received by filter
0 packets dropped by kernel
Second test: telnet message
telnet 1.2.3.4 80
Trying 1.2.3.4...
Connected to 1.2.3.4.
Escape character is '^]'.
hello
sudo tcpdump --interface br0 -n host 10.0.0.5 and tcp port 80
tcpdump: verbose output suppressed, use -v[v]... for full protocol decode
listening on br0, link-type EN10MB (Ethernet), snapshot length 262144 bytes
12:02:23.803817 IP 10.0.0.5.80 > 10.0.0.1.42086: Flags [S.], seq 2635180141, ack 1556976516, win 65160, options [mss 1460,sackOK,TS val 463656147 ecr 1300453172,nop,wscale 7], length 0
12:02:25.284888 IP 10.0.0.5.80 > 10.0.0.1.42086: Flags [F.], seq 208, ack 8, win 510, options [nop,nop,TS val 463657628 ecr 1300454653], length 0
12:02:25.284889 IP 10.0.0.5.80 > 10.0.0.1.42086: Flags [P.], seq 1:208, ack 8, win 510, options [nop,nop,TS val 463657628 ecr 1300454653], length 207: HTTP: HTTP/1.1 400 Bad request
12:02:25.301920 IP 10.0.0.5.80 > 10.0.0.1.42086: Flags [P.], seq 1:208, ack 8, win 510, options [nop,nop,TS val 463657647 ecr 1300454657], length 207: HTTP: HTTP/1.1 400 Bad request
12:02:25.515574 IP 10.0.0.5.80 > 10.0.0.1.42086: Flags [P.], seq 1:208, ack 8, win 510, options [nop,nop,TS val 463657859 ecr 1300454657], length 207: HTTP: HTTP/1.1 400 Bad request
12:02:25.951483 IP 10.0.0.5.80 > 10.0.0.1.42086: Flags [P.], seq 1:208, ack 8, win 510, options [nop,nop,TS val 463658295 ecr 1300454657], length 207: HTTP: HTTP/1.1 400 Bad request
12:02:26.780328 IP 10.0.0.5.80 > 10.0.0.1.42086: Flags [P.], seq 1:208, ack 8, win 510, options [nop,nop,TS val 463659123 ecr 1300454657], length 207: HTTP: HTTP/1.1 400 Bad request
7 packets captured
7 packets received by filter
0 packets dropped by kernel
Third test: wget
wget http://1.2.3.4
--2023-07-06 13:03:07--  http://1.2.3.4/
Connecting to 1.2.3.4... connected.
HTTP request sent, awaiting response...
sudo tcpdump --interface br0 -n host 10.0.0.5 and tcp port 80
tcpdump: verbose output suppressed, use -v[v]... for full protocol decode
listening on br0, link-type EN10MB (Ethernet), snapshot length 262144 bytes
12:03:07.256634 IP 10.0.0.5.80 > 10.0.0.1.38149: Flags [S.], seq 2326959776, ack 3724833973, win 65160, options [mss 1460,sackOK,TS val 463699600 ecr 1300496626,nop,wscale 7], length 0
12:03:07.258590 IP 10.0.0.5.80 > 10.0.0.1.38149: Flags [P.], seq 1:235, ack 130, win 509, options [nop,nop,TS val 463699604 ecr 1300496629], length 234: HTTP: HTTP/1.1 200 OK
12:03:07.467915 IP 10.0.0.5.80 > 10.0.0.1.38149: Flags [P.], seq 1:235, ack 130, win 509, options [nop,nop,TS val 463699811 ecr 1300496629], length 234: HTTP: HTTP/1.1 200 OK
12:03:07.474221 IP 10.0.0.5.80 > 10.0.0.1.38149: Flags [.], ack 130, win 509, options [nop,nop,TS val 463699819 ecr 1300496845,nop,nop,sack 1 {1:130}], length 0
12:03:07.675976 IP 10.0.0.5.80 > 10.0.0.1.38149: Flags [P.], seq 1:235, ack 130, win 509, options [nop,nop,TS val 463700019 ecr 1300496845], length 234: HTTP: HTTP/1.1 200 OK
12:03:07.694254 IP 10.0.0.5.80 > 10.0.0.1.38149: Flags [.], ack 130, win 509, options [nop,nop,TS val 463700039 ecr 1300497065,nop,nop,sack 1 {1:130}], length 0
12:03:08.092024 IP 10.0.0.5.80 > 10.0.0.1.38149: Flags [P.], seq 1:235, ack 130, win 509, options [nop,nop,TS val 463700435 ecr 1300497065], length 234: HTTP: HTTP/1.1 200 OK
12:03:08.124178 IP 10.0.0.5.80 > 10.0.0.1.38149: Flags [.], ack 130, win 509, options [nop,nop,TS val 463700469 ecr 1300497495,nop,nop,sack 1 {1:130}], length 0
12:03:08.923655 IP 10.0.0.5.80 > 10.0.0.1.38149: Flags [P.], seq 1:235, ack 130, win 509, options [nop,nop,TS val 463701267 ecr 1300497495], length 234: HTTP: HTTP/1.1 200 OK
12:03:09.014235 IP 10.0.0.5.80 > 10.0.0.1.38149: Flags [.], ack 130, win 509, options [nop,nop,TS val 463701359 ecr 1300498385,nop,nop,sack 1 {1:130}], length 0
12:03:10.587844 IP 10.0.0.5.80 > 10.0.0.1.38149: Flags [P.], seq 1:235, ack 130, win 509, options [nop,nop,TS val 463702931 ecr 1300498385], length 234: HTTP: HTTP/1.1 200 OK
12:03:10.775502 IP 10.0.0.5.80 > 10.0.0.1.38149: Flags [.], ack 130, win 509, options [nop,nop,TS val 463703119 ecr 1300500145,nop,nop,sack 1 {1:130}], length 0
^C
12 packets captured
12 packets received by filter
0 packets dropped by kernel

3. Interface: WAN only, Destination: Single host or network 1.2.3.4/32

  • This is what I tried based on your suggestion
First test: telnet knock
telnet 1.2.3.4 80
Trying 1.2.3.4...
Connected to 1.2.3.4.
Escape character is '^]'.
sudo tcpdump --interface br0 -n host 10.0.0.5 and tcp port 80
tcpdump: verbose output suppressed, use -v[v]... for full protocol decode
listening on br0, link-type EN10MB (Ethernet), snapshot length 262144 bytes
12:06:00.801909 IP 10.0.0.5.80 > 10.0.0.1.18975: Flags [S.], seq 3241085094, ack 178002942, win 65160, options [mss 1460,sackOK,TS val 463873145 ecr 1300670173,nop,wscale 7], length 0
1 packet captured
1 packet received by filter
0 packets dropped by kernel
Second test: telnet message
telnet 1.2.3.4 80
Trying 1.2.3.4...
Connected to 1.2.3.4.
Escape character is '^]'.
hello
sudo tcpdump --interface br0 -n host 10.0.0.5 and tcp port 80
tcpdump: verbose output suppressed, use -v[v]... for full protocol decode
listening on br0, link-type EN10MB (Ethernet), snapshot length 262144 bytes
12:06:41.694656 IP 10.0.0.5.80 > 10.0.0.1.50053: Flags [S.], seq 3619393819, ack 1454353810, win 65160, options [mss 1460,sackOK,TS val 463914038 ecr 1300711066,nop,wscale 7], length 0
12:06:43.284804 IP 10.0.0.5.80 > 10.0.0.1.50053: Flags [P.], seq 1:208, ack 8, win 510, options [nop,nop,TS val 463915628 ecr 1300712656], length 207: HTTP: HTTP/1.1 400 Bad request
12:06:43.284805 IP 10.0.0.5.80 > 10.0.0.1.50053: Flags [F.], seq 208, ack 8, win 510, options [nop,nop,TS val 463915628 ecr 1300712656], length 0
12:06:43.302251 IP 10.0.0.5.80 > 10.0.0.1.50053: Flags [F.], seq 208, ack 8, win 510, options [nop,nop,TS val 463915647 ecr 1300712656], length 0
12:06:43.502766 IP 10.0.0.5.80 > 10.0.0.1.50053: Flags [.], ack 8, win 510, options [nop,nop,TS val 463915846 ecr 1300712875,nop,nop,sack 1 {1:8}], length 0
12:06:43.509918 IP 10.0.0.5.80 > 10.0.0.1.50053: Flags [FP.], seq 1:208, ack 8, win 510, options [nop,nop,TS val 463915855 ecr 1300712875], length 207: HTTP: HTTP/1.1 400 Bad request
12:06:43.723261 IP 10.0.0.5.80 > 10.0.0.1.50053: Flags [.], ack 8, win 510, options [nop,nop,TS val 463916066 ecr 1300713095,nop,nop,sack 1 {1:8}], length 0
12:06:43.931805 IP 10.0.0.5.80 > 10.0.0.1.50053: Flags [FP.], seq 1:208, ack 8, win 510, options [nop,nop,TS val 463916275 ecr 1300713095], length 207: HTTP: HTTP/1.1 400 Bad request
12:06:44.153096 IP 10.0.0.5.80 > 10.0.0.1.50053: Flags [.], ack 8, win 510, options [nop,nop,TS val 463916496 ecr 1300713525,nop,nop,sack 1 {1:8}], length 0
12:06:44.763598 IP 10.0.0.5.80 > 10.0.0.1.50053: Flags [FP.], seq 1:208, ack 8, win 510, options [nop,nop,TS val 463917107 ecr 1300713525], length 207: HTTP: HTTP/1.1 400 Bad request
12:06:45.013168 IP 10.0.0.5.80 > 10.0.0.1.50053: Flags [.], ack 8, win 510, options [nop,nop,TS val 463917357 ecr 1300714385,nop,nop,sack 1 {1:8}], length 0
11 packets captured
11 packets received by filter
0 packets dropped by kernel
Third test: wget
wget http://1.2.3.4
--2023-07-06 13:08:30--  http://1.2.3.4/
Connecting to 1.2.3.4:80... connected.
HTTP request sent, awaiting response...
sudo tcpdump --interface br0 -n host 10.0.0.5 and tcp port 80
tcpdump: verbose output suppressed, use -v[v]... for full protocol decode
listening on br0, link-type EN10MB (Ethernet), snapshot length 262144 bytes
12:08:30.498369 IP 10.0.0.5.80 > 10.0.0.1.5193: Flags [S.], seq 485313575, ack 1774340012, win 65160, options [mss 1460,sackOK,TS val 464022842 ecr 1300819872,nop,wscale 7], length 0
12:08:30.500765 IP 10.0.0.5.80 > 10.0.0.1.5193: Flags [P.], seq 1:235, ack 130, win 509, options [nop,nop,TS val 464022846 ecr 1300819875], length 234: HTTP: HTTP/1.1 200 OK
12:08:30.707813 IP 10.0.0.5.80 > 10.0.0.1.5193: Flags [P.], seq 1:235, ack 130, win 509, options [nop,nop,TS val 464023051 ecr 1300819875], length 234: HTTP: HTTP/1.1 200 OK
12:08:30.719873 IP 10.0.0.5.80 > 10.0.0.1.5193: Flags [.], ack 130, win 509, options [nop,nop,TS val 464023065 ecr 1300820095,nop,nop,sack 1 {1:130}], length 0
12:08:30.915617 IP 10.0.0.5.80 > 10.0.0.1.5193: Flags [P.], seq 1:235, ack 130, win 509, options [nop,nop,TS val 464023259 ecr 1300820095], length 234: HTTP: HTTP/1.1 200 OK
12:08:30.940214 IP 10.0.0.5.80 > 10.0.0.1.5193: Flags [.], ack 130, win 509, options [nop,nop,TS val 464023285 ecr 1300820315,nop,nop,sack 1 {1:130}], length 0
12:08:31.355895 IP 10.0.0.5.80 > 10.0.0.1.5193: Flags [P.], seq 1:235, ack 130, win 509, options [nop,nop,TS val 464023699 ecr 1300820315], length 234: HTTP: HTTP/1.1 200 OK
12:08:31.370023 IP 10.0.0.5.80 > 10.0.0.1.5193: Flags [.], ack 130, win 509, options [nop,nop,TS val 464023715 ecr 1300820745,nop,nop,sack 1 {1:130}], length 0
12:08:32.187784 IP 10.0.0.5.80 > 10.0.0.1.5193: Flags [P.], seq 1:235, ack 130, win 509, options [nop,nop,TS val 464024531 ecr 1300820745], length 234: HTTP: HTTP/1.1 200 OK
12:08:32.290020 IP 10.0.0.5.80 > 10.0.0.1.5193: Flags [.], ack 130, win 509, options [nop,nop,TS val 464024635 ecr 1300821665,nop,nop,sack 1 {1:130}], length 0
12:08:32.313791 IP 10.0.0.5.80 > 10.0.0.1.58970: Flags [FP.], seq 3064338454:3064338688, ack 4207137339, win 509, options [nop,nop,TS val 464024659 ecr 1300815883], length 234: HTTP: HTTP/1.1 200 OK
^C
11 packets captured
11 packets received by filter

Explanation of tests

  • The first test in each configuration is a simple port knock using telnet, as you suggested.
  • The second test is typing hello into telnet, to see what this does.
  • The third test uses wget to send a proper HTTP request.

Thoughts on results

  • The tests are clearly establishing a connection through to 10.0.0.5, and indeed the webserver is registering the requests and answering them. This is as expected, as we have already seen that the incoming routing rules appear correct. After all, external clients are just fine. So the port knock confirms this without additional noise.
  • The second test generates an HTTP error as expected, but nothing ever comes back through to telnet. This confirms that there is no outbound traffic permitted, which matches previous observations and indeed the original problem description,
  • The third test is for the sake of completeness, showing that a genuine HTTP request is responded to in the manner expected, but that no reply ever surfaces.

So to consider the questions you raised:

  1. is the request arriving at the target

Yes - as shown previously, packets are making their way from the requesting device to the responding device.

  1. is het using the correct source address (the firewalls)

I don't really know how to answer this - can you tell from what I have provided? It appears that the source is the firewall's, but I don't know if that's what we are expecting here. If it is, then yes, that is happening.

The problem therefore remains that the packets are not returning back out, which is in keeping with the observation regarding the rules created (i.e. that the rules created are no different when specifying reflection than they are when not enabling it - which always struck me as incorrect, but I could be wrong there).

Ohw, and by the way, to use reflection, don't select the lan interface if your waiting for an automatic (reflection) rule there, I think I saw lan+wan in one of your screenshots.

As explained in my original bug report, I originally tried just WAN as per documentation and guides, and when this didn't work I added LAN (so LAN+WAN) based on comments that suggested it would not work otherwise. Both setups and related attempts are described and corresponding details supplied, including configuration. However, using WAN only has not changed the behaviour, and neither has specifying the precise WAN IP instead of just "WAN address".

I remain of the opinion that there are some rules that are needed for reflection, which are somehow not getting created... what do you expect to be put in place for this? I.e. can I check something specific that should be happening for the responses to get delivered back to my internal device?

@AdSchellevis
Copy link
Member

My community support time is limited (don't have time to read everything), but if 10.0.0.5 is the target address and 10.0.0.1 the firewalls, where is the request? I would try to capture the traffic on the target (10.0.0.5) to figure out where it is getting the question from as the telnet sample seems to be answering something we don't see a request from.

@danwilliams
Copy link
Author

@AdSchellevis the tcpdump command was run on the 10.0.0.5 system, i.e. the target, so what I have provided is the packet captures for the target. Telnet is not answering anything - it is the instigator of the request. The command I have run is set to filter by host, so should capture traffic going to and from that IP. If I change it to filter by dst then I see nothing.

As the interface is a bridge, there is a possibility this is interfering with the packet capture, so I will set up a port forward to a machine that is using a straightforward non-bridge interface and repeat the packet capture.

@AdSchellevis
Copy link
Member

if you can avoid other machines from sending traffic to port 80 on 10.0.0.5, it might help to capture all on port 80. 10.0.0.5.80 looks like the answer from a request we're not seeing (src port 80).

@danwilliams
Copy link
Author

There are no other machines sending traffic to port 80 on 10.0.0.5, as HTTPS is the commonly-used protocol and HTTP simply issues a redirect. Hence the test is the only thing using it. However, I cannot open up the filter more on that machine, as there are many IP addresses. The command should already capture bi-directional traffic, so I'm setting up a test on a different machine for comparison.

@danwilliams
Copy link
Author

@AdSchellevis Okay, so this is what I have done:

  1. I have configured SSH to listen to port 9999 as well as 22 on a different machine.
  2. I have set up a new port forward, on port 9999, passing in from WAN to the internal 10.0.0.20 address of that machine.
  3. I have tested this from a remote server and verified that it works and I can indeed SSH in using port 9999.
  4. I then tested it from a local machine, running the same command, going to the WAN IP. The expectation was that this would fail. But it did not! It worked!

That was rather unexpected.

  1. I tried it again using the hostname, which resolved to the external IP. This also worked.
  2. Next, I tried amending the NAT forwarding rule to use "WAN address" instead of the specific 1.2.3.4. address. This still worked.

So, at this point I am thinking, what is the difference?

There are two differences that I can see:

  1. My new test is on a non-bridging interface. I don't think that specifically matters. It was mainly just to get more traffic resolution.
  2. The original test was on the same physical machine as the firewall. I did not think that would cause any problem - but it seems maybe it does?

Some more information:

The physical server that OPNsense is on runs both KVM VMs and also Docker containers. It has several network interfaces. OPNsense runs in a KVM VM, and is set up to "own" one passed-through PCI NIC, for the WAN, and sit on the bridge for the internal network traffic. (As already described, but good to recap.)

One of the Docker containers runs HAProxy. This presents an open port on the same bridged interface as OPNsense has access to. HAProxy then channels everything out to the wider network - including a KVM VM sat on a host-only bridge (virbr0). (I have been considering trying to move this setup to OPNsense's HAProxy plugin once everything else is working.)

So the full traffic route is: Client -> WAN IP 1.2.3.4 -> PPPoE -> OPNsense gateway IP 10.0.0.1 on br0 -> HAProxy 10.0.0.5 on br0 -> Gitea VM IP 192.168.100.120 on virbr0

Is it possible that this is the cause of the problem? If so, do you have any idea why? I don't immediately see a reason why traffic would not be able to easily flow between the two. There are no other similar issues - everything else has been talking to each other as expected. Perhaps there's something about the OPNsense rules that would cause it? Or is it more likely to be a network matter outside of OPNsense?

To clarify the devices:

PHYS NIC 1: bridge br0:

  • 10.0.0.1 - OPNsense LAN address
  • 10.0.0.5 - HAProxy
  • (there are other IPs assigned here too)

PHYS NIC 2: PCI pass-through

  • WAN address - OPNsense

My original thinking was that sharing the bridge was a good thing, as virtio speeds can be taken advantage of. But if it causes a conflict, I can change that.

Are there any known constraints that would prevent OPNsense from sharing a bridge with other services on the host, on different IPs?

Based on this, the next thing I will try will be to change over to a different physical NIC - I might assign a third NIC as passthrough for sole use by OPNsense, for the internal LAN network. Perhaps it's better for OPNsense to totally own its cards. (Is this a known issue or recommendation?)

Thanks for the help on this!

@levelad
Copy link

levelad commented Jul 6, 2023

I had the same problem with NAT reflection but I resigned and used Split-DNS instead.

Could a PPPoE WAN connection be the common denominator to this problem? I also couldn't get it working with a PPPoE connection.

@danwilliams
Copy link
Author

@levelad what's interesting is that if I forward to a port on another machine on the LAN, it works. But if I try to forward to a port on the same host machine, it doesn't. This is a recent discovery in trying to diagnose this.

Is/was your setup also on the same host machine?

@levelad
Copy link

levelad commented Jul 6, 2023

@danwilliams I didn't test another machine or another port. But the same setup was working before with an old firewall (Sophos UTM aka Astaro Security Gateway).

@AdSchellevis
Copy link
Member

@danwilliams to be honest, I don't know what your issue is, just don't think it is related to the firewall directly. My recent local test using hardware did work without issues as well.

@Monviech
Copy link
Member

Monviech commented Jul 7, 2023

Maybe this helps. I do all hairpin NAT configurations manually on the opnsense. (Manual outbound NAT rule generation activated). There's 2 common scenarios:

  1. Client and Server are in different broadcast domains, and the Opnsense routes between them.
  • For Hairpin NAT to work you need a DNAT rule, and a Firewall Rule which allows that traffic. The Firewall rule should be auto generated in Firewall - Rules - Floating.

  • In this example, all packets into interface lagg0_vlan44 from the source ip 172.16.100.4 with the destination 1.2.3.4 get translated to destination ip 10.0.0.4 (vlan45).

image

  1. Client and Server are in the same broadcast domain, or even the same device.
  • For Hairping NAT to work you need a DNAT rule AND a SNAT rule. The SNAT Rule makes the firewall answer with its IP Address instead, so asynchronous traffic is avoided.

  • In this example, the host 10.0.0.4 in vlan45 tries to reach its OWN public IP address 1.2.3.4. The DNAT rule rewrites the destination from 1.2.3.4 to 10.0.0.4. But for this to work you need to send the reply from the firewall IP back to 10.0.0.4. Thats why theres an SNAT rule that catches packets in vlan45 with the source ip 10.0.0.4 and destination ip 10.0.0.4, and changes the source to vlan45 address (for example 10.0.0.254).

image
image

@levelad
Copy link

levelad commented Jul 7, 2023

@AdSchellevis
@Monviech
Just out of curiosity is your WAN configured as PPPoE?

@danwilliams
Copy link
Author

@AdSchellevis okay, so this is quite interesting.

I spent some time over the weekend trying various different configurations. Long story short:

  • If OPNsense is set up to share a bridged interface with KVM, then NAT reflection doesn't work
  • If it is assigned its own physical NIC, NAT reflection works just fine

Important things to consider in context:

  • All other functionality appears to work fine when sharing a bridged interface - i.e. normal routing, DNS, all services, and indeed port forwarding from external to internal. It's just the NAT reflection that doesn't work, so internal to internal does not reply to the port forward.
  • I have not tested providing an interface alone through KVM using virtio, although I can take some time and do that if it would be useful. I just went straight from shared interface to passthrough PCI device. Let me know if you'd like me to test a dedicated virtio NIC.
  • I have not tried a bare-metal install as there would be no point, as there would be nothing to share an interface with.

So in summary, it seems that there is some issue with NAT reflection under the specific circumstances listed, which perhaps should lead to an advisory in the documentation if it's by design, otherwise perhaps it helps narrow the possibilities to look at if attempting a fix. I don't know exactly which bit is the problem - shared interface, the interface being a bridge, or the use of KVM (although I doubt it's KVM-related).

@Monviech I read your message with great interest, as it certainly sounds like it could be relevant. Unfortunately, although I spent some time trying what you detailed, I was not able to get anywhere. I think I just don't know enough about NAT/DNAT/SNAT, and therefore was likely doing the wrong thing. As assigning a dedicated passthrough interface worked, and solved my immediate problem, I had limited time for further testing (as I've already spent days on this!). So you could very well be correct, but maybe it's one that @AdSchellevis and co. can pick up and understand and put into OPNsense as a fix (e.g. if additional rules need to be set up automatically) or as a documentation update (say if those rules are only needed under certain circumstances and should not be automatic). I might have another try at some point if I get time.

@levelad I don't think the issue is specifically related to PPPoE, as (from what I have observed) this issue occurs on the LAN interface and does not appear to be affected by the WAN interface. The port forwarding works from external to internal and back, and appears to work for internal to internal in WAN interface context, but then the packets don't manage to come back the other way. So internal requests work, but not responses. I.e. WAN -> LAN works, WAN <- LAN works; LAN -> (WAN) LAN works, but LAN <- (WAN) LAN doesn't work. All this is only a problem when the LAN interface is shared with KVM, i.e. a bridge on the host.

TL;DR

NAT reflection does not work out-of-the box when sharing a network interface that is a bridge on the host, when using KVM virtio. It is unclear whether the issue is the sharing of the interface, the fact that is a bridge, or the use of KVM. Advice is therefore to assign a dedicated passthrough PCI device to avoid this issue entirely.

@danwilliams
Copy link
Author

...interestingly, this other ticket might be describing the same problem, as it sounds very similar and mentions SNAT and DNAT. I had found and read it before posting my original bug report (and referred to it discreetly) but with @Monviech's additional info I am now wondering if it is actually more closely related than I had thought. But I don't understand the outcome or conclusion of that ticket, or why exactly it was closed (which appears to be contentious).

#5941

@Monviech
Copy link
Member

Monviech commented Jul 11, 2023

What I don't understand is why an interface type would change how NAT works. NAT - SNAT (Source Network Address Translation), DNAT (Destination Network Address Translation), PAT (Port Address Translation) - rewrites source and destination IP Addresses and/or Ports in OSI Layer 3 based on NAT rules. You can change the packets however you want with these rules.

A virtual interface connected to a virtual hypervisor switch/bridge operates in OSI Layer 1 and 2, it shouldn't change anything about how IP works in OSI Layer 3.

There certainly could be problems with automatic generation in Opnsense not being able to fit all traffic cases as shown above. I know that these manual rules work though, I've tested them on DEC Hardware, on vmware with pcie passthrough, vmware with vswitch, and hyper-v with vswitch. I cant test it on kvm though.

@danwilliams
Copy link
Author

@fichtner Agreed, I think OPNsense can do better. If it's a situation which is outside of the remit of OPNsense to resolve for some reason (e.g. as speculated in one of my previous messages, if it's a situation than can be identified but for which you do not want to add rules) then there should at least be some identification made, and notes added to the documentation, so that people know what to do in such circumstances. This would fall under my option (b) (which covers your proposed option (c) as well).

I don't think it's a case of commercial vs community support - or at least, it shouldn't be. If the focus of the community edition is providing the best tool possible, then clearly there is a chance here to improve the docs at the very least. Commercial support is not generally about situations like this, which could (and it seems do) occur for a number of people under common circumstances - in my experience it tends to be more about very specific enterprise situations or where help is needed for something that community members would do themselves. KVM, bridged interfaces, and for that matter PPPoE are all very common and likely to be used by community members, hence I think this is a community-context problem to address. Just my opinion, of course.

In summary, you could spend essentially zero time on this by adding a warning to the docs that NAT reflection may not work if sharing a bridged interface under KVM, and to use a dedicated passthrough card instead. I would be happy with that, and would equally be happy to spend some time helping to narrow the focus if desired.

@AdSchellevis
Copy link
Member

... In summary, you could spend essentially zero time on this by adding a warning to the docs that NAT reflection may not work if sharing a bridged interface under KVM, and to use a dedicated passthrough card instead....

To be honest, I don't think that would help people very much as it's highly likely there are also people using KVM that do not have this issue. We can try to describe all possible surroundings, but without being precise, it often doesn't help much.
Without understanding the issue at hand, it will be nearly impossible to point people into a direction in my experience.

I don't think it's a case of commercial vs community support -....

From a product perspective it's not a community vs commercial question (both of our products are equal in that regard) from a support perspective in my humble opinion it is. We just can not spend an endless amount of time on issues that highly likely lie outside of the scope of OPNsense. Realistically we already spend way more time on this ticket than reasonable, but given the time you put into it, it felt good to try to help you analyze your issue to see if we could isolate something that either warranted a fix in the code or documentation.

When we do run into issues like these while doing commercial support, we often try to update the documentation or record it in a ticket if it helps others, unfortunately for you we haven't seen this one.

I remember an issue (with kvm) a long time ago where it discarded traffic for some reason (opnsense/src#85), sometimes these things tend to be version specific as well in our experience (either on the hypervisor or the drivers they need in the guest).

@Monviech
Copy link
Member

@danwilliams

I've checked KVM libvirt. It has 3 different operating modes. Please tell me which mode your libvirt interface is configured as. I want to try and analyze the issue in my freetime, to see if it is indeed a kvm bridge problem or something else.

<forward mode='bridge'>
<forward mode='nat'>
<forward mode='route'>

@levelad
Copy link

levelad commented Jul 13, 2023

I'm not a network engineer but isn't a virtual machine always behind a virtual bridge as shown in this figure:

https://www.researchgate.net/figure/The-architecture-of-KVM-full-virtualized-network-I-O_fig1_269326980

In my case it was a virtual Windows 10 on a Windows Server 2022 Standard host machine.

Hyper-V also has 3 Virtual Switch types:

  • External
  • Internal
  • Private

Probably similar to the KVM ones.

Edit: Only with the External type in Hyper-V the switch is bound to the NIC. That's also the type in use.

@Monviech
Copy link
Member

Monviech commented Jul 13, 2023

I'm not a network engineer but isn't a virtual machine always behind a virtual bridge as shown in this figure:

https://www.researchgate.net/figure/The-architecture-of-KVM-full-virtualized-network-I-O_fig1_269326980

In my case it was a virtual Windows 10 on a Windows Server 2022 Standard host machine.

Hyper-V also has 3 Virtual Switch types:

* External

* Internal

* Private

Probably similar to the KVM ones.

Yes you are right, but qemu seems to have more options. In Ubuntu 22.04 in which I installed kvm right now, I have 3 different options to define a network in /etc/libvirt/qemu/networks/. They are:

EDIT:

  1. NAT
  2. route
  3. Open vSwitch

I think I'll try the Open vSwitch for the Opnsense to test it.

@Monviech
Copy link
Member

Monviech commented Jul 13, 2023

I made a test kvm hypervisor with Open vSwitch

Network Configuration:

:/etc/libvirt/qemu/networks$ lsb_release -a
No LSB modules are available.
Distributor ID: Ubuntu
Description:    Ubuntu 22.04.2 LTS
Release:        22.04
Codename:       jammy
:/etc/libvirt/qemu/networks$ sudo kvm --version
QEMU emulator version 6.2.0 (Debian 1:6.2+dfsg-2ubuntu6.11)
Copyright (c) 2003-2021 Fabrice Bellard and the QEMU Project developers
:/etc/libvirt/qemu/networks$ sudo ovs-vsctl show
42054917-45dd-4ed4-aa93-44b3c94bbb78
    ovs_version: "2.17.7"
:/etc/libvirt/qemu/networks$ sudo virsh --version
8.0.0
:/etc/libvirt/qemu/networks$ sudo nano br0.xml 
<network>
  <name>br0</name>
  <uuid>1106d833-19c8-4eb8-bb43-c241838eca22</uuid>
  <forward mode='bridge'/>
  <bridge name='br0'/>
  <virtualport type='openvswitch'/>
</network>
:/etc/libvirt/qemu/networks$ sudo nano br1.xml 
<network>
  <name>br1</name>
  <uuid>a6eafd21-cd84-4849-bfb0-f0d20288bf03</uuid>
  <forward mode='bridge'/>
  <bridge name='br1'/>
  <virtualport type='openvswitch'/>
</network>
:/etc/libvirt/qemu/networks$ sudo ovs-vsctl show
42054917-45dd-4ed4-aa93-44b3c94bbb78
    Bridge br1
        Port vnet3
            Interface vnet3
        Port br1
            Interface br1
                type: internal
    Bridge br0
        Port br0
            Interface br0
                type: internal
        Port vnet2
            Interface vnet2
    ovs_version: "2.17.7"
13: br0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state UNKNOWN group default qlen 1000
    link/ether e2:c7:4d:f4:75:47 brd ff:ff:ff:ff:ff:ff
    inet 192.168.1.2/24 scope global br0
       valid_lft forever preferred_lft forever
    inet6 fe80::e0c7:4dff:fef4:7547/64 scope link 
       valid_lft forever preferred_lft forever
14: br1: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state UNKNOWN group default qlen 1000
    link/ether 7a:9b:20:52:ef:4d brd ff:ff:ff:ff:ff:ff
    inet6 fe80::789b:20ff:fe52:ef4d/64 scope link 
       valid_lft forever preferred_lft forever
17: vnet2: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc fq_codel master ovs-system state UNKNOWN group default qlen 1000
    link/ether fe:54:00:5f:d5:aa brd ff:ff:ff:ff:ff:ff
    inet6 fe80::fc54:ff:fe5f:d5aa/64 scope link 
       valid_lft forever preferred_lft forever
18: vnet3: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc fq_codel master ovs-system state UNKNOWN group default qlen 1000
    link/ether fe:54:00:0b:83:2d brd ff:ff:ff:ff:ff:ff
    inet6 fe80::fc54:ff:fe0b:832d/64 scope link 
       valid_lft forever preferred_lft forever

Opnsense Configuration

grafik
grafik
grafik
grafik
grafik

EDITED:
image
EDIT END

As Test client I use the hypervisor with the ip address 192.168.1.2 on br0

The test is pinging 1.2.3.4 and checking with tcpdump if the rules are matched accordingly for NAT Reflection.

host ping

:/etc/libvirt/qemu/networks$ ping 1.2.3.4
PING 1.2.3.4 (1.2.3.4) 56(84) bytes of data.
64 bytes from 1.2.3.4: icmp_seq=1 ttl=63 time=0.602 ms
^C
--- 1.2.3.4 ping statistics ---
1 packets transmitted, 1 received, 0% packet loss, time 0ms
rtt min/avg/max/mdev = 0.602/0.602/0.602/0.000 ms

opnsense tcpdump
grafik

host tcpdump

:~$ sudo tcpdump -i br0 -n
tcpdump: verbose output suppressed, use -v[v]... for full protocol decode
listening on br0, link-type EN10MB (Ethernet), snapshot length 262144 bytes
20:54:22.936178 IP 192.168.1.2 > 1.2.3.4: ICMP echo request, id 11, seq 1, length 64
20:54:22.936613 IP 1.2.3.4 > 192.168.1.2: ICMP echo request, id 11, seq 1, length 64
20:54:22.936659 IP 192.168.1.2 > 1.2.3.4: ICMP echo reply, id 11, seq 1, length 64
20:54:22.936749 IP 1.2.3.4 > 192.168.1.2: ICMP echo reply, id 11, seq 1, length 64

ANALYSIS:

The Nat reflection works, I can't find the bug.

@levelad
Copy link

levelad commented Jul 13, 2023

@Monviech thanks for the effort but that is again a static IPv4 WAN. What @danwilliams? and I are proposing is that it doesn't work on PPPoE WAN and the target a virtual machine (virtual bridge/virtual switch).

@Monviech
Copy link
Member

Monviech commented Jul 14, 2023

Alright, see this as addition to the test above:

sudo apt-get install pppoe pppoeconf

# /etc/ppp/pppoe-server-options

debug
name "MeinISP"
lcp-echo-interval 10
lcp-echo-failure 2
mru 1492
netmask 255.255.255.0
defaultroute
ms-dns 8.8.8.8
ms-dns 8.8.4.4
ipcp-accept-local
ipcp-accept-remote
noipdefault

#/etc/ppp/chap-secrets
# Secrets for authentication using CHAP
# client        server  secret                  IP addresses
"user1"         "*"     "password1"             "*"

sudo pppoe-server -I br1 -L 1.2.3.1 -R 1.2.3.4 -F /etc/ppp/pppoe-server-options

image
image

12:05:18.719462 PPPoE PADI [Host-Uniq 0x807511A400F8FFFF] [Service-Name]
12:05:18.719512 PPPoE PADO [AC-Name "pc01"] [Service-Name] [AC-Cookie 0xF03CC67783BE603B1DE1915EBB638C2AAE6E0000] [Host-Uniq 0x807511A400F8FFFF]
12:05:18.719590 PPPoE PADR [Host-Uniq 0x807511A400F8FFFF] [AC-Cookie 0xF03CC67783BE603B1DE1915EBB638C2AAE6E0000] [AC-Name "pc01-akiru"] [Service-Name]
12:05:18.719843 PPPoE PADS [ses 0x2] [Service-Name] [Host-Uniq 0x807511A400F8FFFF]
12:05:18.720059 PPPoE  [ses 0x2] LCP, Conf-Request (0x01), id 1, length 18
12:05:19.725805 PPPoE  [ses 0x2] LCP, Conf-Request (0x01), id 1, length 20
12:05:19.726213 PPPoE  [ses 0x2] LCP, Conf-Ack (0x02), id 1, length 20
12:05:20.792190 PPPoE  [ses 0x2] LCP, Conf-Request (0x01), id 2, length 18
12:05:20.792316 PPPoE  [ses 0x2] LCP, Conf-Reject (0x04), id 2, length 8
12:05:20.792479 PPPoE  [ses 0x2] LCP, Conf-Request (0x01), id 3, length 16
12:05:20.792537 PPPoE  [ses 0x2] LCP, Conf-Ack (0x02), id 3, length 16
12:05:20.792721 PPPoE  [ses 0x2] LCP, Echo-Request (0x09), id 0, length 10
12:05:20.792737 PPPoE  [ses 0x2] EAP 
        0x0000:  0198 0009 014e 616d 65
12:05:20.792871 PPPoE  [ses 0x2] LCP, Echo-Reply (0x0a), id 0, length 10
12:05:20.792956 PPPoE  [ses 0x2] EAP 
        0x0000:  0298 000a 0175 7365 7231
12:05:20.793040 PPPoE  [ses 0x2] EAP 
        0x0000:  0199 0020 0413 b135 d448 7d1c 4142 4773
        0x0010:  c26d 0eba 932e 2ac0 5d4d 6569 6e49 5350
12:05:20.793236 PPPoE  [ses 0x2] EAP 
        0x0000:  0299 001b 0410 51da ce37 a6a7 ea44 504c
        0x0010:  b8fe 66b1 05bb 7573 6572 31
12:05:20.793306 PPPoE  [ses 0x2] EAP 
        0x0000:  039a 0004
12:05:20.793539 PPPoE  [ses 0x2] CCP, Conf-Request (0x01), id 1, length 17
12:05:20.793568 PPPoE  [ses 0x2] IPCP, Conf-Request (0x01), id 1, length 18
12:05:20.793574 PPPoE  [ses 0x2] IP6CP, Conf-Request (0x01), id 1, length 16
12:05:20.793650 PPPoE  [ses 0x2] IPCP, Conf-Request (0x01), id 1, length 30
12:05:20.793721 PPPoE  [ses 0x2] IPCP, Conf-Nack (0x03), id 1, length 24
12:05:20.794014 PPPoE  [ses 0x2] LCP, Prot-Reject (0x08), id 1, length 23
12:05:20.794135 PPPoE  [ses 0x2] IPCP, Conf-Ack (0x02), id 1, length 18
12:05:20.794191 PPPoE  [ses 0x2] LCP, Prot-Reject (0x08), id 2, length 22
12:05:20.794352 PPPoE  [ses 0x2] IPCP, Conf-Request (0x01), id 2, length 30
12:05:20.794421 PPPoE  [ses 0x2] IPCP, Conf-Ack (0x02), id 2, length 30
12:05:21.058185 PPPoE  [ses 0x2] IP 1.2.3.5.30011 > a.root-servers.net.domain: 1115% [1au] NS? . (28)
12:05:26.118350 PPPoE  [ses 0x2] IP 1.2.3.5.38595 > dns.google.domain: 7521+ A? 0.opnsense.pool.ntp.org. (41)
12:05:26.456439 PPPoE  [ses 0x2] IP 1.2.3.5.34060 > a.root-servers.net.domain: 47482% [1au] NS? . (28)
12:05:26.456739 PPPoE  [ses 0x2] IP 1.2.3.5.35735 > d.root-servers.net.domain: 1362% [1au] NS? . (28)
12:05:26.865287 PPPoE  [ses 0x2] IP 1.2.3.5.4712 > d.root-servers.net.domain: 26393% [1au] NS? . (28)

The PPPoE Session was established between the hypervisor on br1 and the opnsense. It received a dynamic ipv4 address!

VM Ubuntu 22.04 LTS started in KVM, network interface attached to the "shared" Bridge br0. It received the IP address 192.168.1.10 from the opnsense DHCP Server on br0.

Configuration Opnsense

The dynamic PPPoe WAN address 1.2.3.5 is DNAT to 192.168.1.10, which is the Ubuntu VM.

image

Snat rule to say that src 192.168.1.10 to dest 192.168.1.10 should be answered by the opnsense IP 192.168.1.1 on br0.

image

EDITED:
This floating firewall policy allows traffic from LAN and WAN to the destination 192.168.1.10. NAT rules match before firewall rules.

image
EDIT END

Ping test from the Ubuntu VM 192.168.1.10 to the IP 1.2.3.5. The request gets redirected back to the Ubuntu VM and it gets answered.

(Top KVM is the ubuntu vm 192.168.1.10, bottom is the opnsense vm)
image

RESULT:

NAT Reflection still works, even if you use PPPoE and have the Opnsense and VM in KVM connected to the same bridge.

@vpx23
Copy link

vpx23 commented Jul 14, 2023

@Monviech I'm confused why your floating firewall rule has the "WAN address" as target and not the redirect target IP (192.168.1.10) from the port forward.

@Monviech
Copy link
Member

Monviech commented Jul 14, 2023

@vpx23 EDIT:
You are right. NAT rules match before Firewall rules. I should check this again. I think I made a mistake. https://docs.opnsense.org/manual/firewall.html

EDIT EDIT:
I've tested what kind of rule the opnsense auto generates if you create a NAT rule with 2 interfaces. The linked floating rule looks like this:

image

image

Thanks for pointing out the mistake. I will improve the pppoe post.
It was working because the default "Default allow LAN to any rule" matched.

EDIT EDIT EDIT:
I've updated both posts explaining the KVM setup to show the floating rule going to the internal IP, not the external IP.

@levelad
Copy link

levelad commented Jul 14, 2023

@Monviech could you please change the source of the outbound NAT to "LAN net" and ping the public IP from a 3rd device, e.g. a client in the same subnet?

Then we'd have a more real world 3 party setup instead of only 2 parties, thanks.

@Monviech
Copy link
Member

I have done enough. Anybody can replicate the setup I did above and mess around with the rules themselves. From all the different setups I tested, I'm pretty sure there is no bug. If I continue now, there will always be a "but what about this scenario" and things will never end.

@danwilliams
Copy link
Author

@AdSchellevis

To be honest, I don't think that would help people very much as it's highly likely there are also people using KVM that do not have this issue. We can try to describe all possible surroundings, but without being precise, it often doesn't help much.
Without understanding the issue at hand, it will be nearly impossible to point people into a direction in my experience.

The purpose of what I am currently trying to do is to resolve the cause precisely, but I cannot do that alone.

We just can not spend an endless amount of time on issues that highly likely lie outside of the scope of OPNsense.

I appreciate that - and your time on this.

I remember an issue (with kvm) a long time ago where it discarded traffic for some reason (opnsense/src#85), sometimes these things tend to be version specific as well in our experience (either on the hypervisor or the drivers they need in the guest).

That's quite interesting - although in this case there are no special drivers in the guest, so it's whatever is in the kernel. But yes, the scope can be tricky.

@Monviech

I've checked KVM libvirt. It has 3 different operating modes.

Indeed it does. I apologise if my original report was not clear in this regard - the bridge created on the host (br0) is one bridge, but then I had also specified bridge mode for the virtual NIC in KVM. I did not choose NAT there for that type, as I needed it to appear on the network and have an IP address assigned.

@levelad

...isn't a virtual machine always behind a virtual bridge...

No - there are always a few options, naming varies slightly between hypervisors but all present the core three of bridge, NAT, and host-only, plus sometimes additional variations. In this situation I had specified bridge so that the virtual NIC could reach the Internet (bridge and NAT both do this, but not host-only) but also accept incoming traffic (bridge and host-only both do this, but not NAT, and host-only accepts it through an internal subnet IP whereas bridge gets one assigned on the LAN subnet). I am not 100% sure what "route" means in KVM land, as I've never used nor had cause to use it, so my "host-only" reference is a VMware term, but they are all comparable. Your Hyper-V options correspond to the VMware terminology, I believe (with "External" being comparable to "bridge" in KVM and VMware).

@Monviech

...qemu seems to have more options.

Not really - QEMU has the same basic options as all the others (my knowledge here spans KVM, VMware, VirtualBox, and XEN) but the naming can differ slightly.

I made a test kvm hypervisor with Open vSwitch

That's very interesting - I have looked over quickly, and the settings appear to align with what I had, but I will look in more detail when I get opportunity. My question here is, coming out of that test, what else would be useful for me to try on this end?

One difference I can see, though... It is interesting that you specified virtualport type='openvswitch' as I've not done that myself. It is not necessary in order to be configured as a bridge for the VM.

@levelad

...proposing is that it doesn't work on PPPoE WAN...

I don't necessarily disagree with you, but my suspicion is that it's the internal side of the routing - but I can't rule out the PPPoE being the cause, as I don't know how the reflection is handling that. So I have no idea, but I'm guessing it's not PPPoE.

@Monviech

see this as addition to the test above:

That's cool that you've set up a PPPoE test case. One question about it - I see you have added a SNAT rule, which appears under "Outbound". I am lost at that point - I did not have that, either manually or automatically. Are you saying this is a necessary step? Could that be the root of the problem?

(I've ignored the edit conversation as that happened before I read the updates, so I'm looking at the latest version here.)

@levelad

could you please change the source of the outbound NAT to "LAN net" and ping the public IP from a 3rd device, e.g. a client in the same subnet?

This is not a bad idea, but unnecessary for my particular case, as I was able to observe it with the parties @Monviech put in place.

@Monviech

I have done enough

Thank you - I appreciate your time on this. Even though this is no longer a problem for me, as I've solved it by moving to a dedicated passthrough PCI card, I believe we both share the goal of improving the situation for the community - plus I hate to have an unresolved issue!

I think you may have identified the cause, with your addition of a SNAT rule - as mentioned above. Could you confirm if this could in fact be the culprit?

@levelad
Copy link

levelad commented Jul 15, 2023

I think you may have identified the cause, with your addition of a SNAT rule - as mentioned above. Could you confirm if this could in fact be the culprit?

NAT Reflection/NAT Loopback/Hairpin NAT is basically SNAT.

Here is a very good description: https://help.mikrotik.com/docs/display/ROS/NAT#NAT-HairpinNAT

@danwilliams
Copy link
Author

@levelad how do you interpret the situation of those rules being added manually - do you also think that could be the cause? I.e. should OPNsense be adding them? I can't understand why @Monviech added them - if they are necessary, either OPNsense should be adding them, or there should be docs to tell us to? What do you think?

@levelad
Copy link

levelad commented Jul 16, 2023

@danwilliams I think there is a bug in the automatic creation of the outbound SNAT rules for NAT reflection. I tried everything (global and local setting) but they were never created.

@Monviech
Copy link
Member

Monviech commented Jul 17, 2023

I've checked the option:

Firewall: Settings: Advanced
Automatic outbound NAT for Reflection

And I was sure to have this enabled:

Hybrid outbound NAT rule generation
(automatically generated rules are applied after manual rules)

I created following DNAT rule:
image

I checked the resulting NAT and RDR rules with pfctl -s nat (I removed the automatic isakmp rules for better readability)

root@opn01:~ # pfctl -s nat
nat on pppoe1 inet from (hn4:network) to any -> (pppoe1:0) port 1024:65535
rdr on pppoe1 inet proto tcp from any to (pppoe1) port = 33333 -> 10.110.44.252 port 33333
rdr on lo0 inet proto tcp from any to (pppoe1) port = 33333 -> 10.110.44.252 port 33333
rdr on hn4 inet proto tcp from any to (pppoe1) port = 33333 -> 10.110.44.252 port 33333
rdr on hn5 inet proto tcp from any to (pppoe1) port = 33333 -> 10.110.44.252 port 33333
rdr on hn6 inet proto tcp from any to (pppoe1) port = 33333 -> 10.110.44.252 port 33333
rdr on hn7 inet proto tcp from any to (pppoe1) port = 33333 -> 10.110.44.252 port 33333
rdr on hn9 inet proto tcp from any to (pppoe1) port = 33333 -> 10.110.44.252 port 33333

Analysis:

There aren't any SNAT rules created by "Automatic outbound NAT for Reflection".

I explained it in a prior post #6650 (comment) that there's two scenarios. NAT Reflection with hosts in the same broadcast domain, and NAT reflection with hosts in different broadcast domains. In the same broadcast domain, because the clients can resolve the arp and communicate directly with each other, reflected traffic is asynchronous. That will prevent protocols like https or ssh from working, because TCP depends on synchronous traffic. The firewall has to answer reflected NAT requests from the same broadcast domain with it's own interface IP address.

If this isn't the way it's intended to work (like with policies not getting generated automatically), then it is a bug.

Result:

The OPNsense doesn't automatically generate the SNAT rules needed for nat reflection in the same broadcast domain. Protocols that need synchronous traffic (like TCP) won't work properly.

Workaround:

The SNAT rules have to be created manually.

@AdSchellevis
Copy link
Member

@Monviech I would expect nat rules as well if both reflection options are enabled, the nat rules are almost the same in terms of logic:

// yield reflection nat rules when enabled, but only for interfaces with networks configured
if (empty($rule['disabled']) && !empty($rule['enablenatreflectionhelper'])) {
$reflinterf[] = $interface;
foreach ($reflinterf as $interf) {
if (!empty($this->interfaceMapping[$interf])) {
$is_ipv4 = $this->isIpV4($rule);
if (
($is_ipv4 && !empty($this->interfaceMapping[$interf]['ifconfig']['ipv4'])) ||
(!$is_ipv4 && !empty($this->interfaceMapping[$interf]['ifconfig']['ipv6']))
) {
// we don't seem to know the ip protocol here, make sure our ruleset contains one
$rule['ipprotocol'] = $is_ipv4 ? "inet" : "inet6";
$rule['rule_type'] = "nat_refl";
$rule['interface'] = $interf;
$rule['staticnatport'] = !empty($rule['staticnatport']);
yield $rule;
}
}
}
}

You could try to grep the description ("Nat Refl..") from /tmp/rules.debug to see all related rules.

@Monviech
Copy link
Member

Monviech commented Jul 17, 2023

@AdSchellevis
I have checked the rules.debug, and I saw that the nat rules were there, but not in pfctl. So I just hit apply again and then they appeared. It was probably a user error on my side.

The rules look fine. I commented the SNAT rule and its exactly whats needed.

root@opn01:/tmp # cat rules.debug | grep -i nat

# NAT Redirects
nat on pppoe1 inet from (hn4:network) to any -> (pppoe1:0) port 1024:65535 # Automatic outbound rule
rdr on pppoe1 inet proto tcp from {any} to {(pppoe1)} port {33333} -> 10.110.44.252 port 33333 # Nat Reflection Test PPPoE
nat on pppoe1 inet proto tcp from (pppoe1:network) to {10.110.44.252} port {33333} -> (pppoe1) port 1024:65535 # Nat Reflection Test PPPoE
rdr on hn4 inet proto tcp from {any} to {(pppoe1)} port {33333} -> 10.110.44.252 port 33333 # Nat Reflection Test PPPoE
nat on hn4 inet proto tcp from (hn4:network) to {10.110.44.252} port {33333} -> (hn4) port 1024:65535 # Nat Reflection Test

root@opn01:/tmp # pfctl -s nat

# Automatic Outbound NAT
nat on pppoe1 inet from (hn4:network) to any -> (pppoe1:0) port 1024:65535

# SNAT
# If a packet is received by interface hn4 with protocol TCP from the source ip hn4:network (10.110.44.0/24) to destination
# ip 10.110.44.252/32 and destination port 33333 -> rewrite the source ip to the interface ip hn4 (10.110.44.254/32) and the
# source port to the upper port range 1024:65535, with no static port and round robin.
nat on hn4 inet proto tcp from (hn4:network) to 10.110.44.252 port = 33333 -> (hn4) port 1024:65535 round-robin

# DNAT
rdr on pppoe1 inet proto tcp from any to (pppoe1) port = 33333 -> 10.110.44.252 port 33333
rdr on hn4 inet proto tcp from any to (pppoe1) port = 33333 -> 10.110.44.252 port 33333

After all the tests I did, even testing special use cases explained above, there still doesn't seem to be a bug.
Maybe I could contribute sometime to the documentation of the NAT Reflection feature, to explain things like troubleshooting, policy creation, and testing of traffic scenarios.

But for me, I'm kinda done, can't continue cause there's nothing left to prove. I think it's proven that NAT Reflection works, except in some very specific weird edge cases that are out of scope for OPNsense.

@NunoHiggs
Copy link

NunoHiggs commented Jul 27, 2023

I am having the same issues since upgrading to this version of opnsense that @AdSchellevis is reporting, and i am unable to get it to work with the workaround described - probably i am doing something wrong.

Can @Monviech put some documentation, or even better, getting a practical example with some pictures so i can understand what is happening and how to fix it?

I have been tracking a package since it enters my fw, gets forwarded to the vm, the vm generates a response, it returns it, the package enters the firewall via the interface facing that vm, but it never reaches the outbound facing interface (where it first entered the firewall).
I do have outbound nat on that interface (the front facing one).

Thanks so much.

@Monviech
Copy link
Member

@NunoHiggs

I have written a Tutorial in the Opnsense Forum as result of this github thread:
https://forum.opnsense.org/index.php?topic=34925.0

@NunoHiggs
Copy link

@NunoHiggs

I have written a Tutorial in the Opnsense Forum as result of this github thread: https://forum.opnsense.org/index.php?topic=34925.0

Hi @Monviech.

Thanks for the reply. I did exactly as stated on both my "prod" firewall and one i built just for testing this and the issue is still ocurring.

I can see the packages entering via the front interface with portfrwd, arriving on the VM, returning to opnsense and then getting discarded and not leaving via the interface they came from (ovpnc10).

I have both DNAT and SNAT set on the front interface depending on the flux but ir appears that opnsense is dropping the tcp traffic.

This are my rules for this particular configuration

nat on ovpnc10 inet from <DMZ_LAN> to any -> (ovpnc10:0) port 1024:65535
nat on ovpnc10 inet all -> (ovpnc10:0) port 1024:65535
nat on vtnet2 inet from (ovpnc10) to any -> (vtnet2:0) port 1024:65535
nat on ovpnc10 inet all -> (ovpnc10:0) port 1024:65535
rdr log on ovpnc10 inet proto tcp from any to (ovpnc10) port = 23761 -> <VM1> port 10002 round-robin
rdr log on ovpnc10 inet proto udp from any to (ovpnc10) port = 23761 -> <VM1> port 10002 round-robin
rdr log on ovpnc10 inet proto tcp from any to (ovpnc10) port = 50304 -> <VM1> port 10003 round-robin
rdr log on ovpnc10 inet proto udp from any to (ovpnc10) port = 50304 -> <VM1> port 10003 round-robin

The VM that is required to communicate has the default gateway in rules, replaced with the gateway from ovpnc10 so all of its traffic is forced to leave via that particular interface.

Also, i have Reflection for port forwards enabled.

Do you think this is warrants a new bug report? This started happening when i went from 23.1.8 to 21.1.11.

@Monviech
Copy link
Member

Does ovpnc interface mean it's OpenVPN? Maybe it's related to this issue?
#6662

@NunoHiggs
Copy link

NunoHiggs commented Jul 27, 2023

Hi @Monviech
Yes, its an openvpn, but not the exact same configuration that he is running so it might be the same issue.
When this was spotted i noticed that i have a similar issue with another connection - wireguard (so no openvpn here).

HTTP request nated ip --> internal ip (sinkhole) --> porfrwrd to transparent SQUID --> Wireguard --> destination.

I am getting many RST on the portfrwd part. And the request never reaches the entrance the wireguard tunnel. The corruption appears to be happening in the NAT part of this configuration.

@NunoHiggs
Copy link

NunoHiggs commented Jul 28, 2023

@Monviech just solved my issue. It was enough setting the openvpn configuration like this and got my reflection working again.

image

@doktornotor
Copy link
Contributor

You might want to review this: #7022 (comment)

@OPNsense-bot
Copy link

This issue has been automatically timed-out (after 180 days of inactivity).

For more information about the policies for this repository,
please read https://github.com/opnsense/core/blob/master/CONTRIBUTING.md for further details.

If someone wants to step up and work on this issue,
just let us know, so we can reopen the issue and assign an owner to it.

@OPNsense-bot OPNsense-bot closed this as not planned Won't fix, can't repro, duplicate, stale Dec 31, 2023
@OPNsense-bot OPNsense-bot added the help wanted Contributor missing / timeout label Dec 31, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
help wanted Contributor missing / timeout support Community support
Development

No branches or pull requests