Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

interfaces: dhclient-script regression in 19.1 #3197

Closed
tbandixen opened this issue Feb 6, 2019 · 92 comments
Closed

interfaces: dhclient-script regression in 19.1 #3197

tbandixen opened this issue Feb 6, 2019 · 92 comments
Assignees
Labels
bug Production bug
Milestone

Comments

@tbandixen
Copy link
Contributor

tbandixen commented Feb 6, 2019

Just want to be sure the dev are aware of "our" issue.

After 10-20 minutes of uptime all incoming connections are beeing dropped!
So, OpenVPN tunnels are dropped too. It was fine on 18.7.10.

In the General-system log I can see this every 20 minutes:

Feb 5 13:00:55	opnsense: /usr/local/etc/rc.newwanip: Dynamic DNS: (Success) No change in IP address
Feb 5 13:00:55	opnsense: /usr/local/etc/rc.newwanip: Dynamic DNS: updating cache file /var/cache/dyndns.org_0.cache: ***
Feb 5 13:00:50	opnsense: /usr/local/etc/rc.newwanip: Interface '' is disabled or empty, nothing to do.
Feb 5 13:00:50	opnsense: /usr/local/etc/rc.newwanip: IP renewal is starting on 'ovpns2'

Feb 5 13:00:49	kernel: ovpns2: link state changed to UP
Feb 5 13:00:44	kernel: ovpns2: link state changed to DOWN
Feb 5 13:00:44	opnsense: /usr/local/etc/rc.newwanip: Resyncing OpenVPN instances for interface WAN.

Feb 5 13:00:40	opnsense: /usr/local/etc/rc.newwanip: ROUTING: skipping IPv6 default route
Feb 5 13:00:40	opnsense: /usr/local/etc/rc.newwanip: ROUTING: keeping current default gateway '***'
Feb 5 13:00:40	opnsense: /usr/local/etc/rc.newwanip: ROUTING: setting IPv4 default route to ***
Feb 5 13:00:40	opnsense: /usr/local/etc/rc.newwanip: ROUTING: IPv4 default gateway set to wan
Feb 5 13:00:40	opnsense: /usr/local/etc/rc.newwanip: ROUTING: IPv6 default gateway set to wan
Feb 5 13:00:40	opnsense: /usr/local/etc/rc.newwanip: ROUTING: entering configure using 'wan'
Feb 5 13:00:39	opnsense: /usr/local/etc/rc.newwanip: On (IP address: ***) (interface: WAN[wan]) (real interface: re1_vlan10).
Feb 5 13:00:39	opnsense: /usr/local/etc/rc.newwanip: IP renewal is starting on 're1_vlan10'

after every execution of rc.newwanip (even manually) all connection are dropped.

I watch TV over the internet and every 20 minutes the stream hangs, so I have to rewind to build up the stream again.

Here is the forum post:
https://forum.opnsense.org/index.php?topic=11456.0

Is there any solution? What could it be?
According to github, the last changes to rc.newwanip where 5 month ago (https://github.com/opnsense/core/commits/master/src/etc/rc.newwanip).
I think it has something to do with the switch to HardenedBSD, but I am absolutly not a unix guy...

@tbandixen
Copy link
Contributor Author

OPNsense 19.1.1-amd64
FreeBSD 11.2-RELEASE-p8-HBSD
OpenSSL 1.0.2q 20 Nov 2018

My OPNsense is running on a apu1d4 from PC Engines for some year without any issue.

@AdSchellevis
Copy link
Member

could be driver (realtek), other connectivity issues. or when ips is enabled related to https://github.com/opnsense/core/issues/3175

@AdSchellevis AdSchellevis added the support Community support label Feb 6, 2019
@tbandixen
Copy link
Contributor Author

IPS is not enabled, ipv6 is also disabled.
Did the realtek drivers change in 19.1.1?

The newwanip script is called every 20 minutes. after every run my connections are being droped.
Is something new in the dhcp? I'm just guessing

@AdSchellevis
Copy link
Member

newwanip is likely triggered due to another event, maybe dmesg has more info (or there's data before IP renewal is starting on 're1_vlan10') assuming that 19.1 has the same issue (different kernel than 18.7.x series)

@tbandixen
Copy link
Contributor Author

tbandixen commented Feb 6, 2019

The data before your mentioned line is the end of my mentioned line 😄


Feb 6 16:50:19 | opnsense: /usr/local/etc/rc.newwanip: IP renewal is starting on 're1_vlan10'
-- | --
Feb 6 16:30:42 | opnsense: /usr/local/etc/rc.newwanip: Dynamic DNS: (Success) No change in IP address


how can i output dmesg? I have ssh access.

@AdSchellevis
Copy link
Member

just dmesg in a console

@tbandixen
Copy link
Contributor Author

I see a lot more of:

ovpns2: link state changed to DOWN
ovpns2: link state changed to UP
ovpns2: link state changed to DOWN
ovpns2: link state changed to UP

you need the whole output?

@AdSchellevis
Copy link
Member

cause and effect... ovpns2 is likely going down, because the interface it's bound to is gone...

so, yes, the logging provided so far doesn't tell a lot about your issue.

@tbandixen
Copy link
Contributor Author

tbandixen commented Feb 6, 2019

WAN connection is gone too, I cannot find anything in dmesg output.

Copyright (c) 2013-2018 The HardenedBSD Project.
Copyright (c) 1992-2018 The FreeBSD Project.
Copyright (c) 1979, 1980, 1983, 1986, 1988, 1989, 1991, 1992, 1993, 1994
	The Regents of the University of California. All rights reserved.
FreeBSD is a registered trademark of The FreeBSD Foundation.
FreeBSD 11.2-RELEASE-p8-HBSD  31af16db12b(stable/19.1) amd64
FreeBSD clang version 6.0.0 (tags/RELEASE_600/final 326565) (based on LLVM 6.0.0)
HardenedBSD: initialize and check features (__HardenedBSD_version 1100056 __FreeBSD_version 1102000).
CPU: AMD G-T40E Processor (1000.02-MHz K8-class CPU)
  Origin="AuthenticAMD"  Id=0x500f20  Family=0x14  Model=0x2  Stepping=0
  Features=0x178bfbff<FPU,VME,DE,PSE,TSC,MSR,PAE,MCE,CX8,APIC,SEP,MTRR,PGE,MCA,CMOV,PAT,PSE36,CLFLUSH,MMX,FXSR,SSE,SSE2,HTT>
  Features2=0x802209<SSE3,MON,SSSE3,CX16,POPCNT>
  AMD Features=0x2e500800<SYSCALL,NX,MMX+,FFXSR,Page1GB,RDTSCP,LM>
  AMD Features2=0x35ff<LAHF,CMP,SVM,ExtAPIC,CR8,ABM,SSE4A,MAS,Prefetch,IBS,SKINIT,WDT>
  SVM: NP,NRIP,NAsids=8
  TSC: P-state invariant, performance statistics
real memory  = 4815060992 (4592 MB)
avail memory = 4067450880 (3879 MB)
Event timer "LAPIC" quality 600
ACPI APIC Table: <CORE   COREBOOT>
FreeBSD/SMP: Multiprocessor System Detected: 2 CPUs
FreeBSD/SMP: 1 package(s) x 2 core(s)
ioapic0 <Version 2.1> irqs 0-23 on motherboard
SMP: AP CPU #1 Launched!
Timecounter "TSC" frequency 1000019124 Hz quality 800
random: entropy device external interface
wlan: mac acl policy registered
netmap: loaded module
module_register_init: MOD_LOAD (vesa, 0xffffffff8113fe30, 0) error 19
kbd0 at kbdmux0
nexus0
cryptosoft0: <software crypto> on motherboard
acpi0: <CORE COREBOOT> on motherboard
acpi0: Power Button (fixed)
cpu0: <ACPI CPU> on acpi0
cpu1: <ACPI CPU> on acpi0
atrtc0: <AT realtime clock> port 0x70-0x71 irq 8 on acpi0
atrtc0: registered as a time-of-day clock, resolution 1.000000s
Event timer "RTC" frequency 32768 Hz quality 0
attimer0: <AT timer> port 0x40-0x43 irq 0 on acpi0
Timecounter "i8254" frequency 1193182 Hz quality 0
Event timer "i8254" frequency 1193182 Hz quality 100
Timecounter "ACPI-fast" frequency 3579545 Hz quality 900
acpi_timer0: <32-bit timer at 3.579545MHz> port 0x808-0x80b on acpi0
hpet0: <High Precision Event Timer> iomem 0xfed00000-0xfed003ff on acpi0
Timecounter "HPET" frequency 14318180 Hz quality 950
Event timer "HPET" frequency 14318180 Hz quality 550
Event timer "HPET1" frequency 14318180 Hz quality 450
pcib0: <ACPI Host-PCI bridge> port 0xcf8-0xcff on acpi0
pci0: <ACPI PCI bus> on pcib0
pcib1: <ACPI PCI-PCI bridge> irq 16 at device 4.0 on pci0
pci1: <ACPI PCI bus> on pcib1
re0: <Realtek PCIe GBE Family Controller> port 0x1000-0x10ff mem 0xf7a00000-0xf7a00fff,0xf7900000-0xf7903fff irq 16 at device 0.0 on pci1
re0: Using Memory Mapping!
re0: Using 1 MSI-X message
re0: ASPM disabled
re0: version:1.95.00
re0: Ethernet address: 00:0d:b9:3f:92:14

This product is covered by one or more of the following patents:
US6,570,884, US6,115,776, and US6,327,625.
re0: Ethernet address: 00:0d:b9:3f:92:14
pcib2: <ACPI PCI-PCI bridge> irq 17 at device 5.0 on pci0
pci2: <ACPI PCI bus> on pcib2
re1: <Realtek PCIe GBE Family Controller> port 0x2000-0x20ff mem 0xf7c00000-0xf7c00fff,0xf7b00000-0xf7b03fff irq 17 at device 0.0 on pci2
re1: Using Memory Mapping!
re1: Using 1 MSI-X message
re1: ASPM disabled
re1: version:1.95.00
re1: Ethernet address: 00:0d:b9:3f:92:15

This product is covered by one or more of the following patents:
US6,570,884, US6,115,776, and US6,327,625.
re1: Ethernet address: 00:0d:b9:3f:92:15
pcib3: <ACPI PCI-PCI bridge> irq 18 at device 6.0 on pci0
pci3: <ACPI PCI bus> on pcib3
re2: <Realtek PCIe GBE Family Controller> port 0x3000-0x30ff mem 0xf7e00000-0xf7e00fff,0xf7d00000-0xf7d03fff irq 18 at device 0.0 on pci3
re2: Using Memory Mapping!
re2: Using 1 MSI-X message
re2: ASPM disabled
re2: version:1.95.00
re2: Ethernet address: 00:0d:b9:3f:92:16

This product is covered by one or more of the following patents:
US6,570,884, US6,115,776, and US6,327,625.
re2: Ethernet address: 00:0d:b9:3f:92:16
ahci0: <AMD SB7x0/SB8x0/SB9x0 AHCI SATA controller> port 0x4010-0x4017,0x4020-0x4023,0x4018-0x401f,0x4024-0x4027,0x4000-0x400f mem 0xf7f04000-0xf7f043ff irq 19 at device 17.0 on pci0
ahci0: AHCI v1.20 with 6 6Gbps ports, Port Multiplier supported
ahci0: quirks=0x22000<ATI_PMP_BUG,1MSI>
ahcich0: <AHCI channel> at channel 0 on ahci0
ahcich1: <AHCI channel> at channel 1 on ahci0
ahcich2: <AHCI channel> at channel 2 on ahci0
ahcich3: <AHCI channel> at channel 3 on ahci0
ahcich4: <AHCI channel> at channel 4 on ahci0
ahcich5: <AHCI channel> at channel 5 on ahci0
ohci0: <AMD SB7x0/SB8x0/SB9x0 USB controller> mem 0xf7f00000-0xf7f00fff irq 18 at device 18.0 on pci0
usbus0 on ohci0
usbus0: 12Mbps Full Speed USB v1.0
ehci0: <AMD SB7x0/SB8x0/SB9x0 USB 2.0 controller> mem 0xf7f04400-0xf7f044ff irq 17 at device 18.2 on pci0
usbus1: EHCI version 1.0
usbus1 on ehci0
usbus1: 480Mbps High Speed USB v2.0
ohci1: <AMD SB7x0/SB8x0/SB9x0 USB controller> mem 0xf7f01000-0xf7f01fff irq 18 at device 19.0 on pci0
usbus2 on ohci1
usbus2: 12Mbps Full Speed USB v1.0
ehci1: <AMD SB7x0/SB8x0/SB9x0 USB 2.0 controller> mem 0xf7f04500-0xf7f045ff irq 17 at device 19.2 on pci0
usbus3: EHCI version 1.0
usbus3 on ehci1
usbus3: 480Mbps High Speed USB v2.0
isab0: <PCI-ISA bridge> at device 20.3 on pci0
isa0: <ISA bus> on isab0
pcib4: <ACPI PCI-PCI bridge> at device 20.4 on pci0
pci4: <ACPI PCI bus> on pcib4
ohci2: <AMD SB7x0/SB8x0/SB9x0 USB controller> mem 0xf7f02000-0xf7f02fff irq 18 at device 20.5 on pci0
usbus4 on ohci2
usbus4: 12Mbps Full Speed USB v1.0
pcib5: <ACPI PCI-PCI bridge> at device 21.0 on pci0
pci5: <ACPI PCI bus> on pcib5
ohci3: <AMD SB7x0/SB8x0/SB9x0 USB controller> mem 0xf7f03000-0xf7f03fff at device 22.0 on pci0
usbus5 on ohci3
usbus5: 12Mbps Full Speed USB v1.0
ehci2: <AMD SB7x0/SB8x0/SB9x0 USB 2.0 controller> mem 0xf7f04600-0xf7f046ff at device 22.2 on pci0
usbus6: EHCI version 1.0
usbus6 on ehci2
usbus6: 480Mbps High Speed USB v2.0
acpi_button0: <Power Button> on acpi0
orm0: <ISA Option ROM> at iomem 0xee800-0xeffff on isa0
ppc0: cannot reserve I/O port range
uart0: <16550 or compatible> at port 0x3f8 irq 4 flags 0x10 on isa0
uart0: console (115200,n,8,1)
uart1: <16550 or compatible> at port 0x2f8 irq 3 on isa0
Timecounters tick every 1.000 msec
ugen6.1: <ATI EHCI root HUB> at usbus6
ugen5.1: <ATI OHCI root HUB> at usbus5
uhub0: <ATI EHCI root HUB, class 9/0, rev 2.00/1.00, addr 1> on usbus6
uhub1: <ATI OHCI root HUB, class 9/0, rev 1.00/1.00, addr 1> on usbus5
ugen2.1: <ATI OHCI root HUB> at usbus2
ugen4.1: <ATI OHCI root HUB> at usbus4
uhub2: <ATI OHCI root HUB, class 9/0, rev 1.00/1.00, addr 1> on usbus2
ugen3.1: <ATI EHCI root HUB> at usbus3
uhub3: <ATI OHCI root HUB, class 9/0, rev 1.00/1.00, addr 1> on usbus4
uhub4: <ATI EHCI root HUB, class 9/0, rev 2.00/1.00, addr 1> on usbus3
ugen0.1: <ATI OHCI root HUB> at usbus0
uhub5: <ATI OHCI root HUB, class 9/0, rev 1.00/1.00, addr 1> on usbus0
ugen1.1: <ATI EHCI root HUB> at usbus1
uhub6: <ATI EHCI root HUB, class 9/0, rev 2.00/1.00, addr 1> on usbus1
ada0 at ahcich0 bus 0 scbus0 target 0 lun 0
ada0: <SATA SSD S9FM02.2> ACS-3 ATA SATA 3.x device
ada0: Serial Number 8EA807550B3200389438
ada0: 600.000MB/s transfers (SATA 3.x, UDMA6, PIO 8192bytes)
ada0: Command Queueing enabled
ada0: 15272MB (31277232 512 byte sectors)
Trying to mount root from ufs:/dev/ada0s1a [rw,noatime]...
uhub3: 2 ports with 2 removable, self powered
uhub1: 4 ports with 4 removable, self powered
uhub2: 5 ports with 5 removable, self powered
uhub5: 5 ports with 5 removable, self powered
random: unblocking device.
uhub0: 4 ports with 4 removable, self powered
uhub4: 5 ports with 5 removable, self powered
uhub6: 5 ports with 5 removable, self powered
ugen6.2: <Generic Flash Card ReaderWriter> at usbus6
umass0 on uhub0
umass0: <Generic Flash Card ReaderWriter, class 0/0, rev 2.01/1.00, addr 2> on usbus6
umass0:  SCSI over Bulk-Only; quirks = 0x4001
umass0:6:0: Attached to scbus6
da0 at umass-sim0 bus 0 scbus6 target 0 lun 0
da0: <Multiple Card  Reader 1.00> Removable Direct Access SPC-2 SCSI device
da0: Serial Number 058F63666485
da0: 40.000MB/s transfers
da0: Attempt to query device size failed: NOT READY, Medium not present
da0: quirks=0x2<NO_6_BYTE>
amdtemp0: <AMD CPU On-Die Thermal Sensors> on hostb4
tun2: changing name to 'ovpns2'
re1: link state changed to UP
vlan0: changing name to 're1_vlan10'
re2: link state changed to UP
vlan1: changing name to 're2_vlan2'
re0: link state changed to UP
pflog0: promiscuous mode enabled
ovpns2: link state changed to UP
ovpns2: link state changed to DOWN
ovpns2: link state changed to UP
ipfw2 (+ipv6) initialized, divert loadable, nat loadable, default to accept, logging disabled
DUMMYNET 0 with IPv6 initialized (100409)
load_dn_sched dn_sched FIFO loaded
load_dn_sched dn_sched QFQ loaded
load_dn_sched dn_sched RR loaded
load_dn_sched dn_sched WF2Q+ loaded
load_dn_sched dn_sched PRIO loaded
load_dn_sched dn_sched FQ_CODEL loaded
load_dn_sched dn_sched FQ_PIE loaded
load_dn_aqm dn_aqm CODEL loaded
load_dn_aqm dn_aqm PIE loaded
ovpns2: link state changed to DOWN
ovpns2: link state changed to UP
ovpns2: link state changed to DOWN
ovpns2: link state changed to UP

@fichtner
Copy link
Member

fichtner commented Feb 6, 2019

Do you know where ovpns2 is listening on? WAN perhaps?

@tbandixen
Copy link
Contributor Author

tbandixen commented Feb 6, 2019

Yes WAN

@fichtner
Copy link
Member

fichtner commented Feb 6, 2019

Do you get a new IPv4 on WAN every time this happens?

@tbandixen
Copy link
Contributor Author

No, the IP is always the same, although I dont have a fixed ip.

@fichtner
Copy link
Member

fichtner commented Feb 6, 2019

Please check /var/db/wan_cacheip and /var/db/wan_ip -- they shouldn't match, correct?

@tbandixen
Copy link
Contributor Author

But they match O_o

@drakosha
Copy link

drakosha commented Feb 6, 2019

It's not the driver issue, i have the same on different hardware. And i have static IP on WAN, but issued by DHCP.

@fichtner
Copy link
Member

fichtner commented Feb 6, 2019

Let's try to find it... this one looks like a good candidate... c83bb8d:

# opnsense-patch c83bb8d

@subivoodoo
Copy link

Same issue here, other hardware (Intel NIC's). My ISP has had a DHCP renew time of 30 seconds and after upgrade to 19.1.1 I get every 30 seconds a drop of my OpenVPN interface.

In the meantime my ISP has changed the DHCP renew time to 12 hours => no issues so far.

Additionally the NTPD service get's also and exit signal during the rc.newwanip script:

ntpd[56403]: ntpd exiting on signal 15 (Terminated)

I have figured out how to easy manually reproduce the issue (VPN interface down/up + the NTPD exit signal in my case):

In the Web Interface go to "Interfaces > Overview" and just klick "Reload" on the WAN interface

@subivoodoo
Copy link

Applied the patch c83bb8d and klicked "Reload" still causes my ovpns1 link DOWN/UP:

Feb 6 19:24:57 	opnsense: /usr/local/etc/rc.dyndns: Dynamic DNS (SOMEWHAT.dyndns.org): No change in my IP address and/or 25 days has not passed. Not updating dynamic DNS entry.
Feb 6 19:24:56 	opnsense: /usr/local/etc/rc.configure_interface: ROUTING: skipping IPv6 default route
Feb 6 19:24:56 	opnsense: /usr/local/etc/rc.configure_interface: ROUTING: keeping current default gateway 'ISPGWIP'
Feb 6 19:24:56 	opnsense: /usr/local/etc/rc.configure_interface: ROUTING: setting IPv4 default route to ISPGWIP
Feb 6 19:24:56 	opnsense: /usr/local/etc/rc.configure_interface: ROUTING: no IPv6 default gateway set, assuming wan
Feb 6 19:24:56 	opnsense: /usr/local/etc/rc.configure_interface: ROUTING: IPv4 default gateway set to wan
Feb 6 19:24:56 	opnsense: /usr/local/etc/rc.configure_interface: ROUTING: entering configure using 'wan'
Feb 6 19:24:55 	opnsense: /usr/local/etc/rc.newwanip: Dynamic DNS (SOMEWHAT.dyndns.org): No change in my IP address and/or 25 days has not passed. Not updating dynamic DNS entry.
Feb 6 19:24:54 	opnsense: /usr/local/etc/rc.newwanip: Interface '' is disabled or empty, nothing to do.
Feb 6 19:24:54 	opnsense: /usr/local/etc/rc.newwanip: IP renewal is starting on 'ovpns1'
Feb 6 19:24:54 	kernel: ovpns1: link state changed to UP
Feb 6 19:24:54 	kernel: ovpns1: link state changed to DOWN
Feb 6 19:24:54 	opnsense: /usr/local/etc/rc.newwanip: Resyncing OpenVPN instances for interface WAN.
Feb 6 19:24:53 	opnsense: /usr/local/etc/rc.newwanip: ROUTING: skipping IPv6 default route
Feb 6 19:24:53 	opnsense: /usr/local/etc/rc.newwanip: ROUTING: keeping current default gateway 'ISPGWIP'
Feb 6 19:24:53 	opnsense: /usr/local/etc/rc.newwanip: ROUTING: setting IPv4 default route to ISPGWIP
Feb 6 19:24:53 	opnsense: /usr/local/etc/rc.newwanip: ROUTING: no IPv6 default gateway set, assuming wan
Feb 6 19:24:53 	opnsense: /usr/local/etc/rc.newwanip: ROUTING: IPv4 default gateway set to wan
Feb 6 19:24:53 	opnsense: /usr/local/etc/rc.newwanip: ROUTING: entering configure using 'wan'
Feb 6 19:24:53 	opnsense: /usr/local/etc/rc.newwanip: On (IP address: MYIPADDRESS) (interface: WAN[wan]) (real interface: em0).
Feb 6 19:24:53 	sshlockout[11128]: sshlockout/webConfigurator v3.0 starting up
Feb 6 19:24:53 	opnsense: /usr/local/etc/rc.newwanip: IP renewal is starting on 'em0'

and NTPD exit signal:

Feb 6 19:24:56 | ntpd[94413]: ntpd exiting on signal 15 (Terminated)

@RehaagJ
Copy link

RehaagJ commented Feb 6, 2019

I had the same (or at least a similar) issue (IP renewal starting on WAN every 30 minutes), then applied c83bb8d about 75 minutes ago, and rebooted. Since then, no more IP renewal.

@subivoodoo
Copy link

Patched again + reboot... still VPN link down/up during IP renewal.

@tbandixen
Copy link
Contributor Author

After patching the tv stream didn't drop for over 1 hour. Tomorrow I can test if the vpn connection is stable again.

Thank you very much for your quick interaction.

@tbandixen
Copy link
Contributor Author

tbandixen commented Feb 7, 2019

I can't see any newwanip logs over the night. So it ran stable I think.

I know now why the connection dropped everytime... I had "Firewall: Settings: Advanced: Dynamic state reset" active (Reset all states when a dynamic IP address changes.This option flushes the entire state table on IPv4 address changes in dynamic setups to e.g. allow VoIP servers to re-register.)
But the IP didn't change, so this trigger shouldn't have been triggered.

But thank you for the patch!

@tbandixen
Copy link
Contributor Author

What happens to the patch if I install some updates through the web gui? Does it persist?

@subivoodoo
Copy link

Got it, the "Reload" button forces the rc.newwanip script but the patch doesn't call it from dhclient if it's not a real IP change, right?

So the patch works also for me!

There was a DHCP triggered renewal try on my WAN without real IP change a few minutes ago (log entry in the dhcp log). But nothing to see about the rc.newwanip in general system log and also no exit signal in NTPD log.

So now my ISP can go back to stupid 30 second IP renewal time ;-)

Thanks a lot!!!

Same question as tbandixen: Will this patch persist or will it find the way into updates?

@fichtner
Copy link
Member

fichtner commented Feb 7, 2019

The patch will not persist into 19.1.2 and needs to be reapplied worst case. opnsense-patch was designed as a tool to test changes between versions and to troubleshoot.

As the change for has been proposed in issue #2542 for a long time for a good reason we now need to find the problem contained within the patch itself as a way to go forward with a better path than just to go back to what we had before.

@fichtner fichtner self-assigned this Feb 7, 2019
@fichtner fichtner added the bug Production bug label Feb 7, 2019
@mtn88
Copy link

mtn88 commented Feb 11, 2019

I have the same problem with 19.1.1 on two APU2C4 systems. Another system running 19.1.1 on APU1D4 doesn't show this problem.

Is there anything I can help to diagnoses the issue?

fichtner added a commit that referenced this issue Feb 11, 2019
The FreeBSD one does, but we have different code that is "safe"
for multiple DHCP connections.

(cherry picked from commit b20f71b)
@esoleyman
Copy link

I have the same issue and started looking around in the log files. I believe the answer lies in System: Gateways: Log File and possibly with dpinger.

I see this in the log file repeating every 15 minutes:

dpinger: send_interval 1000ms loss_interval 2000ms time_period 60000ms report_interval 0ms data_len 0 alert_interval 1000ms latency_alarm 500ms loss_alarm 20% dest_addr xxx.xxx.xxx.xxx bind_addr xxx.xxx.xxx.xxx identifier "WAN_DHCP

I noticed that by going to Firewall: Settings: Advanced and checking the "Disable State Killing on Gateway Failure" option I was able to stop the routine dropping of connections.

I then noticed the following:

Feb 11 16:26:10 dpinger: GATEWAY ALARM: WAN_DHCP (Addr: xxx.xxx.xxx.xxx Alarm: 0 RTT: 66816ms RTTd: 213265ms Loss: 0%)
Feb 11 16:26:10 dpinger: WAN_DHCP xxx.xxx.xxx.xxx: Clear latency 66816us stddev 213265us loss 0%
Feb 11 16:25:58 dpinger: GATEWAY ALARM: WAN_DHCP (Addr: xxx.xxx.xxx.xxx Alarm: 1 RTT: 832201ms RTTd: 0ms Loss: 0%)
Feb 11 16:25:58 dpinger: WAN_DHCP xxx.xxx.xxx.xxx: Alarm latency 832201us stddev 0us loss 0%

Could this be the culprit?

@marjohn56
Copy link
Member

what you see is dpinger reporting the very fact that there's a problem

@tbandixen
Copy link
Contributor Author

tbandixen commented Feb 13, 2019

I have the same problem with 19.1.1 on two APU2C4 systems. Another system running 19.1.1 on APU1D4 doesn't show this problem.

Are there any configuration difference between your systems on the APU2C4 and APU1D4 ?

@tbandixen
Copy link
Contributor Author

I'm out of the loop on this one and have just picked it up to read through. A LONG time ago, well it seems a long time ago now we had a similar issue across the road with IPv6 dhcp6 running newwanipv6 on every renew of the client. This was the reason for adding the RENEW and REQEUST reasons to the output of dhcp6c, if the response from dhcp6c was RENEW then we stopped it from calling newwanipv6 in the script as nothing would have changed, I have not looked so I may be talking out of my derriere but is something similar not possible with dhclient? It would solve an awful lot of problems in one go.

Could this be the solution? Is someone willing to provide a patch that I can test?

@fichtner
Copy link
Member

No, we need to explain what happens first. Something changed, but the changes in the dhclient-script are too superficial to cause this so I'm afraid of later breakage if we don't pin this down.

@tbandixen
Copy link
Contributor Author

I totally understand that. How can I provide further help?

@marjohn56
Copy link
Member

Might be an idea to add some extra debug output to the dhclient-script and get @tbandixen to do some tests and give feedback. Unfortunately I'll not be able to add the extra bits myself until sometime next week!

@marjohn56
Copy link
Member

@tbandixen - Little test for you. Copy the attached script to /usr/local/opnsense/scripts/interfaces . Backup the existing script first. See if it makes any difference at all. Pointers are needed to try and identify the issues so all this script does is filter out the RENEW response and bypass any actions.
dhclient-script.zip

@tbandixen
Copy link
Contributor Author

Ok, I ignore the renew now (as provided in the script) lets see if the vpn is stable now. will report asap

@tbandixen
Copy link
Contributor Author

Well, as expected, the connection remains stable. I will include the ignored lines step by step to identify the causing line.

@tbandixen
Copy link
Contributor Author

How can I manually trigger this script so I dont have to wait 20minutes every time?

@marjohn56
Copy link
Member

It's more a question of if the 'RENEW' Bypass helps, what has changed elsewhere that's causing the issue. Has something changed in dhclient itself perhaps? @fichtner can answer that one.

@tbandixen
Copy link
Contributor Author

I think just bypassing the RENEW isnt that good, its there for a reason. But I dont know if behind the scenes something changed (BSD things maybe, I dont know the *nix subsystem at all). I will try to find the line that causes the drop and digg a bit further.

@marjohn56
Copy link
Member

Actually bypassing RENEW is good, it means nothing has changed so do nothing.

@marjohn56
Copy link
Member

The key is that the var 'changes' is changed to 'yes'. What I would do next is add a logger entry to each and every if... fi statement block to identify the culprit.

Something like this:-

	if [ "$old_ip_address" != "$new_ip_address" ]; then
        $LOGGER "old ip address != new ip address at line 348"
		delete_old_states
	fi

@tbandixen
Copy link
Contributor Author

I think we should check if $alias_ip_address is set, otherwise if [ "$old_ip_address" != "$alias_ip_address" ]; then and if [ "$new_ip_address" != "$alias_ip_address" ]; then will be true always. My $alias_ip_address is empty... Maybe that changed?

@tbandixen
Copy link
Contributor Author

tbandixen commented Feb 15, 2019

I think setting changes="yes" because of [ "$old_ip_address" != "$alias_ip_address" ] is wrong. Because in delete_old_alias the alias will be removed IF $alias_ip_address its length is not zero, so maybe there is no change at all if the alias is empty, right? Same applies to [ "$new_ip_address" != "$alias_ip_address" ]

@tbandixen
Copy link
Contributor Author

My VPN connection remains stable if I add [ -n "$alias_ip_address" ] && to the lines 351 and 369.
So far so good, but did that change in 19.1? I didn't change any config targeting ISP dhcp client settings.

fichtner added a commit that referenced this issue Feb 15, 2019
fichtner added a commit that referenced this issue Feb 15, 2019
@fichtner
Copy link
Member

@tbandixen thanks for the hint, the check changed as it tried to align with FreeBSD. It seems the issue is present there, but it is not critical because it doesn't have a use case maybe shrugs

# opnsense-revert opnsense
# opnsense-patch 90c0c395

That should be it then....

@fichtner
Copy link
Member

fichtner commented Feb 15, 2019

historic reference: pfsense/pfsense@d0d7f09ab3853b

@tbandixen
Copy link
Contributor Author

Its all stable now 😃

@tbandixen
Copy link
Contributor Author

Thank you very much!

fichtner added a commit that referenced this issue Feb 15, 2019
Suggested by: @tbandixen

(cherry picked from commit 90c0c39)
@fichtner
Copy link
Member

Ok, we'll wrap this up for 19.1.2. Thank you for the help. ❤️

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Production bug
Development

No branches or pull requests