Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Internet down after 1-2 minutes, system routing #6338

Closed
2 tasks done
ghost opened this issue Feb 18, 2023 · 27 comments
Closed
2 tasks done

Internet down after 1-2 minutes, system routing #6338

ghost opened this issue Feb 18, 2023 · 27 comments
Assignees
Labels
bug Production bug
Milestone

Comments

@ghost
Copy link

ghost commented Feb 18, 2023

Important notices

Before you add a new report, we ask you kindly to acknowledge the following:

Describe the bug

22.7.11 was the last version where i didn't have any problem.
Every morning when I turn on the firewall for a few minutes 1-2 everything goes fine, then there is no internet everywhere, I need to enter the webgui and restart the "routing System routing" service

Tip: to validate your setup was working with the previous version, use opnsense-revert (https://docs.opnsense.org/manual/opnsense_tools.html#opnsense-revert)

It didn't solve the problem.
https://prnt.sc/R2NlF-xdJYRG

To Reproduce

Steps to reproduce the behavior:

  1. Turn on firewall
  2. Wait 1-2 minute
  3. Internet go down

Expected behavior

That everything is fine, as in the previous version

Describe alternatives you considered

I reinstalled the system, I avoided putting back the backup file and I setup the whole system by hand.
I removed dnscrypt, I removed unbound, I removed gateway pinger.
I installed realtek driver, nothing has changed

Screenshots

https://prnt.sc/Ke1asVE5I8pP , Everything seems to be fine, but the browsers give me DNS_PROBE_ERR, the computer image at the bottom right disappears in favor of the one indicating no internet, until I restart "system routing".

Relevant log files

If applicable, information from log files supporting your claim.

Nothing appears in logs, audit, backend, general, boot, webui

Additional context

Add any other context about the problem here.

Environment

Software version used and hardware type if relevant, e.g.:

OPNsense 23.1.1-amd64
Intel(R) Core(TM) i3-10110U CPU @ 2.10GHz (2 cores, 4 threads)
I don't have any information about the network card, but it worked until the previous version.

@ghost
Copy link
Author

ghost commented Feb 18, 2023

I upgraded with pkg update -f and pkg upgrade -d and pkg upgrade -f.
I got the string fix from OPNsense 23.1.1-amd64
to OPNsense 23.1.1_2-amd64.

However, the problem has not been solved

@AdSchellevis
Copy link
Member

relevant information would be:

  • what type of (wan) connectivity is used, does the wan interface have an address at all (Interfaces: Overview)
  • any events in the system log (System: Log Files: General) at the moment the connection drops.
  • Is there a default route set (System: Routes: Status) after the failure and is it pointing to the correct next hop?

@AdSchellevis AdSchellevis added the support Community support label Feb 18, 2023
@ghost
Copy link
Author

ghost commented Feb 18, 2023

  • what type of (wan) connectivity is used, does the wan interface have an address at all (Interfaces: Overview)

I use DHCP, i have a modem that work in bridge mode because in my home i have a VDSL2.
The informations about the overview: https://prnt.sc/1o7WdqjLQfHn

  • any events in the system log (System: Log Files: General) at the moment the connection drops.

https://prnt.sc/LvZVOu7fhfwS This is the general log, i did a poweroff at 13:58 and power on at 14:00, i obtain the problem at 14:07.
Next i restart system routing for upload screen and in log obtain this https://prnt.sc/Jwbd19EFIb2Q.

  • Is there a default route set (System: Routes: Status) after the failure and is it pointing to the correct next hop?

https://prnt.sc/87NRzL07CF4y during the failure
https://prnt.sc/qtw_D8BDFQBF after restart service routing
It seems that the only item, which I guess is the heart of the problem, that changes is the following and I don't have the faintest idea why.
https://prnt.sc/wDC8C4xumGc1

@AdSchellevis
Copy link
Member

sounds similar to https://forum.opnsense.org/index.php?topic=32347.msg157402#msg157402

when the gateway is dropped, can you check if /usr/local/etc/rc.routing_configure restores normal operation?

@ghost
Copy link
Author

ghost commented Feb 18, 2023

sounds similar to https://forum.opnsense.org/index.php?topic=32347.msg157402#msg157402

when the gateway is dropped, can you check if /usr/local/etc/rc.routing_configure restores normal operation?

https://prnt.sc/B1pEjTlpqz4T

same situation, when you see route not found it is during the down

@AdSchellevis
Copy link
Member

but does executing /usr/local/etc/rc.routing_configure restore the default route in that case?

@ghost
Copy link
Author

ghost commented Feb 19, 2023

but does executing /usr/local/etc/rc.routing_configure restore the default route in that case?

Sorry, i didn't understand, https://prnt.sc/MZQWn9MOrCcL, yes, restore it

@AdSchellevis
Copy link
Member

AdSchellevis commented Feb 19, 2023

@Threefish4096 as a workaround, can you try to install 2be7d9b using the command below?

opnsense-patch 2be7d9b

This is merely a workaround, we still need to figure out why the default route is dropped as at this point it should still be there after receiving the same address from the server. Assigning myself and @fichtner to the ticket.

EDIT changed commit

@AdSchellevis AdSchellevis added bug Production bug and removed support Community support labels Feb 19, 2023
AdSchellevis added a commit that referenced this issue Feb 19, 2023
@ghost
Copy link
Author

ghost commented Feb 19, 2023

I tried, but the problem reoccurs

https://prnt.sc/z2ewfxGuWU4Q

LOG: https://prnt.sc/_f7Pvp9hMrqm

@fichtner
Copy link
Member

Sorry, the log makes no sense to me. It goes into the error condition, doesn't recover, goes into correct reconfiguration but also doesn't do anything? It's a bit hard to get a structure here..

@fichtner fichtner added this to the 23.7 milestone Feb 20, 2023
@ghost
Copy link
Author

ghost commented Feb 20, 2023

https://prnt.sc/foAuj_5eosrT

This morning I got up and turned everything on, while I was preparing to leave the house I noticed that the internet was not working again.
So I restarted it, and the entry should be around 08:18.
After that I came back at 12:15 and the internet on windows 11 was giving the world and not loading any pages.(
image
)
I restarted system routing and it worked again.
since then I've been uploading things and I'm not having any more problems.

Just to recount the events since yesterday I applied the patch, I don't know if it will do any good.

Thank you very much for the attention!

@fichtner
Copy link
Member

I think your lease times on the WAN side are pretty low so that it constantly "breaks". For the time being we have enough information to try and reproduce. As far as the patch goes let's not try to confirm if it is working or not as it's not the exact solution anyway.

@ghost
Copy link
Author

ghost commented Feb 20, 2023

I think your lease times on the WAN side are pretty low so that it constantly "breaks". For the time being we have enough information to try and reproduce. As far as the patch goes let's not try to confirm if it is working or not as it's not the exact solution anyway.

Thanks so much again, I look forward to a patch.
Tag me when I have to try a new "opnsense-patch a1b2c3".

fichtner pushed a commit that referenced this issue Feb 21, 2023
… "something else" dropped it

Also add debugging information in dhclient-script.  For some reason I'm suspecting the kernel
dropping the route after modifying address information...

PR: #6338
PR: https://forum.opnsense.org/index.php?topic=32347.0
@fichtner
Copy link
Member

Linking forum post for reference: https://forum.opnsense.org/index.php?topic=32347.0

fichtner added a commit that referenced this issue Feb 21, 2023
In case addresses are removed and reapplied the routes are gone
and other related interface configuration is missing.  In these
cases do a full recycle even though the address did not change
visibly (which is good that we can detect it).

Also address the "miss" of the cached address clean now that we
know DHCP should not force-update us into a missing address
scenario during a renew.

PR: #6338
fichtner pushed a commit that referenced this issue Feb 21, 2023
… "something else" dropped it

Also add debugging information in dhclient-script.  For some reason I'm suspecting the kernel
dropping the route after modifying address information...

PR: #6338
PR: https://forum.opnsense.org/index.php?topic=32347.0
fichtner added a commit that referenced this issue Feb 22, 2023
In case addresses are removed and reapplied the routes are gone
and other related interface configuration is missing.  In these
cases do a full recycle even though the address did not change
visibly (which is good that we can detect it).

Also address the "miss" of the cached address clean now that we
know DHCP should not force-update us into a missing address
scenario during a renew.

PR: #6338

(cherry picked from 4950460)
(cherry picked from bf97cdf)
@rudiservo
Copy link

rudiservo commented Feb 22, 2023

Dum question, do you have the WAN Gateway checked has upstream?
I am losing connection every time the ISP renews the DHCP, even if the IP does not change (it's my case) I would loose access to the internet (maybe default gateway) only regain when I restart the routing service. IPV6 connectivity was always working.
When I checked the Gateway the upstream was not set, so I marked it has upstream, so far so good 24hours later.

Hopefully this does not have anything to do with Suricata.

@AdSchellevis
Copy link
Member

@rudiservo best check the fix proposed by Franco in the forum https://forum.opnsense.org/index.php?topic=32347.msg157675#msg157675 , this is highly likely the cause of the issue.

@ghost
Copy link
Author

ghost commented Feb 22, 2023

Dum question, do you have the WAN Gateway checked has upstream?

Now that i reinstalled, no and it go...

even if the IP does not change (it's my case)

Me too, i have static ip.

only regain when I restart the routing service

Me too

IPV6 connectivity was always working

My ISP don't use it

Hopefully this does not have anything to do with Suricata

With Suricata i have other problems, go on WAN and on LAN it has like drops

Anyway, I tried to install IpFire yesterday, searching on the internet, it does the job, but I had some configuration problems, so today I reinstalled OPNSense and everything is fine... I tried to recreate the problem, but I can't, I can't does this make sense.

@ghost
Copy link
Author

ghost commented Feb 22, 2023

Screenshot_1

Now i recogniz it!
After I enabled suricata and downloaded all the rules, I tried to reproduce the problem again and now it happens again.

Without Suricata on LAN, all go..

@ghost
Copy link
Author

ghost commented Feb 22, 2023

The only differences from previous installation attempts are as follows:
photo_2023-02-22_19-50-17

1G of swap, before i have 0G.
Mirror swap yes, before no.
Encrypt swap, before no.

photo_2023-02-22_19-50-21

For install i used "other modes" and..
photo_2023-02-22_19-50-25

I hope they can be of some use.
Good evening!

@rudiservo
Copy link

ok try 2 things, check default gateway from WAN as upstream gateway, check if suricata is in promiscuous mode.

@rudiservo
Copy link

@AdSchellevis you're right, the workaround seems to be the fix for now, is it going to on a patch this week?

@fichtner
Copy link
Member

fichtner commented Feb 23, 2023

Debug output from forum:

2023-02-23T06:24:26	Notice	opnsense	/usr/local/etc/rc.newwanip: ROUTING: setting IPv4 default route to 81.xxx.xx.1	
2023-02-23T06:24:26	Notice	opnsense	/usr/local/etc/rc.newwanip: ROUTING: IPv4 default gateway set to wan	
2023-02-23T06:24:26	Notice	opnsense	/usr/local/etc/rc.newwanip: ROUTING: entering configure using 'wan'	
2023-02-23T06:24:26	Notice	opnsense	/usr/local/etc/rc.newwanip: No IP change detected for WAN[wan]	
2023-02-23T06:24:26	Notice	dhclient	Creating resolv.conf	
2023-02-23T06:24:26	Notice	dhclient	New Routers (vtnet2): 81.xxx.xx.1	
2023-02-23T06:24:26	Notice	dhclient	New Broadcast Address (vtnet2): 81.xxx.xx.255	
2023-02-23T06:24:26	Notice	dhclient	New Subnet Mask (vtnet2): 255.255.255.0	
2023-02-23T06:24:26	Notice	dhclient	New IP Address (vtnet2): 81.xxx.xx.x29	
2023-02-23T06:24:26	Notice	dhclient	DEBUG calling add_new_address/add_new_routes	
2023-02-23T06:24:26	Notice	dhclient	DEBUG alias_ip_address:	
2023-02-23T06:24:26	Notice	dhclient	DEBUG new_ip_address: 81.xxx.xx.x29	
2023-02-23T06:24:26	Notice	dhclient	DEBUG old_ip_address: 81.xxx.xx.x29	
2023-02-23T06:24:26	Notice	dhclient	DEBUG entering with BOUND	
2023-02-23T05:24:07	Error	dhclient	send_packet: No route to host

It's a bit strange: we are doing BOUND but with old and new address, don't flush the old one which means adding an IP address that is already there scrubs the route???? Need to verify....

@fichtner
Copy link
Member

yes, the default route disappears when you add the existing address via ifconfig again and it won't even complain about it :/

@ghost
Copy link
Author

ghost commented Feb 23, 2023

ok try 2 things, check default gateway from WAN as upstream gateway, check if suricata is in promiscuous mode.

My actual Gateway, https://prnt.sc/LXcWPiI4gt1- , In the past I had the far and upstream active, these in the screen are the ones I have by default after yesterday's reinstall, I only removed "Disable Gateway Monitoring"

This morning I turned on the firewall again and the problem happened again… #anger

@ornative
Copy link

After upgrading today to 23.1.1_2-amd64, from machines on one of my VLANS, I can get DNS resolution but cannot connect to sites. Windows 11 tells me there is no internet connection, but I can ping the Comcast gateway and get DNS.

Wanted to add this, will probably have to reinstall an older release at this point as I can't be down for more than an hour before I start having automation issues. If there is a patch that would be helpful as at this point I am writing this connected to a hotspot through AT&T.

fichtner added a commit that referenced this issue Feb 27, 2023
In case addresses are removed and reapplied the routes are gone
and other related interface configuration is missing.  In these
cases do a full recycle even though the address did not change
visibly (which is good that we can detect it).

Also address the "miss" of the cached address clean now that we
know DHCP should not force-update us into a missing address
scenario during a renew.

PR: #6338

(cherry picked from 4950460)
(cherry picked from bf97cdf)
(cherry picked from 56fcd68)
@rudiservo
Copy link

@ornative for a workaround check #6338 (comment) response,
just edit /usr/local/etc/rc.newwanip until there is a fix, or just wait for a patch.

fichtner added a commit that referenced this issue Mar 1, 2023
In case addresses are removed and reapplied the routes are gone
and other related interface configuration is missing.  In these
cases do a full recycle even though the address did not change
visibly (which is good that we can detect it).

Also address the "miss" of the cached address clean now that we
know DHCP should not force-update us into a missing address
scenario during a renew.

PR: #6338

(cherry picked from commit 4950460)
(cherry picked from commit bf97cdf)
(cherry picked from commit 56fcd68)
(cherry picked from commit bd635e0)
fichtner added a commit that referenced this issue Mar 3, 2023
In case addresses are removed and reapplied the routes are gone
and other related interface configuration is missing.  In these
cases do a full recycle even though the address did not change
visibly (which is good that we can detect it).

Also address the "miss" of the cached address clean now that we
know DHCP should not force-update us into a missing address
scenario during a renew.

PR: #6338

(cherry picked from commit 4950460)
(cherry picked from commit bf97cdf)
(cherry picked from commit 56fcd68)
(cherry picked from commit bd635e0)
(cherry picked from commit 412c0c7)
@fichtner
Copy link
Member

fichtner commented Mar 3, 2023

Commits have been added for 23.1.2 and confirmed in the forum.

@fichtner fichtner closed this as completed Mar 3, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Production bug
Development

No branches or pull requests

4 participants