Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[bug] loopback interface services unavailable #3287

Closed
thutex opened this issue Mar 5, 2019 · 21 comments
Closed

[bug] loopback interface services unavailable #3287

thutex opened this issue Mar 5, 2019 · 21 comments
Labels
support Community support

Comments

@thutex
Copy link

thutex commented Mar 5, 2019

@fichtner @mimugmail
referencing forum topic https://forum.opnsense.org/index.php?topic=10841.0
and issue opnsense/plugins#1104 with commit 8ca8def

when i updated i lost functionality on the loopback interface, which at first i did not really understand (since the first to be noticed was only dns)

but services running on 127.0.0.1 (dns, ntopng in the form of not being able to connect to redis, nut not being able to connect, just pinging 127.0.0.1 from the terminal itself) were all unavailable.
lo0 showed the ip was assigned.

functionality returned after adding a virtual ip of 127.0.0.1 to the loopback interface on Firewall: Virtual IPs: Settings

maybe in changing the naming, something was overlooked?

@AdSchellevis AdSchellevis transferred this issue from opnsense/plugins Mar 5, 2019
@AdSchellevis
Copy link
Member

@thutex if you remove the vip, does the connectivity break instantly? if it does, can you check the generated firewall config? (diff both versions of /tmp/rules.debug )

@fichtner
Copy link
Member

fichtner commented Mar 5, 2019

# ifconfig lo0

output of that ifconfig command in broken state would be helpful too, I don't see something wrong with the rename, it is for GUI labels only

@fichtner fichtner added the support Community support label Mar 5, 2019
@thutex
Copy link
Author

thutex commented Mar 5, 2019

@AdSchellevis it does indeed break instantly.

root@firewall:~ # ping 127.0.0.1
PING 127.0.0.1 (127.0.0.1): 56 data bytes
64 bytes from 127.0.0.1: icmp_seq=0 ttl=64 time=0.114 ms
64 bytes from 127.0.0.1: icmp_seq=1 ttl=64 time=0.163 ms
------removed ip alias here-----------
ping: sendto: Can't assign requested address
ping: sendto: Can't assign requested address
ping: sendto: Can't assign requested address
------added ip alias here---------
64 bytes from 127.0.0.1: icmp_seq=21 ttl=64 time=0.120 ms

there seems to be no difference when doing a diff on the rules.debug in broken and in working state.
ifconfig looks like this:

root@firewall:~ # ifconfig lo0
lo0: flags=8049<UP,LOOPBACK,RUNNING,MULTICAST> metric 0 mtu 16384
	options=600003<RXCSUM,TXCSUM,RXCSUM_IPV6,TXCSUM_IPV6>
	inet6 ::1 prefixlen 128 
	inet6 fe80::1%lo0 prefixlen 64 scopeid 0x4 
	inet 127.0.0.1 netmask 0xffffffff 
	nd6 options=21<PERFORMNUD,AUTO_LINKLOCAL>
	groups: lo 

------removed ip alias here-----------
                                                                               
Broadcast Message from root@firewall.home.lan                                  
        (no tty) at 20:09 CET...                                               
                                                                               
Communications with UPS gembird lost                                           
                                                                               

root@firewall:~ # ifconfig lo0
lo0: flags=8049<UP,LOOPBACK,RUNNING,MULTICAST> metric 0 mtu 16384
	options=600003<RXCSUM,TXCSUM,RXCSUM_IPV6,TXCSUM_IPV6>
	inet6 ::1 prefixlen 128 
	inet6 fe80::1%lo0 prefixlen 64 scopeid 0x4 
	nd6 options=21<PERFORMNUD,AUTO_LINKLOCAL>
	groups: lo 

------added ip alias here---------

Broadcast Message from root@firewall.home.lan                                  
        (no tty) at 20:12 CET...                                               
                                                                               
Communications with UPS gembird established

funny thing: in the ORIGINAL ifconfig, BEFORE i added the 127.0.0.1 address to virtual addresses, it WAS assigned to lo0. (but not working!)
now, when i remove the alias, it also gets removed from the interface, leaving it without an ip. (which, i suppose, is wanted/correct behaviour?)

@thutex
Copy link
Author

thutex commented Mar 5, 2019

i removed the ip from alias and rebooted the firewall to see what would happen, here is the output:

root@firewall:~ # ifconfig lo0
lo0: flags=8049<UP,LOOPBACK,RUNNING,MULTICAST> metric 0 mtu 16384
	options=600003<RXCSUM,TXCSUM,RXCSUM_IPV6,TXCSUM_IPV6>
	inet6 ::1 prefixlen 128 
	inet6 fe80::1%lo0 prefixlen 64 scopeid 0x4 
	inet 127.0.0.1 netmask 0xff000000 
	nd6 options=21<PERFORMNUD,AUTO_LINKLOCAL>
	groups: lo 
                                                                               
Broadcast Message from root@firewall.home.lan                                  
        (no tty) at 20:41 CET...                                               
                                                                               
Communications with UPS gembird lost                                           
                                                                              
                                                                               
Broadcast Message from root@firewall.home.lan                                  
        (no tty) at 20:42 CET...                                               
                                                                               
UPS gembird is unavailable                                                     
                                                                               
PING 127.0.0.1 (127.0.0.1): 56 data bytes
ping: sendto: Can't assign requested address
ping: sendto: Can't assign requested address
^C
--- 127.0.0.1 ping statistics ---
2 packets transmitted, 0 packets received, 100.0% packet loss

root@firewall:~ # diff /tmp/rules.debug /home/bjorn/workingdebug
120c120
< block in quick proto carp from {(self)} to {any}
---
> # block in quick proto carp from {(self)} to {any}
                                                                               
Broadcast Message from root@firewall.home.lan                                  
        (no tty) at 20:43 CET...                                               
                                                                               
Communications with UPS gembird established           

the "workingdebug" file was a cat from rules.debug from before the reboot, with the ip alias active.

@AdSchellevis
Copy link
Member

ok, that looks rather odd. We'll have to investigate this further. Anything else running on this firewall?

That it removes the address after removing the vip, I can follow (we likely don't check for that), but after the reboot I would expect it just works.

Question is how do we get a machine in this state, a clean install seems to functional normally.

@thutex
Copy link
Author

thutex commented Mar 5, 2019

i reloaded the settings (to change the theme :) ) and again fell without connection on localhost.
this time, again, the loopback IP was on the interface but connections did not work.

root@firewall:/home/bjorn # ifconfig lo0
lo0: flags=8049<UP,LOOPBACK,RUNNING,MULTICAST> metric 0 mtu 16384
	options=600003<RXCSUM,TXCSUM,RXCSUM_IPV6,TXCSUM_IPV6>
	inet6 ::1 prefixlen 128 
	inet6 fe80::1%lo0 prefixlen 64 scopeid 0x4 
	inet 127.0.0.1 netmask 0xffffffff 
	nd6 options=21<PERFORMNUD,AUTO_LINKLOCAL>
	groups: lo 
root@firewall:/home/bjorn # ping 127.0.0.1
PING 127.0.0.1 (127.0.0.1): 56 data bytes
ping: sendto: Can't assign requested address
^C
--- 127.0.0.1 ping statistics ---
1 packets transmitted, 0 packets received, 100.0% packet loss
root@firewall:/home/bjorn # 


dmesg
------
lo0: promiscuous mode enabled
025.568071 [ 254] generic_find_num_desc     called, in tx 1024 rx 1024
025.574773 [ 262] generic_find_num_queues   called, in txq 0 rxq 0
025.581154 [ 760] generic_netmap_dtor       Restored native NA 0
025.592964 [ 254] generic_find_num_desc     called, in tx 1024 rx 1024
025.599660 [ 262] generic_find_num_queues   called, in txq 0 rxq 0
025.606049 [ 760] generic_netmap_dtor       Restored native NA 0
025.612263 [ 254] generic_find_num_desc     called, in tx 1024 rx 1024
025.618999 [ 262] generic_find_num_queues   called, in txq 0 rxq 0
pid 89000 (ntopng), uid 288: exited on signal 11
re0: promiscuous mode disabled
pppoe0: promiscuous mode disabled
ovpnc1: promiscuous mode disabled
ovpns2: promiscuous mode disabled
re1_vlan10: promiscuous mode disabled
re1_vlan193: promiscuous mode disabled
re1_vlan1001: promiscuous mode disabled
re1_vlan1002: promiscuous mode disabled
re1_vlan1003: promiscuous mode disabled
re1_vlan1004: promiscuous mode disabled
re1: promiscuous mode disabled
re1_vlan1005: promiscuous mode disabled
lo0: promiscuous mode disabled
in_scrubprefix: err=65, prefix delete failed

the firewall runs:

clamav
dhcpd
dnscrypt-proxy
dnsmasq
monit
ntopng
nut(_daemon,_upsmon) / redis
openvpn
suricata
ddns

installed plugins:

os-arp-scan
os-clamav
os-dnscrypt-proxy
os-dyndns
os-intrusion-detection-content-pt-open
os-net-snmp
os-ntopng
os-nut
os-redis
os-smart

maybe it can be cloned by installing an 18.7* version and then upgrading?
(this would mimic what i have done, my original install on the current hardware was done around june last year and since has always been upgraded)

@AdSchellevis
Copy link
Member

Which services are running on localhost? I get the impression that one of the installed services is playing tricks here, the upgrade from 18.7 is unlikely the issue (otherwise our firewall would probably suffer from the same).

Easiest test is to try disabling services binding to lo0 and enable them one by one with reboots in between. I would probably start with ntopng, since my guess is that's the service putting the device in promisc "lo0: promiscuous mode enabled"

@thutex
Copy link
Author

thutex commented Mar 6, 2019

i doubt it would be ntopng, since it is one of the services that cannot start.
it also worked before the update, and not only the loopback but all interfaces are in promiscuous mode.
(which works as expected, also for the loopback interface)

listening on localhost (working state):

root@firewall:~ # sockstat | grep 127.0.0.1
uucp upsd 78168 4 tcp4 127.0.0.1:3493 :
uucp upsd 78168 6 tcp4 127.0.0.1:3493 127.0.0.1:27878
ntopng ntopng 2545 8 tcp4 127.0.0.1:59417 127.0.0.1:6379
ntopng ntopng 2545 9 tcp4 127.0.0.1:16396 127.0.0.1:6379
nobody dnsmasq 2511 16 udp4 127.0.0.1:53 :
nobody dnsmasq 2511 17 tcp4 127.0.0.1:53 :
clamav clamd 76788 4 tcp4 127.0.0.1:3310 :
_flowd flowd 76481 3 udp4 127.0.0.1:2056 :
nobody samplicate 19987 3 udp4 127.0.0.1:2055 :
uucp upsmon 80255 4 tcp4 127.0.0.1:27878 127.0.0.1:3493
redis redis-serv 2938 7 tcp4 127.0.0.1:6379 :
redis redis-serv 2938 12 tcp4 127.0.0.1:6379 127.0.0.1:59417
redis redis-serv 2938 13 tcp4 127.0.0.1:6379 127.0.0.1:16396
_dnscrypt-proxy dnscrypt-p70683 3 udp4 127.0.0.1:5353 :
_dnscrypt-proxy dnscrypt-p70683 5 tcp4 127.0.0.1:5353 :
root ntpd 38862 27 udp4 127.0.0.1:123 :
root lighttpd 32678 6 tcp4 127.0.0.1:443 :
root lighttpd 32678 9 tcp4 127.0.0.1:80 :
root sshd 90926 4 tcp4 127.0.0.1:22 :

@AdSchellevis
Copy link
Member

I guess there's only one way to tell, if I could replicate the behaviour over here with a core only install, I would gladly do so, but I have not seen this behaviour on any of our machines.

@AdSchellevis
Copy link
Member

trying to run IPS on localhost? if so, netmap is not intended to run on localhost (not selectable by default either).

@thutex
Copy link
Author

thutex commented Mar 6, 2019

no, IPS is running only on the wan interface
ips

@thutex
Copy link
Author

thutex commented Mar 6, 2019

@AdSchellevis i just recreated the problem in a virtualbox machine.

install 18.7 to virtualbox from dvd iso
load up 18.7 configuration from actual firewall

ping 1.1.1.1 ok
drill google.com @127.0.0.1 ok

(minor) update from console and reboot:
ping 1.1.1.1 ok
drill google.com @127.0.0.1 ok

update to 19.1 from console and reboot:
ping 1.1.1.1 ok
drill google.com @127.0.0.1 NOK: error sending query: error creating socket
(i.e: the issue i am having)

is there a safe way to send you my config (i.e.: easy way to remove/modify sensitive things like passwords) ?
maybe that can help?

@AdSchellevis
Copy link
Member

my community support time is limit, but you can send me at my email address "ad at project domain".
Can you first try to import your configuration into a fresh 19.1 installation (which doesn't have the plugins installed)? I really expect this has something todo with one of the plugins your using, so if a fresh 19.1 with config fails, then I have something we could investigate.

@thutex
Copy link
Author

thutex commented Mar 6, 2019

that is what i did above: install 18.7, imported my config, updated it, without installing any extra package, and the issue appeared again.
reverting the config to a pure blank default config does return it to working state, so it must be something in the configuration, which worked in 18.7 but not 19.1

@AdSchellevis
Copy link
Member

so, hence my question, does it happen on a clean 19.1 install? if it does, without packages installed, it should be reproducible at my end.

@thutex
Copy link
Author

thutex commented Mar 6, 2019

it does indeed.
you should have my config in your mailbox.

@AdSchellevis
Copy link
Member

ok, received

@AdSchellevis
Copy link
Member

I can't reproduce it, this is what I did:

  1. clean install 19.1
  2. [config.xml] changed the root password to something else
  3. [config.xm] renamed the interface to something my vm supports (not realtek)
  4. copied the config to my clean install
  5. booted and did a ping to 127.0.0.1. works like a charm.
  6. set the address and gateway from a console (ifconfig+route) to be able to update
  7. pressed 12 for an update
  8. reboot
  9. ping 127.0.0.1, still works like a charm.

I can't spend more time on this, if anyone else wants to give it a shot, feel free to do so.

@thutex
Copy link
Author

thutex commented Mar 6, 2019

somehow, something is setting a route for 127.0.0.1 that should not be there:
127.0.0.1 UGHS pppoe0

@thutex
Copy link
Author

thutex commented Mar 6, 2019

@AdSchellevis : found the problem.
because i use the dnscrypt proxy, i have my default dns (System: Settings: General) set to 127.0.0.1
somehow after the update it got my wan interface as a gateway, and this in turn caused opnsense to set a route through the wan interface.

i removed the gateway from the settings and all is working as expected again.
strange that you couldnt reproduce the issue with the same config though?

anyway since this is resolved and no action is needed from the team, i will close this issue.
thank you for thinking along, i know your time is precious!

@thutex thutex closed this as completed Mar 6, 2019
@AdSchellevis
Copy link
Member

@thutex thanks for letting us know!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
support Community support
Development

No branches or pull requests

3 participants