Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

python.d ERROR: unbound[local] : [Errno 57] Socket is not connected FreeBSD #6434

Closed
driesmp opened this issue Jul 11, 2019 · 16 comments · Fixed by #6561
Closed

python.d ERROR: unbound[local] : [Errno 57] Socket is not connected FreeBSD #6434

driesmp opened this issue Jul 11, 2019 · 16 comments · Fixed by #6561
Labels
area/collectors Everything related to data collection bug collectors/python.d needs triage Issues which need to be manually labelled os/bsd bsd os related issues

Comments

@driesmp
Copy link
Contributor

driesmp commented Jul 11, 2019

Hi,

I have set-up netdata (v1.15.0) to monitor my local unbound server on FreeBSD 12 through its control socket.

My error logs are flooding every second (poll rate) with the following:
2019-07-11 19:42:51: python.d ERROR: unbound[local] : [Errno 57] Socket is not connected
2019-07-11 19:42:52: python.d ERROR: unbound[local] : [Errno 57] Socket is not connected
2019-07-11 19:42:53: python.d ERROR: unbound[local] : [Errno 57] Socket is not connected
2019-07-11 19:42:54: python.d ERROR: unbound[local] : [Errno 57] Socket is not connected
2019-07-11 19:42:55: python.d ERROR: unbound[local] : [Errno 57] Socket is not connected
2019-07-11 19:42:56: python.d ERROR: unbound[local] : [Errno 57] Socket is not connected
2019-07-11 19:42:57: python.d ERROR: unbound[local] : [Errno 57] Socket is not connected
2019-07-11 19:42:58: python.d ERROR: unbound[local] : [Errno 57] Socket is not connected

I have in unbound.conf (python.d plugin):

local:
 extended: true
 socket: /var/run/unbound/unbound-control.sock

I have in unbound.conf:

remote-control:
        control-enable: yes
        control-interface: /var/run/unbound/unbound-control.sock

The unbound charts are working fine, although logs are obviously flooding.

@driesmp driesmp added bug needs triage Issues which need to be manually labelled labels Jul 11, 2019
@ilyam8
Copy link
Member

ilyam8 commented Jul 11, 2019

Hi @Duffyx

I have set-up netdata (v1.15.0)

We released 1.16.0 recently

The unbound charts are working fine, although logs are obviously flooding.

@Ferroin any ideas?

@Ferroin
Copy link
Member

Ferroin commented Jul 11, 2019

Well, the error message isn't being issued by the module itself, so it's not something we're catching. My best guess is that something is causing it to disconnect regularly while it's collecting data and it's not reconnecting properly.

Unfortunately, I have very little experience with BSD in general compared to Linux, so I'm not really sure what might be causing this.

@ilyam8
Copy link
Member

ilyam8 commented Jul 11, 2019

Yeah, the error message from the SocketService and the only place where we log it w/o additional text is

def _disconnect(self):
"""
Close socket connection
:return:
"""
if self._sock is not None:
try:
self.debug('closing socket')
self._sock.shutdown(2) # 0 - read, 1 - write, 2 - all
self._sock.close()
except Exception as error:
self.error(error)
self._sock = None

@Ferroin
Copy link
Member

Ferroin commented Jul 11, 2019

Hmm...

Does FreeBSD have some way for UDS listeners to explicitly disconnect clients? AFAIK, such behavior is not in POSIX (but it's not non-compliant either), and at least Linux does not allow for this, but if FreeBSD does then Unbound is probably taking advantage of it (I suspect unintentionally, they do this for framing of replies on regular TCP connections).

@driesmp
Copy link
Contributor Author

driesmp commented Jul 18, 2019

Hi, I have no real knowledge about this. But lets assume FreeBSD does what you describe, could you provide a patch for that which I could test? :)

@ilyam8
Copy link
Member

ilyam8 commented Jul 18, 2019

Well, the workaround is to rewrite _disconnect in unbound module and check the OS before loggin the error. What do you think @Ferroin ?

@Ferroin
Copy link
Member

Ferroin commented Jul 18, 2019

Having looked a bit deeper now, I'm not 100% certain it's something that's truly specific to FreeBSD.

@Duffyx Quick question: Are you running either Netdata or Unbound (or possibly both) under Linux emulation, or are they running natively on FreeBSD?

@driesmp
Copy link
Contributor Author

driesmp commented Jul 18, 2019

@Ferroin They are both running natively on FreeBSD.

@Ferroin
Copy link
Member

Ferroin commented Jul 18, 2019

OK, figured that was probably the case but wanted to check. I'm going to set up a FreeBSD VM to try and reproduce this locally so I can poke around at the network stack and hopefully figure out what exactly is going on. Short therm though, @ilyam8's suggestion should work as a stopgap to at least stop this from flooding your log with these seemingly pointless error messages.

@ilyam8
Copy link
Member

ilyam8 commented Jul 18, 2019

I understand that this is very annoying to have error.log full of Socket is not connected messages, i think we can do a quick workaround and remove it when we have a better solution.

@driesmp
Copy link
Contributor Author

driesmp commented Jul 18, 2019

@Ferroin Thanks for doing this! Make sure to set-up netdata monitoring unbound through a file system socket. As described in my sample config files.

@Ferroin
Copy link
Member

Ferroin commented Jul 18, 2019

@Duffyx One last question that I forgot earlier, what version of Unbound are you using and how did you install it (ports tree, manual local build, pkg command, etc)? I don't think the exact version of Unbound is likely to matter, but the more closely I can replicate your setup, the more likely I'll be able to reproduce the issue.

@driesmp
Copy link
Contributor Author

driesmp commented Jul 18, 2019

Through pkg, unbound version 1.9.2 and netdata version 1.15.0.
FreeBSD 12 STABLE (12.0 RELEASE most closely resembles my version).

@Ferroin
Copy link
Member

Ferroin commented Jul 18, 2019

@Duffyx Thanks. I probably won't be able to dig too deep into this until this weekend, so you may not hear back from me until Monday about it, but but if all goes well I may actually have a fix ready by then.

@driesmp
Copy link
Contributor Author

driesmp commented Jul 27, 2019

Hi @Ferroin have you had the time to take a look at this? :-)

@Ferroin
Copy link
Member

Ferroin commented Jul 29, 2019

@Duffyx Yes, but I completely forgot to post back here about it, sorry about that.

I've not been able to reproduce the error locally in a VM, so it looks like the workaround proposed by @ilyam8 is going to unfortunately be the only option here. I'll look at getting a PR put together for that today.

Ferroin added a commit to Ferroin/netdata that referenced this issue Jul 29, 2019
This adds an explicit check for the case of a socket that's already
disconnected and skips logging an error message.  The conditionn
technically is an error, but it's one that we can recover from trivially
by just doing nothing in this case (we were trying to disconnect the
scoket anyway, so if it's already disconnected, we don't need to change
anything).

This uses Python's `errno` module so that we can detect this situation
in a system-agnostic manner.

Fixes netdata#6434
ilyam8 pushed a commit that referenced this issue Jul 30, 2019
* Handle disconnected sockets in unbound collector.

This adds an explicit check for the case of a socket that's already
disconnected and skips logging an error message.  The conditionn
technically is an error, but it's one that we can recover from trivially
by just doing nothing in this case (we were trying to disconnect the
scoket anyway, so if it's already disconnected, we don't need to change
anything).

This uses Python's `errno` module so that we can detect this situation
in a system-agnostic manner.

Fixes #6434
jacekkolasa pushed a commit to jacekkolasa/netdata that referenced this issue Aug 6, 2019
* Handle disconnected sockets in unbound collector.

This adds an explicit check for the case of a socket that's already
disconnected and skips logging an error message.  The conditionn
technically is an error, but it's one that we can recover from trivially
by just doing nothing in this case (we were trying to disconnect the
scoket anyway, so if it's already disconnected, we don't need to change
anything).

This uses Python's `errno` module so that we can detect this situation
in a system-agnostic manner.

Fixes netdata#6434
jacekkolasa pushed a commit to jacekkolasa/netdata that referenced this issue Aug 6, 2019
* Handle disconnected sockets in unbound collector.

This adds an explicit check for the case of a socket that's already
disconnected and skips logging an error message.  The conditionn
technically is an error, but it's one that we can recover from trivially
by just doing nothing in this case (we were trying to disconnect the
scoket anyway, so if it's already disconnected, we don't need to change
anything).

This uses Python's `errno` module so that we can detect this situation
in a system-agnostic manner.

Fixes netdata#6434
jackyhuang85 pushed a commit to jackyhuang85/netdata that referenced this issue Jan 1, 2020
* Handle disconnected sockets in unbound collector.

This adds an explicit check for the case of a socket that's already
disconnected and skips logging an error message.  The conditionn
technically is an error, but it's one that we can recover from trivially
by just doing nothing in this case (we were trying to disconnect the
scoket anyway, so if it's already disconnected, we don't need to change
anything).

This uses Python's `errno` module so that we can detect this situation
in a system-agnostic manner.

Fixes netdata#6434
@ilyam8 ilyam8 added the os/bsd bsd os related issues label May 30, 2020
@ilyam8 ilyam8 added collectors/python.d area/collectors Everything related to data collection and removed area/external/python labels Apr 14, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/collectors Everything related to data collection bug collectors/python.d needs triage Issues which need to be manually labelled os/bsd bsd os related issues
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants