-
Notifications
You must be signed in to change notification settings - Fork 245
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
syslog fills with "Error 1 sending the modular data", gmond keeps using socket after EINVAL #65
Comments
I'm seeing similar errors, but it mostly seems to be due to connection refused:
FD 7 is a UDP channel to our central ganglia server
man 2 connect says:
|
Late reply, but may help some poor soul ... get pid of gmond: "lsof -p 20606" revealed: Notice type IPv6. Disabled ipv6 like this: add the following: restart network: restart gmond: |
Suggested action/solution: if write returns EINVAL, gmond should try to recreate or re-bind the sending socket, rather than continuing to send on a bad socket (and filling logs with errors)
Google reveals this has been discussed several times in the past, and
none of the discussions ended with a solution, so I'm presenting some
analysis below.
Here is what I did and what I found:
I discovered my gmond PID = 21015 and I checked it with strace:
strace -p 21015 -o /tmp/gmond.errs -v
After about a minute, I had a look inside /tmp/gmond.errs, lots of this:
write(7, "\0\0\0\205\0\0\0\4srv1\0\0\0\fmachine_type\0\0\0\0"..., 52) = 52
write(8, "\0\0\0\205\0\0\0\4srv1\0\0\0\fmachine_type\0\0\0\0"..., 52) =
-1 EINVAL (Invalid argument)
write(7, "\0\0\0\200\0\0\0\4srv1\0\0\0\7os_name\0\0\0\0\0\0\0\0\6"...,
164) = 164
write(8, "\0\0\0\200\0\0\0\4srv1\0\0\0\7os_name\0\0\0\0\0\0\0\0\6"...,
164) = -1 EINVAL (Invalid argument)
time([1351418592]) = 1351418592
sendto(9, "<30>Oct 28 11:03:12 /usr/sbin/gm"..., 90, MSG_NOSIGNAL, NULL,
0) = 90
Notice the `sendto' is actually sending the error to syslog, not sending
a metric packet
Ok, the `write' calls show me two file descriptors, 7 and 8. writes to
FD 8 are failing with EINVAL:
write(8, .... ) = -1 EINVAL (Invalid argument)
The file descriptors correspond to two different udp_send_channels in
gmond.conf - but which is which? Fortunately, lsof tells me:
lsof -p 21015 -n
gmond 21015 ganglia 7u IPv4 2747622 0t0 UDP
192.168.1.2:44778->239.2.11.71:8649
gmond 21015 ganglia 8u IPv4 2747628 0t0 UDP
(VPN address):53976->(remote server address):8649
Notice that FD 7 corresponds to a very standard multicast channel, while
FD 8 corresponds to a UDP unicast channel. I have deleted the IP
addresses, but this immediately revealed the problem (in my case
anyway): the local address (VPN address) existed when gmond started, but
no longer exists on this machine (because the VPN is not always up).
I can imagine similar problems would occur for hosts that get an IP by
means of DHCP, or hosts that have IPsec tunnel, PPP or some other
transient interfaces.
If anyone else sees the problem, it would be interested to see your
strace and lsof output. I believe gmond could be tweaked, for example,
to recreate (or re-bind) the socket with FD 8 after such an EINVAL error.
Doing so might log a more specific error or might successfully bind on a
new local IP.
The text was updated successfully, but these errors were encountered: