Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

rsyslog doesn't collect log messages from a UDP socket that is bound to a nonlocal address #2040

Closed
amg1127 opened this issue Nov 14, 2017 · 9 comments · Fixed by #2080
Closed
Labels
Milestone

Comments

@amg1127
Copy link

amg1127 commented Nov 14, 2017

I am trying to setup an UDP log input whose socket is bound to the nonlocal address 192.168.99.99:

    # ip -4 addr show
    1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN qlen 1
        inet 127.0.0.1/8 scope host lo
        valid_lft forever preferred_lft forever
    2: eth0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc pfifo_fast state UP qlen 1000
        inet 10.4.28.91/26 brd 10.4.28.127 scope global eth0
        valid_lft forever preferred_lft forever

In order to achieve this result, I created the file "/etc/rsyslog.d/udp-nonlocaladdr-bug.conf" with the following content:

    # cat /etc/rsyslog.d/udp-nonlocaladdr-bug.conf
    module(load="imudp")

    input(type="imudp" address="127.0.0.1" port="514" RateLimit.Interval="0" ruleset="logtest")

    input(type="imudp" address="192.168.99.99" port="514" RateLimit.Interval="0" ruleset="logtest")

    ruleset(name="logtest") {
        action(type="omfile" file="/tmp/logtest")
    }

Then, after restarting "rsyslog" service, the intended socket is shown by "netstat" output:

    # systemctl restart rsyslog

    # netstat -pano | grep :514
    udp        0      0 192.168.99.99:514       0.0.0.0:*                           30134/rsyslogd       off (0.00/0/0)
    udp        0      0 127.0.0.1:514           0.0.0.0:*                           30134/rsyslogd       off (0.00/0/0)

However, "rsyslog" does not accept datagrams from that socket even though I bind the address to the "eth0" interface while the program is running:

    # ip -4 addr add dev eth0 192.168.99.99/24

    # logger --udp --server 192.168.99.99 'Nonlocal hello'

    # logger --udp --server 127.0.0.1 'Local hello'

    # cat /tmp/logtest
    Nov 14 18:23:17 localhost root: Local hello

The configuration works as intended only after the service is restarted at this point:

    # systemctl restart rsyslog

    # ip -4 addr show
    1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN qlen 1
        inet 127.0.0.1/8 scope host lo
        valid_lft forever preferred_lft forever
    2: eth0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc pfifo_fast state UP qlen 1000
        inet 10.4.28.91/26 brd 10.4.28.127 scope global eth0
        valid_lft forever preferred_lft forever
        inet 192.168.99.99/24 scope global eth0
        valid_lft forever preferred_lft forever

    # logger --udp --server 192.168.99.99 'Nonlocal hello'

    # logger --udp --server 127.0.0.1 'Local hello'

    # cat /tmp/logtest
    Nov 14 18:23:17 localhost root: Local hello
    Nov 14 18:24:04 cobaia-sst-centos7 root: Nonlocal hello
    Nov 14 18:24:09 localhost root: Local hello
@amg1127
Copy link
Author

amg1127 commented Nov 14, 2017

While trying to debug this issue, I figured that the root cause of the problem may reside in a programming error during UDP sockets initialization.

In the file "runtime/net.c", starting at line 1451, the following code exists:

    if(bIsServer) {

            /* rgerhards, 2007-06-22: if we run on a kernel that does not support
                * the IPV6_V6ONLY socket option, we need to use a work-around. On such
                * systems the IPv6 socket does also accept IPv4 sockets. So an IPv4
                * socket can not listen on the same port as an IPv6 socket. The only
                * workaround is to ignore the "socket in use" error. This is what we
                * do if we have to.
                */
            if(     (bind(*s, r->ai_addr, r->ai_addrlen) < 0)
#		ifndef IPV6_V6ONLY
                    && (errno != EADDRINUSE)
#		endif
                ) {
                    if (errno == EADDRNOTAVAIL && ipfreebind != IPFREEBIND_DISABLED) {
                            if (setsockopt(*s, IPPROTO_IP, IP_FREEBIND, &on, sizeof(on)) < 0) {
                                    errmsg.LogError(errno, NO_ERRCODE, "setsockopt(IP_FREEBIND)");
                            }
                            else if (bind(*s, r->ai_addr, r->ai_addrlen) < 0) {
                                    errmsg.LogError(errno, NO_ERRCODE, "bind with IP_FREEBIND");
                            } else {
                                    if (ipfreebind >= IPFREEBIND_ENABLED_WITH_LOG)
                                            errmsg.LogMsg(0, RS_RET_OK_WARN, LOG_WARNING, "bound address %s IP free", hostname);
                                    continue;
                            }
                    }
                    close(*s);
                    *s = -1;
                    continue;
            }
    }

    (*socks)++;
    s++;

According to my analysis:

  • "socks[0]" indicates how many socket descriptors the program will listen

  • "&socks[1]" is the starting point of an array that stores socket descriptors created by the program

  • "s" is a pointer that addresses members from the "socks" array

  • If the "bind()" call at L1460 succeeds, the code increments both the socket counter (L1483) and "s" (L1484).

  • If an administrator chooses to specify a nonlocal address in configuration instead, that "bind()" call will fail. After that, the second "bind()" call at L1469 will succeed and the code will reach the "continue" statement at the line 1474.

  • In such scenario, the program leaves a listening socket opened and unattended, because the "s" pointer would not be incremented and the "socket()" call at either L1309 or L1311 would override the file descriptor with a new socket.

@davidelang
Copy link
Contributor

davidelang commented Nov 15, 2017 via email

@amg1127
Copy link
Author

amg1127 commented Nov 15, 2017

@davidelang

I am afraid I need to deploy a "nontraditional solution", that is expected to be supported by "rsyslog".

I intend to deploy a couple of logservers on my network and configure an active & standby architecture. On both servers, I plan to bind UDP inputs to an unique IP address which is failed over by a keepalived daemon. As the IP_FREEBIND socket option have been used by "imudp" module since rsyslog 8.18, I am willing to use it and avoid configuring a global nonlocal address bind permission.

  • enhancement: Allow rsyslog to bind UDP ports even w/out specific
    interface being up at the moment.
    Alternatively, rsyslog could be ordered after networking, however,
    that might have some negative side effects. Also IP_FREEBIND is
    recommended by systemd documentation.
    Thanks to Nirmoy Das and Marius Tomaschewski for the patch.

In addition, let us suppose a fellow administrator intends to deploy a centralized log server and, for some reason, decides to bind an UDP input socket to a specific network interface. If:

  • The distribution is systemd-based, and
  • "systemd" starts "rsyslog" daemon before the network stack initializes, and
  • Applications are not allowed to bind sockets to nonlocal addresses (which is the default kernel setting), and
  • An interface address is specified in "rsyslog" config file

Then the administrator will experience the same problem: "rsyslog" will not collect log messages from the listening UDP socket.

@deoren
Copy link
Contributor

deoren commented Nov 15, 2017

@amg1127: "systemd" starts "rsyslog" daemon before the network stack initializes

Loosely related: #1694, #1656

@davidelang
Copy link
Contributor

davidelang commented Nov 15, 2017 via email

@amg1127
Copy link
Author

amg1127 commented Nov 17, 2017

According to the code, someone else has already tried to "change things in rsyslog" in order to deal with non-local binds. I just intend to rely on that try.

    if (errno == EADDRNOTAVAIL && ipfreebind != IPFREEBIND_DISABLED) {
            if (setsockopt(*s, IPPROTO_IP, IP_FREEBIND, &on, sizeof(on)) < 0) {
                    errmsg.LogError(errno, NO_ERRCODE, "setsockopt(IP_FREEBIND)");
            }
            else if (bind(*s, r->ai_addr, r->ai_addrlen) < 0) {

There is an unexpected behavior from that approach, which I ask developers to manage. The solution could be either adjust handling of EADDRNOTAVAIL in the code or delegate the OS error to the system administrator (thus forcing enabling the "ip_nonlocal_bind" kernel option).


Side note: "imudp" module does recognize the "ipfreebind" parameter, but it is currently undocumented in the manual. That is another issue.

@rgerhards
Copy link
Member

Side note: "imudp" module does recognize the "ipfreebind" parameter, but it is currently undocumented in the manual. That is another issue.

Would you mind updating the doc?

amg1127 added a commit to amg1127/rsyslog that referenced this issue Nov 18, 2017
amg1127 added a commit to amg1127/rsyslog that referenced this issue Nov 18, 2017
@amg1127
Copy link
Author

amg1127 commented Nov 19, 2017

Would you mind updating the doc?

No problem. I actually hadn't figured that I could do it until you asked me to.

Here is the documentation pull request: rsyslog/rsyslog-doc#392

rgerhards added a commit to rgerhards/rsyslog that referenced this issue Nov 21, 2017
Function used very old style and was hard to read. Now also supports
strucutred handling of exceptions.

see also rsyslog#2049
see also rsyslog#2040
rgerhards added a commit to rgerhards/rsyslog that referenced this issue Nov 21, 2017
Function used very old style and was hard to read. Now also supports
strucutred handling of exceptions.

see also rsyslog#2049
see also rsyslog#2040
rgerhards added a commit to rgerhards/rsyslog that referenced this issue Nov 21, 2017
Function used very old style and was hard to read. Now also supports
strucutred handling of exceptions.

see also rsyslog#2049
see also rsyslog#2040
@rgerhards rgerhards added this to the v8.31 milestone Nov 21, 2017
@rgerhards rgerhards added the bug label Nov 21, 2017
rgerhards added a commit to rgerhards/rsyslog that referenced this issue Nov 22, 2017
... and also solve a socket leak that pre-8.31 occurred when the option
was enabled.

Special thanks to github user amg1127 who analysed the situation and
actually provided the original patch for the issue. Unfortunately, the
existing code base was really bad, and so I needed to do some refactoring
first. My patch here is modelled exactly after amg1127's patch, but
bases on the refactoring.

Full credits for the patch belong to amg1127.

see also rsyslog#2049
closes rsyslog#2040
@lock
Copy link

lock bot commented Dec 26, 2019

This thread has been automatically locked since there has not been any recent activity after it was closed. Please open a new issue for related bugs.

@lock lock bot locked as resolved and limited conversation to collaborators Dec 26, 2019
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants