New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
imrelp FD leak #4005
Comments
We have this problem too. We use to connect to the port and close the connection for a rudimental monitoring... and the descriptors grow indefinitely until rsyslog stops to work. |
since this has been postponed for a few month now and i know of others who faced this issue, here is the workaround currently working for us (in a slightly differend way and running as a cronjob multiple times a day): [ note: lsof could show those tcp handles in CLOSE_WAIT state with -iTCP -s "TCP:^LISTEN" -s "TCP:^ESTABLISHED" but then these leaked handles would not be counted: rsyslogd 24831 root 1978u sock 0,7 0t0 279860656 protocol: TCP with two loadbalancers checking the port and 32k open files limit, this check restarts rsyslog twice a day. |
We met this problem as well and can reproduce it reliably with the newest rsyslog (8.2002.0 at the time of writing): Expected behaviorFDs get cleared if relp connections are force terminated. Actual behaviorFDs gets left open: # ls -l /proc/11/fd|head -10
total 0
lr-x------. 1 root root 64 Mar 30 16:40 0 -> /dev/urandom
lrwx------. 1 root root 64 Mar 30 16:40 1 -> socket:[166569908]
lrwx------. 1 root root 64 Mar 30 16:40 10 -> socket:[166569914]
lrwx------. 1 root root 64 Mar 30 16:40 100 -> socket:[166573526]
lrwx------. 1 root root 64 Mar 30 16:40 101 -> socket:[166573527]
lrwx------. 1 root root 64 Mar 30 16:40 102 -> socket:[166573528]
lrwx------. 1 root root 64 Mar 30 16:40 103 -> socket:[166573529]
lrwx------. 1 root root 64 Mar 30 16:40 104 -> socket:[166573530]
lrwx------. 1 root root 64 Mar 30 16:40 105 -> socket:[166573531]
# ls -l /proc/11/fd|wc -l
996 Output from reproduce.sh rsyslogd: version 8.2002.0, config validation run (level 9), master config /spool.conf
rsyslogd: End of config validation run. Bye.
Waiting for threads to finish
Threads finished, sleeping 10s before counting open filehandles..
Open files at the end: 989 Steps to reproduce the behaviorspool.conf $LocalHostName spool
$AbortOnUncleanConfig on
$PreserveFQDN on
global(
workDirectory="/workdir"
maxMessageSize="256k"
)
main_queue(queue.type="Direct")
module(load="imrelp")
input(
type="imrelp"
name="imrelp"
port="601"
ruleset="spool"
MaxDataSize="256k"
)
ruleset(name="spool" queue.type="direct") {
}
# Just so rsyslog doesn't whine that we do not have outputs
ruleset(name="noop" queue.type="direct") {
action(
type="omfile"
name="omfile"
file="/spool/spool.log"
)
} reproduce.sh #!/bin/bash
$RSYSLOGD -f /spool.conf -N9
$RSYSLOGD -f /spool.conf -i /workdir/rsyslogd.pid
sleep 2;
# How many tcpfloods we run at the same tiem
for ((i=1;i<=10;i++)); do
(
# How many times tcpflood runs in each threads
for ((y=1;y<=100;y++)); do
(
$TCPFLOOD -T relp-plain -t 127.0.0.1 -p 601 -s -m 1000000
) &
# Give it time to actually connect
sleep 2;
kill -9 $!;
done;
) >/dev/null 2>&1 &
done;
echo "Waiting for threads to end";
wait;
echo "Open files at the end: $(ls -l "/proc/$(cat /workdir/rsyslogd.pid)/fd" | wc -l)"; Environment# cat /etc/centos-release
CentOS Linux release 7.7.1908 (Core)
# rpm -qa "rsyslog*"
rsyslog-relp-8.2002.0-1.el7.x86_64
rsyslog-8.2002.0-1.el7.x86_64 |
@StrongestNumber9 I will try to create a test based on your configuration. |
@StrongestNumber9 : I have successfully build a test based on this config and I could reproduce an unexpected problem which I will investigate next week. |
In iRet handler of relpSessSendResponse, the sendbuf was freed if iRet returned a failure. However if error RELP_RET_IO_ERR happened in relpSendqAddBuf, sendbuf was already assigned to relpSendqe_t. As a result sendbuf was double freed in relpSendqDestruct. see also rsyslog/rsyslog#4184 rsyslog/rsyslog#4005
In iRet handler of relpSessSendResponse, the sendbuf was freed if iRet returned a failure. However if error RELP_RET_IO_ERR happened in relpSendqAddBuf, sendbuf was already assigned to relpSendqe_t. As a result sendbuf was double freed in relpSendqDestruct. see also rsyslog/rsyslog#4184 rsyslog/rsyslog#4005 closes rsyslog#183
In iRet handler of relpSessSendResponse, the sendbuf was freed if iRet returned a failure. However if error RELP_RET_IO_ERR happened in relpSendqAddBuf, sendbuf was already assigned to relpSendqe_t. As a result sendbuf was double freed in relpSendqDestruct. see also rsyslog/rsyslog#4184 rsyslog/rsyslog#4005 closes rsyslog#183
For successfull test, the PR rsyslog/librelp#184 for Issue rsyslog/librelp#183 is needed. Otherwise the test will trigger a double free inside librelp which results in a segfault in some cases. closes: rsyslog#4184 closes: rsyslog#4005
For successfull test, the PR rsyslog/librelp#184 for Issue rsyslog/librelp#183 is needed. Otherwise the test will trigger a double free inside librelp which results in a segfault in some cases. closes: rsyslog#4184 closes: rsyslog#4005
For successfull test, the PR rsyslog/librelp#184 for Issue rsyslog/librelp#183 is needed. Otherwise the test will trigger a double free inside librelp which results in a segfault in some cases. closes: rsyslog#4184 closes: rsyslog#4005
For successfull test, the PR rsyslog/librelp#184 for Issue rsyslog/librelp#183 is needed. Otherwise the test will trigger a double free inside librelp which results in a segfault in some cases. closes: rsyslog#4184 closes: rsyslog#4005
For successfull test, the PR rsyslog/librelp#184 for Issue rsyslog/librelp#183 is needed. Otherwise the test will trigger a double free inside librelp which results in a segfault in some cases. closes: rsyslog#4184 closes: rsyslog#4005
For successfull test, the PR rsyslog/librelp#184 for Issue rsyslog/librelp#183 is needed. Otherwise the test will trigger a double free inside librelp which results in a segfault in some cases. closes: rsyslog#4184 closes: rsyslog#4005
We have this problem too. The leak is happening both on log producers using omrelp, and on log receivers using imrelp. Within a day after restart imrelp process exhausts all available FDs and, being unable to open new connections, stops receiving messages. It appeared after updating from 8.39 to 8.2002. |
@alorbach: I think this is fixed with the new librelp versions, right? |
The leak itself is not fixed yet as it is difficult to fix. The crash that could happen with broken sessions was fixed. I amworking on the testbench plumbing which is needed to create a session break test. Then we can directly run the test in librelp testbench. After that I wanted to address the leak. |
This thread has been automatically locked since there has not been any recent activity after it was closed. Please open a new issue for related bugs. |
Expected behavior
open files descriptors are recycled when not used any more and allover number of FD's roughly aligns to currently open file handles plus (network) sockets
Actual behavior
open descriptors of the rsyslogd process rise by ~ 1440 / h (with two balancers monitoring open ports every 5 seconds) while actual number of connections (418) and open files (89) stay roughly the same. Most lines of lsof output lines show something like this:
rsyslogd 15288 root 7529u sock 0,7 0t0 139507485 protocol: TCP
Steps to reproduce the behavior
Environment
simplified config:
ruleset( name="output" ) {
action(
name="a_outlog"
type="omfile"
File="/tmp/out.log"
)
stop
}
module(load="imrelp")
input(type="imrelp" port="2510" ruleset="output")
then connect with netcat 100 times will permanently leak 100 descriptors:
for i in {1..100} ; do nc -z 127.0.0.1 2510 ; done
Doing the same with imtcp module does NOT change the number of used descriptors (only once after starting, as expected, since unused descriptors are recycled by imtcp).
Since it seems irrelevant for the FD leak if a relp protocol handshake actually happened, i put this issue under the rsyslog and not the librelp project.
The text was updated successfully, but these errors were encountered: