Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

(imfile) File truncation detect by comparing current offset and file size on disk #514

Closed
wants to merge 3 commits into from

Conversation

ymattw
Copy link
Contributor

@ymattw ymattw commented Sep 7, 2015

My attempt to fix #511. Suggest to review commits one by one.

Tested manually and also with following command sequence. cp will truncate the target file as open("/tmp/input.log", O_WRONLY|O_TRUNC) = 4.

for i in {1..9999}; do echo "message $i" >> /tmp/input.log; done && cp /etc/services /tmp/input.log

@ymattw
Copy link
Contributor Author

ymattw commented Sep 7, 2015

Looks like the change introduced a regression in tests. I will try to get a Ubuntu environment and verify locally. If anyone knows what failed exactly please let me know, thanks.

@ymattw
Copy link
Contributor Author

ymattw commented Sep 7, 2015

According to the travis build log, failed test was imfile-endregex.sh. However I could not reproduce this in a clean Ubuntu 12.04 docker container.

What I did was starting a clean 12.04 container, install the dependencies (+ libgnutls-dev) and executed build steps as defined in .travis.yml. All tests passed.

Then I tried to execute the imfile-endregex.sh separately by overwriting the TESTS macro in tests/Makefile, it also passed.

So this could be a intermittent problem in the test code, @rgerhards is it possible to re-trigger the travis CI build for this PR?

root@travis:/r/home/ymattw/ws/rsyslog# git branch
  master
* truncated-file

root@travis:/r/home/ymattw/ws/rsyslog# grep -w ^TESTS tests/Makefile
TESTS = stop-localvar.sh stop-msgvar.sh \
TESTS = imfile-endregex.sh

root@travis:/r/home/ymattw/ws/rsyslog# git diff
diff --git a/tests/imfile-endregex.sh b/tests/imfile-endregex.sh
index 8985a3e..3436a3c 100755
--- a/tests/imfile-endregex.sh
+++ b/tests/imfile-endregex.sh
@@ -2,6 +2,7 @@
 # This is part of the rsyslog testbench, licensed under ASL 2.0
 # This test mimics the test imfile-readmode2.sh, but works via
 # endmsg.regex. It's kind of a base test for the regex functionality.
+set -x
 echo ======================================================================
 echo [imfile-endregex.sh]
 . $srcdir/diag.sh init

root@travis:/r/home/ymattw/ws/rsyslog# make -C tests check-TESTS
make: Entering directory `/r/home/ymattw/ws/rsyslog/tests'
+ echo ======================================================================
======================================================================
+ echo '[imfile-endregex.sh]'
[imfile-endregex.sh]
+ . ./diag.sh init
++ TB_TIMEOUT_STARTSTOP=3000
++ case $1 in
++ ./killrsyslog.sh
++ '[' -z ']'
++ RS_SORTCMD=sort
++ '[' x '!=' x ']'
++ cp ./testsuites/diag-common.conf diag-common.conf
++ cp ./testsuites/diag-common2.conf diag-common2.conf
++ rm -f rsyslogd.started 'work-*.conf' rsyslog.random.data
++ rm -f rsyslogd2.started 'work-*.conf'
++ rm -f work rsyslog.out.log rsyslog2.out.log rsyslog.out.log.save
++ rm -rf test-spool test-logdir stat-file1
++ rm -f 'rsyslog.out.*.log' work-presort rsyslog.pipe
++ rm -f rsyslog.input rsyslog.empty
++ rm -f rsyslog.errorfile
++ rm -f 'core.*' 'vgcore.*'
++ mkdir test-spool
++ ulimit -c 4000000000
++ '[' '' '!=' '' ']'
++ '[' '' '!=' on ']'
++ rm -f IN_AUTO_DEBUG
++ '[' -e IN_AUTO_DEBUG ']'
+ . ./diag.sh startup imfile-endregex.conf
++ TB_TIMEOUT_STARTSTOP=3000
++ case $1 in
++ . ./diag.sh wait-startup
++ ../tools/rsyslogd -C -n -irsyslog.pid -M../runtime/.libs:../.libs -f./testsuites/imfile-endregex.conf
+++ TB_TIMEOUT_STARTSTOP=3000
+++ case $1 in
+++ i=0
+++ test '!' -f rsyslog.pid
+++ ./msleep 100
+++ let i++
+++ test 1 -gt 3000
+++ test '!' -f rsyslog.pid
+++ i=0
+++ test '!' -f rsyslogd.started
++++ cat rsyslog.pid
+++ echo 'rsyslogd started with pid ' 1902
rsyslogd started with pid  1902
+ echo 'msgnum:0
 msgnum:1'
+ echo msgnum:2
+ sleep 1
+ echo 'msgnum:3
 msgnum:4'
+ echo msgnum:5
+ sleep 1
+ . ./diag.sh shutdown-when-empty
++ TB_TIMEOUT_STARTSTOP=3000
++ case $1 in
++ '[' '' == 2 ']'
++ . ./diag.sh wait-queueempty
+++ TB_TIMEOUT_STARTSTOP=3000
+++ case $1 in
+++ '[' '' == 2 ']'
+++ echo WaitMainQueueEmpty
+++ ./diagtalker
imdiag[13500]: mainqueue empty
++ ./msleep 1000
+++ cat rsyslog.pid
++ kill 1902
+ . ./diag.sh wait-shutdown
++ TB_TIMEOUT_STARTSTOP=3000
++ case $1 in
++ i=0
++ test -f rsyslog.pid
++ ./msleep 100
++ let i++
++ test 1 -gt 3000
++ test -f rsyslog.pid
++ '[' -e 'core.*' ']'
+ sleep 1
++ grep -c HEADER rsyslog.out.log
+ NUMLINES=3
+ '[' -z 3 ']'
+ '[' '!' 3 -eq 3 ']'
+ for i in '{1..4}'
+ grep msgnum:1 rsyslog.out.log
+ '[' '!' 0 -eq 0 ']'
+ for i in '{1..4}'
+ grep msgnum:2 rsyslog.out.log
+ '[' '!' 0 -eq 0 ']'
+ for i in '{1..4}'
+ grep msgnum:3 rsyslog.out.log
+ '[' '!' 0 -eq 0 ']'
+ for i in '{1..4}'
+ grep msgnum:4 rsyslog.out.log
+ '[' '!' 0 -eq 0 ']'
+ . ./diag.sh exit
++ TB_TIMEOUT_STARTSTOP=3000
++ case $1 in
++ rm -f rsyslogd.started 'work-*.conf' diag-common.conf
++ rm -f rsyslogd2.started diag-common2.conf 'rsyslog.action.*.include'
++ rm -f work rsyslog.out.log rsyslog2.out.log rsyslog.out.log.save
++ rm -rf test-spool test-logdir stat-file1
++ rm -f 'rsyslog.out.*.log' rsyslog.random.data work-presort rsyslog.pipe
++ rm -f rsyslog.input rsyslog.conf.tlscert stat-file1 rsyslog.empty
++ rm -f rsyslog.errorfile
++ rm -f HOSTNAME imfile-state:.-rsyslog.input
++ unset TCPFLOOD_EXTRA_OPTS
++ echo -------------------------------------------------------------------------------
-------------------------------------------------------------------------------
PASS: imfile-endregex.sh
=============
1 test passed
=============
make: Leaving directory `/r/home/ymattw/ws/rsyslog/tests'
root@travis:/r/home/ymattw/ws/rsyslog# echo $?
0

@ymattw
Copy link
Contributor Author

ymattw commented Sep 7, 2015

Another proof: I saw the build https://travis-ci.org/rsyslog/rsyslog/builds/79089258 for PR #513 was successful, only difference was just one comment change inside runtime/stream.c.

@rgerhards
Copy link
Member

I verified, you were indeed hit by a glitch in the testbench. But may I kindly request to make this feature optional and turned off by default. We had such a bad time with this logic that I want to make sure it must be explicitely be opted in. Thanks!

@ymattw
Copy link
Contributor Author

ymattw commented Sep 14, 2015

@rgerhards That's totally acceptable! Do you want me to update this with a C macro or you will do it yourself?

I myself will do long duration test soon and will let you know if I run into trouble. Thanks!

@rgerhards
Copy link
Member

It would be great if you could amend your commit. I can do it, but I am pretty busy, so I could probably work on it around december this year. If you look at it, be aware that there are two types of config statements: legacy ones (the one starting with a dollar sign) and the current system. We do not add any legacy ones anymore (legacy is not good for complex cases). To add statements, search for "pblk" and you'll see how to others are handled. Please also don't forget to update the doc at https://github.com/rsyslog/rsyslog-doc

@ymattw
Copy link
Contributor Author

ymattw commented Sep 18, 2015

It will be a challenge for me to make it right, let me verify my patch in real prod environment for a couple of days first. I will try to get back on this soon. Thanks again!

@ymattw
Copy link
Contributor Author

ymattw commented Oct 11, 2015

Hi @rgerhards, I took some time on the related code today, here's what I am thinking:

  • Add a new option reopenOnTruncate="on"|"off" to the imfile module, default to "off"
  • Pass down the parameter to strmReadLine() and strmReadMultiLine() which are implemented in runtime/stream.c
  • Enable the reopen on truncate logic if the option is on

My understanding is that this requires to upgrade strmCURR_IF_VERSION. But I do not yet have knowledge how it is used. Please take a look and let me know whether if this is the right way to go, thanks!

@ymattw
Copy link
Contributor Author

ymattw commented Oct 22, 2015

Hi @rgerhards, I tested the current patch in real environment and found it indeed introduces new problem: when input load is heavy it sometimes send many duplicate message to output, even when there's no truncation happened.

I am closing this pull request, but I still think issue #511 is a big deal and should be fixed.

FYI my test was to use imfile and also output to a file, input > 1024 records per second, each record is 512 bytes long.

@ymattw ymattw closed this Oct 22, 2015
ymattw added a commit to ymattw/rsyslog that referenced this pull request Dec 3, 2015
Attempt to fix rsyslog#511.  This is an enhanced version based on
rsyslog#514, works fine for us so far.
As discussed before, I added an opt-in flag 'reopenOnTruncate' and
default is "off".
ymattw added a commit to ymattw/rsyslog that referenced this pull request Dec 3, 2015
Attempt to fix rsyslog#511.  This is an enhanced version based on
rsyslog#514, works fine for us so far.
As discussed before, I added an opt-in flag 'reopenOnTruncate' and
default is "off".
@lock
Copy link

lock bot commented Dec 28, 2019

This thread has been automatically locked since there has not been any recent activity after it was closed. Please open a new issue for related bugs.

@lock lock bot locked as resolved and limited conversation to collaborators Dec 28, 2019
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

imfile: truncated file won't be read until size growed to the previous length
2 participants