Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

imfile-state files are not removed #4186

Open
dwerder opened this issue Feb 20, 2020 · 21 comments
Open

imfile-state files are not removed #4186

dwerder opened this issue Feb 20, 2020 · 21 comments

Comments

@dwerder
Copy link

dwerder commented Feb 20, 2020

Expected behavior

imfile-state:<inode>:<hash> files should be deleted if logfile is rotated/changed

Actual behavior

imfile-state files are not deleted.

ls -l /var/lib/rsyslog

total 36288
--
-rw------- 1 root root   617 Feb  6 20:37 imfile-state:1791:a336f4732a67cb64
-rw------- 1 root root   617 Feb  6 20:40 imfile-state:49969:4647b446cb72a568
-rw------- 1 root root   616 Feb  6 20:45 imfile-state:49972:345a0f4fd2e76dac
-rw------- 1 root root   616 Feb  6 20:47 imfile-state:1791:705587a1f660ab3c
-rw------- 1 root root   617 Feb  6 20:56 imfile-state:49972:6620e54f59e7553d
-rw------- 1 root root   617 Feb  6 20:59 imfile-state:1791:dea1d9e5dbef52f2
-rw------- 1 root root   615 Feb  7 08:21 imfile-state:49972:d7e9cd027b1eb482
-rw------- 1 root root   616 Feb 11 16:17 imfile-state:49250:6ba76d5bd33a6140
-rw------- 1 root root   617 Feb 12 19:13 imfile-state:49281:816b61b854e44629
-rw------- 1 root root   616 Feb 12 23:57 imfile-state:1766:9f08112aadd18e9d
-rw------- 1 root root   616 Feb 13 00:06 imfile-state:690:52ae7a8a5808663c
-rw------- 1 root root   616 Feb 13 02:58 imfile-state:1766:90340e7a717f25f2
-rw------- 1 root root   617 Feb 13 02:59 imfile-state:681:bf2c868420379126
-rw------- 1 root root   617 Feb 13 03:00 imfile-state:1766:b6e6d96e484ac5da
-rw------- 1 root root   616 Feb 13 03:05 imfile-state:48844:d4bcd966f43b5fd8
-rw------- 1 root root   617 Feb 13 03:06 imfile-state:1766:fa2c216b57474009
-rw------- 1 root root   570 Feb 13 23:25 imfile-state:49266:782b7753d25796b5`

Steps to reproduce the behavior

Install rsyslog version 8.2001.0-3 and have fast rotated log files. Also we see that our logs always switch just a few inode (don't know if that info helps)

Since merge of #3990 we do not have the problems that log entries are resend, but we have servers with thousends and thousends of that imfile-state files.

Environment

  • rsyslog version: 8.2001.0 (aka 2020.01)
  • platform: CentOS Linux release 7.7.1908 (Core)
  • for configuration questions/issues, include rsyslog.conf and included config files
@rgerhards
Copy link
Member

rgerhards commented Feb 20, 2020

At the bottom, this is a feature. I am not sure if we added a config option to turn it off.

Here it is why we do not delete state files: we have many users who move log files around, even between monitored locations. They do not want rsyslog to re-sent data already sent from a different location. Thus we need to keep the state file as we never know when and where the original log file is moved in again.

That said, I think I remember a request to make this configurable. @jvymazal do you know it our of your head?

@dwerder
Copy link
Author

dwerder commented Feb 20, 2020

So what should we do with that files? When can they securly be deleted? We have servers who generes 20k of this files each day.

@dwerder
Copy link
Author

dwerder commented Feb 20, 2020

We can implement a workaround like a cron, which deletes the files every 5 min, when they are older then a few minutes. But how do we know if that would not break stuff for rsyslog?

@jvymazal
Copy link
Contributor

@dwerder can you confirm that these stae-files are new (created by rsyslog after #3990)? I was under impression that by that we covered all possible scenarios and deletes should now happen for these files...

@dwerder
Copy link
Author

dwerder commented Feb 20, 2020

They were created with the 8.2001.0 version. We did it on fresh installed mashines

@jvymazal
Copy link
Contributor

also @rgerhards it is indeed configurable, see https://www.rsyslog.com/doc/v8-stable/configuration/modules/imfile.html#deletestateonfiledelete but by default it is on meaning that rsyslog should actually delete the files.

@dwerder
Copy link
Author

dwerder commented Feb 20, 2020

We will try to explicite enable it. And then all the files should be deleted or do we need already existing files.

@jvymazal
Copy link
Contributor

Ok, now that I am thinking about it more, @dwerder are you seding HUP to rsyslog upon file rotations? If not, and you did not configure forced PersistStateInterval to happen at least once in between file rotations than deleting the state-files will not work.

@jvymazal
Copy link
Contributor

Also, as for the old files, if rsyslog is restarted and did not delete them than it will never delete them so you have to do it manually (the older hashes are not persisted anywhere)

@dwerder
Copy link
Author

dwerder commented Feb 26, 2020

Hello @jvymazal . We tried it with the kill Hup systemctl kill -s HUP rsyslog.service. The /var/lib/rsyslog/ dir was cleand and empty.
We still see that files are created and not removed. New files are created after each HUP.

First we used inotify and saw that the files are still created and we see that warning on each HUP.

module(
  load="imfile"
  mode="inotify"
)
Feb 26 13:41:44 host rsyslogd[24298]: imfile: internal error? inotify provided watch descriptor 5 which we could not find in our tables - ignored [v8.2001.0 try https://www.rsyslog.com/e/2175 ]
Feb 26 13:41:44 host rsyslogd[24298]: [origin software="rsyslogd" swVersion="8.2001.0" x-pid="24298" x-info="https://www.rsyslog.com"] rsyslogd was HUPed
Feb 26 13:41:58 host rsyslogd[24298]: imfile: internal error? inotify provided watch descriptor 6 which we could not find in our tables - ignored [v8.2001.0 try https://www.rsyslog.com/e/2175 ]
Feb 26 13:41:58 host rsyslogd[24298]: [origin software="rsyslogd" swVersion="8.2001.0" x-pid="24298" x-info="https://www.rsyslog.com"] rsyslogd was HUPed
Feb 26 13:42:11 host rsyslogd[24298]: imfile: internal error? inotify provided watch descriptor 7 which we could not find in our tables - ignored [v8.2001.0 try https://www.rsyslog.com/e/2175 ]
Feb 26 13:42:11 host rsyslogd[24298]: [origin software="rsyslogd" swVersion="8.2001.0" x-pid="24298" x-info="https://www.rsyslog.com"] rsyslogd was HUPed
Feb 26 13:42:24 host rsyslogd[24298]: imfile: internal error? inotify provided watch descriptor 8 which we could not find in our tables - ignored [v8.2001.0 try https://www.rsyslog.com/e/2175 ]
Feb 26 13:42:24 host rsyslogd[24298]: [origin software="rsyslogd" swVersion="8.2001.0" x-pid="24298" x-info="https://www.rsyslog.com"] rsyslogd was HUPed
Feb 26 13:42:32 host rsyslogd[24298]: imfile: internal error? inotify provided watch descriptor 9 which we could not find in our tables - ignored [v8.2001.0 try https://www.rsyslog.com/e/2175 ]
Feb 26 13:42:32 host rsyslogd[24298]: [origin software="rsyslogd" swVersion="8.2001.0" x-pid="24298" x-info="https://www.rsyslog.com"] rsyslogd was HUPed

When we use polling, we don't see the warning. But files are created as before.

module(
  load="imfile"
  mode="polling"
  PollingInterval="1"
)
Feb 26 13:56:05 host rsyslogd[1232]: [origin software="rsyslogd" swVersion="8.2001.0" x-pid="1232" x-info="https://www.rsyslog.com"] rsyslogd was HUPed
Feb 26 13:56:17 host rsyslogd[1232]: [origin software="rsyslogd" swVersion="8.2001.0" x-pid="1232" x-info="https://www.rsyslog.com"] rsyslogd was HUPed
Feb 26 13:56:33 host rsyslogd[1232]: [origin software="rsyslogd" swVersion="8.2001.0" x-pid="1232" x-info="https://www.rsyslog.com"] rsyslogd was HUPed
Feb 26 13:56:42 host rsyslogd[1232]: [origin software="rsyslogd" swVersion="8.2001.0" x-pid="1232" x-info="https://www.rsyslog.com"] rsyslogd was HUPed
Feb 26 13:56:56 host rsyslogd[1232]: [origin software="rsyslogd" swVersion="8.2001.0" x-pid="1232" x-info="https://www.rsyslog.com"] rsyslogd was HUPed
-rw------- 1 root root 1331 Feb 26 13:54 imfile-state:2503:2af90740afcd4e76
-rw------- 1 root root 2068 Feb 26 13:54 imfile-state:2521:974ea1fc83d42e97
-rw------- 1 root root 2251 Feb 26 13:54 imfile-state:2503:32ae111acde701ba
-rw------- 1 root root 1991 Feb 26 13:54 imfile-state:2521:d57cd2f52023a64f
-rw------- 1 root root 1747 Feb 26 13:55 imfile-state:2503:3e89dfd92e1c4d8d
-rw------- 1 root root  814 Feb 26 13:55 imfile-state:2521:885f498b07e065ec
-rw------- 1 root root 3546 Feb 26 13:55 imfile-state:2503:a62f83617e475762
-rw------- 1 root root  664 Feb 26 13:55 imfile-state:2521:1ca4b607bb96ae0d
-rw------- 1 root root 3779 Feb 26 13:55 imfile-state:2503:47a09a7394e9ef22
-rw------- 1 root root 1070 Feb 26 13:56 imfile-state:2521:a3ebf4ae3f0310b5
-rw------- 1 root root 2220 Feb 26 13:56 imfile-state:2503:968f398d61b69416
-rw------- 1 root root 1015 Feb 26 13:56 imfile-state:2521:ef3f47f8c85e2539
-rw------- 1 root root 2231 Feb 26 13:56 imfile-state:2503:0ecacc5b15470d9b
-rw------- 1 root root  544 Feb 26 13:56 imfile-state:2521:049e49fb20319368

I like to mention that it is one log file and it is rotate multiple times within a minute.

@jvymazal
Copy link
Contributor

@rgerhards please correct me if I am wrong but sending HUP should force persisting of imfile statefiles, right?

@rgerhards
Copy link
Member

@jvymazal think so, but not 100% sure - sorry, heading from teleconf to teleconf today... Will check ASAP.

@ChenQianCQ
Copy link

ChenQianCQ commented May 20, 2020

i am also hit this issue with version 8.2002.0 .
it is very simple to reproduce.

  1. write a log file by bash like this.
    #! /bin/bash
    i=0
    while [ 1 ];do
    i=expr $i + 1
    echo "$i" >> /opt/1.log
    done

  2. remove the log file "/opt/1.log" .
    rm -rf /opt/1.log

  3. in /var/lib/rsyslog
    -rw------- 1 root root 88 May 20 14:29 imfile-state:88536:1588fefbf7f8559e
    -rw------- 1 root root 84 May 20 14:29 imfile-state:88536:2e25ab1d834616f2
    -rw------- 1 root root 88 May 20 14:27 imfile-state:88536:32f8dad678a397a5

@jvymazal
Copy link
Contributor

jvymazal commented May 21, 2020

i am also hit this issue with version 8.2002.0 .
it is very simple to reproduce.

...

Would you please also post relevant configuration snippet you use in reproducing?
Also, do you have rsyslog running all the time or do you stop/restart it any time?

Thanks

@ChenQianCQ
Copy link

i am also hit this issue with version 8.2002.0 .
it is very simple to reproduce.
...

Would you please also post relevant configuration snippet you use in reproducing?
Also, do you have rsyslog running all the time or do you stop/restart it any time?

Thanks

i am add my own configuration in /etc/rsyslog.d directory. the configure file is like this (it is syslogmgr-collect.conf, but can't upload, so i change to text format)
syslogmgr-collect.txt

and, the rsyslog process is always running , NO excute stop/restart/kill/...... operation.

@jvymazal
Copy link
Contributor

May thanks for info, I will look into it once I have a moment.

@ChenQianCQ
Copy link

@jvymazal ,do you have any update on this issue ?

@zbukhari-apex
Copy link

zbukhari-apex commented Dec 21, 2023

I'm running into this as well. A slew of versions but using Ubuntu:

No state file surprisingly but behaving as it should:
Trusty (i.e. 14.04) - 7.4.4-1ubuntu2.7
Xenial (i.e. 16.04) - 8.16.0-1ubuntu3.1

State file behaving as it should:
Bionic (i.e. 18.04) - 8.32.0-1ubuntu4.2

/var/spool/rsyslog contains imfile-state:-var-log-audit-audit.log it does not leave stale files around.

Problem started:
Focal (i.e. 20.04) - 8.2001.0-1ubuntu1.3
Jammy (i.e. 22.04) - 8.2112.0-2ubuntu2.2

/var/spool/rsyslog contains ... a lot of imfile-state:8620608:ffa5e1b5ced716b0 with the file ID hash changing and the old files not being cleaned up. Documentation says deleteStateOnFileDelete is set to on. It doesn't cleanup. I tried to set it explicitly, it still doesn't clean up. I tried legacy syntax, that doesn't work either. I tried different modes, still doesn't work.

I have debug output if desired. Current for 8.2112.0 but if it would help for another version, let me know and I can do that for you.

This became an issue as we use rsyslog to process and send auditd (i.e. linux-audit) logs. Some of our servers have a lot of activity and this caused us to run out of inodes. Very rare overall but a thing I'm sure many Linux admins do. There used to be an rsyslog wiki page which showed how to use rsyslog for auditd logging. That's pretty much our setup. I don't have that link anymore but it's probably going to be similar to this. We use UDP but pretty much the same setup.

https://linux-audit.com/central-audit-logging-configuration-collecting-linux-audit-events/

@zbukhari-apex
Copy link

I have noticed that the unlink call via strace does not have the actual file name. It shows it in the rsyslog debug logs but when performing strace -f -e trace=file -p PID_OF_RSYSLOG, the actual call is incomplete. It's like the filename is missing the last colon. Looking at the source I can't see where this is incorrect as the debug log prints out statefn. There is an unlink call below it unlink((char *)statefn);.

https://github.com/rsyslog/rsyslog/blob/master/plugins/imfile/imfile.c#L1063

In the rsyslog debug logs essentially (20 lines before and 20 lines after "deleting state file"):

4156.793236863:main Q:Reg/w0  : ruleset.c: Filter: check for property 'msg' (value 'type=SYSCALL msg=audit(1703184156.787:401897213): arch=c000003e syscall=59 success=no exit=-2 a0=7fba5f17b810 a1=7fba5f1bcfb0 a2=7ffe40ae9130 a3=0 items=1 ppid=3177174 pid=3177264 auid=4294967295 uid=1000 gid=1000 euid=1000 su
id=1000 fsuid=1000 egid=1000 sgid=1000 fsgid=1000 tty=(none) ses=4294967295 comm="python3" exe="/opt/omd/versions/2.2.0p8.cre/bin/python3.11" subj=? key="execve"') contains '[UFW ': FALSE
4156.793243671:main Q:Reg/w0  : ruleset.c: PROPFILT condition result is 0
4156.793247013:main Q:Reg/w0  : rainerscript.c:     PROPFILT
4156.793256812:main Q:Reg/w0  : rainerscript.c:         Property.: 'syslogtag'
4156.793266428:main Q:Reg/w0  : rainerscript.c:         Operation: 'isequal'
4156.793279165:main Q:Reg/w0  : rainerscript.c:         Value....: '[CLOUDINIT]'
4156.793288874:main Q:Reg/w0  : ruleset.c: Filter: check for property 'syslogtag' (value 'tag_audit_log') isequal '[CLOUDINIT]': FALSE
4156.793295697:main Q:Reg/w0  : ruleset.c: PROPFILT condition result is 0
4156.793299055:main Q:Reg/w0  : rainerscript.c:     PRIFILT 'local6.info'
4156.793308913:main Q:Reg/w0  : rainerscript.c:     pmask: 
4156.793317010:imfile.c       : imfile.c: removing old state file: '/var/spool/rsyslog/imfile-state:8620502'

4156.793318595:main Q:Reg/w0  : rainerscript.c:  X  X  X  X  X  X 4156.793340063:main Q:Reg/w0  : rainerscript.c:  X 
4156.793340122:imfile.c       : imfile.c: trying to delete no longer valid statefile '/var/spool/rsyslog/imfile-state:8620502' which no longer exists (probably already deleted)

4156.793343682:main Q:Reg/w0  : rainerscript.c:  X 4156.793346708:imfile.c       : strm 0x7fe920003280: stream.c: file 16(/var/log/audit/audit.log) closing, bDeleteOnClose 0

4156.793347651:main Q:Reg/w0  : rainerscript.c:  X 4156.793351455:imfile.c       : strm 0x7fe920003280: stream.c: file 16(/var/log/audit/audit.log) closing

4156.793352379:main Q:Reg/w0  : rainerscript.c:  X  X  X 
4156.793361611:imfile.c       : imfile.c: act_obj_destroy: deleting state file /var/spool/rsyslog/imfile-state:8620502:324c212b17bcc6a5

4156.793363474:main Q:Reg/w0  : rainerscript.c:  X  X  X  X  X  X  X  X  X  X 7F  X  X  X 
4156.793409474:main Q:Reg/w0  : ruleset.c: PRIFILT condition result is 1
4156.793412946:main Q:Reg/w0  : rainerscript.c:     ACTION 2 [builtin:omfwd:@rubberducky1.quack.quack:867]
4156.793422520:main Q:Reg/w0  : ruleset.c: executing action 2
4156.793426251:main Q:Reg/w0  : ../action.c: action 'action-2-builtin:omfwd': called, logging to builtin:omfwd (susp 0/0, direct q 1)
4156.793430284:main Q:Reg/w0  : ../action.c: action 'action-2-builtin:omfwd': is transactional - executing in commit phase
4156.793433705:main Q:Reg/w0  : ../action.c: actionPrepare[action-2-builtin:omfwd]: enter
4156.793437112:main Q:Reg/w0  : ../action.c: checking external state file
4156.793440629:main Q:Reg/w0  : ../action.c: done checking external state file, iRet=0
4156.793444108:main Q:Reg/w0  : omfwd.c: omfwd: beginTransaction
4156.793447611:main Q:Reg/w0  : omfwd.c: omfwd: doTryResume rubberducky1.quack.quack iRet 0
4156.793451074:main Q:Reg/w0  : ../action.c: action[action-2-builtin:omfwd] transitioned to state: itx
4156.793454689:main Q:Reg/w0  : ../action.c: action 'action-2-builtin:omfwd': set suspended state to 0
4156.793458102:main Q:Reg/w0  : rainerscript.c:     ACTION 3 [builtin:omfwd:@rubberducky2.quack.quack:5309]
4156.793467738:main Q:Reg/w0  : ruleset.c: executing action 3
4156.793471516:main Q:Reg/w0  : ../action.c: action 'action-3-builtin:omfwd': called, logging to builtin:omfwd (susp 0/0, direct q 1)
4156.793475204:main Q:Reg/w0  : ../action.c: action 'action-3-builtin:omfwd': is transactional - executing in commit phase
4156.793478688:main Q:Reg/w0  : ../action.c: actionPrepare[action-3-builtin:omfwd]: enter
4156.793483849:main Q:Reg/w0  : ../action.c: checking external state file

However strace never shows that with unlink. These aren't the same files as they were ran at different times but we only have state files for audit logs. /var/log/audit/audit.log:

[pid 3173847] openat(AT_FDCWD, "/var/spool/rsyslog/imfile-state:8620608:7c55a701dd8e041e", O_WRONLY|O_CREAT|O_NOCTTY|O_TRUNC|O_CLOEXEC, 0600) = 22
[pid 3173847] unlink("/var/spool/rsyslog/imfile-state:8620608") = -1 ENOENT (No such file or directory)
[pid 3173847] openat(AT_FDCWD, "/var/spool/rsyslog/imfile-state:8615038:ff0da1b277cd29f3", O_RDONLY|O_NOCTTY|O_CLOEXEC) = -1 ENOENT (No such file or directory)
[pid 3173847] openat(AT_FDCWD, "/var/spool/rsyslog/imfile-state:8615038", O_RDONLY|O_NOCTTY|O_CLOEXEC) = -1 ENOENT (No such file or directory)
[pid 3173847] newfstatat(AT_FDCWD, "/var/spool/rsyslog/imfile-state:-var-log-audit-audit.log", 0x7fe936e283f0, 0) = -1 ENOENT (No such file or directory)
[pid 3173847] openat(AT_FDCWD, "/var/spool/rsyslog/imfile-state:8615038:ff0da1b277cd29f3", O_WRONLY|O_CREAT|O_NOCTTY|O_TRUNC|O_CLOEXEC, 0600) = 22
[pid 3173847] unlink("/var/spool/rsyslog/imfile-state:8615038") = -1 ENOENT (No such file or directory)
[pid 3173847] openat(AT_FDCWD, "/var/spool/rsyslog/imfile-state:8615038:ff0da1b277cd29f3", O_WRONLY|O_CREAT|O_NOCTTY|O_TRUNC|O_CLOEXEC, 0600) = 22
[pid 3173847] unlink("/var/spool/rsyslog/imfile-state:8615038") = -1 ENOENT (No such file or directory)
[pid 3173847] openat(AT_FDCWD, "/var/spool/rsyslog/imfile-state:8620502:20b52b2c9d03a8cc", O_RDONLY|O_NOCTTY|O_CLOEXEC) = -1 ENOENT (No such file or directory)
[pid 3173847] openat(AT_FDCWD, "/var/spool/rsyslog/imfile-state:8620502", O_RDONLY|O_NOCTTY|O_CLOEXEC) = -1 ENOENT (No such file or directory)
[pid 3173847] newfstatat(AT_FDCWD, "/var/spool/rsyslog/imfile-state:-var-log-audit-audit.log", 0x7fe936e283f0, 0) = -1 ENOENT (No such file or directory)
[pid 3173847] openat(AT_FDCWD, "/var/spool/rsyslog/imfile-state:8620502:20b52b2c9d03a8cc", O_WRONLY|O_CREAT|O_NOCTTY|O_TRUNC|O_CLOEXEC, 0600) = 22
[pid 3173847] unlink("/var/spool/rsyslog/imfile-state:8620502") = -1 ENOENT (No such file or directory)
[pid 3173847] openat(AT_FDCWD, "/var/spool/rsyslog/imfile-state:8620502:20b52b2c9d03a8cc", O_WRONLY|O_CREAT|O_NOCTTY|O_TRUNC|O_CLOEXEC, 0600) = 22

Here via strace you can see the argument supplied to unlink is missing the file ID hash.

I found this to be similar - updated a month ago - trying to see if this will work: #5258

@zbukhari-apex
Copy link

The fix used in #5258 did not work for me. Stale files are still left and the unlink call is not given the right argument.

@powerfulkelly
Copy link

I am also hit this issue with version 8.2312.0.
Has this issue been resolved or is there any workaround that can support this “deleteStateOnFileDelete="on"”?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

6 participants