Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

error starting up action queue #3622

Closed
foxpluto opened this issue Apr 15, 2019 · 10 comments
Closed

error starting up action queue #3622

foxpluto opened this issue Apr 15, 2019 · 10 comments

Comments

@foxpluto
Copy link

Dear support,

I have a remote rsyslogd configured to log to a logcollector via RELP TLS, the purpose of the remote installation is to send all the logs to the logcollector in a secure way via internet, dealing with disconnection and power outage of the server.

For this reason I have configured a sending action with a local disk queue as indicated above.

I have experience a couple of strange problem. After a reboot I found this error in /var/log/message:

Apr 11 09:26:45 datalog01 rsyslogd: error starting up action queue [v8.1901.0 try https://www.rsyslog.com/e/2026 ]

After this problem no more logs are sent to the logcollector, all the logs are silently dropped.
Restart of rsyslog doesn't recover the error, the only way to retrieve the normal operation is to stop rsyslog, delete these two file:

-rw-------.  1 root root 4400886 Apr 11 09:25 outputToLogCollectorJsonDataQueue.00005894
-rw-------.  1 root root       0 Apr 11 09:25 outputToLogCollectorJsonDataQueue.qi

and restart rsyslog.
As you could see the outputToLogCollectorJsonDataQueue is strangely empty.

The outputToLogCollectorJsonDataQueue.00005894 doesn't seem corrupted, at the end I could read:

.
<Obj:1:msg:1:
+iProtocolVersion:2:1:0:
+iSeverity:2:1:7:
+iFacility:2:1:5:
+msgFlags:2:1:0:
+ttGenTime:2:10:1554967498:
+tRcvdAt:3:34:2:2019:4:11:9:24:58:646501:6:+:2:0:
+tTIMESTAMP:3:34:2:2019:4:11:9:24:58:646501:6:+:2:0:
+pszTAG:1:16:rsyslogd-pstats::
+pszRawMsg:1:105:main Q: origin=core.queue size=15 enqueued=14272662 full=0 discarded.full=0 discarded.nf=0 maxqsize=3782 :
+pszHOSTNAME:1:9:datalog01:
+pszInputName:1:8:impstats:
+pszRcvFrom:1:9:datalog01:
+pszRcvFromIP:1:9:127.0.0.1:
+offMSG:2:1:0:
>End
.

Expected behavior

I expect that rsyslog doesn't stop sending messages to logcollector and recover the problem by itself.

Actual behavior

Sometimes the remote rsyslog stop sending log to the logcollector server.

Steps to reproduce the behavior

Unknown, probably a power outage can sometimes recreate the problem.

Environment

  • rsyslog version:
rsyslogd  8.1901.0 (aka 2019.01) compiled with:
	PLATFORM:				x86_64-redhat-linux-gnu
	PLATFORM (lsb_release -d):
	FEATURE_REGEXP:				Yes
	GSSAPI Kerberos 5 support:		Yes
	FEATURE_DEBUG (debug build, slow code):	No
	32bit Atomic operations supported:	Yes
	64bit Atomic operations supported:	Yes
	memory allocator:			system default
	Runtime Instrumentation (slow code):	No
	uuid support:				Yes
	systemd support:			Yes
	Number of Bits in RainerScript integers: 64
  • platform:
CentOS Linux release 7.6.1810 (Core)
  • the config file of the queue:
   action
   (
        type="omrelp"
        target="87.26.192.115"
        port="514"
        name="outputToLogCollector"
        tls="on"
        tls.compression="on"
        tls.caCert="/etc/pki/tls/certs/ca.pem"
        tls.myCert="/etc/pki/tls/certs/datalog01-cert.pem"
        tls.myPrivKey="/etc/pki/tls/private/datalog01-key.pem"
        tls.authmode="name"
        tls.permittedpeer=["logcollector"]
        Action.ResumeRetryCount="-1"
        Action.ResumeInterval="5"
        queue.type="Disk"
        queue.size="250000"
        queue.filename="outputToLogCollectorJsonDataQueue"
        queue.spoolDirectory="/var/lib/rsyslog"
        queue.maxFileSize="16m"
        queue.maxDiskSpace="10G"
        queue.checkpointInterval="1"
        queue.saveOnShutdown="on"
        queue.workerThreads="4"
        queue.workerThreadMinimumMessages="60000"
        queue.timeoutEnqueue="100000"
    )

thanks for your time and support.

Regards,
Stefano

@davidelang
Copy link
Contributor

davidelang commented Apr 15, 2019 via email

@foxpluto
Copy link
Author

Yes,

I don't care about slowness, actually I am writing on an SSD which is good enough for my purpose.
What I am trying to achieve is to never loose a line of log in any occasion.

Regards,
S.

P.S. as indicated above my question is not related to slowness, anyway.

@davidelang
Copy link
Contributor

davidelang commented Apr 15, 2019 via email

@foxpluto
Copy link
Author

Hi Davidlang,

I think you missed the point.
I agree with you, the queue is slow but:

  • it could never miss a log, my log source is a reliable TCP source and as described in the documentation perform a slow down in case of congestion (read the chapter Filled-Up Queues in https://www.rsyslog.com/doc/v8-stable/concepts/queues.html)
  • even if the queue fill up and no slow down mechanism is in place, there's no reason why a restart doesn't recover the problem!

Please could someone answer these questions:

  • is it an error to have outputToLogCollectorJsonDataQueue.qi empty?
  • why restarting rsyslogd doesn't recover the problem but in message I still read: rsyslogd: error starting up action queue [v8.1901.0 try https://www.rsyslog.com/e/2026 ]
  • why just deleting manually the queue files recover the problem?

Thanks,
S.

@foxpluto
Copy link
Author

then you need to do more, look at the syncing/commit options. also, you set the queue size rather small, when the queue fills up, it will stop processing logs, so you will then either loose logs on the sender, or the sending systems will stop running (including you being unable to login to them as the login process writes logs) David Lang

ok, better reading you post....
yes you are right, in case the queue will be filled up some problem will occur but let me be more clear on the software I have in place:

  • I have a software written be me which use rsyslog with RELP like a reliable way to move line of logs (actually these logs are data read from a source) to a log collector;
  • the same queue is used to logcollect even the system logs.

yes, you are right I need two different queue, one for the systems logs (i.e. with the possibility to drop the messages) and another for my application (i.e. with the slow down mechanism).

Ok, thanks for the suggestion.

Anyway I think the problem I am facing has a different root cause.

Regards,
S.

@davidelang
Copy link
Contributor

davidelang commented Apr 15, 2019 via email

@foxpluto
Copy link
Author

Hi DavidLang,

thanks for your quick an very detailed answer.
Could you point me to the utility to rebuild the cache you mention in your post?
Do you think is it possible to launch this utility every time rsyslogd start?
In my installation I cannot avoid the power outage so this problem will reappear sometimes in the future but if every time the system startup the cache is cleaned but the utility you mentioned I will never face loosing line of logs.

Thanks,
S.

@foxpluto
Copy link
Author

Ok,

probably the tool you are referring is this one on github: https://gist.github.com/wilrnh/9373137

I read that a lot of people are facing the very same problem as mine since a lot of years.
Rsyslog should perform a queue check at startup and recovery by itself the main problems regarding queue corruption.
Starting up with a warning in the messages without any other indication that the queue is corrupted and it will never be usable without human intervention is not a good behaviour.

There will be some Queue check startup mechanism in the future?
Do you have some plan to develop this feature or is a pull request?

Regards,
S.

@davidelang
Copy link
Contributor

davidelang commented Apr 16, 2019 via email

@lock
Copy link

lock bot commented Dec 24, 2019

This thread has been automatically locked since there has not been any recent activity after it was closed. Please open a new issue for related bugs.

@lock lock bot locked as resolved and limited conversation to collaborators Dec 24, 2019
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants