New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
error starting up action queue #3622
Comments
queue type 'disk' is very slow, it means no messages are kept in memory,
everything is written to disk and all queue activities reqire disk I/O.
If you have a very fast disk (say a PCI connected SSD), this may only be a
slowdown of 1000x or so, if you don't have a fast disk, it could be much slower.
Is this what you are intending?
David Lang
|
Yes, I don't care about slowness, actually I am writing on an SSD which is good enough for my purpose. Regards, P.S. as indicated above my question is not related to slowness, anyway. |
then you need to do more, look at the syncing/commit options.
also, you set the queue size rather small, when the queue fills up, it will stop
processing logs, so you will then either loose logs on the sender, or the
sending systems will stop running (including you being unable to login to them
as the login process writes logs)
David Lang
|
Hi Davidlang, I think you missed the point.
Please could someone answer these questions:
Thanks, |
ok, better reading you post....
yes, you are right I need two different queue, one for the systems logs (i.e. with the possibility to drop the messages) and another for my application (i.e. with the slow down mechanism). Ok, thanks for the suggestion. Anyway I think the problem I am facing has a different root cause. Regards, |
On Mon, 15 Apr 2019, foxpluto wrote:
Hi Davidlang,
I think you missed the point.
I agree with you, the queue is slow but:
* it could never miss a log, my log source is a reliable TCP source and as described in the documentation perform a slow down in case of congestion (read the chapter Filled-Up Queues in https://www.rsyslog.com/doc/v8-stable/concepts/queues.html)
note that TCP logging is not always reliable
https://rainer.gerhards.net/2008/04/on-unreliability-of-plain-tcp-syslog.html
* even if the queue fill up and no slow down mechanism is in place, there's no reason why a restart doesn't recover the problem!
Please could someone answer these questions:
* is it an error to have `outputToLogCollectorJsonDataQueue.qi` empty?
no, that file is created on shutdown or as needed (I think there is an option in
the most recent versions to make this file sooner).It sounds as ifyou may have
had an unclean shutdown.
* why restarting rsyslogd doesn't recover the problem but in message I still read: `rsyslogd: error starting up action queue [v8.1901.0 try https://www.rsyslog.com/e/2026 ]`
there is a utility that will rebuild the .qi file.
* why just deleting manually the queue files recover the problem?
because it throws away the queue data that is corrupted and lets it start from
scratch.
David Lang
|
Hi DavidLang, thanks for your quick an very detailed answer. Thanks, |
Ok, probably the tool you are referring is this one on github: https://gist.github.com/wilrnh/9373137 I read that a lot of people are facing the very same problem as mine since a lot of years. There will be some Queue check startup mechanism in the future? Regards, |
On Tue, 16 Apr 2019, foxpluto wrote:
probably the tool you are referring is this one on github: https://gist.github.com/wilrnh/9373137
yes, that's the one
I read that a lot of people are facing the very same problem as mine since a lot of years.
Rsyslog should perform a queue check at startup and recovery by itself the main problems regarding queue corruption.
the recovery can take quite a while to run (especially if it's got a large disk
queue)
Starting up with a warning in the messages without any other indication that the queue is corrupted and it will never be usable without human intervention is not a good behaviour.
There will be some Queue check startup mechanism in the future?
Do you have some plan to develop this feature or is a pull request?
It's not clear that recovering every time is the right thing to do every time.
There are no plans to add this at this time.
I would welcome an pull request that would do something more than report a
problem when a disk queue is corrupt.
I would suggest multiple actions:
1. delete the queue files
2. move the queue files recursively (add a suffix, but first check if something
with that suffix already exists, if so, move those files to .suffix.suffix,
after checking...)
3. run a rebuild, even though this can delay the start of rsyslog by a
significant amount of time.
David Lang
|
This thread has been automatically locked since there has not been any recent activity after it was closed. Please open a new issue for related bugs. |
Dear support,
I have a remote rsyslogd configured to log to a logcollector via RELP TLS, the purpose of the remote installation is to send all the logs to the logcollector in a secure way via internet, dealing with disconnection and power outage of the server.
For this reason I have configured a sending action with a local disk queue as indicated above.
I have experience a couple of strange problem. After a reboot I found this error in
/var/log/message
:After this problem no more logs are sent to the logcollector, all the logs are silently dropped.
Restart of rsyslog doesn't recover the error, the only way to retrieve the normal operation is to stop rsyslog, delete these two file:
and restart rsyslog.
As you could see the
outputToLogCollectorJsonDataQueue
is strangely empty.The
outputToLogCollectorJsonDataQueue.00005894
doesn't seem corrupted, at the end I could read:Expected behavior
I expect that rsyslog doesn't stop sending messages to logcollector and recover the problem by itself.
Actual behavior
Sometimes the remote rsyslog stop sending log to the logcollector server.
Steps to reproduce the behavior
Unknown, probably a power outage can sometimes recreate the problem.
Environment
thanks for your time and support.
Regards,
Stefano
The text was updated successfully, but these errors were encountered: