-
Notifications
You must be signed in to change notification settings - Fork 13
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Race condition on startup with janitor process #26
Comments
Scratch that: actually if there's a failure the file will not be picked up anymore as its extension is changed prior to the archiving attempt, and not changed back on error. So effectively files will accumulate over the rotation threshold slowly when these errors occur. |
Before logfiles are rotated (i.e., removed) a series of processing tasks are performed onto them. Among these, the first one is to archive the logfile into the NGAS server itself, if the server has been configured for this. During normal operations this is not a problem, but exactly on the first try, when the janitor process has just been created and the HTTP server might not be bound yet, it might result on an ECONNREFUSED error. This produced big error messages on the logs, while in reality this is a transient error that should disappear on the next try. A second, more general problem, was found while inspecting this code: logfiles were not renamed back to have their original ".unsaved" extensions when errors happened. This meant that when errors in general were found (and in particular when ECONNREFUSED was raised) logfiles were not picked up by successive janitor cycles. This commit acknowledges these problems, improving the handling of the ECONNREFUSED error in particular, and of errors in general. On the on hand, when the ECONNREFUSED error is encountered we simply issue a warning log statement instead of letting the exception to propagate up through the stack. On the other hand, if *any* error happens during archiving we rename the file back to its original *.unsaved name so it gets picked up again in the next janitor cycle. To make code and error handling a bit simpler I took the chance of moving the archiving of files into a separate try_archiving() function, whose invocation is then surrounded by the error handling block. Additionally I also added a sorted() call to process unsaved logfiles in time order, which until now wasn't guaranteed (and is a nice property to have). This commit addresses #26. Signed-off-by: Rodrigo Tobar <rtobar@icrar.org>
On startup NGAS creates a janitor process. The janitor process reports and error when it tries to archive a rotated log file. The error occurs because the NGAS HTTP server is not ready to handle requests. Here are the log messages...
EDIT by rtobar to format error message
The text was updated successfully, but these errors were encountered: