-
Notifications
You must be signed in to change notification settings - Fork 36
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
LoggingThread should recover from temporary outage on hub #60
Comments
rohanpm
added a commit
to rohanpm/kobo
that referenced
this issue
Jan 4, 2019
Previously, LoggingThread would recover from any XML-RPC fault, but would stop when any other type of exception was encountered. That is a problem, as it means the worker will permanently give up sending messages to the hub when all kinds of temporary issues occur (e.g. a temporary network disruption between worker and hub). The task underneath may continue running for hours, with all log messages being discarded. Given the nature of this thread, it makes more sense to attempt recovering from *all* kinds of errors, as we should try hard not to lose log messages from a task. Fixes release-engineering#60
rohanpm
added a commit
to rohanpm/kobo
that referenced
this issue
Jan 13, 2019
Previously, LoggingThread would recover from any XML-RPC fault, but would stop when any other type of exception was encountered. That is a problem, as it means the worker will permanently give up sending messages to the hub when all kinds of temporary issues occur (e.g. a temporary network disruption between worker and hub). The task underneath may continue running for hours, with all log messages being discarded. Given the nature of this thread, it makes more sense to attempt recovering from *all* kinds of errors, as we should try hard not to lose log messages from a task. Fixes release-engineering#60
rohanpm
added a commit
to rohanpm/kobo
that referenced
this issue
Mar 9, 2022
When a task runs, LoggingThread is responsible for sending all task logs to hub via XML-RPC. The handling of errors during this process was: 1) if an XML-RPC fault: retry an unlimited amount of times 2) if anything else: thread silently exits and logs stop forever This commit aims to improve the behavior in case (2). If the LoggingThread is about to die (which does happen sometimes in practice), we should at least try logging the relevant exception to the worker's local log file. This relates to issue release-engineering#60 which suggests that case (2) should also retry. That might still make sense, but I'm reluctant to have this retry on all kinds of exceptions without first understanding what exceptions can be hit in practice. So, let's first fix up the logging, then maybe go back and adjust retry behavior later based on what we find.
crungehottman
added a commit
to crungehottman/kobo
that referenced
this issue
Feb 6, 2024
Previously, LoggingThread would recover from any XML-RPC fault, but would stop when any other type of exception was encountered. That is a problem, as it means the worker will permanently give up sending messages to the hub when all kinds of temporary issues occur (e.g. a temporary network disruption between worker and hub). The task underneath may continue running for hours, with all log messages being discarded. Given the nature of this thread, it makes more sense to attempt recovering from all kinds of errors, as we should try hard not to lose log messages from a task. Fixes release-engineering#60 (This commit is a reimplementation of release-engineering#106)
lzaoral
pushed a commit
to lzaoral/kobo
that referenced
this issue
Mar 12, 2024
Previously, LoggingThread would recover from any XML-RPC fault, but would stop when any other type of exception was encountered. That is a problem, as it means the worker will permanently give up sending messages to the hub when all kinds of temporary issues occur (e.g. a temporary network disruption between worker and hub). The task underneath may continue running for hours, with all log messages being discarded. Given the nature of this thread, it makes more sense to attempt recovering from all kinds of errors, as we should try hard not to lose log messages from a task. Fixes release-engineering#60 (This commit is a reimplementation of release-engineering#106)
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
When a task is running in a kobo worker, it's the responsibility of the LoggingThread to send log messages back to the hub.
This is done by the following loop:
Problem: if self._hub.upload_task_log raises an error (other than an XML-RPC Fault), then the logging thread simply stops. However, the main thread of the task being executed doesn't necessarily stop.
The end result of this is that if the kobo hub has a temporary outage while a task is in progress, that task might continue executing but all logs after the outage would be lost.
It would be better if the LoggingThread kept retrying the task log uploads, for as long as the task's main thread is alive.
Steps to reproduce
Actual behavior
Expected behavior
The text was updated successfully, but these errors were encountered: