-
Notifications
You must be signed in to change notification settings - Fork 2.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Make logging of RPC retries configurable #2486
Conversation
This simplifies away unnecessary propagation, cleans up argument lists. As far as I can tell, log_exceptions is True in all execution paths. It seems that 6dfe9af removed the last occasion of it being (spuriously) set to False.
0978652
to
d2d3944
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Great!
@@ -116,31 +116,34 @@ def __init__(self, url='http://localhost:8082/', connect_timeout=None): | |||
|
|||
self._rpc_retry_attempts = config.getint('core', 'rpc-retry-attempts', 3) | |||
self._rpc_retry_wait = config.getint('core', 'rpc-retry-wait', 30) | |||
self._rpc_log_retries = config.getboolean('core', 'rpc-log-retries', True) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh, this part of the code still doesn't use the config class syntax? Oh well maybe better done in a separate PR.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, better be a separated PR.
try: | ||
self.get_work(fetch_results) | ||
self.fail("get_work should have thrown RPCError") | ||
except luigi.rpc.RPCError as e: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can't you use self.assertRaises?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This was done to assert about e.sub_exception.
Motivation and Context
Logging of intermittent RPC exceptions and retries as they happen makes logs spammy and obscures other errors, so we want a way to disable it. If the exceptions persist, the final RPCError does log the last exception anyway.
Especially when long running (hanging, even) tasks are considered, with periods of silence in the logs, any intermittent "Failed connecting to remote scheduler" logging from KeepAliveThread looks very flashy and misleads troubleshooting often, in our experience.
The health of communication channels (presence of intermittent failures) probably should be easy to monitor through dedicated tooling, as opposed to logging it interspersed with task logs.
Have you tested this? If so, how?
Ought to be covered with unit tests.