-
Notifications
You must be signed in to change notification settings - Fork 1.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Monitoring - Look for specific messages in retries #2451
Conversation
@@ -207,9 +207,14 @@ def _query_timeseries_with_retries(): | |||
def _has_timeseries(result): | |||
return len(list(result)) > 0 | |||
|
|||
def _unknown_metric(result): | |||
return ('The provided filter doesn\'t refer to any known ' | |||
'metric.'in result.message) |
This comment was marked as spam.
This comment was marked as spam.
Sorry, something went wrong.
This comment was marked as spam.
This comment was marked as spam.
Sorry, something went wrong.
This comment was marked as spam.
This comment was marked as spam.
Sorry, something went wrong.
retry_result = RetryResult(_has_timeseries, | ||
max_tries=MAX_RETRIES)(client.query) | ||
return RetryErrors(BadRequest, max_tries=MAX_RETRIES)(retry_result) | ||
return RetryErrors(BadRequest, _unknown_metric, |
This comment was marked as spam.
This comment was marked as spam.
Sorry, something went wrong.
This comment was marked as spam.
This comment was marked as spam.
Sorry, something went wrong.
RE: "generic messages" Sometimes the error payload isn't even JSON: Sometimes it's a gRPC error (which is usually full of good info): But usually we can get specific error info from the error response: |
Review comments addressed, I am looking at the other retries, some of them are a bit harder to repro so still playing with it. |
@@ -29,6 +29,21 @@ | |||
retry_500 = RetryErrors(InternalServerError) | |||
retry_503 = RetryErrors(ServiceUnavailable) | |||
|
|||
# Retry predicates |
This comment was marked as spam.
This comment was marked as spam.
Sorry, something went wrong.
|
@dhermes I also don't see anything in the |
That's why I mentioned
@waprin I can just take over that issue if you like. Wasn't trying to make it an undue burden on you. |
@dhermes definitely not an undue burden, but you seem like you understand what you want better, so happy to punt it to you, but if you change your mind I am more than happy to do it. |
@waprin You wrote:
That's incorrect.
I'm not understanding the problem you are running into. Please don't modify the class. It's working as intended. |
Yes, I was totally confused and misunderstood the problem I had previously encountered.
Looked a it again and realized this is the issue: by re-creating the Query object I was getting a new |
This issue may no longer be relevant due to its age. Feel free to re-open. |
This starts to address #2415.
This was the most obvious check to add. For the other errors, 404s, 500s, and 503s all provide very generic error messages (and really we need the API team to just fix the 500s).
As far as retry logic, I'm not convinced it can be significantly improved, using a base of 3 makes the jumps too big. Maybe we could start from a higher number, but I think it would complicate the retry logic to save at best a few seconds.
So I am voting to just close #2415 after this is merged but let me know if you disagree.