-
Notifications
You must be signed in to change notification settings - Fork 519
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Context.canceled handling changes for slo and receiver shim #3505
Closed
Closed
Changes from all commits
Commits
Show all changes
2 commits
Select commit
Hold shift + click to select a range
File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -1,6 +1,8 @@ | ||
package frontend | ||
|
||
import ( | ||
"context" | ||
"errors" | ||
"net/http" | ||
"time" | ||
|
||
|
@@ -66,8 +68,10 @@ func sloHook(allByTenantCounter, withinSLOByTenantCounter *prometheus.CounterVec | |
|
||
// most errors are SLO violations | ||
if err != nil { | ||
// however, if this is a grpc resource exhausted error (429) then we are within SLO | ||
if status.Code(err) == codes.ResourceExhausted { | ||
// However these errors are considered within SLO: | ||
// * grpc resource exhausted error (429) | ||
// * context canceled (client disconnected or canceled) | ||
if status.Code(err) == codes.ResourceExhausted || errors.Is(err, context.Canceled) { | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. same thoughts here. maybe we just log the cancel cause if one exists and see if that's populated? |
||
withinSLOByTenantCounter.WithLabelValues(tenant).Inc() | ||
} | ||
return | ||
|
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this brings to mind a difficulty I have on the read path. it's impossible to tell where this context cancelled came from. Is it further up in the otel receiver code due to a client disconnect? or deeper down in the distributor code.
For instance, if we fail to write to 2+ ingesters due to this timeout I think that would bubble up as a context canceled as well:
tempo/modules/distributor/distributor.go
Lines 392 to 393 in c41b078
withCancelCause
was added in 1.20:https://pkg.go.dev/context#WithCancelCause
to allow for communication of the reason, but I don't know if this is set correctly in the GRPC server. it's definitely not in our own code. Maybe we set it in our code and assume if there is no cause it's due to client disconnect?
we unfortunately cancel context in a lot of places and don't have good patterns for when, why or what is communicated when we do. as is, i think this would mask timeouts to the ingesters.