-
Notifications
You must be signed in to change notification settings - Fork 60
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Allow reporting gRPC codes in "status_code" label of request duration metrics #424
Conversation
Signed-off-by: Yuri Nikolic <durica.nikolic@grafana.com>
// This implementation differs from status.FromError() because the | ||
// latter checks only if the given error can be cast to status.Status, | ||
// and doesn't check other errors in the given error's tree. | ||
func ErrorToStatus(err error) (*status.Status, bool) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This current httpgrpc.statusFromError()
function has been renamed to ErrorToStatus()
, exported and moved here.
grpcstatus "google.golang.org/grpc/status" | ||
) | ||
|
||
func TestErrorToStatus(t *testing.T) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The current httpgrpc.TestStatusFromError()
has been moved here.
Signed-off-by: Yuri Nikolic <durica.nikolic@grafana.com>
741ad95
to
d9ac52a
Compare
) | ||
|
||
func TestAppendMessageSizeToOutgoingContext(t *testing.T) { | ||
ctx := context.Background() | ||
|
||
req := &httpgrpc.HTTPRequest{ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This has been replaced with fakeSizer
to avoid a circular dependency. The parameter of AppendMessageSizeToOutgoingContext
should be any implementation of the Sizer
interface, and httpgrpc.HTTPRequest
was just one of them.
grpcutil/cancel.go
Outdated
) | ||
|
||
// IsCanceled checks whether an error comes from an operation being canceled | ||
func IsCanceled(err error) bool { | ||
if errors.Is(err, context.Canceled) { | ||
return true | ||
} | ||
s, ok := status.FromError(err) | ||
s, ok := ErrorToStatus(err) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Doing full conversion to gogo's status.Status
seems unnecessary here, if all we care about is code from grpc.Status. Perhaps we can have ErrorToStatusCode
function too?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It is true that we need only the status code here, but in order to get it, we anyway need to call a gogostatus.FromError()
or grpcstatus.FromError()
(the current implementation).
My idea here was to unify the way we extract statuses. Because in most of the places it is done by gogostatus
, and in a couple of places it is done by grpcstatus
.
The idea is, anyway, to get rid of the gogo
-related things, but once we find a valid alternative. If we now start mixing gogo
and grpc
usages, it will be more difficult to do the change one day.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If we now start mixing gogo and grpc usages, it will be more difficult to do the change one day.
I don't see how my proposal would do that?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't see how my proposal would do that?
I was referring to the current implementation: it added a grpcstatus
usage to dskit
. And my motivation with the proposed change was to get rid of it. I meant that.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I was referring to the current implementation: it added a
grpcstatus
usage todskit
. And my motivation with the proposed change was to get rid of it. I meant that.
We need to check for GRPCStatus() *grpcstatus.Status
method, is this what you're referring to? I don't see a way around it.
Signed-off-by: Yuri Nikolic <durica.nikolic@grafana.com>
32a09f0
to
f180cf0
Compare
Signed-off-by: Yuri Nikolic <durica.nikolic@grafana.com>
…LabelOption structs Signed-off-by: Yuri Nikolic <durica.nikolic@grafana.com>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This PR is getting bigger and bigger :/
middleware/grpc_instrumentation.go
Outdated
statusCode := grpcutil.ErrorToStatusCode(err) | ||
|
||
if statusCode == codes.Canceled { | ||
return statusCode | ||
} | ||
|
||
if isHTTPStatusCode(statusCode) { | ||
return statusCode | ||
} | ||
if i.acceptGRPCStatuses { | ||
return statusCode | ||
} | ||
return codes.Unknown |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this function is doing too much -- it should just return grpcutil.ErrorToStatusCode(err)
for non-nil and not-canceled errors. (It also doesn't need to be a method of InstrumentationLabel
.)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This function is the only place where we actually take into account whether i.acceptGRPCStatues
is set to true or not. If we move it out from here, we need to put it in statusCodeToString()
function. I can try to see how it would look like, but it will for sure complicate the test.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I can try to see how it would look like, but it will for sure complicate the test.
I'm not sure if test is too complicated or not, but I think this function looks good now. Thank you for giving it a try.
Signed-off-by: Yuri Nikolic <durica.nikolic@grafana.com>
Signed-off-by: Yuri Nikolic <durica.nikolic@grafana.com>
594186e
to
907aa3c
Compare
Signed-off-by: Yuri Nikolic <durica.nikolic@grafana.com>
907aa3c
to
b0e0b74
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Great work 👏
instrumentationOptions []InstrumentationOption | ||
expectedAcceptGRPCStatuses bool | ||
}{ | ||
"Applying no InstrimentationOption sets acceptGRPCStatus to false": { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: typo in InstrimentationOption
func TestApplyInstrumentationLabelOptions(t *testing.T) { | ||
testCases := map[string]struct { | ||
instrumentationOptions []InstrumentationOption | ||
expectedAcceptGRPCStatuses bool |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
it's now called "reportGRPCStatus".
CHANGELOG.md
Outdated
@@ -155,6 +155,13 @@ | |||
* [ENHANCEMENT] Memcached: allow to configure write and read buffer size (in bytes). #414 | |||
* [ENHANCEMENT] Server: Add `-server.http-read-header-timeout` option to specify timeout for reading HTTP request header. It defaults to 0, in which case reading of headers can take up to `-server.http-read-timeout`, leaving no time for reading body, if there's any. #423 | |||
* [ENHANCEMENT] Make httpgrpc.Server produce non-loggable errors when a header with key `httpgrpc.DoNotLogErrorHeaderKey` and any value is present in the HTTP response. #421 | |||
* [ENHANCEMENT] Server: Add `-server.report-grpc-codes-in-instrumentation-label` CLI flag to specify whether gRPC status codes should be used in instrumentation labels. It defaults to false, meaning that gRPC status codes are represented with `error` value. #424 | |||
* [ENHANCEMENT] Instrumentation: `middleware.InstrumentationOption` struct, and a special value `middleware.ReportGRPCStatusOption` have been added to allow both server and clients to configure gRPC status code usages in instrumentation labels. This can be optionally used in the following functions: #424 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
* [ENHANCEMENT] Instrumentation: `middleware.InstrumentationOption` struct, and a special value `middleware.ReportGRPCStatusOption` have been added to allow both server and clients to configure gRPC status code usages in instrumentation labels. This can be optionally used in the following functions: #424 | |
* [ENHANCEMENT] Instrumentation: Added `middleware.ReportGRPCStatusOption` that can be passed to the following functions to enable reporting of gRPC status codes in "status_code" label (instead of simplified "error", "cancel" or "success" values): #424 |
CHANGELOG.md
Outdated
@@ -155,6 +155,13 @@ | |||
* [ENHANCEMENT] Memcached: allow to configure write and read buffer size (in bytes). #414 | |||
* [ENHANCEMENT] Server: Add `-server.http-read-header-timeout` option to specify timeout for reading HTTP request header. It defaults to 0, in which case reading of headers can take up to `-server.http-read-timeout`, leaving no time for reading body, if there's any. #423 | |||
* [ENHANCEMENT] Make httpgrpc.Server produce non-loggable errors when a header with key `httpgrpc.DoNotLogErrorHeaderKey` and any value is present in the HTTP response. #421 | |||
* [ENHANCEMENT] Server: Add `-server.report-grpc-codes-in-instrumentation-label` CLI flag to specify whether gRPC status codes should be used in instrumentation labels. It defaults to false, meaning that gRPC status codes are represented with `error` value. #424 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
* [ENHANCEMENT] Server: Add `-server.report-grpc-codes-in-instrumentation-label` CLI flag to specify whether gRPC status codes should be used in instrumentation labels. It defaults to false, meaning that gRPC status codes are represented with `error` value. #424 | |
* [ENHANCEMENT] Server: Add `-server.report-grpc-codes-in-instrumentation-label` CLI flag to specify whether gRPC status codes should be used in `status_code` label of request duration metric. It defaults to false, meaning that gRPC status codes are represented with `error` value. #424 |
middleware/grpc_instrumentation.go
Outdated
// errorToStatusCode extracts a status code from the given error, and does the following: | ||
// | ||
// - If the error is nil, codes.OK is returned. | ||
// - If the error corresponds to context.Canceled, codes.Canceled is returned. | ||
// - Otherwise, the actual status code of the error is returned. | ||
func errorToStatusCode(err error) codes.Code { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: This description of implementation seems unnecessary. Same for next function. (I'd only comment public functions, of functions with tricky behaviour. But this is pretty straightforward)
Signed-off-by: Yuri Nikolic <durica.nikolic@grafana.com>
What this PR does:
Before this PR
grpc_instrumentation.go
could recognize only the errors built by thehttpgrpc
package and use their error codes in thestatus_code
label of the request duration metrics. All other gRPC errors were treated as unknown, and used to be labeled aserror
. This PR fixes that erroneous behaviour, and allows all gRPC status codes to be used as thestatus_code
label.Types
middleware.InstrumentationLabel
andmiddleware.InstrumentationOption
, as well as a special value of the latter,middleware.ReportGRPCStatusOption
have been introduced. This special value can be used to enable reporting of gRPC status codes instatus_code
label (instead of simplified "error", "cancel" or "success" values).middleware.ReportGRPCStatusOption
can be passed as an optional argument to the following methods:middleware.UnaryServerInstrumentInterceptor
middleware.StreamServerInstrumentInterceptor
middleware.UnaryClientInstrumentInterceptor
middleware.StreamClientInstrumentInterceptor
middleware.Instrument
.In order to guarantee backwards compatibility, reporting of gRPC codes as labels is disabled by default and could be enabled as follows:
-server.report-grpc-codes-in-instrumentation-label
totrue
, or by settingserver.Config.ReportGRPCCodesInInstrumentationLabel
totrue
.middleware.ReportGRPCStatusOption
tomiddleware.Instrument
.Which issue(s) this PR fixes:
Part of grafana/mimir#6008.
Checklist
CHANGELOG.md
updated - the order of entries should be[CHANGE]
,[FEATURE]
,[ENHANCEMENT]
,[BUGFIX]