New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
gRPC status code labels are reported as description strings rather than numbers #11449
Closed
Labels
Comments
thanks for reporting. I'm looking into this. |
adleong
added a commit
to linkerd/linkerd2-proxy
that referenced
this issue
Oct 4, 2023
Fixes linkerd/linkerd2#11449 The `grpc_status` metric label is rendered as a long form, human readable string value in the proxy metrics. For example: ``` response_total{direction="outbound", [...], classification="failure",grpc_status="Unknown error",error=""} 1 ``` This is because of the Display impl for Code. We explicitly convert to an i32 so this renders as a number instead: ``` response_total{direction="outbound", [...] ,classification="failure",grpc_status="2",error=""} 1 ``` Signed-off-by: Alex Leong <alex@buoyant.io>
Merged
hawkw
added a commit
that referenced
this issue
Oct 19, 2023
## edge-23.10.3 This edge release fixes issues in the proxy and destination controller which can result in Linkerd proxies sending traffic to stale endpoints. In addition, it contains other bugfixes and updates dependencies to include patches for the security advisories [CVE-2023-44487]/GHSA-qppj-fm5r-hxr3 and GHSA-c827-hfw6-qwvm. * Fixed an issue where the Destination controller could stop processing changes in the endpoints of a destination, if a proxy subscribed to that destination stops reading service discovery updates. This issue results in proxies attempting to send traffic for that destination to stale endpoints ([#11483], fixes [#11480], [#11279], and [#10590]) * Fixed a regression introduced in stable-2.13.0 where proxies would not terminate unused service discovery watches, exerting backpressure on the Destination controller which could cause it to become stuck ([linkerd2-proxy#2484] and [linkerd2-proxy#2486]) * Added `INFO`-level logging to the proxy when endpoints are added or removed from a load balancer. These logs are enabled by default, and can be disabled by [setting the proxy log level][proxy-log-level] to `warn,linkerd=info,linkerd_proxy_balance=warn` or similar ([linkerd2-proxy#2486]) * Fixed a regression where the proxy rendered `grpc_status` metric labels as a string rather than as the numeric status code ([linkerd2-proxy#2480]; fixes [#11449]) * Added missing `imagePullSecrets` to `linkerd-jaeger` ServiceAccount ([#11504]) * Updated the control plane's dependency on the `golang.google.org/grpc` Go package to include patches for [CVE-2023-44487]/GHSA-qppj-fm5r-hxr3 ([#11496]) * Updated dependencies on `rustix` to include patches for GHSA-c827-hfw6-qwvm ([linkerd2-proxy#2488] and [#11512]). [#10590]: #10590 [#11279]: #11279 [#11483]: #11483 [#11449]: #11449 [#11480]: #11480 [#11504]: #11504 [#11504]: #11512 [linkerd2-proxy#2480]: linkerd/linkerd2-proxy#2480 [linkerd2-proxy#2484]: linkerd/linkerd2-proxy#2484 [linkerd2-proxy#2486]: linkerd/linkerd2-proxy#2486 [linkerd2-proxy#2488]: linkerd/linkerd2-proxy#2488 [proxy-log-level]: https://linkerd.io/2.14/tasks/modifying-proxy-log-level/ [CVE-2023-44487]: GHSA-qppj-fm5r-hxr3
hawkw
pushed a commit
to linkerd/linkerd2-proxy
that referenced
this issue
Oct 25, 2023
Fixes linkerd/linkerd2#11449 The `grpc_status` metric label is rendered as a long form, human readable string value in the proxy metrics. For example: ``` response_total{direction="outbound", [...], classification="failure",grpc_status="Unknown error",error=""} 1 ``` This is because of the Display impl for Code. We explicitly convert to an i32 so this renders as a number instead: ``` response_total{direction="outbound", [...] ,classification="failure",grpc_status="2",error=""} 1 ``` Signed-off-by: Alex Leong <alex@buoyant.io>
hawkw
pushed a commit
to linkerd/linkerd2-proxy
that referenced
this issue
Oct 25, 2023
Fixes linkerd/linkerd2#11449 The `grpc_status` metric label is rendered as a long form, human readable string value in the proxy metrics. For example: ``` response_total{direction="outbound", [...], classification="failure",grpc_status="Unknown error",error=""} 1 ``` This is because of the Display impl for Code. We explicitly convert to an i32 so this renders as a number instead: ``` response_total{direction="outbound", [...] ,classification="failure",grpc_status="2",error=""} 1 ``` Signed-off-by: Alex Leong <alex@buoyant.io>
hawkw
pushed a commit
to linkerd/linkerd2-proxy
that referenced
this issue
Oct 26, 2023
Fixes linkerd/linkerd2#11449 The `grpc_status` metric label is rendered as a long form, human readable string value in the proxy metrics. For example: ``` response_total{direction="outbound", [...], classification="failure",grpc_status="Unknown error",error=""} 1 ``` This is because of the Display impl for Code. We explicitly convert to an i32 so this renders as a number instead: ``` response_total{direction="outbound", [...] ,classification="failure",grpc_status="2",error=""} 1 ``` Signed-off-by: Alex Leong <alex@buoyant.io>
hawkw
pushed a commit
to linkerd/linkerd2-proxy
that referenced
this issue
Oct 26, 2023
Fixes linkerd/linkerd2#11449 The `grpc_status` metric label is rendered as a long form, human readable string value in the proxy metrics. For example: ``` response_total{direction="outbound", [...], classification="failure",grpc_status="Unknown error",error=""} 1 ``` This is because of the Display impl for Code. We explicitly convert to an i32 so this renders as a number instead: ``` response_total{direction="outbound", [...] ,classification="failure",grpc_status="2",error=""} 1 ``` Signed-off-by: Alex Leong <alex@buoyant.io>
mateiidavid
added a commit
that referenced
this issue
Oct 26, 2023
This stable release fixes issues in the proxy and Destination controller which can result in Linkerd proxies sending traffic to stale endpoints. In addition, it contains a bug fix for profile resolutions for pods bound on host ports and includes patches for security advisory [CVE-2023-44487]/GHSA-qppj-fm5r-hxr3 * Control Plane * Fixed an issue where the Destination controller could stop processing changes in the endpoints of a destination, if a proxy subscribed to that destination stops reading service discovery updates. This issue results in proxies attempting to send traffic for that destination to stale endpoints ([#11483], fixes [#11480], [#11279], [#10590]) * Fixed an issue where the Destination controller would not update pod metadata for profile resolutions for a pod accessed via the host network (e.g. HostPort endpoints) ([#11334]) * Addressed [CVE-2023-44487]/GHSA-qppj-fm5r-hxr3 by upgrading several dependencies (including Go's gRPC and net libraries) * Proxy * Fixed a regression where the proxy rendered `grpc_status` metric labels as a string rather than as the numeric status code ([linkerd2-proxy#2480]; fixes [#11449]) * Fixed a regression introduced in stable-2.13.0 where proxies would not terminate unusred service discovery watches, exerting backpressure on the Destination controller which could cause it to become stuck ([linkerd2-proxy#2484]) [#10590]: #10590 [#11279]: #11279 [#11483]: #11483 [#11480]: #11480 [#11334]: #11334 [#11449]: #11449 [CVE-2023-44487]: GHSA-qppj-fm5r-hxr3 [linkerd2-proxy#2480]: linkerd/linkerd2-proxy#2480 [linkerd2-proxy#2484]: linkerd/linkerd2-proxy#2484 Signed-off-by: Matei David <matei@buoyant.io>
Merged
mateiidavid
added a commit
that referenced
this issue
Oct 26, 2023
This stable release fixes issues in the proxy and Destination controller which can result in Linkerd proxies sending traffic to stale endpoints. In addition, it contains a bug fix for profile resolutions for pods bound on host ports and includes patches for security advisory [CVE-2023-44487]/GHSA-qppj-fm5r-hxr3 * Control Plane * Fixed an issue where the Destination controller could stop processing changes in the endpoints of a destination, if a proxy subscribed to that destination stops reading service discovery updates. This issue results in proxies attempting to send traffic for that destination to stale endpoints ([#11483], fixes [#11480], [#11279], [#10590]) * Fixed an issue where the Destination controller would not update pod metadata for profile resolutions for a pod accessed via the host network (e.g. HostPort endpoints) ([#11334]) * Addressed [CVE-2023-44487]/GHSA-qppj-fm5r-hxr3 by upgrading several dependencies (including Go's gRPC and net libraries) * Proxy * Fixed a regression where the proxy rendered `grpc_status` metric labels as a string rather than as the numeric status code ([linkerd2-proxy#2480]; fixes [#11449]) * Fixed a regression introduced in stable-2.13.0 where proxies would not terminate unusred service discovery watches, exerting backpressure on the Destination controller which could cause it to become stuck ([linkerd2-proxy#2484]) [#10590]: #10590 [#11279]: #11279 [#11483]: #11483 [#11480]: #11480 [#11334]: #11334 [#11449]: #11449 [CVE-2023-44487]: GHSA-qppj-fm5r-hxr3 [linkerd2-proxy#2480]: linkerd/linkerd2-proxy#2480 [linkerd2-proxy#2484]: linkerd/linkerd2-proxy#2484 --------- Signed-off-by: Matei David <matei@buoyant.io> Co-authored-by: Alejandro Pedraza <alejandro@buoyant.io> Co-authored-by: Oliver Gould <ver@buoyant.io>
Sign up for free
to subscribe to this conversation on GitHub.
Already have an account?
Sign in.
What is the issue?
After upgrading linkerd-proxy to
cr.l5d.io/linkerd/proxy:stable-2.14.0
, thegrpc_status
label for metric becomes a full-sentence description with whitespace like""The operation completed successfully"
rather than a numerical code like"2"
. Likely this was introduced earlier.This change breaks existing dashboards and monitoring that expect the numeric gPRC response codes. These description strings also have whitespace and other characters that don't work well in tags.
How can it be reproduced?
It can be reproduced by running version
stable-2.14.0
of the linkerd-proxy and making gRPC requests to a meshed service. When viewing the output from the /metrics endpoint of prometheus/openmetrics, the output will look something like:I think it'd also be pretty easy to reproduce this in a failing unit test case here: https://github.com/linkerd/linkerd2-proxy/blob/45b324f7b4bc5221b5ca796f79830d04ecce8e79/linkerd/app/core/src/metrics.rs#L425C48-L425C48
Logs, error output, etc
output of
linkerd check -o short
Environment
Possible solution
Add call to
.code()
here so that a numeric code is emitted? https://github.com/linkerd/linkerd2-proxy/blob/main/linkerd/app/core/src/metrics.rs#L425C48-L425C48Additional context
No response
Would you like to work on fixing this bug?
yes
The text was updated successfully, but these errors were encountered: