Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add labels on 1xx, 2xx, etc metrics #33545

Closed
realzero0 opened this issue Apr 16, 2024 · 8 comments
Closed

Add labels on 1xx, 2xx, etc metrics #33545

realzero0 opened this issue Apr 16, 2024 · 8 comments
Labels
area/http area/stats enhancement Feature requests. Not bugs or questions. stale stalebot believes this issue/PR has not been touched recently

Comments

@realzero0
Copy link

realzero0 commented Apr 16, 2024

Title: Add labels on 1xx, 2xx, etc metics

Description:

envoy provides upstream_rq_<*xx> metrics, but in some cases, I wanna aggregate the data using request headers or response headers. If I know that some requests in latest app version fail, I can easily find out that something wrong in latest app version..

Could you add some labels on rq_*xx metrics? labels can be req, res headers..

Actually, in nginx, I can easily add that kind of aggregation using nginx vts module. https://github.com/vozlt/nginx-module-vts?tab=readme-ov-file#vhost_traffic_status_filter
Please consider to add this kind of feature..

[optional Relevant Links:]

Any extra documentation required to understand the issue.

@realzero0 realzero0 added enhancement Feature requests. Not bugs or questions. triage Issue requires triage labels Apr 16, 2024
@realzero0 realzero0 changed the title Add labels on 1xx, 2xx, etc metics Add labels on 1xx, 2xx, etc metrics Apr 16, 2024
@adisuissa
Copy link
Contributor

Thanks for the suggestion!
I wonder if this can be achieved with access-logs?
I'm not sure if adding labels to stats makes sense, and what the implications will be. cc @jmarantz who may have some prior experience with this.

@realzero0
Copy link
Author

realzero0 commented Apr 17, 2024

@adisuissa
Thank you for your reply!
For your point of view, I wonder why you make upstream_rq_<*xx> kind of metrics in first place.
In my point of view, upstream_rq_<*xx> data are aggregated data. this means I don't have to aggregate data from requests again..
If I want to get some alerts like if 10% of all requests(maybe all 5xx requests) that have 5xx status in specific request header(app version) occur, make an alert, if I only have access log, I have to make some other applications that can aggregate all requests and make metrics that have labels that are response status code(2xx, 3xx, 4xx, etc) and req, res headers...

@adisuissa
Copy link
Contributor

I look at this from an Envoy perspective which isn't looking at specific headers, because there may be many and it will not scale well.
I think you can achieve what you are looking for with a custom filter that defines the metrics, and tracks them on the response path.

@adisuissa adisuissa added area/http area/stats and removed triage Issue requires triage labels Apr 19, 2024
@realzero0
Copy link
Author

@adisuissa
I want to know that many, not scale.. means.
let me know that I did understand it correctly.
many -> headers may be many
not scale -> envoy is not prepared for metrics that have many headers..? I saw the metric logic and it might be not scale right?

envoy is proxy and this is for company level proxy in almost all times.. that means headers might differ per corp by corp.. and headers are restricted in corps. envoy might use control plane (like Istio) for configuration.. If I make a CRD (header metric configuration) for Istio, I can easily collect the metrics (including header information) from envoy proxy.. In this case, what many, not scale problems happen?

sorry I totally don't understand your explanation..

And for the custom filter, should I create a lua filter right..? Do you know any other filters that I can use?

@adisuissa
Copy link
Contributor

Maybe it is I that don't understand your original request.
Can you give a specific example of what you are trying to achieve?

@realzero0
Copy link
Author

realzero0 commented Apr 23, 2024

@adisuissa

let me explain it In prometheus metrics,

right now there are metrics like this.

envoy_cluster_upstream_rq_xx{envoy_response_code_class="2",envoy_cluster_name="a_inbound_cluster"} 10
envoy_cluster_upstream_rq_xx{envoy_response_code_class="3",envoy_cluster_name="a_inbound_cluster"} 20
envoy_cluster_upstream_rq_xx{envoy_response_code_class="5",envoy_cluster_name="a_inbound_cluster"} 30

but I wanna add some configurations and make some labels..

if I set headers like "app_version", "os_version"

envoy_cluster_upstream_rq_xx_filtered{envoy_response_code_class="2", app_version="1.0.0", os_version="Android10", ,envoy_cluster_name="a_inbound_cluster"} 5
envoy_cluster_upstream_rq_xx_filtered{envoy_response_code_class="3", app_version="1.0.0" , os_version="Android10", ,envoy_cluster_name="a_inbound_cluster"} 20
envoy_cluster_upstream_rq_xx_filtered{envoy_response_code_class="5", app_version="1.0.0" , os_version="Android10", ,envoy_cluster_name="a_inbound_cluster"} 29

envoy_cluster_upstream_rq_xx_filtered{envoy_response_code_class="2", app_version="1.0.1" , os_version="Android10", ,envoy_cluster_name="a_inbound_cluster"} 5
envoy_cluster_upstream_rq_xx_filtered{envoy_response_code_class="3", app_version="1.0.1" , os_version="Android10", ,envoy_cluster_name="a_inbound_cluster"} 0
envoy_cluster_upstream_rq_xx_filtered{envoy_response_code_class="5", app_version="1.0.1" , os_version="Android10", ,envoy_cluster_name="a_inbound_cluster"} 1

This is the one that I suggested..
headers (req, res) can be the labels of rq_*xx metrics...

Copy link

This issue has been automatically marked as stale because it has not had activity in the last 30 days. It will be closed in the next 7 days unless it is tagged "help wanted" or "no stalebot" or other activity occurs. Thank you for your contributions.

@github-actions github-actions bot added the stale stalebot believes this issue/PR has not been touched recently label May 23, 2024
Copy link

This issue has been automatically closed because it has not had activity in the last 37 days. If this issue is still valid, please ping a maintainer and ask them to label it as "help wanted" or "no stalebot". Thank you for your contributions.

@github-actions github-actions bot closed this as not planned Won't fix, can't repro, duplicate, stale May 30, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/http area/stats enhancement Feature requests. Not bugs or questions. stale stalebot believes this issue/PR has not been touched recently
Projects
None yet
Development

No branches or pull requests

2 participants