Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Revamp upstream_rq_timeout virtual cluster counter #23867

Open
Augustyniak opened this issue Nov 7, 2022 · 2 comments
Open

Revamp upstream_rq_timeout virtual cluster counter #23867

Augustyniak opened this issue Nov 7, 2022 · 2 comments
Labels
area/stats enhancement Feature requests. Not bugs or questions. no stalebot Disables stalebot from closing an issue

Comments

@Augustyniak
Copy link
Contributor

Description:
Envoy Mobile does not increment upstream_rq_timeout virtual cluster counter when any of the request it perform times out. For the record, the Envoy Mobile library enables the two following types of timeouts: per_try_timeout and per_try_idle_timeout.

Instead of incrementing upstream_rq_timeout stat, every time Envoy Mobile's request times out upstream_rx_5xx and upstream_rq_504 virtual cluster counters are incremented. From the perspective of mobile clients (Envoy Mobile library) it's not desired to treat timeouts as 504 failures - mobile clients want to see HTTP status codes that come from the server only and do not want Envoy Mobile (or Envoy) to emit its own status codes.

Proposal:

  • Change the implementation of upstream_rq_timeout stat so that it gets incremented when a request timeouts due to either per_try_timeout or per_try_idle_timeout timeout.
  • Stop incrementing upstream_rx_5xx and upstream_rq_504 virtual cluster stats when an Envoy Mobile request timeouts due to either of the aforementioned timeout types.
  • Hide the change behind a runtime flag or make it a part of the config - allow customers to decide whether they want to treat certain types of timeouts as 5xx or not.

Relevant Links:
Currently available virtual cluster stats are listed at https://www.envoyproxy.io/docs/envoy/latest/configuration/http/http_filters/router_filter#virtual-clusters.

@Augustyniak Augustyniak added enhancement Feature requests. Not bugs or questions. triage Issue requires triage labels Nov 7, 2022
@Augustyniak Augustyniak changed the title Revamp upstream_rq_timeout virtual cluster stat Revamp upstream_rq_timeout virtual cluster counter Nov 7, 2022
@phlax phlax added area/stats and removed triage Issue requires triage labels Nov 8, 2022
@phlax
Copy link
Member

phlax commented Nov 8, 2022

cc @jmarantz

related to #23866

@github-actions
Copy link

github-actions bot commented Dec 8, 2022

This issue has been automatically marked as stale because it has not had activity in the last 30 days. It will be closed in the next 7 days unless it is tagged "help wanted" or "no stalebot" or other activity occurs. Thank you for your contributions.

@github-actions github-actions bot added the stale stalebot believes this issue/PR has not been touched recently label Dec 8, 2022
Augustyniak added a commit that referenced this issue Dec 12, 2022
Commit Message:
Additional Description: Disable `upstream_rq_timeout` (this one does not work, see #23867), `upstream_rq_retry_success`, `upstream_rq_retry_overflow` and `upstream_rq_retry_limit_exceeded` stats to limit the number of stats that's created per every virtual cluster that Envoy Mobile is configured to start with. The list is somehow arbitrary and should eventually be moved out of the main config #24459. Reducing their number because of the performance concerns as stats as created as part of the virtual cluster creation which happens on app launch.
Risk Level: Low
Testing:
Docs Changes:
Release Notes:
Platform Specific Features:
[Optional Runtime guard:]
[Optional Fixes #Issue]
[Optional Fixes commit #PR or SHA]
[Optional Deprecated:]
[Optional [API Considerations](https://github.com/envoyproxy/envoy/blob/main/api/review_checklist.md):]
@Augustyniak Augustyniak added the no stalebot Disables stalebot from closing an issue label Dec 12, 2022
@github-actions github-actions bot removed the stale stalebot believes this issue/PR has not been touched recently label Dec 12, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/stats enhancement Feature requests. Not bugs or questions. no stalebot Disables stalebot from closing an issue
Projects
None yet
Development

No branches or pull requests

2 participants