Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

HTTP idleTimeout only apply to Clusters, not to the upstream service's HTTP Filters #40619

Closed
bvandewalle opened this issue Aug 23, 2022 · 9 comments
Labels
area/networking lifecycle/automatically-closed Indicates a PR or issue that has been closed automatically. lifecycle/stale Indicates a PR or issue hasn't been manipulated by an Istio team member for a while

Comments

@bvandewalle
Copy link

bvandewalle commented Aug 23, 2022

Bug Description

I'm trying to adjust the duration of HTTP timeouts for a specific service to more than one hour (the default), by adding the following DestinationRule (Set to 4 hours/14400s in this example):

spec:
  host: myHost
  trafficPolicy:
    connectionPool:
      http:
        idleTimeout: 4h
    tls:
      mode: ISTIO_MUTUAL

As a result, I see all the upstream clusters correctly adjusted all over the mesh with the following correct configuration on Envoy (idle_timeout correctly set to 14400s):

    {
     "version_info": "2022-08-23T19:37:17Z/21992",
     "cluster": {
      "@type": "type.googleapis.com/envoy.config.cluster.v3.Cluster",
      "name": "outbound|8443||[redacted]",
      "type": "EDS",
      [.....]
      "typed_extension_protocol_options": {
       "envoy.extensions.upstreams.http.v3.HttpProtocolOptions": {
        "@type": "type.googleapis.com/envoy.extensions.upstreams.http.v3.HttpProtocolOptions",
        "common_http_protocol_options": {
         "idle_timeout": "14400s"
        },
        "explicit_http_config": {
         "http_protocol_options": {}
        }
     [.....]
   }

However when I look at the sidecar of the upstream service, I don't see that the idle_timeout option was configured on the http connection manager, which results in the default idle_timeout of 1 hour still being applied to the connections coming from the gateways.

So if my understanding is correct, for the Gateway <-> Istio-Sidecar HTTP connections, the Gateway correctly adjusted its timeout to 4 hour on the cluster configuration, while the istio-sidecar didn't adjust that option on the HTTP Connection manager, resulting in those connections being detroyed by the sidecar after 1 hour.

So it seems that the idleTimeout option on the destinationRule works fine when it is less than one hour (decreasing it from the default) as it only needs to be configured on one of the connection's sides. However if we want to increase it to more than one hour, both side of the connections need to have that option adjusted on Envoy.

Version

istiod 1.14.3

Additional Information

No response

@bvandewalle bvandewalle changed the title HTTP idleTimeout only apply to Clusters, not to HTTP Filters HTTP idleTimeout only apply to Clusters, not to the upstream service's HTTP Filters Aug 23, 2022
@hzxuzhonghu
Copy link
Member

I think we have applied this value to inbound cluster, seems not apply it to inbound listener. Maybe related to multi services the pod may belong to, while each has different DestinationRule bounded.

@ramaraochavali @howardjohn Do you know exactly?

@bvandewalle
Copy link
Author

Even if the pod was in multiple services, the listener is still per pod, so I would expect a single service per pod, which also maps nicely to a listener/http_connection_manager ?

@ramaraochavali
Copy link
Contributor

For inbound listener, you have to set based on node metadata IDLE_TIMEOUT. The DR is only used for upstream cluster because it is intended as client idle timeout not as server idle timeout.

@bvandewalle
Copy link
Author

What is the reason why this wouldn't happen for the client? If we want to make the sidecar "transparent", it should happen both for the client AND the server IMO.

If not we start getting into weird different timeout issues on the mesh (which is what we are seeing).

@ramaraochavali
Copy link
Contributor

What is the reason why this wouldn't happen for the client? If we want to make the sidecar "transparent", it should happen both for the client AND the server IMO.

This would happen at the client i.e. service A making call to service B, the idle timeout of that connection is applied at client side (service A). On server side, also It can be set, but just that it has a different config as I mentioned. DR from API perspective comes in to picture when client is making a call. Brief history on why it was implemented like that https://github.com/istio/istio/pull/13515/files#r277489637

@hzxuzhonghu
Copy link
Member

To be clear, DR Idletimeout works as below, the httpConnectionManager timeout can be specified by node metadata IDLE_TIMEOUT

image

Have to admit this model is not aligned with user's expectation, not a perfect solution in the whole flow

@bvandewalle
Copy link
Author

Thanks @hzxuzhonghu that's exactly what we are seeing. IMO the timeout should be consistent on the whole path for a service. So sidecar B http_connection_manager should at least apply the same timeout.

This is a bit more tricky on the source sidecar listener (sidecar A), as that listener might be shared for multiple destinations.

@hzxuzhonghu
Copy link
Member

Yes, it is tricky for shared listener

@istio-policy-bot istio-policy-bot added the lifecycle/stale Indicates a PR or issue hasn't been manipulated by an Istio team member for a while label Nov 25, 2022
@istio-policy-bot
Copy link

🚧 This issue or pull request has been closed due to not having had activity from an Istio team member since 2022-08-27. If you feel this issue or pull request deserves attention, please reopen the issue. Please see this wiki page for more information. Thank you for your contributions.

Created by the issue and PR lifecycle manager.

@istio-policy-bot istio-policy-bot added the lifecycle/automatically-closed Indicates a PR or issue that has been closed automatically. label Dec 10, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/networking lifecycle/automatically-closed Indicates a PR or issue that has been closed automatically. lifecycle/stale Indicates a PR or issue hasn't been manipulated by an Istio team member for a while
Projects
None yet
Development

No branches or pull requests

4 participants