-
Notifications
You must be signed in to change notification settings - Fork 4.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Exclude connection setup in per try timeout #4903
Comments
To exclude the connection setup I can imagine starting the timeout timer in |
@snowp yes this seems reasonable. |
Would the best approach here be to add another header to set the new timeout? I'll be working on this, this is pretty high priority for us. |
No, I would probably just start the per-try timeout in onPoolReady instead of where it is being set now. I think that's probably fine. I don't think you need a new header. Note that this won't help with H2, since onPoolReady returns immediately IIRC. |
How would one approach this for H2 then? We're primarily using H2 within the mesh |
I can't think of anything other than changing the h2 connection pool to have logic as to whether there is a connected primary connection, and if not, having a pending request queue like we do for h1. Then you would also do the change of starting per-try-timeout in onPoolReady() while leaving the overall timeout to include everything including possible connection. IMO this makes the most sense, but is non-trivial. |
I think changing how per try timeouts work + consistency between h/1.1 and h/2 would be good here, so I'll give this a go. I'll update how per try timeouts work first and then look into updating the h2 conn pool. |
@mattklein123 Just to clarify: are you suggesting just modifying the existing behavior? Or introduce an option on the retry policy to specify this? I read it as just modifying the existing behavior, but that will involve straight up deleting existing tests that cover the case where the connect times out, so I wanted to check first. |
I'm OK with just modifying the existing behavior (and release noting it) since I think what you are proposing makes more sense for the intention of the timeout, as long as the outer timeout continues to cover the entire thing. @envoyproxy/maintainers any opinions here? |
I think we can get away with it for now but in the long run we should probably have policy around non-breaking but behavior altering changes. I don't want to spam envoy-announce to the point folks filter it out but we don't have a good way of engaging folks running envoy in production who might prefer the existing behavior and might want to weigh in asking for a config option or even an easy way of saying "what has changed by default" between hash X and hash Y since most of the relnotes are config-guarded additions rather than functional changes |
@alyssawilk agreed. In this case, I think the new behavior is better than the old behavior in all cases, which is why I recommended that we just change it, but am happy to revisit if folks think that is not the right way to go. |
Per try timeouts should now exclude connection setup for both h/1 and h/2. |
We've been seeing per try timeouts trigger needlessly during boot due to per try timeout being less than the connection setup (including TLS). To get around this, we need to increase the per try timeout to the point where it becomes meaningless for some of our fast endpoints.
It would be nice to be able to specify the per try timeout for the request/response itself, not including the connection setup.
The text was updated successfully, but these errors were encountered: