Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Exclude connection setup in per try timeout #4903

Closed
snowp opened this issue Oct 29, 2018 · 12 comments
Closed

Exclude connection setup in per try timeout #4903

snowp opened this issue Oct 29, 2018 · 12 comments
Assignees
Labels
enhancement Feature requests. Not bugs or questions. help wanted Needs help!
Milestone

Comments

@snowp
Copy link
Contributor

snowp commented Oct 29, 2018

We've been seeing per try timeouts trigger needlessly during boot due to per try timeout being less than the connection setup (including TLS). To get around this, we need to increase the per try timeout to the point where it becomes meaningless for some of our fast endpoints.

It would be nice to be able to specify the per try timeout for the request/response itself, not including the connection setup.

@snowp
Copy link
Contributor Author

snowp commented Oct 29, 2018

To exclude the connection setup I can imagine starting the timeout timer in onPoolReady instead of doing it when the entire downstream request has been written to UpstreamRequest

@mattklein123 mattklein123 added enhancement Feature requests. Not bugs or questions. help wanted Needs help! labels Oct 29, 2018
@mattklein123
Copy link
Member

@snowp yes this seems reasonable.

@snowp
Copy link
Contributor Author

snowp commented Oct 29, 2018

Would the best approach here be to add another header to set the new timeout?

I'll be working on this, this is pretty high priority for us.

@mattklein123
Copy link
Member

No, I would probably just start the per-try timeout in onPoolReady instead of where it is being set now. I think that's probably fine. I don't think you need a new header. Note that this won't help with H2, since onPoolReady returns immediately IIRC.

@snowp
Copy link
Contributor Author

snowp commented Oct 29, 2018

How would one approach this for H2 then? We're primarily using H2 within the mesh

@mattklein123
Copy link
Member

I can't think of anything other than changing the h2 connection pool to have logic as to whether there is a connected primary connection, and if not, having a pending request queue like we do for h1. Then you would also do the change of starting per-try-timeout in onPoolReady() while leaving the overall timeout to include everything including possible connection. IMO this makes the most sense, but is non-trivial.

@snowp
Copy link
Contributor Author

snowp commented Oct 29, 2018

I think changing how per try timeouts work + consistency between h/1.1 and h/2 would be good here, so I'll give this a go. I'll update how per try timeouts work first and then look into updating the h2 conn pool.

@mattklein123 mattklein123 added this to the 1.9.0 milestone Oct 29, 2018
@snowp
Copy link
Contributor Author

snowp commented Oct 29, 2018

@mattklein123 Just to clarify: are you suggesting just modifying the existing behavior? Or introduce an option on the retry policy to specify this? I read it as just modifying the existing behavior, but that will involve straight up deleting existing tests that cover the case where the connect times out, so I wanted to check first.

@mattklein123
Copy link
Member

I'm OK with just modifying the existing behavior (and release noting it) since I think what you are proposing makes more sense for the intention of the timeout, as long as the outer timeout continues to cover the entire thing. @envoyproxy/maintainers any opinions here?

@alyssawilk
Copy link
Contributor

I think we can get away with it for now but in the long run we should probably have policy around non-breaking but behavior altering changes. I don't want to spam envoy-announce to the point folks filter it out but we don't have a good way of engaging folks running envoy in production who might prefer the existing behavior and might want to weigh in asking for a config option or even an easy way of saying "what has changed by default" between hash X and hash Y since most of the relnotes are config-guarded additions rather than functional changes

@mattklein123
Copy link
Member

@alyssawilk agreed. In this case, I think the new behavior is better than the old behavior in all cases, which is why I recommended that we just change it, but am happy to revisit if folks think that is not the right way to go.

@snowp
Copy link
Contributor Author

snowp commented Nov 12, 2018

Per try timeouts should now exclude connection setup for both h/1 and h/2.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement Feature requests. Not bugs or questions. help wanted Needs help!
Projects
None yet
Development

No branches or pull requests

3 participants