Exclude connection setup in per try timeout #4903

snowp · 2018-10-29T22:32:54Z

We've been seeing per try timeouts trigger needlessly during boot due to per try timeout being less than the connection setup (including TLS). To get around this, we need to increase the per try timeout to the point where it becomes meaningless for some of our fast endpoints.

It would be nice to be able to specify the per try timeout for the request/response itself, not including the connection setup.

snowp · 2018-10-29T22:38:26Z

To exclude the connection setup I can imagine starting the timeout timer in onPoolReady instead of doing it when the entire downstream request has been written to UpstreamRequest

mattklein123 · 2018-10-29T22:49:59Z

@snowp yes this seems reasonable.

snowp · 2018-10-29T23:03:12Z

Would the best approach here be to add another header to set the new timeout?

I'll be working on this, this is pretty high priority for us.

mattklein123 · 2018-10-29T23:07:50Z

No, I would probably just start the per-try timeout in onPoolReady instead of where it is being set now. I think that's probably fine. I don't think you need a new header. Note that this won't help with H2, since onPoolReady returns immediately IIRC.

snowp · 2018-10-29T23:12:12Z

How would one approach this for H2 then? We're primarily using H2 within the mesh

mattklein123 · 2018-10-29T23:18:56Z

I can't think of anything other than changing the h2 connection pool to have logic as to whether there is a connected primary connection, and if not, having a pending request queue like we do for h1. Then you would also do the change of starting per-try-timeout in onPoolReady() while leaving the overall timeout to include everything including possible connection. IMO this makes the most sense, but is non-trivial.

snowp · 2018-10-29T23:24:02Z

I think changing how per try timeouts work + consistency between h/1.1 and h/2 would be good here, so I'll give this a go. I'll update how per try timeouts work first and then look into updating the h2 conn pool.

snowp · 2018-10-29T23:57:04Z

@mattklein123 Just to clarify: are you suggesting just modifying the existing behavior? Or introduce an option on the retry policy to specify this? I read it as just modifying the existing behavior, but that will involve straight up deleting existing tests that cover the case where the connect times out, so I wanted to check first.

mattklein123 · 2018-10-30T00:00:20Z

I'm OK with just modifying the existing behavior (and release noting it) since I think what you are proposing makes more sense for the intention of the timeout, as long as the outer timeout continues to cover the entire thing. @envoyproxy/maintainers any opinions here?

alyssawilk · 2018-10-30T13:50:12Z

I think we can get away with it for now but in the long run we should probably have policy around non-breaking but behavior altering changes. I don't want to spam envoy-announce to the point folks filter it out but we don't have a good way of engaging folks running envoy in production who might prefer the existing behavior and might want to weigh in asking for a config option or even an easy way of saying "what has changed by default" between hash X and hash Y since most of the relnotes are config-guarded additions rather than functional changes

mattklein123 · 2018-10-30T15:43:37Z

@alyssawilk agreed. In this case, I think the new behavior is better than the old behavior in all cases, which is why I recommended that we just change it, but am happy to revisit if folks think that is not the right way to go.

snowp · 2018-11-12T06:45:52Z

Per try timeouts should now exclude connection setup for both h/1 and h/2.

mattklein123 added enhancement Feature requests. Not bugs or questions. help wanted Needs help! labels Oct 29, 2018

mattklein123 added this to the 1.9.0 milestone Oct 29, 2018

snowp mentioned this issue Oct 30, 2018

router: start per-try timeout timer in onPoolReady #4905

Merged

snowp mentioned this issue Oct 30, 2018

http: use a request queue in the http2 conn pool #4917

Merged

mattklein123 assigned snowp Oct 31, 2018

snowp closed this as completed Nov 12, 2018

snowp mentioned this issue Nov 24, 2018

connect timeout overrides route timeout & per try timeout #5097

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Exclude connection setup in per try timeout #4903

Exclude connection setup in per try timeout #4903

snowp commented Oct 29, 2018

snowp commented Oct 29, 2018

mattklein123 commented Oct 29, 2018

snowp commented Oct 29, 2018

mattklein123 commented Oct 29, 2018

snowp commented Oct 29, 2018 •

edited

Loading

mattklein123 commented Oct 29, 2018

snowp commented Oct 29, 2018

snowp commented Oct 29, 2018

mattklein123 commented Oct 30, 2018

alyssawilk commented Oct 30, 2018

mattklein123 commented Oct 30, 2018

snowp commented Nov 12, 2018

Exclude connection setup in per try timeout #4903

Exclude connection setup in per try timeout #4903

Comments

snowp commented Oct 29, 2018

snowp commented Oct 29, 2018

mattklein123 commented Oct 29, 2018

snowp commented Oct 29, 2018

mattklein123 commented Oct 29, 2018

snowp commented Oct 29, 2018 • edited Loading

mattklein123 commented Oct 29, 2018

snowp commented Oct 29, 2018

snowp commented Oct 29, 2018

mattklein123 commented Oct 30, 2018

alyssawilk commented Oct 30, 2018

mattklein123 commented Oct 30, 2018

snowp commented Nov 12, 2018

snowp commented Oct 29, 2018 •

edited

Loading