Allow disabling connection pooling and use 1:1 connection with upstream and downstream #19458

howardjohn · 2022-01-10T16:03:55Z

Title: Allow disabling connection pooling and use 1:1 connection with upstream and downstream

Description:
It would be great to have an option (on cluster?) to allow HTTP proxying to follow connection properties like tcp_proxy. That is:

When downstream opens a connection, we open a connection to upstream.
We never share an upstream connection with multiple downstream connections, and the reverse
If the upstream closes, so does the downstream, and the reverse.

Example motivating use case: istio/istio#36768

Basically the proxy captures 100% of traffic. We have some known services where we do special routing for, but for the rest we passthrough with an ORIGINAL_DST cluster. The problem is that uses see unexpected behavior because the connection pooling in Envoy. The most common I have seen is when the upstream is doing DNS load balancing. Because Envoy is keeping the downstream connection alive and sending 503s, the user's HTTP client will re-use the connection and not retry DNS, so it will be stuck failing forever getting 503s.

Alternatives considered:

Use a tcp_proxy. Unfortunately, we don't know which destination will be used until we process HTTP, so we need to send it through HCM.
Close downstream connection only when Envoy cannot connect to upstream. This is like a subset of above which would probably fix some issues but not all

alyssawilk · 2022-01-10T16:43:05Z

Can you comment a bit more about how this differs from setting max_requests_per_connection to 1?

https://www.envoyproxy.io/docs/envoy/latest/api-v3/config/cluster/v3/cluster.proto#envoy-v3-api-field-config-cluster-v3-cluster-max-requests-per-connection

howardjohn · 2022-01-10T16:54:00Z

Yes, the difference is we want the downstream/upstream to explicitly control connection handling, not Envoy.
For example, consider this client:

con := connect(example.com)
con.sendReq()
con.sendReq()
con.disconnect() # this should be propagated to upstream

max_requests_per_connection will result in 2 different upstream connections. With this proposal, there would be 1 single connection.

alyssawilk · 2022-01-10T16:56:30Z

ah, totally makes sense. I think this is a dup of #12370 then?

fatedier · 2022-01-11T07:04:10Z

@alyssawilk I'm not sure if #12370 is already completed and solve this issue.

connect_pool_per_downstream_connection flag is added to the cluster config to use a separate connection pool for every downstream connection. But if the upstream connection closes, the downstream still won't.

#12370 is not active for 14 months. Is there any more development plan?

alyssawilk · 2022-01-11T13:56:07Z

It's not complete, it just has more explanation and discussion so better to have that tracking the desired feature.
AFIK no one is actively working on it. If your company has a need for it it'd be great for you to pick it up!

fatedier · 2022-01-11T14:09:16Z

It's not complete, it just has more explanation and discussion so better to have that tracking the desired feature. AFIK no one is actively working on it. If your company has a need for it it'd be great for you to pick it up!

I'd love to but am not familiar with c++.

alyssawilk · 2022-01-11T14:29:02Z

ah fair. yeah generally in Envoy it's not going to get picked up until someone who needs the feature either writes the new code, or pays a company (e.g. tetrate) to do so. But at least you can follow that open issue for updates

lambdai · 2022-04-15T00:00:10Z

@alyssawilk Can you reopen this? 1 downstream conn with 1 upstream conn pool does not fully resolve the problem.

We need to convert some of the upstream failure (e.g. connect failure) into a downstream connection error.

This behavior may not make sense for all cluster types, but it align with the original_dst cluster as upstream

kyessenov · 2022-04-15T17:57:03Z

I think we can narrow it down to a specific problem: ORIGINAL_DST (IP-based) HCM should be able to appear transparent at L4, by propagating the connection events downstream. This is because the common usage for ORIGINAL_DST is to avoid any upstream decision making by Envoy so there is no need for Envoy to manage upstream connections. This can be used, for example, to collect pure L7 telemetry for all L4 using inspection and a more "lenient" HCM filter.

github-actions · 2022-05-15T20:01:21Z

This issue has been automatically marked as stale because it has not had activity in the last 30 days. It will be closed in the next 7 days unless it is tagged "help wanted" or "no stalebot" or other activity occurs. Thank you for your contributions.

Fishrock123 · 2022-12-19T22:05:52Z

Would it be possible to instead of having a 1:1 connection for every request, create a new backing connection + retry when a 503 is encountered?

That kind of middle option seems like it would have most of the benefit without having a dramatic performance reduction.

A bug in Emvoy keeps connections open even if the IP behind the target hostname changed, which happens during rolling restarts [1]. This meant that the communication between Gerrit containers using the high- availability plugin was only working in one direction almost all the time. This change adds ServiceEntries for the primary Gerrit pods. This works around the issue. [1] envoyproxy/envoy#19458 Change-Id: I899eca647a7294e0deb877ee51533d6147e1d9d6

A bug in Envoy keeps connections open even if the IP address the target hostname resolves to changed, which happens during rolling restarts [1]. This meant that the communication between Gerrit containers using the high- availability plugin was only working in one direction almost all the time. This change adds ServiceEntries for the primary Gerrit pods. This works around the issue. [1] envoyproxy/envoy#19458 Change-Id: I899eca647a7294e0deb877ee51533d6147e1d9d6

howardjohn added enhancement Feature requests. Not bugs or questions. triage Issue requires triage labels Jan 10, 2022

howardjohn mentioned this issue Jan 10, 2022

constant http request errors to external service domain even after modifying DNS record istio/istio#36768

Closed

alyssawilk removed the triage Issue requires triage label Jan 10, 2022

alyssawilk closed this as completed Jan 10, 2022

mattklein123 reopened this Apr 15, 2022

lambdai mentioned this issue Apr 19, 2022

Do not close health check connections when draining #20748

Open

github-actions bot added the stale stalebot believes this issue/PR has not been touched recently label May 15, 2022

kyessenov added help wanted Needs help! and removed stale stalebot believes this issue/PR has not been touched recently labels May 16, 2022

kyessenov mentioned this issue Jun 6, 2022

Envoy forward proxy authentication with NTLM #21543

Closed

kyessenov mentioned this issue Sep 2, 2022

Respond with connection: close on upstream connect failures in original dst #21963

Closed

perlow mentioned this issue May 12, 2023

Unexpected routing behavior with long-lived connections with headless services istio/istio#44886

Closed

16 tasks

howardjohn mentioned this issue Sep 12, 2023

PassthroughCluster load balances to only 2 replicas istio/istio#46959

Closed

2 tasks

nielsole mentioned this issue Nov 10, 2023

Random infrequent URX/connection failures talking to external endpoints after adding proxy sidecars istio/istio#43938

Closed

ramaraochavali mentioned this issue Feb 20, 2024

proxy has trouble discovering gRPC endpoints after server pods restart istio/istio#49391

Closed

2 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Allow disabling connection pooling and use 1:1 connection with upstream and downstream #19458

Allow disabling connection pooling and use 1:1 connection with upstream and downstream #19458

howardjohn commented Jan 10, 2022

alyssawilk commented Jan 10, 2022

howardjohn commented Jan 10, 2022

alyssawilk commented Jan 10, 2022

fatedier commented Jan 11, 2022 •

edited

Loading

alyssawilk commented Jan 11, 2022

fatedier commented Jan 11, 2022

alyssawilk commented Jan 11, 2022

lambdai commented Apr 15, 2022

kyessenov commented Apr 15, 2022

github-actions bot commented May 15, 2022

Fishrock123 commented Dec 19, 2022

Allow disabling connection pooling and use 1:1 connection with upstream and downstream #19458

Allow disabling connection pooling and use 1:1 connection with upstream and downstream #19458

Comments

howardjohn commented Jan 10, 2022

alyssawilk commented Jan 10, 2022

howardjohn commented Jan 10, 2022

alyssawilk commented Jan 10, 2022

fatedier commented Jan 11, 2022 • edited Loading

alyssawilk commented Jan 11, 2022

fatedier commented Jan 11, 2022

alyssawilk commented Jan 11, 2022

lambdai commented Apr 15, 2022

kyessenov commented Apr 15, 2022

github-actions bot commented May 15, 2022

Fishrock123 commented Dec 19, 2022

fatedier commented Jan 11, 2022 •

edited

Loading