Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Allow disabling connection pooling and use 1:1 connection with upstream and downstream #19458

Open
howardjohn opened this issue Jan 10, 2022 · 11 comments
Labels
enhancement Feature requests. Not bugs or questions. help wanted Needs help!

Comments

@howardjohn
Copy link
Contributor

Title: Allow disabling connection pooling and use 1:1 connection with upstream and downstream

Description:
It would be great to have an option (on cluster?) to allow HTTP proxying to follow connection properties like tcp_proxy. That is:

  • When downstream opens a connection, we open a connection to upstream.
  • We never share an upstream connection with multiple downstream connections, and the reverse
  • If the upstream closes, so does the downstream, and the reverse.

Example motivating use case: istio/istio#36768

Basically the proxy captures 100% of traffic. We have some known services where we do special routing for, but for the rest we passthrough with an ORIGINAL_DST cluster. The problem is that uses see unexpected behavior because the connection pooling in Envoy. The most common I have seen is when the upstream is doing DNS load balancing. Because Envoy is keeping the downstream connection alive and sending 503s, the user's HTTP client will re-use the connection and not retry DNS, so it will be stuck failing forever getting 503s.

Alternatives considered:

  • Use a tcp_proxy. Unfortunately, we don't know which destination will be used until we process HTTP, so we need to send it through HCM.
  • Close downstream connection only when Envoy cannot connect to upstream. This is like a subset of above which would probably fix some issues but not all
@alyssawilk
Copy link
Contributor

Can you comment a bit more about how this differs from setting max_requests_per_connection to 1?

https://www.envoyproxy.io/docs/envoy/latest/api-v3/config/cluster/v3/cluster.proto#envoy-v3-api-field-config-cluster-v3-cluster-max-requests-per-connection

@alyssawilk alyssawilk removed the triage Issue requires triage label Jan 10, 2022
@howardjohn
Copy link
Contributor Author

Yes, the difference is we want the downstream/upstream to explicitly control connection handling, not Envoy.
For example, consider this client:

con := connect(example.com)
con.sendReq()
con.sendReq()
con.disconnect() # this should be propagated to upstream

max_requests_per_connection will result in 2 different upstream connections. With this proposal, there would be 1 single connection.

@alyssawilk
Copy link
Contributor

ah, totally makes sense. I think this is a dup of #12370 then?

@fatedier
Copy link

fatedier commented Jan 11, 2022

@alyssawilk I'm not sure if #12370 is already completed and solve this issue.

connect_pool_per_downstream_connection flag is added to the cluster config to use a separate connection pool for every downstream connection. But if the upstream connection closes, the downstream still won't.

#12370 is not active for 14 months. Is there any more development plan?

@alyssawilk
Copy link
Contributor

It's not complete, it just has more explanation and discussion so better to have that tracking the desired feature.
AFIK no one is actively working on it. If your company has a need for it it'd be great for you to pick it up!

@fatedier
Copy link

It's not complete, it just has more explanation and discussion so better to have that tracking the desired feature. AFIK no one is actively working on it. If your company has a need for it it'd be great for you to pick it up!

I'd love to but am not familiar with c++.

@alyssawilk
Copy link
Contributor

ah fair. yeah generally in Envoy it's not going to get picked up until someone who needs the feature either writes the new code, or pays a company (e.g. tetrate) to do so. But at least you can follow that open issue for updates

@lambdai
Copy link
Contributor

lambdai commented Apr 15, 2022

@alyssawilk Can you reopen this? 1 downstream conn with 1 upstream conn pool does not fully resolve the problem.

We need to convert some of the upstream failure (e.g. connect failure) into a downstream connection error.

This behavior may not make sense for all cluster types, but it align with the original_dst cluster as upstream

@mattklein123 mattklein123 reopened this Apr 15, 2022
@kyessenov
Copy link
Contributor

I think we can narrow it down to a specific problem: ORIGINAL_DST (IP-based) HCM should be able to appear transparent at L4, by propagating the connection events downstream. This is because the common usage for ORIGINAL_DST is to avoid any upstream decision making by Envoy so there is no need for Envoy to manage upstream connections. This can be used, for example, to collect pure L7 telemetry for all L4 using inspection and a more "lenient" HCM filter.

@github-actions
Copy link

This issue has been automatically marked as stale because it has not had activity in the last 30 days. It will be closed in the next 7 days unless it is tagged "help wanted" or "no stalebot" or other activity occurs. Thank you for your contributions.

@github-actions github-actions bot added the stale stalebot believes this issue/PR has not been touched recently label May 15, 2022
@kyessenov kyessenov added help wanted Needs help! and removed stale stalebot believes this issue/PR has not been touched recently labels May 16, 2022
@Fishrock123
Copy link

Would it be possible to instead of having a 1:1 connection for every request, create a new backing connection + retry when a 503 is encountered?

That kind of middle option seems like it would have most of the benefit without having a dramatic performance reduction.

lucamilanesio pushed a commit to GerritCodeReview/k8s-gerrit that referenced this issue Apr 9, 2024
A bug in Emvoy keeps connections open even if the IP behind the target
hostname changed, which happens during rolling restarts [1]. This meant
that the communication between Gerrit containers using the high-
availability plugin was only working in one direction almost all the time.

This change adds ServiceEntries for the primary Gerrit pods. This works
around the issue.

[1] envoyproxy/envoy#19458

Change-Id: I899eca647a7294e0deb877ee51533d6147e1d9d6
lucamilanesio pushed a commit to GerritCodeReview/k8s-gerrit that referenced this issue Apr 9, 2024
A bug in Emvoy keeps connections open even if the IP behind the target
hostname changed, which happens during rolling restarts [1]. This meant
that the communication between Gerrit containers using the high-
availability plugin was only working in one direction almost all the time.

This change adds ServiceEntries for the primary Gerrit pods. This works
around the issue.

[1] envoyproxy/envoy#19458

Change-Id: I899eca647a7294e0deb877ee51533d6147e1d9d6
lucamilanesio pushed a commit to GerritCodeReview/k8s-gerrit that referenced this issue Apr 10, 2024
A bug in Emvoy keeps connections open even if the IP behind the target
hostname changed, which happens during rolling restarts [1]. This meant
that the communication between Gerrit containers using the high-
availability plugin was only working in one direction almost all the time.

This change adds ServiceEntries for the primary Gerrit pods. This works
around the issue.

[1] envoyproxy/envoy#19458

Change-Id: I899eca647a7294e0deb877ee51533d6147e1d9d6
lucamilanesio pushed a commit to GerritCodeReview/k8s-gerrit that referenced this issue Apr 10, 2024
A bug in Envoy keeps connections open even if the IP address the target
hostname resolves to changed, which happens during rolling restarts [1].
This meant that the communication between Gerrit containers using the high-
availability plugin was only working in one direction almost all the time.

This change adds ServiceEntries for the primary Gerrit pods. This works
around the issue.

[1] envoyproxy/envoy#19458

Change-Id: I899eca647a7294e0deb877ee51533d6147e1d9d6
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement Feature requests. Not bugs or questions. help wanted Needs help!
Projects
None yet
Development

No branches or pull requests

7 participants