Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support for multiple proxy addresses in remote cluster connection #82366

Open
DaveCTurner opened this issue Jan 10, 2022 · 9 comments
Open

Support for multiple proxy addresses in remote cluster connection #82366

DaveCTurner opened this issue Jan 10, 2022 · 9 comments
Labels
:Distributed/Network Http and internode communication implementations >enhancement Team:Distributed Meta label for distributed team

Comments

@DaveCTurner
Copy link
Contributor

DaveCTurner commented Jan 10, 2022

A highly available setup running with proxy-mode remote cluster connections needs to be able to handle a failure of the proxy. Today a proxy-mode remote cluster connection accepts only a single address for the proxy. If the entry is a DNS name then it is resolved afresh on each connection attempt, but we only use the first resolved address each time. In practice this works ok if DNS is configured to return multiple addresses in different orders on each request but it's not ideal and may take a long time to re-establish connectivity if DNS happens to select an address to which connection attempts time out instead of actively failing. Some users configure additional middleware, or orchestrate IP address migrations, to work around this limitation.

I believe we should support multiple proxy addresses in remote cluster connections to improve the availability of remote clusters without needing additional middleware or complex orchestration steps. We should accept a list of addresses or names in the config, and recognise that each DNS name may resolve to multiple addresses too. Each connection attempt should distribute across multiple addresses properly, and ideally could keep track of connection failures and avoid known-bad addresses.

@DaveCTurner DaveCTurner added >enhancement :Distributed/Network Http and internode communication implementations labels Jan 10, 2022
@elasticmachine elasticmachine added the Team:Distributed Meta label for distributed team label Jan 10, 2022
@elasticmachine
Copy link
Collaborator

Pinging @elastic/es-distributed (Team:Distributed)

@justincr-elastic
Copy link
Contributor

@tbrooks8 mentioned that during Proxy Mode design, a Direct Mode was also considered.

For Remote Cluster security design, I think we need Direct Mode and Remote Sniff Mode.

Proxy Mode Questions:

  1. Are you planning to enhance the existing Proxy Mode, or create a new mode?
  2. Will Multiple Proxy Mode be equivalent to that previous planned Direct Mode? In other words would customers be able to configure FQDNs, short hostnames, and IP addresses? No dependency on configuring DNS is desirable.

Remote Sniff Mode Question:

  1. Is there any way to distinguish local vs remote listeners now? I can't think of any, other than Transport Profiles, but I am not sure if we want to reuse Transport Profiles for the new Cross Cluster security model. In other words, I think Remote Sniff Mode is desirable, but it will directly depend on new inbound and outbound contexts in elasticsearch.yml.

@DaveCTurner
Copy link
Contributor Author

I don't know what Direct Mode would have done exactly, but yes we'd continue to support all flavours of address lookup. I don't think we'd have a separate mode here, we'd just make it so that cluster.remote.*.proxy_address can also accept a list.

Let's discuss the remote sniff mode question elsewhere to keep this conversation on-topic for this issue about proxy mode.

@Tim-Brooks
Copy link
Contributor

When I original wrote the design document with @ywelsch I had called the mode "direct" or "simple" mode and it accepted a list of socket addresses where we open direct tcp connections to with no sniffing or knowledge of the remote cluster topology.

I had originally named it that since it did not matter if we were going through a proxy or directly connecting to remote nodes. Just a list of addresses that we round robin connect to.

But then there was the decision that since we were specifically designing this for a proxy we would call it proxy mode and only support a single address.

We can modify cluster.remote.*.proxy_address to accept multiple addresses. It does raise the question if we want it to be cluster.remote.*.proxy_addresses. And if we think that this mode would be used by non-proxy use cases it might raise the question if we are still happy with the name proxy.

@DaveCTurner
Copy link
Contributor Author

That's a good point, although we're not exactly strict about singular/plural things in other settings' names either. Even when using a proxy we've encountered users who want to use multiple for resiliency, and that's really the case I think we should address here.

Technically you don't need a proxy today to use proxy mode, you can just point it to one of the nodes in your cluster (or a DNS alias that resolves to a list of multiple nodes). OTOH if you're able to connect directly to the nodes of the remote cluster then you could reasonably use sniff mode, so if you're not doing that then this kind of implies that you're using something like a proxy.

@justincr-elastic
Copy link
Contributor

Note, sniff mode won't work for the new Remote Cluster Security. Remote will need to be on new port, with the option to use API Key instead of TLS client cert.

@justincr-elastic
Copy link
Contributor

Is further discussion required, or can we agree to add cluster.remote.*.proxy_addresses?

Tagging @gwbrown @jakelandis @n1v0lg.

@justincr-elastic
Copy link
Contributor

justincr-elastic commented Sep 14, 2022

I am planning to start a PR to add cluster.remote.*.proxy_addresses. I am proposing it will be behind an RCS 2.0 feature flag.

@gwbrown
Copy link
Contributor

gwbrown commented Sep 14, 2022

Agreed to add support for multiple proxy addresses. I think we should just extend the existing setting though, rather than requiring a plural here - we support singleton strings as well as lists of strings for the same field in many other APIs, so it aligns well there and will make backwards compatibility easier. I'm also not sure why this should be behind the feature flag, as it's not reliant on any other bits from RCS 2.0 to function and has value in the existing security model as well.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
:Distributed/Network Http and internode communication implementations >enhancement Team:Distributed Meta label for distributed team
Projects
None yet
Development

No branches or pull requests

5 participants