Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Clusterbus extensions and hostname support #9530

Merged
merged 1 commit into from Jan 3, 2022

Conversation

madolson
Copy link
Contributor

@madolson madolson commented Sep 21, 2021

This PR introduces two changes, it introduces a clusterbus extension system so that we can add additional metadata and then uses that to add hostname support. I've been very slow, and it's been sitting on my laptop for awhile, so would rather publish it with context before I get hit by a bus. I will try to iterate more this week and get the code into a better shape, but would appreciate input from @redis/core-team / @ShooterIT / @zuiderkwast.

Clusterbus extension

Now we can send extra metadata after the end of the gossip information. This is a backwards compatible change, in that you can't use it between nodes of the same cluster version, but you can upgrade all the nodes in a cluster to support it and then start using it. The point of this is we want to send a consistent version of the cluster mode state along with the message, instead of introducing a separate type of message.

An alternative to this would be to add a new type of message, a hostname message. The reason I don't want to introduce this is that it adds yet another way to propagate information throughout the cluster, and introduces periods of time where one of the messages ( the MEET for example) was received but we still don't know the clusters hostname, so we might have to show the IP to incoming clients. A new message also introduces more overhead.

Another use of this extension is that I want to add display names/context names, that can be printed in place of the nodeID (the 40 character hex blob). When debugging, the 40 character hex blob is really annoying.

Hostname support

I've added a new config, "cluster-announce-hostname", which is a hostname that an externally facing client can use to connect to this node. Using the new mechanism we will send an hostname extension to all nodes, so that eventually all nodes in the cluster will know our hostname. NOTE: This is not gossiped, we don't tell other nodes about other nodes hostname's, this is just to reduce message volume. NOTE: Nodes do not talk to each other with the hostname.

You can also add a hostname to a node in existing cluster, and it will be eventually propagated to all nodes.

This hostname will be added as the 4th field to the CLUSTER SLOTS output which is the primary way clients will discover it. I'm also proposing we introduce a "cluster-preferred-endpoint-type" option to configure what type of endpoint is shown by default.

The hostname will be committed to the cluster nodes file, appended on the end of IP/port/cport information. I think it was a done in a way that supports clients, and it's actually easier to place there then to throw it at the end of the line as a positional argument.

Considerations

  1. CLUSTER SLOTS will be considered as a first class citizen, but CLUSTER NODES will be able to support it if clients want to do special work. Right now I am adding the hostname into the cluster nodes file, so that it's loaded on restart, but not considering making it terribly easy to parse. However, I know some clients try to use that to discover the topology, but I don't want to try to do anything special for them with regards to hostname support. A follow up item will be to expose a variant of CLUSTER NODES that is more client friendly.
  2. Extensions are only added to PING/PONG/MEET right now, but there is nothing blocking future implementation work to add them for other messages.

Out of scope:

  • Intra-node DNS resolution, all of the Redis cluster nodes should be in the same network (or I haven't heard a reason why to do that otherwise) so all of the connection establishment is still done through IP.
  • Doing TLS verification between nodes within the cluster based off of the hostname. This might be a useful verification, but I'm not convinced as of right now.

Tasks punted to other PRs:

  • All of the tooling should support DNS resolution, especially for redis-cli --cluster stuff. Not a critical requirement for the main release.
  • There was a follow up ask to make a version of CLUSTER NODES that is more human readable, like CLUSTER HEALTH or CLUSTER STATUS. It should be able to show the hostnames, but not necessarily be used by clients.
  • A note to myself, a lot of the tests set up non-contiguous slots, which makes CLUSTER SLOTS really slow. Might want to optimize this in tests.

@madolson madolson linked an issue Sep 21, 2021 that may be closed by this pull request
@zuiderkwast
Copy link
Contributor

Nice! If you ever get hit by a bus, it better be a cluster bus. :-)

I haven't looked at the code yet.

I think SNI verification between nodes might be useful, just as it is useful between client and cluster, for deploying a system in an untrusted network. We use mutual authentication instead though.

cluster-prefer-hostnames sounds good to me. If redirects use hostnames, that can already break clients, so if that's enabled, we can as well enable it for the first arg in CLUSTER SLOTS. But even better may be to let the client announce its capabilities (e.g. HELLO 3 hostnames).

If we ever want to add more fields to CLUSTER SLOTS, perhaps consider making the last argument a map. It may be secondary IP addresses (IPv6 and IPv4). A hostname can be resolved to multiple IP addresses though, if DNS is used, so it might not be needed for that use case.

@yossigo
Copy link
Member

yossigo commented Sep 23, 2021

@madolson great to see this making progress!

I didn't look at the implementation yet, but I suppose that any approach we take to support cluster bus upgrades should be flexible enough to support additional upgrades in the future. I'm sure we'll need that when we proceed with the ClusterV2 plans.

I support the cluster-prefer-hostnames all-or-nothing approach, so if we use hostnames we use them for everything. There will definitely be some client breakage but I think it's an opportunity to refresh them, and also migrate all to CLUSTER SLOTS while doing so.

Agree about not using DNS names for intra-node connectivity, and I think SNI validation is also not really that important there (adding other basic cert validation configuration is easier and just as good IMHO).

@dmitrypol
Copy link

it would be nice if we also could use hostnames in create cluster process. Right now you need to use IPs for that.

Copy link
Member

@yossigo yossigo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@madolson I had a quick look at the code (don't consider it a full review yet) and have a couple of small comments.
Some other thoughts/questions:

  • You mention this is semi-breaking change, but IIUC if one enables hostnames and delivers extensions to old nodes they'll just be ignored resulting with inconsistent behavior but no other breakage - right?
  • I think there's something a bit confusing about the way extensions are implemented. On one hand it's a generic mechanism with a packet-level flag and extensions count. On the other hand, extensions specifically extend the gossip section. Maybe we should consider going all the way to a more generic extensions mechanism?
  • The argument for extensions vs. new commands is atomicity of updates, but IIUC that's not the case when a node joins - it will initially receive information about other nodes without hostnames, and only later have hostnames propagated to it directly from other nodes.

src/cluster.c Outdated Show resolved Hide resolved
src/cluster.c Outdated Show resolved Hide resolved
@madolson
Copy link
Contributor Author

@yossigo

You mention this is semi-breaking change, but IIUC if one enables hostnames and delivers extensions to old nodes they'll just be ignored resulting with inconsistent behavior but no other breakage - right?

Yeah, I said it's breaking but it's really not as long as you're being deliberate. The danger here is that the extension is grouped with the ping/pong messages themselves, so that failure to parse the extensions means that the entire ping will also be rejected.

I think there's something a bit confusing about the way extensions are implemented. On one hand it's a generic mechanism with a packet-level flag and extensions count. On the other hand, extensions specifically extend the gossip section. Maybe we should consider going all the way to a more generic extensions mechanism?

Is there something specific you have in mind here that is useful? It is an extension, but it's meant to jump on the existing ping/pong structure that already exists to spread data around. The module interface is already extensible in that you can add new messages if you want. (We could implement hostnames that way as well) There is also no strong reason this mechanism couldn't be generalized to add arbitrary additional data to any of the other existing messages.

I'll also mention that I think long term this type of gossip isn't very efficient, and we probably want to figure out a better way to distribute this information in the cluster for cluster V2.

The argument for extensions vs. new commands is atomicity of updates, but IIUC that's not the case when a node joins - it will initially receive information about other nodes without hostnames, and only later have hostnames propagated to it directly from other nodes.

This is mostly right, but we do have atomicity because gossip data isn't that comprehensive. The gossiped information (IP, node name, flags, health information) is just enough so that nodes learning about a new node can reach out and ping it, it's not enough to know detailed information about the node. Specifically slots are missing, which disqualifies it from showing up in CLUSTER SLOTS. Once it has exchanged a single ping/pong message, it will then know all the information it needs to display it in cluster slots, which is where we can inject the new hostname.

This is why I made a very specific point about CLUSTER NODES as well as SNI for intra-node communication. Cluster nodes requires very deliberate parsing to understand the state, which most clients don't do very well, but the node will show up immediately without the hostname. We also can't do SNI for intra-node based on the current implementation, since we reach out to the node before knowing it's hostname. There is no hard blocker for gossiping the hostname, just seems like extra data.

@madolson madolson marked this pull request as ready for review October 1, 2021 22:47
@dmitrypol
Copy link

@madolson - also any thoughts on that idea you and I discussed to create cluster health command so that users would not have to parse cluster nodes looking for fail?

@madolson
Copy link
Contributor Author

madolson commented Oct 3, 2021

@dmitrypol It's in one of the checkboxes ;)

There was a follow up ask to make a version of CLUSTER NODES that is more human readable, like CLUSTER HEALTH or CLUSTER STATUS. It should be able to show the hostnames, but not necessarily be used by clients."

My thought was to decouple your ask from this specific PR. This is mostly code complete to my satisfaction for the core. (Also, I'll be out for a couple of weeks, so won't respond quickly)

@dmitrypol
Copy link

@dmitrypol It's in one of the checkboxes ;)

There was a follow up ask to make a version of CLUSTER NODES that is more human readable, like CLUSTER HEALTH or CLUSTER STATUS. It should be able to show the hostnames, but not necessarily be used by clients."

My thought was to decouple your ask from this specific PR. This is mostly code complete to my satisfaction for the core. (Also, I'll be out for a couple of weeks, so won't respond quickly)

My mistake, did not notice

@yossigo
Copy link
Member

yossigo commented Oct 6, 2021

@madolson

The only issue I had with the extensions is that it's a bit weird to have the flag and count at the clusterMsg, but still have to deal with extensions per clusterMsgData, but I suppose that's really the easiest way to maintain backwards compatible ping payloads. And I agree we'll probably want to move away from the gossip as it works right now anyway.

Copy link
Contributor Author

@madolson madolson left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some thoughts that came to me, I'll fix them when I'm back and have access to my laptop.

src/cluster.c Show resolved Hide resolved
src/config.c Outdated Show resolved Hide resolved
@yossigo
Copy link
Member

yossigo commented Oct 25, 2021

@madolson
Something that is related to this work and came up in a recent discussion: A primary use case for hostnames is to deal with network topologies where the cluster does not have good visibility into what addresses are exposed to clients, but assumes that a hostname will resolve to the right address on the client side.

If we stretch scenario further - the hostname itself may also not be known, or be dynamic and different for different clients. In that case, it could be useful to return something like -MOVED ::<port> (just an example) and expect a well behaved client to reuse the same address/hostname but just a different port.

There is an inherent assumption here that the client only uses ports to distinguish between cluster nodes, and that the hostname/address is identical - but I believe that is becoming the case with some network topologies that involve a service mesh proxy / load balancer / gateway / etc.

There's practically no work on the server side for this, it's only about setting a convention and communicating it to clients as part of the hostname support change. Any thoughts about this?

@dmitrypol
Copy link

@yossigo - you are absolutely correct, hostname can be different per node. Cluster can be composed of server1.domain.com:6379, server2.domain.com:6379 and server3.domain.com:6379.

@madolson
Copy link
Contributor Author

madolson commented Oct 25, 2021

@yossigo That is a good insight. An alternative to what you proposed is we could add a client config so that a client can tell the cluster the hostname/IP that is should always respond with. I think that would require a bit less client changes, as they would more focus on sending an additional command on startup as opposed to changing how interpreting the Cluster Slots/redirects function.

@dmitrypol Not sure I followed your comment, is having the server side not know the hostname a better solution for what we talked about?

I'm going to rebase and address my changes today in either case. We should be able to quickly add the changes outlined. Once this has general buy in, I'll close off on the tooling improvement.

@madolson madolson requested a review from yossigo October 26, 2021 01:49
@madolson madolson added approval-needed Waiting for core team approval to be merged state:major-decision Requires core team consensus labels Oct 26, 2021
@madolson madolson added this to Backlog in 7.0 via automation Oct 26, 2021
@madolson madolson moved this from Backlog to In progress in 7.0 Oct 26, 2021
@yossigo yossigo moved this from In progress to To Do in 7.0 Oct 26, 2021
@madolson madolson moved this from To Do to In progress in 7.0 Oct 26, 2021
@yossigo
Copy link
Member

yossigo commented Oct 26, 2021

@dmitrypol The example you provide is already part of this work, I was actually referring to something else. For example assume there are clients A and B behind different load balancers, both pointing to the same Redis Cluster. The clients may use different, locally known and locally resolved hostnames to reach those load balancers, but the cluster does not know where to redirect each client.

+-------------+              +-----------+          +----------------+               
|             | hostnameA    |           |--------->|                |               
|  Client A   |------------->|   LB A    |          |                |               
|             |              |           |          |                |               
+-------------+              +-----------+          |      Redis     |               
                                                    |     Cluster    |               
+-------------+              +-----------+          |                |               
|             | hostnameB    |           |          |                |               
|  Client B   |------------->|   LB B    |          |                |               
|             |              |           |--------->|                |               
+-------------+              +-----------+          +----------------+               

@madolson This is a good point, it involves less parsing changes. On the other hand, we're anyway introducing parsing changes, not just due to hostnames but potentially also pushing clients to finally move from CLUSTER NODES to CLUSTER SLOTS so we could try to get this all done together. I don't feel strongly either way though.

@dmitrypol
Copy link

thank you for clarifying @yossigo. I misunderstood.

panjf2000 pushed a commit to panjf2000/redis that referenced this pull request Feb 3, 2022
…edis#9530)

Implement the ability for cluster nodes to advertise their location with extension messages.
@liuchong
Copy link

This is on 7.0-rc2

root@redis-node-1:/data# redis-cli --cluster create redis-node-1:7001 redis-node-2:7002 redis-node-3:7003 redis-node-4:7004 redis-node-5:7005 redis-node-6:7006 --cluster-replicas 1
>>> Performing hash slots allocation on 6 nodes...
Master[0] -> Slots 0 - 5460
Master[1] -> Slots 5461 - 10922
Master[2] -> Slots 10923 - 16383
Adding replica redis-node-5:7005 to redis-node-1:7001
Adding replica redis-node-6:7006 to redis-node-2:7002
Adding replica redis-node-4:7004 to redis-node-3:7003
M: 8d0e1b09f6e6812c8e99e1eed82653d4d38a43e4 redis-node-1:7001
   slots:[0-5460] (5461 slots) master
M: fbc7c451d2681a24deefa06f78e4929b90634bf9 redis-node-2:7002
   slots:[5461-10922] (5462 slots) master
M: b4feae38993df2e1073cfc43d0e6f9ba5a014833 redis-node-3:7003
   slots:[10923-16383] (5461 slots) master
S: 4c3a58c2cd0217967e82f04ee3df187c78f5a84f redis-node-4:7004
   replicates b4feae38993df2e1073cfc43d0e6f9ba5a014833
S: 3a4ad7c47a3f07146581f1ad38fab7e648305865 redis-node-5:7005
   replicates 8d0e1b09f6e6812c8e99e1eed82653d4d38a43e4
S: 30379a24dd87f9b4016d3a25398a02150c1a6977 redis-node-6:7006
   replicates fbc7c451d2681a24deefa06f78e4929b90634bf9
Can I set the above configuration? (type 'yes' to accept): yes
>>> Nodes configuration updated
>>> Assign a different config epoch to each node
>>> Sending CLUSTER MEET messages to join the cluster
Node redis-node-2:7002 replied with error:
ERR Invalid node address specified: redis-node-1:7001

mention my self @liuchong for issue filter 👀

@zuiderkwast
Copy link
Contributor

@liuchong It seems as CLUSTER MEET does not accept a hostname. I guess we need to implement that.

@FarhanSajid1
Copy link

This is on 7.0-rc2

root@redis-node-1:/data# redis-cli --cluster create redis-node-1:7001 redis-node-2:7002 redis-node-3:7003 redis-node-4:7004 redis-node-5:7005 redis-node-6:7006 --cluster-replicas 1
>>> Performing hash slots allocation on 6 nodes...
Master[0] -> Slots 0 - 5460
Master[1] -> Slots 5461 - 10922
Master[2] -> Slots 10923 - 16383
Adding replica redis-node-5:7005 to redis-node-1:7001
Adding replica redis-node-6:7006 to redis-node-2:7002
Adding replica redis-node-4:7004 to redis-node-3:7003
M: 8d0e1b09f6e6812c8e99e1eed82653d4d38a43e4 redis-node-1:7001
   slots:[0-5460] (5461 slots) master
M: fbc7c451d2681a24deefa06f78e4929b90634bf9 redis-node-2:7002
   slots:[5461-10922] (5462 slots) master
M: b4feae38993df2e1073cfc43d0e6f9ba5a014833 redis-node-3:7003
   slots:[10923-16383] (5461 slots) master
S: 4c3a58c2cd0217967e82f04ee3df187c78f5a84f redis-node-4:7004
   replicates b4feae38993df2e1073cfc43d0e6f9ba5a014833
S: 3a4ad7c47a3f07146581f1ad38fab7e648305865 redis-node-5:7005
   replicates 8d0e1b09f6e6812c8e99e1eed82653d4d38a43e4
S: 30379a24dd87f9b4016d3a25398a02150c1a6977 redis-node-6:7006
   replicates fbc7c451d2681a24deefa06f78e4929b90634bf9
Can I set the above configuration? (type 'yes' to accept): yes
>>> Nodes configuration updated
>>> Assign a different config epoch to each node
>>> Sending CLUSTER MEET messages to join the cluster
Node redis-node-2:7002 replied with error:
ERR Invalid node address specified: redis-node-1:7001

mention my self @liuchong for issue filter 👀

Also seeing this

@oranagra
Copy link
Member

@zuiderkwast this is resolved by #10436, right?

@zuiderkwast
Copy link
Contributor

@oranagra That's right. I created the issue #10433 to track it too.

oranagra pushed a commit that referenced this pull request Jul 26, 2022
Gossip the cluster node blacklist in ping and pong messages.
This means that CLUSTER FORGET doesn't need to be sent to all nodes in a cluster.
It can be sent to one or more nodes and then be propagated to the rest of them.

For each blacklisted node, its node id and its remaining blacklist TTL is gossiped in a
cluster bus ping extension (introduced in #9530).
@oranagra
Copy link
Member

oranagra commented Sep 4, 2022

seen a failure in a test introduced here. i assume timing issue.
https://github.com/redis/redis-extra-ci/runs/8173232234?check_suite_focus=true

*** [err]: Verify the nodes configured with prefer hostname only show hostname for new nodes in tests/unit/cluster/hostnames.tcl
Expected '' to be equal to 'shard-2.com' (context: type eval line 39 cmd {assert_equal [lindex [get_slot_field $slot_result 0 2 3] 1] "shard-2.com"} proc ::test)

Mixficsol pushed a commit to Mixficsol/redis that referenced this pull request Apr 12, 2023
Gossip the cluster node blacklist in ping and pong messages.
This means that CLUSTER FORGET doesn't need to be sent to all nodes in a cluster.
It can be sent to one or more nodes and then be propagated to the rest of them.

For each blacklisted node, its node id and its remaining blacklist TTL is gossiped in a
cluster bus ping extension (introduced in redis#9530).
madolson added a commit that referenced this pull request Jun 18, 2023
This PR adds a human readable name to a node in clusters that are visible as part of error logs. This is useful so that admins and operators of Redis cluster have better visibility into failures without having to cross-reference the generated ID with some logical identifier (such as pod-ID or EC2 instance ID). This is mentioned in #8948. Specific nodenames can be set by using the variable cluster-announce-human-nodename. The nodename is gossiped using the clusterbus extension in #9530.

Co-authored-by: Madelyn Olson <madelyneolson@gmail.com>
enjoy-binbin pushed a commit to enjoy-binbin/redis that referenced this pull request Jul 31, 2023
Gossip the cluster node blacklist in ping and pong messages.
This means that CLUSTER FORGET doesn't need to be sent to all nodes in a cluster.
It can be sent to one or more nodes and then be propagated to the rest of them.

For each blacklisted node, its node id and its remaining blacklist TTL is gossiped in a
cluster bus ping extension (introduced in redis#9530).
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approval-needed Waiting for core team approval to be merged release-notes indication that this issue needs to be mentioned in the release notes state:major-decision Requires core team consensus state:needs-doc-pr requires a PR to redis-doc repository
Projects
Archived in project
7.0
Done
Development

Successfully merging this pull request may close these issues.

Will hostnames be supported ?
10 participants