-
Notifications
You must be signed in to change notification settings - Fork 4.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conflicts appears when changing node_name on agents #3974
Comments
There are many existing (closed) tickets about allowing consul servers to change ip address, most of them were closed thanks to using raft3 (and possibly the node-id). |
@hashicorp : what is the reason for restricting changes in node names since there is now a nodeId ? Simply because implementation was complicated ? (Update of existing services/checks...) or to be defensive to avoid clashes ? |
What is the use-case for having client agent name change without it's ID change? There may well be one but it's worth understanding why it's needed before considering the change which at least has some subtleties to think through. The main value of that on servers is that they have persistent state and participate in raft where identity and state both matter for correctness. My guess is we didn't extend renaming to work fine for agents just because it's not very clear why you would need to rename a client agent (e.g. change hostname) without also just letting it get a new ID (i.e. wiping it's persistent state). I could be wrong but I don't think there is any problem if a client agent leaves and comes back with a different name AND ID but the same IP right? |
Personally I don't have a strong user-case but I can report that we had a "mini-incident" due to this bug. We also experienced the blocked consul servers behaviour that @kamaradclimber mentioned. We usually don't rename nodes but we ended up hitting this issue due to a race condition in our provisioning pipeline ( I think it could be nice to have consul to gracefully handle this event. |
On our cluster, consul node name is the fqdn of the machine. Some of our users change their domain name, leading to an attempt to change consul node name. |
As a side note, we dont touch the node id and let consul generate it using its deterministic method. |
For my on premise solution, I have a cron job which names the machine based on its ip address and the Proxmox VMID. I currently am on v 1.0.7. If a machine is offline for a few days, it gets a new ip and the name change goes through smoothly. Recently I changed the naming scheme a little bit. A VM which had been off for a few months (Consul 0.9) came online with the old naming scheme. After updating the Consul agent and updating the cron files, I had two entries in my I just let it be and the next day, the old name was gone from the list. |
@shantanugadgil |
@banks would have feedback on that issue? |
We had to revert #3983 as it caused problems in testing and we discovered it's a breaking change which we can't include in current release cycle. We still think this is close and will add some extra details about what we need to do to get this into 1.3. |
@banks Ok, I'll give you more details about our incident as well |
…#4413 This change allow to rename any well behaving recent agent with an ID to be renamed safely, ie: without taking the name of another one with case insensitive comparison. Deprecated behaviour warning ---------------------------- Due to asceding compatibility, it is still possible however to "take" the name of another name by not providing any ID. Note that when not providing any ID, it is possible to have 2 nodes having similar names with case differences, ie: myNode and mynode which might lead to DB corruption on Consul server side and lead to server not properly restarting. See hashicorp#3983 and hashicorp#4399 for Context about this change. Disabling registration of nodes without IDs as specified in hashicorp#4414 should probably be the way to go eventually.
\o/ |
Finally!!! 👍👍👍 |
Is this fixed in 1.3.0? Because I keep getting errors similar to this all over my stack:
Sometimes it goes away with a service restart, sometimes it doesn't. |
Description of the Issue (and unexpected/desired result)
Reproduction steps
On the consul server:
Output of consul members:
consul version
for both Client and ServerClient:
1.0.6
Server:
1.0.6
(with some patches)consul info
for both Client and ServerClient:
Server:
Operating system and Environment details
centos7.3
The text was updated successfully, but these errors were encountered: