Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Minion id changes from 0.16.4 to 0.17.0, looses domain name part #7558

Closed
jkur opened this issue Oct 2, 2013 · 28 comments
Closed

Minion id changes from 0.16.4 to 0.17.0, looses domain name part #7558

jkur opened this issue Oct 2, 2013 · 28 comments

Comments

@jkur
Copy link

@jkur jkur commented Oct 2, 2013

This patch (#6734) seems to break the id grain on debian systems (and maybe many others)!

We just got an update from 0.16.4 to 0.17.0 through unattended upgrades (maybe a bad idea for salt).
Now all our minions are gone, because the id changed and new minion keys are generated, which of course are unknown to the master.
These ids do not include the domain part anymore.
This is really bad. It's not the first time the id generation changes and breaks everything. With such a crucial part of the system, i think there should be taken more care.

Sorry for the angry tone, but this is a bit frustrating.

@terminalmage
Copy link
Collaborator

@terminalmage terminalmage commented Oct 2, 2013

Do you have an /etc/hostname on these hosts?

@jkur
Copy link
Author

@jkur jkur commented Oct 2, 2013

Yes, and this is the reason why the minion id changes. But on Debian the /etc/hostname does not include the domainname.
Before #6734 the next rule would match (fqdn = socket.getfqdn() ) and then the fullname including the domain part is used.

@basepi
Copy link
Contributor

@basepi basepi commented Oct 2, 2013

For the record, I totally agree with your frustration. I was under the impression that the minion cached its ID after its first run, and that is apparently incorrect.

I think the desired behavior will be to guess the ID on the first run of the minion, then store that ID in a file somewhere. On subsequent runs, unless the ID is specified in the minion config (which should always be the authoritative source of minion ID), it should use the minion ID that was cached.

@terminalmage WRT that pull req, on systems like arch, was socket.getfqdn() not working? what was the rationale? Should we remove the /etc/hostname check, or make it lower priority, or what?

@basepi
Copy link
Contributor

@basepi basepi commented Oct 2, 2013

I actually labeled this as high severity -- I think this ID caching is important, since the master does all its key management based on minion ID.

@terminalmage
Copy link
Collaborator

@terminalmage terminalmage commented Oct 2, 2013

We were failing back to localhost on Arch Linux machines. socket.getfqdn() was returning localhost, which Salt tries to avoid, but if all else fails it does settle on localhost rather than simply bailing.

@basepi
Copy link
Contributor

@basepi basepi commented Oct 2, 2013

Ah, so what if we were to try socket.getfqdn() first, and then try /etc/hostname if it returns localhost?

@basepi
Copy link
Contributor

@basepi basepi commented Oct 2, 2013

Just switch the priority, basically.

@basepi
Copy link
Contributor

@basepi basepi commented Oct 3, 2013

Alright, just got the go-ahead from @thatch45.

The fix will be two-fold. First, we'll make socket.getfqdn() the first source for guessing the minion ID, and if it returns localhost, we'll move on to /etc/hostname. This will solve this original regression.

Second, we add caching. This means that once the minion has guessed its ID once, it will just read that ID from the cache on future starts. This will prevent future regressions. Also, if you define a minion ID in the config, that will override all other guessing/caching. The minion config is the source of truth for minion IDs.

If nobody has any objections, I'll probably start on this tomorrow or Friday.

@ghost ghost assigned basepi Oct 3, 2013
@jkur
Copy link
Author

@jkur jkur commented Oct 3, 2013

Sounds great. Especially the caching thing is highly welcome. Many thanks for all your efford.

@joehealy
Copy link
Contributor

@joehealy joehealy commented Oct 3, 2013

@baspi - if you finish the patch and a 0.17.1 is delayed, i'd be happy to do a 0.17.0-2 for debian that contained the patch.

Let me know if this would be worthwhile.

@joehealy
Copy link
Contributor

@joehealy joehealy commented Oct 3, 2013

@jkur probably also worth having a look at the "Release Streams" section on debian.saltstack.com.

Aiming to give users more control over timing of upgrades.

@basepi
Copy link
Contributor

@basepi basepi commented Oct 3, 2013

@joehealy There are a few minor show-stopping bugs in salt-ssh in 0.17.0, so I expect we will be doing a 0.17.1 next week. I don't think a -2 release will be required.

@malinoff
Copy link

@malinoff malinoff commented Oct 7, 2013

@basepi what about my notice?

Right now minion's id is the only possible way to target minions without being afraid that some minion has changed it's id and remains authenticated, instead of grains.
If a minion's id will be cached, this solution won't work.

@basepi
Copy link
Contributor

@basepi basepi commented Oct 7, 2013

@malinoff Wait, I'm confused.

So you're saying targeting by ID is the only way to target without being afraid that the ID has changed? That doesn't make sense.

Can you clarify your use case?

Note that this ID "caching" is only on the minion, so that it keeps the same ID each time it starts up. Looks like you might be thinking it's on the master.

@malinoff
Copy link

@malinoff malinoff commented Oct 8, 2013

@basepi No, i wasn't talking about that.
Imagine that a malefactor knows that we're using G@roles:project-node notation. And he's got access to a minion that has no such grain. When targeting with grains, a malefactor can specify roles:project-node grain, and he will get all information about the project. When targeting by ID, such situation would never happen.

But yes, i see that caching will not break it. Thanks for the clarify.
I mixed caching ID with caching minion's key.

@joehealy
Copy link
Contributor

@joehealy joehealy commented Oct 8, 2013

Andrii Senkovych (@jollyroger) has prepared a temporary packaging fix of this. It has been uploaded to debian.saltstack.com for squeeze, wheezy and unstable. The version is 0.17.0-2.

@basepi
Copy link
Contributor

@basepi basepi commented Oct 8, 2013

Opened a pull req with the fix: #7678

Feedback welcome.

thatch45 added a commit that referenced this issue Oct 9, 2013
Cache minion ID after first time guessing it, Fixes #7558
@LeTink
Copy link

@LeTink LeTink commented Oct 10, 2013

I can confirm that the freshly applied update on ubuntu 12.04 has the same result. I had to re-accept 2 ubuntu minions, both now w/o the FQDN. 1 Centos minion went unharmed.

@drags
Copy link

@drags drags commented Oct 10, 2013

Appreciate the quick response and re-prioritizing socket.getfqdn(). 👍

I would like to present a counter argument that minion IDs should not be cached.

Minion IDs are set to the system hostname. They are also used as the most basic (and common to new users) form of targeting jobs. Having the minion ID drift from the hostname is jarring and unexpected from a user perspective.

Two scenarios come to mind where the hostname is likely to change:

  1. Machines that do not have their final hostname during bootstrap, when the minion is most likely to be installed.
  2. Clusters that are renamed as they move from staging/beta to production.
@basepi
Copy link
Contributor

@basepi basepi commented Oct 11, 2013

Right, but even if a minion ID changes, the difficulty comes because the master doesn't smoothly transition to the new minion ID. Rather, it sees the new ID and assumes it's a new minion. You have to re-accept the key and delete the old one to prevent delays because of the old key. In most cases, the headaches caused by this change happening without you triggering it are greater than the slight annoyance of having to either define the minion ID explicitly or delete the cache to get the minion to fetch the correct hostname.

In the end, we always recommend that for most deployments you should really just be defining your minion IDs explicitly in the minion config -- this will prevent any sort of problem along these lines.

That being said, it's really easy to get the minion to guess its new ID after the bootstrapping is complete -- just delete the minion_id file from /etc/salt (or wherever your config files are stored)

@korylprince
Copy link

@korylprince korylprince commented Oct 14, 2013

Just upgraded the ubuntu packages today and ran into this issue. Was about to post about it to the mailing list and saw this. Glad to see it's already fixed. Will this be a fix that goes into the package repos (0.17.0.2 or something) or will we have to wait until 0.18.0?

Also, I like the cached minion_id. It should help with deployment also.

@basepi
Copy link
Contributor

@basepi basepi commented Oct 15, 2013

It will be in 0.17.1, which we will hopefully cut tonight or tomorrow morning, to be in the PPA by end of the week.

@DaveQB
Copy link
Contributor

@DaveQB DaveQB commented Jun 16, 2014

Interesting. I just got bitten by this. I setup a new SOE image for us. In doing so, I delete ssh and salt keys, but I was not aware of this /etc/salt/minion_id file. You can imagine the hair pulling out and the frustrations I have been facing wondering by my new hosts keys are not appearing on the salt master.

A bunch of md5sum checking and debugging finally found they were all turning up as the identical hostname of the original machine the SOE was created off.

I don't think I am adding anything to this conversation, just getting it off my chest. :)

@basepi
Copy link
Contributor

@basepi basepi commented Jun 16, 2014

Oh no! Sorry for the inconvenience, @DaveQB -- in the current version of salt, we report where the minion is getting its ID at the INFO log level, to try to make it blatantly obvious that we're using a cached minion ID. Guess we need to see if we can message it better.

@DaveQB
Copy link
Contributor

@DaveQB DaveQB commented Jun 17, 2014

@basepi I didn't notice this in the log. I used debug level too.

What has happened for me, is I have updated our SOE image. The previous had salt 13.x and so the updated one has 17.6. So now salt is getting and setting this /etc/salt/minion_id file very early in the start up process, before the final hostname is set. Causing me troubles.

I am thinking about setting an empty, immutable file at /etc/salt/minion_id in the image, so it can't be set at all.

Would an empty /etc/salt/minion_id file mean salt-minion skips this caching look up and use the systems FQDN?

@basepi
Copy link
Contributor

@basepi basepi commented Jun 17, 2014

Sorry, I should have mentioned, you can turn off minion ID caching completely by setting minion_id_caching: False in the minion config.

@DaveQB
Copy link
Contributor

@DaveQB DaveQB commented Jun 17, 2014

@basepi Oh great. I was meaning to put that question in my comment. I'll be adding that to my SOE.
Thanks!

@Alivefish
Copy link

@Alivefish Alivefish commented Jan 24, 2018

If one doesn't have bootstrap setup, and remote windows execution via smbclient/winRM tcp445 for salt-cloud isn't allowed through any firewall besides the one in your basement (or for AD and Security tools), is it possible to create a minion config for the sole purpose of baking into a golden windows image that would behave in a such a way that it would produce a dynamic minion id based on hostname_ip for the minion id vs hostname upon startup? Using hostname/fqdn for standard (understandable by design and as intended for normal ops) you end up with 10s or 100s of vm's with a minion id of the lab host you built the image from which doesn't exist in your production dns and it just goes down hill to fix from there. The master gets flooded with minions with the same id with no easy way to track down source node and automate remediation without parsing a very high volume event bus or using bad remote execution tool we are trying to rid ourselves of; I'm sure that is up for debate but bare with me.
Of course, this isn't your go forward production minion config. We'd create some kind of beacon or watcher to scan enterprise or stat postgres for id's containing 32bit address in the name (e.g. labsvr_10.0.0.5) target host by ip, push standard config, restart agent, delete old newsvr_10.0.0.5 and viola. Yes, this is dumb, dangerous, and different but eliminates a few dependencies and selling tcp/445 (which every windows exploit on the planet targets, is locked down by group policy in most cases, and requires an act of congress to open up through a firewall). Also, assuming tcp/445 is only required using winrepo method for windows vm vmware module and not for linux methods of the same IaaS. Laughs, comments, bad jokes anyone?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Linked pull requests

Successfully merging a pull request may close this issue.

10 participants
You can’t perform that action at this time.