New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Minion id changes from 0.16.4 to 0.17.0, looses domain name part #7558

Closed
jkur opened this Issue Oct 2, 2013 · 28 comments

Comments

Projects
None yet
10 participants
@jkur

jkur commented Oct 2, 2013

This patch (#6734) seems to break the id grain on debian systems (and maybe many others)!

We just got an update from 0.16.4 to 0.17.0 through unattended upgrades (maybe a bad idea for salt).
Now all our minions are gone, because the id changed and new minion keys are generated, which of course are unknown to the master.
These ids do not include the domain part anymore.
This is really bad. It's not the first time the id generation changes and breaks everything. With such a crucial part of the system, i think there should be taken more care.

Sorry for the angry tone, but this is a bit frustrating.

@terminalmage

This comment has been minimized.

Show comment
Hide comment
@terminalmage

terminalmage Oct 2, 2013

Member

Do you have an /etc/hostname on these hosts?

Member

terminalmage commented Oct 2, 2013

Do you have an /etc/hostname on these hosts?

@jkur

This comment has been minimized.

Show comment
Hide comment
@jkur

jkur Oct 2, 2013

Yes, and this is the reason why the minion id changes. But on Debian the /etc/hostname does not include the domainname.
Before #6734 the next rule would match (fqdn = socket.getfqdn() ) and then the fullname including the domain part is used.

jkur commented Oct 2, 2013

Yes, and this is the reason why the minion id changes. But on Debian the /etc/hostname does not include the domainname.
Before #6734 the next rule would match (fqdn = socket.getfqdn() ) and then the fullname including the domain part is used.

@basepi

This comment has been minimized.

Show comment
Hide comment
@basepi

basepi Oct 2, 2013

Collaborator

For the record, I totally agree with your frustration. I was under the impression that the minion cached its ID after its first run, and that is apparently incorrect.

I think the desired behavior will be to guess the ID on the first run of the minion, then store that ID in a file somewhere. On subsequent runs, unless the ID is specified in the minion config (which should always be the authoritative source of minion ID), it should use the minion ID that was cached.

@terminalmage WRT that pull req, on systems like arch, was socket.getfqdn() not working? what was the rationale? Should we remove the /etc/hostname check, or make it lower priority, or what?

Collaborator

basepi commented Oct 2, 2013

For the record, I totally agree with your frustration. I was under the impression that the minion cached its ID after its first run, and that is apparently incorrect.

I think the desired behavior will be to guess the ID on the first run of the minion, then store that ID in a file somewhere. On subsequent runs, unless the ID is specified in the minion config (which should always be the authoritative source of minion ID), it should use the minion ID that was cached.

@terminalmage WRT that pull req, on systems like arch, was socket.getfqdn() not working? what was the rationale? Should we remove the /etc/hostname check, or make it lower priority, or what?

@basepi

This comment has been minimized.

Show comment
Hide comment
@basepi

basepi Oct 2, 2013

Collaborator

I actually labeled this as high severity -- I think this ID caching is important, since the master does all its key management based on minion ID.

Collaborator

basepi commented Oct 2, 2013

I actually labeled this as high severity -- I think this ID caching is important, since the master does all its key management based on minion ID.

@terminalmage

This comment has been minimized.

Show comment
Hide comment
@terminalmage

terminalmage Oct 2, 2013

Member

We were failing back to localhost on Arch Linux machines. socket.getfqdn() was returning localhost, which Salt tries to avoid, but if all else fails it does settle on localhost rather than simply bailing.

Member

terminalmage commented Oct 2, 2013

We were failing back to localhost on Arch Linux machines. socket.getfqdn() was returning localhost, which Salt tries to avoid, but if all else fails it does settle on localhost rather than simply bailing.

@basepi

This comment has been minimized.

Show comment
Hide comment
@basepi

basepi Oct 2, 2013

Collaborator

Ah, so what if we were to try socket.getfqdn() first, and then try /etc/hostname if it returns localhost?

Collaborator

basepi commented Oct 2, 2013

Ah, so what if we were to try socket.getfqdn() first, and then try /etc/hostname if it returns localhost?

@basepi

This comment has been minimized.

Show comment
Hide comment
@basepi

basepi Oct 2, 2013

Collaborator

Just switch the priority, basically.

Collaborator

basepi commented Oct 2, 2013

Just switch the priority, basically.

@basepi

This comment has been minimized.

Show comment
Hide comment
@basepi

basepi Oct 3, 2013

Collaborator

Alright, just got the go-ahead from @thatch45.

The fix will be two-fold. First, we'll make socket.getfqdn() the first source for guessing the minion ID, and if it returns localhost, we'll move on to /etc/hostname. This will solve this original regression.

Second, we add caching. This means that once the minion has guessed its ID once, it will just read that ID from the cache on future starts. This will prevent future regressions. Also, if you define a minion ID in the config, that will override all other guessing/caching. The minion config is the source of truth for minion IDs.

If nobody has any objections, I'll probably start on this tomorrow or Friday.

Collaborator

basepi commented Oct 3, 2013

Alright, just got the go-ahead from @thatch45.

The fix will be two-fold. First, we'll make socket.getfqdn() the first source for guessing the minion ID, and if it returns localhost, we'll move on to /etc/hostname. This will solve this original regression.

Second, we add caching. This means that once the minion has guessed its ID once, it will just read that ID from the cache on future starts. This will prevent future regressions. Also, if you define a minion ID in the config, that will override all other guessing/caching. The minion config is the source of truth for minion IDs.

If nobody has any objections, I'll probably start on this tomorrow or Friday.

@ghost ghost assigned basepi Oct 3, 2013

@jkur

This comment has been minimized.

Show comment
Hide comment
@jkur

jkur Oct 3, 2013

Sounds great. Especially the caching thing is highly welcome. Many thanks for all your efford.

jkur commented Oct 3, 2013

Sounds great. Especially the caching thing is highly welcome. Many thanks for all your efford.

@joehealy

This comment has been minimized.

Show comment
Hide comment
@joehealy

joehealy Oct 3, 2013

Contributor

@baspi - if you finish the patch and a 0.17.1 is delayed, i'd be happy to do a 0.17.0-2 for debian that contained the patch.

Let me know if this would be worthwhile.

Contributor

joehealy commented Oct 3, 2013

@baspi - if you finish the patch and a 0.17.1 is delayed, i'd be happy to do a 0.17.0-2 for debian that contained the patch.

Let me know if this would be worthwhile.

@joehealy

This comment has been minimized.

Show comment
Hide comment
@joehealy

joehealy Oct 3, 2013

Contributor

@jkur probably also worth having a look at the "Release Streams" section on debian.saltstack.com.

Aiming to give users more control over timing of upgrades.

Contributor

joehealy commented Oct 3, 2013

@jkur probably also worth having a look at the "Release Streams" section on debian.saltstack.com.

Aiming to give users more control over timing of upgrades.

@basepi

This comment has been minimized.

Show comment
Hide comment
@basepi

basepi Oct 3, 2013

Collaborator

@joehealy There are a few minor show-stopping bugs in salt-ssh in 0.17.0, so I expect we will be doing a 0.17.1 next week. I don't think a -2 release will be required.

Collaborator

basepi commented Oct 3, 2013

@joehealy There are a few minor show-stopping bugs in salt-ssh in 0.17.0, so I expect we will be doing a 0.17.1 next week. I don't think a -2 release will be required.

@malinoff

This comment has been minimized.

Show comment
Hide comment
@malinoff

malinoff Oct 7, 2013

@basepi what about my notice?

Right now minion's id is the only possible way to target minions without being afraid that some minion has changed it's id and remains authenticated, instead of grains.
If a minion's id will be cached, this solution won't work.

malinoff commented Oct 7, 2013

@basepi what about my notice?

Right now minion's id is the only possible way to target minions without being afraid that some minion has changed it's id and remains authenticated, instead of grains.
If a minion's id will be cached, this solution won't work.

@basepi

This comment has been minimized.

Show comment
Hide comment
@basepi

basepi Oct 7, 2013

Collaborator

@malinoff Wait, I'm confused.

So you're saying targeting by ID is the only way to target without being afraid that the ID has changed? That doesn't make sense.

Can you clarify your use case?

Note that this ID "caching" is only on the minion, so that it keeps the same ID each time it starts up. Looks like you might be thinking it's on the master.

Collaborator

basepi commented Oct 7, 2013

@malinoff Wait, I'm confused.

So you're saying targeting by ID is the only way to target without being afraid that the ID has changed? That doesn't make sense.

Can you clarify your use case?

Note that this ID "caching" is only on the minion, so that it keeps the same ID each time it starts up. Looks like you might be thinking it's on the master.

@malinoff

This comment has been minimized.

Show comment
Hide comment
@malinoff

malinoff Oct 8, 2013

@basepi No, i wasn't talking about that.
Imagine that a malefactor knows that we're using G@roles:project-node notation. And he's got access to a minion that has no such grain. When targeting with grains, a malefactor can specify roles:project-node grain, and he will get all information about the project. When targeting by ID, such situation would never happen.

But yes, i see that caching will not break it. Thanks for the clarify.
I mixed caching ID with caching minion's key.

malinoff commented Oct 8, 2013

@basepi No, i wasn't talking about that.
Imagine that a malefactor knows that we're using G@roles:project-node notation. And he's got access to a minion that has no such grain. When targeting with grains, a malefactor can specify roles:project-node grain, and he will get all information about the project. When targeting by ID, such situation would never happen.

But yes, i see that caching will not break it. Thanks for the clarify.
I mixed caching ID with caching minion's key.

@joehealy

This comment has been minimized.

Show comment
Hide comment
@joehealy

joehealy Oct 8, 2013

Contributor

Andrii Senkovych (@jollyroger) has prepared a temporary packaging fix of this. It has been uploaded to debian.saltstack.com for squeeze, wheezy and unstable. The version is 0.17.0-2.

Contributor

joehealy commented Oct 8, 2013

Andrii Senkovych (@jollyroger) has prepared a temporary packaging fix of this. It has been uploaded to debian.saltstack.com for squeeze, wheezy and unstable. The version is 0.17.0-2.

@basepi

This comment has been minimized.

Show comment
Hide comment
@basepi

basepi Oct 8, 2013

Collaborator

Opened a pull req with the fix: #7678

Feedback welcome.

Collaborator

basepi commented Oct 8, 2013

Opened a pull req with the fix: #7678

Feedback welcome.

@thatch45 thatch45 closed this in #7678 Oct 9, 2013

thatch45 added a commit that referenced this issue Oct 9, 2013

Merge pull request #7678 from basepi/idcache7558
Cache minion ID after first time guessing it, Fixes #7558
@LeTink

This comment has been minimized.

Show comment
Hide comment
@LeTink

LeTink Oct 10, 2013

I can confirm that the freshly applied update on ubuntu 12.04 has the same result. I had to re-accept 2 ubuntu minions, both now w/o the FQDN. 1 Centos minion went unharmed.

LeTink commented Oct 10, 2013

I can confirm that the freshly applied update on ubuntu 12.04 has the same result. I had to re-accept 2 ubuntu minions, both now w/o the FQDN. 1 Centos minion went unharmed.

@drags

This comment has been minimized.

Show comment
Hide comment
@drags

drags Oct 10, 2013

Appreciate the quick response and re-prioritizing socket.getfqdn(). 👍

I would like to present a counter argument that minion IDs should not be cached.

Minion IDs are set to the system hostname. They are also used as the most basic (and common to new users) form of targeting jobs. Having the minion ID drift from the hostname is jarring and unexpected from a user perspective.

Two scenarios come to mind where the hostname is likely to change:

  1. Machines that do not have their final hostname during bootstrap, when the minion is most likely to be installed.
  2. Clusters that are renamed as they move from staging/beta to production.

drags commented Oct 10, 2013

Appreciate the quick response and re-prioritizing socket.getfqdn(). 👍

I would like to present a counter argument that minion IDs should not be cached.

Minion IDs are set to the system hostname. They are also used as the most basic (and common to new users) form of targeting jobs. Having the minion ID drift from the hostname is jarring and unexpected from a user perspective.

Two scenarios come to mind where the hostname is likely to change:

  1. Machines that do not have their final hostname during bootstrap, when the minion is most likely to be installed.
  2. Clusters that are renamed as they move from staging/beta to production.
@basepi

This comment has been minimized.

Show comment
Hide comment
@basepi

basepi Oct 11, 2013

Collaborator

Right, but even if a minion ID changes, the difficulty comes because the master doesn't smoothly transition to the new minion ID. Rather, it sees the new ID and assumes it's a new minion. You have to re-accept the key and delete the old one to prevent delays because of the old key. In most cases, the headaches caused by this change happening without you triggering it are greater than the slight annoyance of having to either define the minion ID explicitly or delete the cache to get the minion to fetch the correct hostname.

In the end, we always recommend that for most deployments you should really just be defining your minion IDs explicitly in the minion config -- this will prevent any sort of problem along these lines.

That being said, it's really easy to get the minion to guess its new ID after the bootstrapping is complete -- just delete the minion_id file from /etc/salt (or wherever your config files are stored)

Collaborator

basepi commented Oct 11, 2013

Right, but even if a minion ID changes, the difficulty comes because the master doesn't smoothly transition to the new minion ID. Rather, it sees the new ID and assumes it's a new minion. You have to re-accept the key and delete the old one to prevent delays because of the old key. In most cases, the headaches caused by this change happening without you triggering it are greater than the slight annoyance of having to either define the minion ID explicitly or delete the cache to get the minion to fetch the correct hostname.

In the end, we always recommend that for most deployments you should really just be defining your minion IDs explicitly in the minion config -- this will prevent any sort of problem along these lines.

That being said, it's really easy to get the minion to guess its new ID after the bootstrapping is complete -- just delete the minion_id file from /etc/salt (or wherever your config files are stored)

@korylprince

This comment has been minimized.

Show comment
Hide comment
@korylprince

korylprince Oct 14, 2013

Just upgraded the ubuntu packages today and ran into this issue. Was about to post about it to the mailing list and saw this. Glad to see it's already fixed. Will this be a fix that goes into the package repos (0.17.0.2 or something) or will we have to wait until 0.18.0?

Also, I like the cached minion_id. It should help with deployment also.

korylprince commented Oct 14, 2013

Just upgraded the ubuntu packages today and ran into this issue. Was about to post about it to the mailing list and saw this. Glad to see it's already fixed. Will this be a fix that goes into the package repos (0.17.0.2 or something) or will we have to wait until 0.18.0?

Also, I like the cached minion_id. It should help with deployment also.

@basepi

This comment has been minimized.

Show comment
Hide comment
@basepi

basepi Oct 15, 2013

Collaborator

It will be in 0.17.1, which we will hopefully cut tonight or tomorrow morning, to be in the PPA by end of the week.

Collaborator

basepi commented Oct 15, 2013

It will be in 0.17.1, which we will hopefully cut tonight or tomorrow morning, to be in the PPA by end of the week.

@DaveQB

This comment has been minimized.

Show comment
Hide comment
@DaveQB

DaveQB Jun 16, 2014

Contributor

Interesting. I just got bitten by this. I setup a new SOE image for us. In doing so, I delete ssh and salt keys, but I was not aware of this /etc/salt/minion_id file. You can imagine the hair pulling out and the frustrations I have been facing wondering by my new hosts keys are not appearing on the salt master.

A bunch of md5sum checking and debugging finally found they were all turning up as the identical hostname of the original machine the SOE was created off.

I don't think I am adding anything to this conversation, just getting it off my chest. :)

Contributor

DaveQB commented Jun 16, 2014

Interesting. I just got bitten by this. I setup a new SOE image for us. In doing so, I delete ssh and salt keys, but I was not aware of this /etc/salt/minion_id file. You can imagine the hair pulling out and the frustrations I have been facing wondering by my new hosts keys are not appearing on the salt master.

A bunch of md5sum checking and debugging finally found they were all turning up as the identical hostname of the original machine the SOE was created off.

I don't think I am adding anything to this conversation, just getting it off my chest. :)

@basepi

This comment has been minimized.

Show comment
Hide comment
@basepi

basepi Jun 16, 2014

Collaborator

Oh no! Sorry for the inconvenience, @DaveQB -- in the current version of salt, we report where the minion is getting its ID at the INFO log level, to try to make it blatantly obvious that we're using a cached minion ID. Guess we need to see if we can message it better.

Collaborator

basepi commented Jun 16, 2014

Oh no! Sorry for the inconvenience, @DaveQB -- in the current version of salt, we report where the minion is getting its ID at the INFO log level, to try to make it blatantly obvious that we're using a cached minion ID. Guess we need to see if we can message it better.

@DaveQB

This comment has been minimized.

Show comment
Hide comment
@DaveQB

DaveQB Jun 17, 2014

Contributor

@basepi I didn't notice this in the log. I used debug level too.

What has happened for me, is I have updated our SOE image. The previous had salt 13.x and so the updated one has 17.6. So now salt is getting and setting this /etc/salt/minion_id file very early in the start up process, before the final hostname is set. Causing me troubles.

I am thinking about setting an empty, immutable file at /etc/salt/minion_id in the image, so it can't be set at all.

Would an empty /etc/salt/minion_id file mean salt-minion skips this caching look up and use the systems FQDN?

Contributor

DaveQB commented Jun 17, 2014

@basepi I didn't notice this in the log. I used debug level too.

What has happened for me, is I have updated our SOE image. The previous had salt 13.x and so the updated one has 17.6. So now salt is getting and setting this /etc/salt/minion_id file very early in the start up process, before the final hostname is set. Causing me troubles.

I am thinking about setting an empty, immutable file at /etc/salt/minion_id in the image, so it can't be set at all.

Would an empty /etc/salt/minion_id file mean salt-minion skips this caching look up and use the systems FQDN?

@basepi

This comment has been minimized.

Show comment
Hide comment
@basepi

basepi Jun 17, 2014

Collaborator

Sorry, I should have mentioned, you can turn off minion ID caching completely by setting minion_id_caching: False in the minion config.

Collaborator

basepi commented Jun 17, 2014

Sorry, I should have mentioned, you can turn off minion ID caching completely by setting minion_id_caching: False in the minion config.

@DaveQB

This comment has been minimized.

Show comment
Hide comment
@DaveQB

DaveQB Jun 17, 2014

Contributor

@basepi Oh great. I was meaning to put that question in my comment. I'll be adding that to my SOE.
Thanks!

Contributor

DaveQB commented Jun 17, 2014

@basepi Oh great. I was meaning to put that question in my comment. I'll be adding that to my SOE.
Thanks!

@Alivefish

This comment has been minimized.

Show comment
Hide comment
@Alivefish

Alivefish Jan 24, 2018

If one doesn't have bootstrap setup, and remote windows execution via smbclient/winRM tcp445 for salt-cloud isn't allowed through any firewall besides the one in your basement (or for AD and Security tools), is it possible to create a minion config for the sole purpose of baking into a golden windows image that would behave in a such a way that it would produce a dynamic minion id based on hostname_ip for the minion id vs hostname upon startup? Using hostname/fqdn for standard (understandable by design and as intended for normal ops) you end up with 10s or 100s of vm's with a minion id of the lab host you built the image from which doesn't exist in your production dns and it just goes down hill to fix from there. The master gets flooded with minions with the same id with no easy way to track down source node and automate remediation without parsing a very high volume event bus or using bad remote execution tool we are trying to rid ourselves of; I'm sure that is up for debate but bare with me.
Of course, this isn't your go forward production minion config. We'd create some kind of beacon or watcher to scan enterprise or stat postgres for id's containing 32bit address in the name (e.g. labsvr_10.0.0.5) target host by ip, push standard config, restart agent, delete old newsvr_10.0.0.5 and viola. Yes, this is dumb, dangerous, and different but eliminates a few dependencies and selling tcp/445 (which every windows exploit on the planet targets, is locked down by group policy in most cases, and requires an act of congress to open up through a firewall). Also, assuming tcp/445 is only required using winrepo method for windows vm vmware module and not for linux methods of the same IaaS. Laughs, comments, bad jokes anyone?

Alivefish commented Jan 24, 2018

If one doesn't have bootstrap setup, and remote windows execution via smbclient/winRM tcp445 for salt-cloud isn't allowed through any firewall besides the one in your basement (or for AD and Security tools), is it possible to create a minion config for the sole purpose of baking into a golden windows image that would behave in a such a way that it would produce a dynamic minion id based on hostname_ip for the minion id vs hostname upon startup? Using hostname/fqdn for standard (understandable by design and as intended for normal ops) you end up with 10s or 100s of vm's with a minion id of the lab host you built the image from which doesn't exist in your production dns and it just goes down hill to fix from there. The master gets flooded with minions with the same id with no easy way to track down source node and automate remediation without parsing a very high volume event bus or using bad remote execution tool we are trying to rid ourselves of; I'm sure that is up for debate but bare with me.
Of course, this isn't your go forward production minion config. We'd create some kind of beacon or watcher to scan enterprise or stat postgres for id's containing 32bit address in the name (e.g. labsvr_10.0.0.5) target host by ip, push standard config, restart agent, delete old newsvr_10.0.0.5 and viola. Yes, this is dumb, dangerous, and different but eliminates a few dependencies and selling tcp/445 (which every windows exploit on the planet targets, is locked down by group policy in most cases, and requires an act of congress to open up through a firewall). Also, assuming tcp/445 is only required using winrepo method for windows vm vmware module and not for linux methods of the same IaaS. Laughs, comments, bad jokes anyone?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment