Minion-master authentication fails if minions added concurrently #5768
Labels
Bug
broken, incorrect, or confusing behavior
severity-low
4th level, cosemtic problems, work around exists
Milestone
Original thread - https://groups.google.com/forum/#!topic/salt-users/wU9Xz7QyhwQ
I'm working on a project that provides a light web interface on top of Salt (and other things). One of the things the web interface can do is add and configure a host for Salt control using SSH.
When it adds a host for Salt control, the web interface will delete all existing keys on the salt master for that host (using salt-api), configure and start up the minion on the host (over SSH), wait for the key to appear before accepting it and pinging that host to confirm (all with salt-api). The web interface can do this on multiple hosts concurrently, however when I try that, nearly all the minions crash or become unresponsive.
On the unresponsive minions, I'll see the following exception repeated -
On the hosts where the minions have died (but left the pidfile behind), if I try to start the minion manually on the console, I'll get the following before it dies -
When this happens, on the master, I'll start seeing the following line appear multiple times in the log -
[DEBUG] Failed to authenticate message
Restarting the minions does not help, however, if I restart the master (after they've all been added), then restart the minions, everything just works.
The 'add host' process also works fine if I add each host serially (i.e. one at a time). If I add hosts serially, then add others in parallel, then those hosts I add in serial will become uncontactable as well with the same symptoms. Adding 3 or more hosts in parallel consistently produces this problem (it occasionally works for 2 hosts).
I am therefore led to believe there is a threading issue somewhere inside Salt where the minion keys are stored, and if they're modified quickly within a very small timeframe with separate calls, corruption may occur that corrupts the entire in-memory store until the store is re-read when the master is restarted.
The version details are as follows (all hosts tested are identical) -
I'm not familiar enough with the encryption innards of Salt to debug there, but let me know if there's anything else I can provide that may help.
This issue maybe related to #5599.
The text was updated successfully, but these errors were encountered: