-
Notifications
You must be signed in to change notification settings - Fork 5.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[BUG] Windows minion hangs loading disk.disk grain #57621
Comments
A couple out of thousands are hanging, how many thousands? Are they all reporting to the same master? And how many are hanging, like one? 10? |
Right now, 1800 windows servers mixed from 2008R2 to 2016. All running 3000.3. I've only seen this hangup happen on two servers. The root cause is almost certainly with those servers but I'd like to see if it's an error condition (e.g. WMI corruption) that salt should catch/handle instead of hanging and never completing. Last year I was also part of a deployment of 2018.3.3 to approx. 6000 Windows servers in the same data center and never came across this issue there either. They aren't all reporting to the same master, but they are all reporting back to a set of masters all the same configuration/version and running on the same server in docker containers. I'm not sure the master is too relevant since I can reproduce this masterless. |
Hmmm, and its just the same 2 servers. Sometimes you need a sanity check; have you considered swapping out the disk from the 2 servers that are hanging and putting them into other servers that aren't hanging? Just thinking outside the box here considering you have 1790-some other servers working just fine. And this is certainly something we could do here such as skip the particular grains after sometime goes by, or fill them with a null value. There's been a few other grains we've gated recently that are causing hiccups, but this seems too specific. There's another log that the minion should have, sitting inside C:\salt\var\log\minion. Is there anything interesting in there? |
What happens when you run the following command on one of the failing servers?
It should return something like the following:
|
On the offending machines:
On a similar machine in the same data center:
Looks like some WMI repository corruption. While definitely not salt's fault, it'd be nice if salt handled this more gracefully, by timing out and logging an error, instead of hanging. |
Description
A couple freshly installed windows minions (out of thousands) are hanging up during a test.ping and neither completing nor erroring.
The last line in a trace log is:
Setup
Minion configuration
There were no modificaiton to C:\salt\conf\minion
Modifications in minion.d:
Steps to Reproduce the behavior
Minion hangs. Doesn't look like Python is doing anything in task manager
Trace log output
Trace log
Expected behavior
Minion either completes loading the grain or fails with an error
Screenshots
If applicable, add screenshots to help explain your problem.
Versions Report
salt --versions-report
(Provided by running salt --versions-report. Please also mention any differences in master/minion versions.)The text was updated successfully, but these errors were encountered: