Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[BUG] Windows minion hangs loading disk.disk grain #57621

Open
cruscio opened this issue Jun 10, 2020 · 6 comments
Open

[BUG] Windows minion hangs loading disk.disk grain #57621

cruscio opened this issue Jun 10, 2020 · 6 comments
Labels
Bug broken, incorrect, or confusing behavior severity-high 2nd top severity, seen by most users, causes major problems Windows
Projects

Comments

@cruscio
Copy link

cruscio commented Jun 10, 2020

Description
A couple freshly installed windows minions (out of thousands) are hanging up during a test.ping and neither completing nor erroring.

The last line in a trace log is:

[TRACE   ] Loading disks.disks grain

Setup

Minion configuration

There were no modificaiton to C:\salt\conf\minion
Modifications in minion.d:

PS C:\salt> gci C:\salt\conf\minion.d\ | foreach {"$_`n------"; $_ | get-content; "------`n"}
loglevel.conf
------
log_level: quiet
log_level_logfile: warning
log_rotate_max_bytes: 512000
log_rotate_backup_count: 10
------

master_id.conf
------
master: *********
------

minion_id.conf
------
id: *********
minion_id_caching: False
------

recon.conf
------
recon_default: 1000
recon_max: 120000
recon_randomize: True
random_reauth_delay: 120
------

startupdelay.conf
------
#random_startup_delay: 30
------

transport.conf
------
transport: tcp
ssl:
  ssl_version: PROTOCOL_TLSv1_2
publish_port: ****
master_port: ****
------

transport_keepalive.conf
------
schedule:
  minion_keepalive_ping:
    function: test.ping
    seconds: 120
    splay: 120
master_alive_interval: 230
ping_interval: 2
------

_schedule.conf
------
schedule:
  __mine_interval: {enabled: true, function: mine.update, jid_include: true, maxrunning: 2,
    minutes: 60, return_job: false, run_on_start: true}
  minion_keepalive_ping: {function: test.ping, seconds: 120, splay: 120}
------

Steps to Reproduce the behavior

C:\Windows\system32>c:\salt\salt-call.bat --local --log-level=trace test.ping

Minion hangs. Doesn't look like Python is doing anything in task manager

Trace log output

Trace log

C:\Windows\system32>c:\salt\salt-call.bat --local --log-level=trace test.ping
[DEBUG   ] Reading configuration from c:\salt\conf\minion
[DEBUG   ] Including configuration from 'c:\salt\conf\minion.d\_schedule.conf'
[DEBUG   ] Reading configuration from c:\salt\conf\minion.d\_schedule.conf
[DEBUG   ] Including configuration from 'c:\salt\conf\minion.d\loglevel.conf'
[DEBUG   ] Reading configuration from c:\salt\conf\minion.d\loglevel.conf
[DEBUG   ] Including configuration from 'c:\salt\conf\minion.d\master_id.conf'
[DEBUG   ] Reading configuration from c:\salt\conf\minion.d\master_id.conf
[DEBUG   ] Including configuration from 'c:\salt\conf\minion.d\minion_id.conf'
[DEBUG   ] Reading configuration from c:\salt\conf\minion.d\minion_id.conf
[DEBUG   ] Including configuration from 'c:\salt\conf\minion.d\recon.conf'
[DEBUG   ] Reading configuration from c:\salt\conf\minion.d\recon.conf
[DEBUG   ] Including configuration from 'c:\salt\conf\minion.d\startupdelay.conf'
[DEBUG   ] Reading configuration from c:\salt\conf\minion.d\startupdelay.conf
[DEBUG   ] Including configuration from 'c:\salt\conf\minion.d\transport.conf'
[DEBUG   ] Reading configuration from c:\salt\conf\minion.d\transport.conf
[DEBUG   ] Including configuration from 'c:\salt\conf\minion.d\transport_keepalive.conf'
[DEBUG   ] Reading configuration from c:\salt\conf\minion.d\transport_keepalive.conf
[TRACE   ] The required configuration section, 'fluent_handler', was not found the in the configuration. Not loading the fluent logging handlers module.
[TRACE   ] None of the required configuration sections, 'logstash_udp_handler' and 'logstash_zmq_handler', were found in the configuration. Not loading the Logstash logging handlers module.
[DEBUG   ] Configuration file path: c:\salt\conf\minion
[WARNING ] Insecure logging configuration detected! Sensitive data may be logged.
[DEBUG   ] Grains refresh requested. Refreshing grains.
[DEBUG   ] Reading configuration from c:\salt\conf\minion
[DEBUG   ] Including configuration from 'c:\salt\conf\minion.d\_schedule.conf'
[DEBUG   ] Reading configuration from c:\salt\conf\minion.d\_schedule.conf
[DEBUG   ] Including configuration from 'c:\salt\conf\minion.d\loglevel.conf'
[DEBUG   ] Reading configuration from c:\salt\conf\minion.d\loglevel.conf
[DEBUG   ] Including configuration from 'c:\salt\conf\minion.d\master_id.conf'
[DEBUG   ] Reading configuration from c:\salt\conf\minion.d\master_id.conf
[DEBUG   ] Including configuration from 'c:\salt\conf\minion.d\minion_id.conf'
[DEBUG   ] Reading configuration from c:\salt\conf\minion.d\minion_id.conf
[DEBUG   ] Including configuration from 'c:\salt\conf\minion.d\recon.conf'
[DEBUG   ] Reading configuration from c:\salt\conf\minion.d\recon.conf
[DEBUG   ] Including configuration from 'c:\salt\conf\minion.d\startupdelay.conf'
[DEBUG   ] Reading configuration from c:\salt\conf\minion.d\startupdelay.conf
[DEBUG   ] Including configuration from 'c:\salt\conf\minion.d\transport.conf'
[DEBUG   ] Reading configuration from c:\salt\conf\minion.d\transport.conf
[DEBUG   ] Including configuration from 'c:\salt\conf\minion.d\transport_keepalive.conf'
[DEBUG   ] Reading configuration from c:\salt\conf\minion.d\transport_keepalive.conf
[TRACE   ] 'dig' could not be found in the following search path: '['C:\\Windows\\system32', 'C:\\Windows', 'C:\\Windows\\System32\\Wbem', 'C:\\Windows\\System32\\WindowsPowerShell
\\v1.0\\', 'C:\\Program Files\\Microsoft SQL Server\\120\\DTS\\Binn\\', 'C:\\Program Files\\Microsoft SQL Server\\Client SDK\\ODBC\\110\\Tools\\Binn\\', 'C:\\Program Files (x86)\\M
icrosoft SQL Server\\120\\Tools\\Binn\\', 'C:\\Program Files\\Microsoft SQL Server\\120\\Tools\\Binn\\', 'C:\\Program Files (x86)\\Microsoft SQL Server\\120\\Tools\\Binn\\Managemen
tStudio\\', 'C:\\Program Files (x86)\\Microsoft SQL Server\\120\\DTS\\Binn\\', 'C:\\salt', 'c:\\salt\\bin\\lib\\site-packages\\pywin32_system32']'
[TRACE   ] 'drill' could not be found in the following search path: '['C:\\Windows\\system32', 'C:\\Windows', 'C:\\Windows\\System32\\Wbem', 'C:\\Windows\\System32\\WindowsPowerShe
ll\\v1.0\\', 'C:\\Program Files\\Microsoft SQL Server\\120\\DTS\\Binn\\', 'C:\\Program Files\\Microsoft SQL Server\\Client SDK\\ODBC\\110\\Tools\\Binn\\', 'C:\\Program Files (x86)\
\Microsoft SQL Server\\120\\Tools\\Binn\\', 'C:\\Program Files\\Microsoft SQL Server\\120\\Tools\\Binn\\', 'C:\\Program Files (x86)\\Microsoft SQL Server\\120\\Tools\\Binn\\Managem
entStudio\\', 'C:\\Program Files (x86)\\Microsoft SQL Server\\120\\DTS\\Binn\\', 'C:\\salt', 'c:\\salt\\bin\\lib\\site-packages\\pywin32_system32']'
[TRACE   ] 'host' could not be found in the following search path: '['C:\\Windows\\system32', 'C:\\Windows', 'C:\\Windows\\System32\\Wbem', 'C:\\Windows\\System32\\WindowsPowerShel
l\\v1.0\\', 'C:\\Program Files\\Microsoft SQL Server\\120\\DTS\\Binn\\', 'C:\\Program Files\\Microsoft SQL Server\\Client SDK\\ODBC\\110\\Tools\\Binn\\', 'C:\\Program Files (x86)\\
Microsoft SQL Server\\120\\Tools\\Binn\\', 'C:\\Program Files\\Microsoft SQL Server\\120\\Tools\\Binn\\', 'C:\\Program Files (x86)\\Microsoft SQL Server\\120\\Tools\\Binn\\Manageme
ntStudio\\', 'C:\\Program Files (x86)\\Microsoft SQL Server\\120\\DTS\\Binn\\', 'C:\\salt', 'c:\\salt\\bin\\lib\\site-packages\\pywin32_system32']'
[TRACE   ] 'esxcli' could not be found in the following search path: '['C:\\Windows\\system32', 'C:\\Windows', 'C:\\Windows\\System32\\Wbem', 'C:\\Windows\\System32\\WindowsPowerSh
ell\\v1.0\\', 'C:\\Program Files\\Microsoft SQL Server\\120\\DTS\\Binn\\', 'C:\\Program Files\\Microsoft SQL Server\\Client SDK\\ODBC\\110\\Tools\\Binn\\', 'C:\\Program Files (x86)
\\Microsoft SQL Server\\120\\Tools\\Binn\\', 'C:\\Program Files\\Microsoft SQL Server\\120\\Tools\\Binn\\', 'C:\\Program Files (x86)\\Microsoft SQL Server\\120\\Tools\\Binn\\Manage
mentStudio\\', 'C:\\Program Files (x86)\\Microsoft SQL Server\\120\\DTS\\Binn\\', 'C:\\salt', 'c:\\salt\\bin\\lib\\site-packages\\pywin32_system32']'
[TRACE   ] Depends decorator instantiated with dep list of (False,) and kwargs {}
[TRACE   ] Depends decorator instantiated with dep list of (False,) and kwargs {}
[... snip ...]
][TRACE   ] Depends decorator instantiated with dep list of (False,) and kwargs {}
[TRACE   ] Depends decorator instantiated with dep list of (False,) and kwargs {}
[TRACE   ] Depends decorator instantiated with dep list of (False, False) and kwargs {}
[TRACE   ] Depends decorator instantiated with dep list of (False, False) and kwargs {}
[TRACE   ] Depends decorator instantiated with dep list of (False, False) and kwargs {}
[TRACE   ] Depends decorator instantiated with dep list of (False, False) and kwargs {}
[TRACE   ] Depends decorator instantiated with dep list of (False, False) and kwargs {}
[TRACE   ] Depends decorator instantiated with dep list of (False, False) and kwargs {}
[TRACE   ] Depends decorator instantiated with dep list of (False, False) and kwargs {}
[TRACE   ] Depends decorator instantiated with dep list of (False, False) and kwargs {}
[TRACE   ] Depends decorator instantiated with dep list of (False,) and kwargs {}
[TRACE   ] Depends decorator instantiated with dep list of (False,) and kwargs {}
[TRACE   ] Depends decorator instantiated with dep list of (False,) and kwargs {}
[TRACE   ] Depends decorator instantiated with dep list of (False,) and kwargs {}
[TRACE   ] Depends decorator instantiated with dep list of (False,) and kwargs {}
[TRACE   ] Depends decorator instantiated with dep list of (False,) and kwargs {}
[TRACE   ] Depends decorator instantiated with dep list of (False,) and kwargs {}
[TRACE   ] Error loading grains.napalm: "napalm"" (c:\salt\bin\lib\site-packages\salt-3000.3-py3.5.egg\salt\grains\napalm.py) cannot be loaded: NAPALM is not installed: ``pip insta
ll napalm``
[TRACE   ] Error loading grains.zfs: ZFS: Not available on Windows
[TRACE   ] Loading core.append_domain grain
[TRACE   ] Loading core.cwd grain
[TRACE   ] Loading core.default_gateway grain
[TRACE   ] 'ip' could not be found in the following search path: '['C:\\Windows\\system32', 'C:\\Windows', 'C:\\Windows\\System32\\Wbem', 'C:\\Windows\\System32\\WindowsPowerShell\
\v1.0\\', 'C:\\Program Files\\Microsoft SQL Server\\120\\DTS\\Binn\\', 'C:\\Program Files\\Microsoft SQL Server\\Client SDK\\ODBC\\110\\Tools\\Binn\\', 'C:\\Program Files (x86)\\Mi
crosoft SQL Server\\120\\Tools\\Binn\\', 'C:\\Program Files\\Microsoft SQL Server\\120\\Tools\\Binn\\', 'C:\\Program Files (x86)\\Microsoft SQL Server\\120\\Tools\\Binn\\Management
Studio\\', 'C:\\Program Files (x86)\\Microsoft SQL Server\\120\\DTS\\Binn\\', 'C:\\salt', 'c:\\salt\\bin\\lib\\site-packages\\pywin32_system32']'
[TRACE   ] Loading core.dns grain
[TRACE   ] Loading core.fqdns grain
[TRACE   ] Loading core.get_machine_id grain
[TRACE   ] Loading core.get_master grain
[TRACE   ] Loading core.get_server_id grain
[TRACE   ] Loading core.hostname grain
[TRACE   ] Loading core.hwaddr_interfaces grain
[TRACE   ] Loading core.id_ grain
[TRACE   ] Loading core.ip4_interfaces grain
[TRACE   ] Loading core.ip6_interfaces grain
[TRACE   ] Loading core.ip_fqdn grain
[TRACE   ] Loading core.ip_interfaces grain
[TRACE   ] Loading core.linux_distribution grain
[TRACE   ] Loading core.locale_info grain
[TRACE   ] Loading core.os_data grain
[TRACE   ] Loading core.path grain
[TRACE   ] Loading core.pythonexecutable grain
[TRACE   ] Loading core.pythonpath grain
[TRACE   ] Loading core.pythonversion grain
[TRACE   ] Loading core.saltpath grain
[TRACE   ] Loading core.saltversion grain
[TRACE   ] Loading core.saltversioninfo grain
[TRACE   ] Loading core.zmqversion grain
[TRACE   ] Loading disks.disks grain

Expected behavior
Minion either completes loading the grain or fails with an error

Screenshots
If applicable, add screenshots to help explain your problem.

Versions Report

salt --versions-report (Provided by running salt --versions-report. Please also mention any differences in master/minion versions.)
C:\Windows\system32>c:\salt\salt-call.bat --versions-report
Salt Version:
           Salt: 3000.3

Dependency Versions:
           cffi: 1.12.2
       cherrypy: 17.4.1
       dateutil: 2.8.0
      docker-py: Not Installed
          gitdb: 2.0.5
      gitpython: Not Installed
         Jinja2: 2.10.1
        libgit2: Not Installed
       M2Crypto: Not Installed
           Mako: 1.0.7
   msgpack-pure: Not Installed
 msgpack-python: 0.5.6
   mysql-python: Not Installed
      pycparser: 2.19
       pycrypto: Not Installed
   pycryptodome: 3.8.1
         pygit2: Not Installed
         Python: 3.5.4 (v3.5.4:3f56838, Aug  8 2017, 02:17:05) [MSC v.1900 64 bit (AMD64)]
   python-gnupg: 0.4.4
         PyYAML: 5.1.2
          PyZMQ: 18.0.1
          smmap: 2.0.5
        timelib: 0.2.4
        Tornado: 4.5.3
            ZMQ: 4.3.1

System Versions:
           dist:
         locale: cp1252
        machine: AMD64
        release: 2012ServerR2
         system: Windows
        version: 2012ServerR2 6.3.9600 SP0 Multiprocessor Free
@cruscio cruscio added the Bug broken, incorrect, or confusing behavior label Jun 10, 2020
@frogunder
Copy link
Contributor

@cruscio Thank you for reporting this issue.

@twangboy @xeacott Do you guys have any thoughts on this issue?

Thanks.

@frogunder frogunder added this to the Approved milestone Jun 11, 2020
@frogunder frogunder added severity-high 2nd top severity, seen by most users, causes major problems Windows labels Jun 11, 2020
@sagetherage sagetherage added the Magnesium Mg release after Na prior to Al label Jun 15, 2020
@xeacott
Copy link
Contributor

xeacott commented Jun 15, 2020

A couple out of thousands are hanging, how many thousands? Are they all reporting to the same master? And how many are hanging, like one? 10?

@cruscio
Copy link
Author

cruscio commented Jun 16, 2020

Right now, 1800 windows servers mixed from 2008R2 to 2016. All running 3000.3.

I've only seen this hangup happen on two servers. The root cause is almost certainly with those servers but I'd like to see if it's an error condition (e.g. WMI corruption) that salt should catch/handle instead of hanging and never completing.

Last year I was also part of a deployment of 2018.3.3 to approx. 6000 Windows servers in the same data center and never came across this issue there either.

They aren't all reporting to the same master, but they are all reporting back to a set of masters all the same configuration/version and running on the same server in docker containers. I'm not sure the master is too relevant since I can reproduce this masterless.

@xeacott
Copy link
Contributor

xeacott commented Jun 17, 2020

Hmmm, and its just the same 2 servers. Sometimes you need a sanity check; have you considered swapping out the disk from the 2 servers that are hanging and putting them into other servers that aren't hanging?

Just thinking outside the box here considering you have 1790-some other servers working just fine. And this is certainly something we could do here such as skip the particular grains after sometime goes by, or fill them with a null value. There's been a few other grains we've gated recently that are causing hiccups, but this seems too specific.

There's another log that the minion should have, sitting inside C:\salt\var\log\minion. Is there anything interesting in there?

@twangboy
Copy link
Contributor

twangboy commented Jun 17, 2020

What happens when you run the following command on one of the failing servers?

wmic /namespace:\\root\microsoft\windows\storage path MSFT_PhysicalDisk get DeviceID,MediaType /format:table

It should return something like the following:

DeviceId  MediaType
0         4

@cruscio
Copy link
Author

cruscio commented Jul 1, 2020

On the offending machines:

PS> wmic /namespace:\\root\microsoft\windows\storage path MSFT_PhysicalDisk get DeviceID,MediaType /format:table
Invalid GET Expression.

On a similar machine in the same data center:

PS> wmic /namespace:\\root\microsoft\windows\storage path MSFT_PhysicalDisk get DeviceID,MediaType /format:table
DeviceId  MediaType
0         0
1         0

Looks like some WMI repository corruption.

While definitely not salt's fault, it'd be nice if salt handled this more gracefully, by timing out and logging an error, instead of hanging.

@sagetherage sagetherage modified the milestones: Approved, Magnesium Jul 14, 2020
@sagetherage sagetherage removed the Magnesium Mg release after Na prior to Al label Oct 8, 2020
@sagetherage sagetherage modified the milestones: Magnesium, Approved Oct 8, 2020
@xeacott xeacott added this to To do in Windows Jan 11, 2021
@sagetherage sagetherage added the Phosphorus v3005.0 Release code name and version label Jun 18, 2021
@sagetherage sagetherage modified the milestones: Approved, Phosphorus Jun 18, 2021
@Ch3LL Ch3LL removed the Phosphorus v3005.0 Release code name and version label Mar 30, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Bug broken, incorrect, or confusing behavior severity-high 2nd top severity, seen by most users, causes major problems Windows
Projects
Windows
  
To do
Development

No branches or pull requests

7 participants