Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[BUG] On Windows minion, status.master used by master_alive polling does not work when default language is not en_US #60508

Closed
johnl2323 opened this issue Jul 8, 2021 · 2 comments
Assignees
Labels
Bug broken, incorrect, or confusing behavior severity-medium 3rd level, incorrect or bad functionality, confusing and lacks a work around Windows
Projects
Milestone

Comments

@johnl2323
Copy link

johnl2323 commented Jul 8, 2021

Description
We have our masters setup in multimaster mode with failover. Windows minions have master_alive_interval set to 60 seconds to detect failed masters. The master_alive pooling calls the status.master module to determine if the minion is connected to the master.

The status.master method calls Windows netstat to check for connections to the master with the "ESTABLISHED" state, but when the default language is not en_US, Windows translates "ESTABLISHED" into the local language. In our case, we have multiple Windows minions with de_DE set as the system default language.

The code in question:

if "ESTABLISHED" not in line:

Output of netstat with de_DE default language:

C:\ netstat -n -p TCP
                                                                                              
    Aktive Verbindungen                                                                       
                                                                                              
      Proto  Lokale Adresse         Remoteadresse          Status     
      TCP    10.68.7.151:3389     204.5.18.248:59558   SCHLIESSEN_WARTEN
      TCP    10.68.7.151:54657    10.14.26.98:445        HERGESTELLT
      TCP    10.68.7.151:56234    185.4.22.88:80         SCHLIESSEN_WARTEN
      TCP    10.68.7.151:58648    51.13.5.159:443       HERGESTELLT
      TCP    10.68.7.151:59000    10.14.1.124:50964    HERGESTELLT
      TCP    10.68.7.151:59339    185.4.22.88:80        HERGESTELLT
      TCP    10.68.7.151:59341    185.4.22.88:80        HERGESTELLT
      TCP    10.68.7.151:59379    10.15.5.107:4506     WARTEND    
      TCP    10.68.7.151:59388    10.15.5.107:4506     WARTEND    
      TCP    127.0.0.1:4510         127.0.0.1:54217        HERGESTELLT

The net effect is that the master_alive polling thinks that the minion is not connected to the master and re-connects. We also have configured an orchestration to trigger the highstate on minion start event. Thus these systems will run a highstate every minute 24hrs per day due to status.master not working correctly

Setup
Relevant configs in minion.conf

master:
  - saltha-1.foo.com
  - saltha-2.foo.com
random_master: True
master_type: failover
master_alive_interval: 60

Steps to Reproduce the behavior

  1. Set Windows minion to connect to 2 multi masters with configuration setting above.
  2. Setup orchestration to highstate on minion start event
  3. Verify that highstate is triggered on minion start and only runs once. You could monitor the event bus watching for minion start events and highstate returns.
  4. Change Wndows default language to de_DE and restart salt-minion
  5. Monitor salt event bus or minion.log and see that highstates are being triggered every minute.

Expected behavior
status.master should work correctly no matter what your default language is set to. Instead of status.master shelling out to run netstat, the same information is available from psutils and it's return data is not changed based on locale/language settings.

Versions Report

salt --versions-report

Our minions and masters running 3002.6

   Salt Version:
              Salt: 3002.6
     
    Dependency Versions:
              cffi: 1.12.2
          cherrypy: 17.4.1
          dateutil: 2.8.0
         docker-py: Not Installed
             gitdb: 2.0.5
         gitpython: 2.1.10
            Jinja2: 2.10.1
           libgit2: Not Installed
          M2Crypto: Not Installed
              Mako: 1.0.7
           msgpack: 1.0.0
      msgpack-pure: Not Installed
      mysql-python: Not Installed
         pycparser: 2.19
          pycrypto: Not Installed
      pycryptodome: 3.9.8
            pygit2: Not Installed
            Python: 3.7.4 (tags/v3.7.4:e09359112e, Jul  8 2019, 20:34:20) [MSC v.1916 64 bit (AMD64)]
      python-gnupg: 0.4.4
            PyYAML: 5.3.1
             PyZMQ: 18.0.1
             smmap: 2.0.5
           timelib: 0.2.4
           Tornado: 4.5.3
               ZMQ: 4.3.1
     
    System Versions:
              dist:   
            locale: cp1252
           machine: AMD64
           release: 10
            system: Windows
           version: 10 10.0.19041 SP0

Additional context
Workaround: Instead of highstating every 60 seconds (master_alive_interval) we set the master_alive_interval to 12 hours on systems where the grain locale_info.defaultlanguage is not en_US.

@johnl2323 johnl2323 added Bug broken, incorrect, or confusing behavior needs-triage labels Jul 8, 2021
@welcome
Copy link

welcome bot commented Jul 8, 2021

Hi there! Welcome to the Salt Community! Thank you for making your first contribution. We have a lengthy process for issues and PRs. Someone from the Core Team will follow up as soon as possible. In the meantime, here’s some information that may help as you continue your Salt journey.
Please be sure to review our Code of Conduct. Also, check out some of our community resources including:

There are lots of ways to get involved in our community. Every month, there are around a dozen opportunities to meet with other contributors and the Salt Core team and collaborate in real time. The best way to keep track is by subscribing to the Salt Community Events Calendar.
If you have additional questions, email us at saltproject@vmware.com. We’re glad you’ve joined our community and look forward to doing awesome things with you!

@sagetherage sagetherage added this to To do in Windows via automation Jul 12, 2021
@xeacott
Copy link
Contributor

xeacott commented Jul 13, 2021

Thanks for bringing this up, definitely an oversight here with how it checks. Ill leave this is a medium severity since the workout makes this a little more bearable.

@xeacott xeacott added severity-medium 3rd level, incorrect or bad functionality, confusing and lacks a work around and removed needs-triage labels Jul 13, 2021
@xeacott xeacott added this to the Approved milestone Jul 13, 2021
Windows automation moved this from To do to Done Apr 15, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Bug broken, incorrect, or confusing behavior severity-medium 3rd level, incorrect or bad functionality, confusing and lacks a work around Windows
Projects
Windows
  
Done
Development

No branches or pull requests

4 participants