Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Orchestration runner output on Success vs Failures in 2016.3.0 #33915

Closed
mattglv opened this issue Jun 9, 2016 · 9 comments
Closed

Orchestration runner output on Success vs Failures in 2016.3.0 #33915

mattglv opened this issue Jun 9, 2016 · 9 comments
Assignees
Labels
Bug broken, incorrect, or confusing behavior Core relates to code central or existential to Salt fixed-pls-verify fix is linked, bug author to confirm fix P2 Priority 2 Runners severity-medium 3rd level, incorrect or bad functionality, confusing and lacks a work around ZRELEASED - 2016.3.2
Milestone

Comments

@mattglv
Copy link

mattglv commented Jun 9, 2016

I've run into an issue with the orchestration runner in the new 2016.3.0 version.

We use the orchestration runner heavily and the fixes to this runner are the main reason we're trying to upgrade ASAP; however, this issue is somewhat of a blocker.

The orchestration runner seems to not return the 'highstate' formatted output when everything succeeds but it does return it when there is a failure in one of the states.

Quick example:

install_vim.sls contains:

vim-enhanced:
  pkg.installed

test_orch.sls contains:

test_orchestartion:
  salt.state:
    - tgt: '*'
    - sls: install_vim

This is what I get when I run it:

salt-run state.orchestrate test_orch
[INFO    ] Found minion id from generate_minion_id(): master
[INFO    ] Found minion id from generate_minion_id(): master
[INFO    ] Loading fresh modules for state activity
[INFO    ] Fetching file from saltenv 'base', ** skipped ** latest already in cache 'salt://test_orch.sls'
[INFO    ] Running state [test_orchestartion] at time 15:32:06.558517
[INFO    ] Executing state salt.state for test_orchestartion
[INFO    ] {'ret': {'minion1.releng.net': {'pkg_|-vim-enhanced_|-vim-enhanced_|-installed': {'comment': 'The following packages were installed/updated: vim-enhanced', 'name': 'vim-enhanced', 'start_time': '10:32:09.946535', 'result': True, 'duration': 6298.6019999999999, '__run_num__': 0, 'changes': {'vim-enhanced': {'new': '2:7.4.629-5.el6', 'old': ''}}, '__id__': 'vim-enhanced'}}}, 'out': 'highstate'}
[INFO    ] Completed state [test_orchestartion] at time 15:32:16.329084 duration_in_ms=9770.567
retcode:
    0
[INFO    ] Runner completed: 20160609153206097107

Now if I change the vim package name in install_vim.sls to 'bad_package_name':

salt-run state.orchestrate test_orch
[INFO    ] Found minion id from generate_minion_id(): master
[INFO    ] Found minion id from generate_minion_id(): master
[INFO    ] Loading fresh modules for state activity
[INFO    ] Fetching file from saltenv 'base', ** skipped ** latest already in cache 'salt://test_orch.sls'
[INFO    ] Running state [test_orchestartion] at time 15:25:15.963114
[INFO    ] Executing state salt.state for test_orchestartion
[ERROR   ] Run failed on minions: minion1.releng.net
Failures:
    minion1.releng.net:
    ----------
              ID: bad_package_name
        Function: pkg.installed
          Result: False
         Comment: Error occurred installing package(s). Additional info follows:

                  errors:
                      - Loaded plugins: fastestmirror
                        Setting up Install Process
                        Loading mirror speeds from cached hostfile
                         * base: mirrors.gigenet.com
                         * epel: mirror.nexcess.net
                         * extras: centos.chicago.waneq.com
                         * updates: mirror.tzulo.com
                        No package bad_package_name available.
                        Error: Nothing to do
         Started: 10:25:19.533628
        Duration: 4618.207 ms
         Changes:  

    Summary for minion1.releng.net
    ------------
    Succeeded: 0
    Failed:    1
    ------------
    Total states run:     1

[INFO    ] Completed state [test_orchestartion] at time 15:25:24.288673 duration_in_ms=8325.559
retcode:
    1
[INFO    ] Runner completed: 20160609152515482353

Master:

           Salt: 2016.3.0

Dependency Versions:
           cffi: Not Installed
       cherrypy: Not Installed
       dateutil: Not Installed
          gitdb: Not Installed
      gitpython: Not Installed
          ioflo: Not Installed
         Jinja2: 2.7.3
        libgit2: Not Installed
        libnacl: Not Installed
       M2Crypto: 0.20.2
           Mako: Not Installed
   msgpack-pure: Not Installed
 msgpack-python: 0.4.6
   mysql-python: Not Installed
      pycparser: Not Installed
       pycrypto: 2.6.1
         pygit2: Not Installed
         Python: 2.6.6 (r266:84292, Jul 23 2015, 15:22:56)
   python-gnupg: Not Installed
         PyYAML: 3.11
          PyZMQ: 14.5.0
           RAET: Not Installed
          smmap: Not Installed
        timelib: Not Installed
        Tornado: 4.2.1
            ZMQ: 4.0.5

System Versions:
           dist: centos 6.7 Final
        machine: x86_64
        release: 2.6.32-573.el6.x86_64
         system: Linux
        version: CentOS 6.7 Final

Minion:

    Salt Version:
               Salt: 2016.3.0

    Dependency Versions:
               cffi: Not Installed
           cherrypy: Not Installed
           dateutil: Not Installed
              gitdb: Not Installed
          gitpython: Not Installed
              ioflo: Not Installed
             Jinja2: 2.7.3
            libgit2: Not Installed
            libnacl: Not Installed
           M2Crypto: 0.20.2
               Mako: Not Installed
       msgpack-pure: Not Installed
     msgpack-python: 0.4.6
       mysql-python: Not Installed
          pycparser: Not Installed
           pycrypto: 2.6.1
             pygit2: Not Installed
             Python: 2.6.6 (r266:84292, Jul 23 2015, 15:22:56)
       python-gnupg: Not Installed
             PyYAML: 3.11
              PyZMQ: 14.5.0
               RAET: Not Installed
              smmap: Not Installed
            timelib: Not Installed
            Tornado: 4.2.1
                ZMQ: 4.0.5

    System Versions:
               dist: centos 6.8 Final
            machine: x86_64
            release: 2.6.32-573.el6.x86_64
             system: Linux
            version: CentOS 6.8 Final
@Ch3LL
Copy link
Contributor

Ch3LL commented Jun 11, 2016

@mattglv I am able to replicate this behavior:

➜  issues sudo salt-run state.orchestrate orch.33915 -lwarning
retcode:
    0

We definitely need to get this fixed up. Thanks for reporting it.

@Ch3LL Ch3LL added P2 Priority 2 Runners severity-medium 3rd level, incorrect or bad functionality, confusing and lacks a work around labels Jun 11, 2016
@Ch3LL Ch3LL added this to the Approved milestone Jun 11, 2016
@Ch3LL Ch3LL added Bug broken, incorrect, or confusing behavior Core relates to code central or existential to Salt labels Jun 11, 2016
@Ch3LL
Copy link
Contributor

Ch3LL commented Jun 11, 2016

Also did some more testing it looks like it is just the highstate outputter. Looks like it works with yaml and json: sudo salt-run state.orchestrate orch.33915 -lwarning --output=json

@mattglv
Copy link
Author

mattglv commented Jun 11, 2016

Megan, thanks for looking into this! hope we can get this fixed soon.

@meggiebot meggiebot modified the milestones: C 7, Approved Jun 14, 2016
@meggiebot
Copy link

Need this fixed in 2016.3.2

@Ch3LL
Copy link
Contributor

Ch3LL commented Jul 1, 2016

hey @mattglv in the process of trying to re-replicate this issue to try to fix it it was found that my box was the only one that could replicate it. On a clean install of salt no one could replicate it. After a lot of troubleshooting I found that the only thing that ended up fixing it was wiping away /etc/salt/. (I don't recommend wiping this entire directory, I was merely trying to troubleshoot and was able to safely do this) Unfortunately we are not sure what in the directory was causing this issue, and I can no longer replicate the issue.

So I was wondering if you could provide us with some more information to try to help replicate this and find out what the culprit is. I did find out what the issue was but not why it was happening. It was here that I realized retcode: 0 was being appending into the beginning of the data ret dict. for example here was a bad ret dict where I saw the issue:

{'outputter': 'highstate', 'data': {'retcode': 0, 'ch3ll-master_master': {'salt_|-test_orchestartion_|-test_orchestartion_|-state': {'comment': 'States ran successfully. No changes made to ch3ll-master.', 'name': 'test_orchestartion', 'start_time': '16:58:03.734900', 'result': True, 'duration': 3302.591, '__run_num__': 0, 'changes': {}, '__id__': 'test_orchestartion'}}}}

and here was a good one where it worked:

{'outputter': 'highstate', 'data': {'testbox': {'pkg_|-vim-enhanced_|-vim-enhanced_|-installed': {'comment': 'Package vim-enhanced is already installed', 'name': 'vim-enhanced', 'start_time': '22:58:48.442570', 'result': True, 'duration': 659.39800000000002, '__run_num__': 0, 'changes': {}, '__id__': 'vim-enhanced'}}, 'retcode': 0}}

As you can see retcode is in different places which causes the highstate outputter to assign retcode as the minion id, so now we have to figure out what is causing this.

Can you provide us with a sanitized version of your master and local minion on the master configurations? Also just take a look at your /etc/salt directory and see if you notice anything unusual in there. For example maybe some keys were they do not belong, etc.

@mattglv
Copy link
Author

mattglv commented Jul 1, 2016

Megan,
I've put together a minimal vagrant setup (master/minion VM) to reproduce this, hopefully you're able to use it. Attaching the .zip file.

Assuming vagrant and virtualbox are installed, if not: apt-get install -y virtualbox vagrant

unzip vagrant_33915.zip
cd ../vagrant_33915
vagrant up
ssh root@190.160.100.101 (password: vagrant)

salt * test.ping
salt-run state.orchestrate test_orch

Let me know if I can help in any other way, really looking forward to this fix.

Thanks,
Matt

vagrant_33915.zip

@terminalmage
Copy link
Contributor

terminalmage commented Jul 5, 2016

Thanks for the vagrant box, I was able to use a python debugger inside of it to get to the bottom of this.

The issue was that in 2016.3.0 we added the ability for runners to specify a return code which salt-run reads and uses to exit with a nonzero exit status when there are errors. This retcode value is added to the return dictionary. However, due to this block of code we simply return the formatted result of the first key/value pair iterated in the data dictionary. So, depending on the minion ID the order of dictionary iteration would either process the retcode value (as in your vagrant box) or the actual return data from the orchestration (as in my attempts to reproduce).

What I think happened with @Ch3LL's box is that the minion ID was named differently when she could reproduce, from when she couldn't. This could have happened because she had an /etc/salt/minion_id file (which Salt uses to determine the minion ID to prevent the minion ID from flapping if the hostname changes), and clearing /etc/salt caused salt to determine the minion ID on the next time the salt-minion service was started, changing the minion ID and affecting the iteration order when the highstate outputter was invoked.

#34459 fixes this by ignoring the retcode key as its value is not itself a dictionary.

@terminalmage terminalmage added the fixed-pls-verify fix is linked, bug author to confirm fix label Jul 5, 2016
@mattglv
Copy link
Author

mattglv commented Jul 5, 2016

@terminalmage Thanks for the info and the fix. I've applied your patch on my local dev environment and it seems to work as expected!

@terminalmage
Copy link
Contributor

No problem, thanks for confirming!

cachedout pushed a commit to cachedout/salt that referenced this issue Sep 8, 2016
This regressed here: 82a70e0

The above incorrect assumed that all data flowing into the highstate outputter is expected to be a dictionary. However, looking at _format_host() shows that it does expect to receive multiple types and does handle them accordingly.

I believe that the issue reported in saltstack#33915 may be fixed by saltstack#35559 instead. This should allow the highstate outputter to display all results instead of stopping after the first one. I would like @terminalmage and @mattglv and possibly @Ch3LL to please confirm this before we get this in.

Resolves saltstack#35423
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Bug broken, incorrect, or confusing behavior Core relates to code central or existential to Salt fixed-pls-verify fix is linked, bug author to confirm fix P2 Priority 2 Runners severity-medium 3rd level, incorrect or bad functionality, confusing and lacks a work around ZRELEASED - 2016.3.2
Projects
None yet
Development

No branches or pull requests

4 participants