Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Remote state execution is much slower on 2016.11.1 compared to 2016.3.4 #38758

Closed
bobrik opened this issue Jan 17, 2017 · 16 comments
Closed

Remote state execution is much slower on 2016.11.1 compared to 2016.3.4 #38758

bobrik opened this issue Jan 17, 2017 · 16 comments
Assignees
Labels
Bug broken, incorrect, or confusing behavior Confirmed Salt engineer has confirmed bug/feature - often including a MCVE Core relates to code central or existential to Salt fixed-pls-verify fix is linked, bug author to confirm fix P2 Priority 2 severity-critical top severity, seen by most users, serious issues severity-high 2nd top severity, seen by most users, causes major problems State-Module ZRELEASED - 2016.11.4
Milestone

Comments

@bobrik
Copy link
Contributor

bobrik commented Jan 17, 2017

Description of Issue/Question

Master:

Salt Version:
           Salt: 2016.11.1

Dependency Versions:
           cffi: Not Installed
       cherrypy: Not Installed
       dateutil: 2.2
          gitdb: Not Installed
      gitpython: Not Installed
          ioflo: Not Installed
         Jinja2: 2.8
        libgit2: Not Installed
        libnacl: Not Installed
       M2Crypto: Not Installed
           Mako: Not Installed
   msgpack-pure: Not Installed
 msgpack-python: 0.4.2
   mysql-python: Not Installed
      pycparser: Not Installed
       pycrypto: 2.6.1
         pygit2: Not Installed
         Python: 2.7.9 (default, Jun 29 2016, 13:08:31)
   python-gnupg: 0.3.6
         PyYAML: 3.11
          PyZMQ: 14.4.0
           RAET: Not Installed
          smmap: Not Installed
        timelib: Not Installed
        Tornado: 4.2.1
            ZMQ: 4.0.5

System Versions:
           dist: debian 8.6
        machine: x86_64
        release: 4.4.36-cloudflare
         system: Linux
        version: debian 8.6

Minion on 2016.3.4:

Salt Version:
           Salt: 2016.3.4

Dependency Versions:
           cffi: 0.8.6
       cherrypy: Not Installed
       dateutil: 2.2
          gitdb: Not Installed
      gitpython: Not Installed
          ioflo: Not Installed
         Jinja2: 2.8
        libgit2: Not Installed
        libnacl: Not Installed
       M2Crypto: Not Installed
           Mako: Not Installed
   msgpack-pure: Not Installed
 msgpack-python: 0.4.2
   mysql-python: Not Installed
      pycparser: 2.10
       pycrypto: 2.6.1
         pygit2: Not Installed
         Python: 2.7.9 (default, Jun 29 2016, 13:08:31)
   python-gnupg: Not Installed
         PyYAML: 3.11
          PyZMQ: 14.4.0
           RAET: Not Installed
          smmap: Not Installed
        timelib: Not Installed
        Tornado: 4.2.1
            ZMQ: 4.0.5

System Versions:
           dist: debian 8.6
        machine: x86_64
        release: 4.4.36-cloudflare
         system: Linux
        version: debian 8.6

Minion on 2016.11.1:

Salt Version:
           Salt: 2016.11.1

Dependency Versions:
           cffi: 0.8.6
       cherrypy: Not Installed
       dateutil: 2.2
          gitdb: Not Installed
      gitpython: Not Installed
          ioflo: Not Installed
         Jinja2: 2.8
        libgit2: Not Installed
        libnacl: Not Installed
       M2Crypto: Not Installed
           Mako: Not Installed
   msgpack-pure: Not Installed
 msgpack-python: 0.4.2
   mysql-python: Not Installed
      pycparser: 2.10
       pycrypto: 2.6.1
         pygit2: Not Installed
         Python: 2.7.9 (default, Jun 29 2016, 13:08:31)
   python-gnupg: Not Installed
         PyYAML: 3.11
          PyZMQ: 14.4.0
           RAET: Not Installed
          smmap: Not Installed
        timelib: Not Installed
        Tornado: 4.2.1
            ZMQ: 4.0.5

System Versions:
           dist: debian 8.6
        machine: x86_64
        release: 4.4.36-cloudflare
         system: Linux
        version: debian 8.6

Setup

Both machines are identical apart from salt minion versions.

Consider the following state:

/tmp/test.txt:
  file.managed:
    - source: salt://test/templates/test.txt

Both machines have comparable timings with salt-call:

ivan@salt-master:~$ sudo salt-master-pdx salt --output highstate -L salt-2016-11-1,salt-2016-3-4 cmd.run 'salt-call state.apply test'
salt-2016-11-1:
    local:
    ----------
              ID: /tmp/test.txt
        Function: file.managed
          Result: True
         Comment: File /tmp/test.txt is in the correct state
         Started: 00:31:58.971728
        Duration: 13.96 ms
         Changes:

    Summary for local
    ------------
    Succeeded: 1
    Failed:    0
    ------------
    Total states run:     1
    Total run time:  13.960 ms
salt-2016-3-4:
    local:
    ----------
              ID: /tmp/test.txt
        Function: file.managed
          Result: True
         Comment: File /tmp/test.txt is in the correct state
         Started: 00:31:59.359181
        Duration: 37.646 ms
         Changes:

    Summary for local
    ------------
    Succeeded: 1
    Failed:    0
    ------------
    Total states run:     1
    Total run time:  37.646 ms

Timings are dramatically different with salt call from master:

ivan@salt-master:~$ sudo salt-master-pdx salt --output highstate -L salt-2016-11-1,salt-2016-3-4 state.apply test
salt-2016-3-4:

Summary for salt-2016-3-4
------------
Succeeded: 1
Failed:    0
------------
Total states run:     1
Total run time:  35.249 ms
salt-2016-11-1:

Summary for salt-2016-11-1
------------
Succeeded: 1
Failed:    0
------------
Total states run:     1
Total run time:   1.568 s

But they get the same without template:

/tmp/test.txt:
  file.managed:
    - contents: so sad
ivan@salt-master:~$ sudo salt-master-pdx salt --output highstate -L salt-2016-11-1,salt-2016-3-4 state.apply test
salt-2016-3-4:

Summary for salt-2016-3-4
------------
Succeeded: 1
Failed:    0
------------
Total states run:     1
Total run time:   8.456 ms
salt-2016-11-1:

Summary for salt-2016-11-1
------------
Succeeded: 1
Failed:    0
------------
Total states run:     1
Total run time:   6.039 ms

This gets very bad very quickly: 71s vs 750s for local vs remote mode on 2016.11.1.

Debug logs from minions:

  • 2016.3.4:
[INFO    ] Starting a new job with PID 3226
[DEBUG   ] LazyLoaded saltutil.find_job
[DEBUG   ] LazyLoaded direct_call.get
[DEBUG   ] Minion return retry timer set to 10 seconds (randomized)
[INFO    ] Returning information for job: 20170117003546157530
[DEBUG   ] Initializing new AsyncZeroMQReqChannel for ('/etc/salt/pki/minion', 'salt-2016-3-4', 'tcp://10.36.11.18:4506', 'aes')
[DEBUG   ] Initializing new AsyncAuth for ('/etc/salt/pki/minion', 'salt-2016-3-4', 'tcp://10.36.11.18:4506')
[DEBUG   ] Loaded minion key: /etc/salt/pki/minion/minion.pem
[INFO    ] Loading fresh modules for state activity
[DEBUG   ] LazyLoaded jinja.render
[DEBUG   ] LazyLoaded yaml.render
[DEBUG   ] Could not find file from saltenv 'base', 'salt://test.sls'
[DEBUG   ] In saltenv 'base', looking at rel_path 'test/init.sls' to resolve 'salt://test/init.sls'
[DEBUG   ] In saltenv 'base', ** considering ** path '/var/cache/salt/minion/files/base/test/init.sls' to resolve 'salt://test/init.sls'
[DEBUG   ] Fetching file from saltenv 'base', ** attempting ** 'salt://test/init.sls'
[DEBUG   ] No dest file found
[INFO    ] Fetching file from saltenv 'base', ** done ** 'test/init.sls'
[DEBUG   ] compile template: /var/cache/salt/minion/files/base/test/init.sls
[DEBUG   ] Jinja search path: ['/var/cache/salt/minion/files/base']
[PROFILE ] Time (in seconds) to render '/var/cache/salt/minion/files/base/test/init.sls' using 'jinja' renderer: 0.00490188598633
[DEBUG   ] Rendered data from file: /var/cache/salt/minion/files/base/test/init.sls:
/tmp/test.txt:
  file.managed:
#    - contents: so sad
    - source: salt://test/templates/test.txt

[DEBUG   ] LazyLoaded config.get
[DEBUG   ] Results of YAML rendering:
OrderedDict([('/tmp/test.txt', OrderedDict([('file.managed', [OrderedDict([('source', 'salt://test/templates/test.txt')])])]))])
[PROFILE ] Time (in seconds) to render '/var/cache/salt/minion/files/base/test/init.sls' using 'yaml' renderer: 0.0020899772644
[DEBUG   ] LazyLoaded file.managed
[INFO    ] Running state [/tmp/test.txt] at time 00:35:53.662503
[INFO    ] Executing state file.managed for /tmp/test.txt
[DEBUG   ] LazyLoaded file.source_list
[DEBUG   ] LazyLoaded cp.hash_file
[DEBUG   ] Initializing new AsyncZeroMQReqChannel for ('/etc/salt/pki/minion', 'salt-2016-3-4', 'tcp://10.36.11.18:4506', 'aes')
[DEBUG   ] Initializing new AsyncAuth for ('/etc/salt/pki/minion', 'salt-2016-3-4', 'tcp://10.36.11.18:4506')
[DEBUG   ] In saltenv 'base', looking at rel_path 'test/templates/test.txt' to resolve 'salt://test/templates/test.txt'
[DEBUG   ] In saltenv 'base', ** considering ** path '/var/cache/salt/minion/files/base/test/templates/test.txt' to resolve 'salt://test/templates/test.txt'
[DEBUG   ] Fetching file from saltenv 'base', ** attempting ** 'salt://test/templates/test.txt'
[DEBUG   ] No dest file found
[INFO    ] Fetching file from saltenv 'base', ** done ** 'test/templates/test.txt'
[INFO    ] File /tmp/test.txt is in the correct state
[INFO    ] Completed state [/tmp/test.txt] at time 00:35:53.706404 duration_in_ms=43.901
[DEBUG   ] File /var/cache/salt/minion/accumulator/140439475305104 does not exist, no need to cleanup.
[DEBUG   ] Minion return retry timer set to 9 seconds (randomized)
[INFO    ] Returning information for job: 20170117003531051037
  • 2016.11.1:
[INFO    ] Starting a new job with PID 20435
[DEBUG   ] LazyLoaded saltutil.find_job
[DEBUG   ] LazyLoaded direct_call.get
[DEBUG   ] Minion return retry timer set to 9 seconds (randomized)
[INFO    ] Returning information for job: 20170117003556166921
[DEBUG   ] Initializing new AsyncZeroMQReqChannel for ('/etc/salt/pki/minion', 'salt-2016-11-1', 'tcp://10.36.11.18:4506', 'aes')
[DEBUG   ] Initializing new AsyncAuth for ('/etc/salt/pki/minion', 'salt-2016-11-1', 'tcp://10.36.11.18:4506')
[DEBUG   ] In saltenv 'base', looking at rel_path 'test/init.sls' to resolve 'salt://test/init.sls'
[DEBUG   ] In saltenv 'base', ** considering ** path '/var/cache/salt/minion/files/base/test/init.sls' to resolve 'salt://test/init.sls'
[DEBUG   ] Fetching file from saltenv 'base', ** attempting ** 'salt://test/init.sls'
[DEBUG   ] No dest file found
[INFO    ] Fetching file from saltenv 'base', ** done ** 'test/init.sls'
[DEBUG   ] compile template: /var/cache/salt/minion/files/base/test/init.sls
[DEBUG   ] Jinja search path: ['/var/cache/salt/minion/files/base']
[PROFILE ] Time (in seconds) to render '/var/cache/salt/minion/files/base/test/init.sls' using 'jinja' renderer: 0.00338506698608
[DEBUG   ] Rendered data from file: /var/cache/salt/minion/files/base/test/init.sls:
/tmp/test.txt:
  file.managed:
#    - contents: so sad
    - source: salt://test/templates/test.txt

[DEBUG   ] LazyLoaded config.get
[DEBUG   ] Results of YAML rendering:
OrderedDict([('/tmp/test.txt', OrderedDict([('file.managed', [OrderedDict([('source', 'salt://test/templates/test.txt')])])]))])
[PROFILE ] Time (in seconds) to render '/var/cache/salt/minion/files/base/test/init.sls' using 'yaml' renderer: 0.00159597396851
[DEBUG   ] LazyLoaded file.managed
[INFO    ] Running state [/tmp/test.txt] at time 00:35:58.212000
[INFO    ] Executing state file.managed for /tmp/test.txt
[DEBUG   ] LazyLoaded file.source_list
[DEBUG   ] LazyLoaded cp.hash_file
[DEBUG   ] Initializing new AsyncZeroMQReqChannel for ('/etc/salt/pki/minion', 'salt-2016-11-1', 'tcp://10.36.11.18:4506', 'aes')
[DEBUG   ] Initializing new AsyncAuth for ('/etc/salt/pki/minion', 'salt-2016-11-1', 'tcp://10.36.11.18:4506')
[DEBUG   ] In saltenv 'base', looking at rel_path 'test/templates/test.txt' to resolve 'salt://test/templates/test.txt'
[DEBUG   ] In saltenv 'base', ** considering ** path '/var/cache/salt/minion/files/base/test/templates/test.txt' to resolve 'salt://test/templates/test.txt'
[INFO    ] Fetching file from saltenv 'base', ** skipped ** latest already in cache 'salt://test/templates/test.txt', mode up-to-date
[INFO    ] File /tmp/test.txt is in the correct state
[INFO    ] Completed state [/tmp/test.txt] at time 00:35:59.820315 duration_in_ms=1608.315
[DEBUG   ] File /var/cache/salt/minion/accumulator/140688738910288 does not exist, no need to cleanup.
[DEBUG   ] Minion return retry timer set to 10 seconds (randomized)
[INFO    ] Returning information for job: 20170117003531051037
@Ch3LL
Copy link
Contributor

Ch3LL commented Jan 18, 2017

@bobrik would you mind giving #38167 a try and see if there are any improvements?

@Ch3LL Ch3LL added the info-needed waiting for more info label Jan 18, 2017
@Ch3LL Ch3LL added this to the Blocked milestone Jan 18, 2017
@bobrik
Copy link
Contributor Author

bobrik commented Jan 18, 2017

@Ch3LL do I need that on master, on minion or on both?

@bobrik
Copy link
Contributor Author

bobrik commented Jan 19, 2017

Adding that on minion did not help.

@bobrik
Copy link
Contributor Author

bobrik commented Jan 19, 2017

I did some profiling and here's what's happening on CPU-heavy state.apply path:

Thu Jan 19 07:00:02 2017    /tmp/profile.out

         187068524 function calls (165547465 primitive calls) in 68.070 seconds

   Ordered by: internal time

   ncalls  tottime  percall  cumtime  percall filename:lineno(function)
20146000/10098   18.206    0.000   39.307    0.004 /usr/lib/python2.7/copy.py:145(deepcopy)
      898   11.086    0.012   11.086    0.012 {zmq.backend.cython._poll.zmq_poll}
 19239393    6.557    0.000    8.542    0.000 /usr/lib/python2.7/copy.py:267(_keep_alive)
       41    3.745    0.091    3.745    0.091 {built-in method poll}
1080896/10098    3.147    0.000   39.275    0.004 /usr/lib/python2.7/copy.py:253(_deepcopy_dict)
695008/670667    3.076    0.000   17.583    0.000 /usr/lib/python2.7/copy.py:226(_deepcopy_list)
 39465726    2.938    0.000    2.938    0.000 {method 'get' of 'dict' objects}
 41181240    1.943    0.000    1.943    0.000 {id}
      140    1.891    0.014    1.891    0.014 {posix.read}
 27940135    1.653    0.000    1.653    0.000 {method 'append' of 'list' objects}
 17456430    1.155    0.000    1.155    0.000 /usr/lib/python2.7/copy.py:198(_deepcopy_atomic)
   438282    0.792    0.000    0.826    0.000 /usr/lib/python2.7/dist-packages/yaml/reader.py:99(forward)
  1137728    0.709    0.000    1.313    0.000 /usr/lib/python2.7/dist-packages/yaml/scanner.py:142(need_more_tokens)
  1087676    0.563    0.000    0.563    0.000 {method 'iteritems' of 'dict' objects}
   687874    0.541    0.000    5.992    0.000 /usr/lib/python2.7/dist-packages/yaml/scanner.py:113(check_token)
    81024    0.482    0.000    1.262    0.000 /usr/lib/python2.7/dist-packages/yaml/scanner.py:1187(scan_flow_scalar_non_spaces)
    87242    0.449    0.000    1.756    0.000 /usr/lib/python2.7/dist-packages/yaml/parser.py:273(parse_node)
  1213513    0.416    0.000    0.506    0.000 /usr/lib/python2.7/dist-packages/yaml/scanner.py:276(stale_possible_simple_keys)
   422720    0.403    0.000    0.655    0.000 /usr/lib/python2.7/dist-packages/yaml/reader.py:114(get_mark)
  2528245    0.386    0.000    0.386    0.000 /usr/lib/python2.7/dist-packages/yaml/reader.py:87(peek)
   171787    0.346    0.000    4.315    0.000 /usr/lib/python2.7/dist-packages/yaml/scanner.py:153(fetch_more_tokens)
    81281    0.328    0.000    0.328    0.000 /usr/lib/python2.7/dist-packages/yaml/events.py:65(__init__)
    79136    0.289    0.000    2.098    0.000 /usr/lib/python2.7/dist-packages/yaml/scanner.py:1137(scan_flow_scalar)
1845687/1845686    0.277    0.000    0.300    0.000 {isinstance}
     3234    0.275    0.000   39.211    0.012 /usr/lib/python2.7/dist-packages/salt/utils/context.py:130(__init__)
   171787    0.255    0.000    0.520    0.000 /usr/lib/python2.7/dist-packages/yaml/scanner.py:749(scan_to_next_token)
        1    0.253    0.253    0.276    0.276 /usr/lib/python2.7/dist-packages/apt/cache.py:143(open)
   422720    0.252    0.000    0.252    0.000 /usr/lib/python2.7/dist-packages/yaml/error.py:6(__init__)
Thu Jan 19 07:00:02 2017    /tmp/profile.out

         187068524 function calls (165547465 primitive calls) in 68.070 seconds

   Ordered by: cumulative time

   ncalls  tottime  percall  cumtime  percall filename:lineno(function)
        1    0.000    0.000   68.101   68.101 <string>:1(<module>)
        1    0.000    0.000   68.101   68.101 /usr/lib/python2.7/dist-packages/salt/executors/direct_call.py:27(execute)
        1    0.000    0.000   68.101   68.101 /usr/lib/python2.7/dist-packages/salt/modules/state.py:415(apply_)
        1    0.000    0.000   68.101   68.101 /usr/lib/python2.7/dist-packages/salt/modules/state.py:834(sls)
      180    0.002    0.000   50.681    0.282 /usr/lib/python2.7/dist-packages/salt/utils/async.py:70(wrap)
      180    0.001    0.000   50.635    0.281 /usr/lib/python2.7/dist-packages/salt/utils/async.py:82(_block_future)
      180    0.000    0.000   50.633    0.281 /usr/lib/python2.7/dist-packages/zmq/eventloop/ioloop.py:149(start)
      180    0.013    0.000   50.633    0.281 /usr/local/lib/python2.7/dist-packages/tornado/ioloop.py:729(start)
        1    0.000    0.000   45.189   45.189 /usr/lib/python2.7/dist-packages/salt/state.py:2246(call_high)
        2    0.002    0.001   44.825   22.413 /usr/lib/python2.7/dist-packages/salt/state.py:1799(call_chunks)
      175    0.004    0.000   44.821    0.256 /usr/lib/python2.7/dist-packages/salt/state.py:1987(call_chunk)
      175    0.011    0.000   44.080    0.252 /usr/lib/python2.7/dist-packages/salt/state.py:1645(call)
      175    0.003    0.000   43.993    0.251 /usr/lib/python2.7/dist-packages/salt/loader.py:1699(wrapper)
     1078    0.006    0.000   39.471    0.037 /usr/local/lib/python2.7/dist-packages/tornado/stack_context.py:281(wrapped)
20146000/10098   18.206    0.000   39.307    0.004 /usr/lib/python2.7/copy.py:145(deepcopy)
1080896/10098    3.147    0.000   39.275    0.004 /usr/lib/python2.7/copy.py:253(_deepcopy_dict)
     1078    0.002    0.000   39.243    0.036 /usr/local/lib/python2.7/dist-packages/tornado/stack_context.py:117(enter)
     1078    0.006    0.000   39.225    0.036 /usr/lib/python2.7/dist-packages/salt/minion.py:1303(ctx)
     3234    0.006    0.000   39.217    0.012 /usr/lib/python2.7/dist-packages/salt/utils/context.py:88(clone)
     3234    0.275    0.000   39.211    0.012 /usr/lib/python2.7/dist-packages/salt/utils/context.py:130(__init__)
       34    0.001    0.000   37.538    1.104 /usr/lib/python2.7/dist-packages/salt/states/file.py:1115(managed)
       34    0.002    0.000   37.530    1.104 /usr/lib/python2.7/dist-packages/salt/modules/file.py:4235(check_managed_changes)
     1258    0.002    0.000   32.697    0.026 /usr/local/lib/python2.7/dist-packages/tornado/ioloop.py:586(_run_callback)
       34    0.001    0.000   22.293    0.656 /usr/lib/python2.7/dist-packages/salt/modules/file.py:3619(get_managed)
695008/670667    3.076    0.000   17.583    0.000 /usr/lib/python2.7/copy.py:226(_deepcopy_list)
       38    0.000    0.000   16.893    0.445 /usr/lib/python2.7/dist-packages/salt/fileclient.py:173(cache_file)
       38    0.000    0.000   16.893    0.445 /usr/lib/python2.7/dist-packages/salt/fileclient.py:459(get_url)
       38    0.001    0.000   16.892    0.445 /usr/lib/python2.7/dist-packages/salt/fileclient.py:978(get_file)
       74    0.001    0.000   16.877    0.228 /usr/lib/python2.7/dist-packages/salt/fileclient.py:1289(hash_and_stat_file)
      108    0.000    0.000   15.984    0.148 /usr/lib/python2.7/dist-packages/salt/fileclient.py:1281(hash_file)
      108    0.001    0.000   15.983    0.148 /usr/lib/python2.7/dist-packages/salt/fileclient.py:1259(__hash_and_stat_file)
       34    0.021    0.001   15.226    0.448 /usr/lib/python2.7/dist-packages/salt/modules/file.py:3477(source_list)
       33    0.000    0.000   14.699    0.445 /usr/lib/python2.7/dist-packages/salt/modules/cp.py:353(cache_file)
        1    0.000    0.000   11.568   11.568 /usr/lib/python2.7/dist-packages/salt/state.py:3264(render_highstate)
      3/1    0.000    0.000   11.567   11.567 /usr/lib/python2.7/dist-packages/salt/state.py:2942(render_state)
        1    0.000    0.000   11.306   11.306 /usr/lib/python2.7/dist-packages/salt/state.py:3509(__init__)
      898    0.002    0.000   11.091    0.012 /usr/lib/python2.7/dist-packages/zmq/eventloop/ioloop.py:117(poll)
      898    0.002    0.000   11.089    0.012 /usr/lib/python2.7/dist-packages/zmq/sugar/poll.py:77(poll)
      898   11.086    0.012   11.086    0.012 {zmq.backend.cython._poll.zmq_poll}

I can give you the whole profile for inspection if you want.

@bobrik
Copy link
Contributor Author

bobrik commented Jan 19, 2017

I suspect it was #37378. cc @skizunov

     3234    0.006    0.000   39.217    0.012 /usr/lib/python2.7/dist-packages/salt/utils/context.py:88(clone)
     3234    0.275    0.000   39.211    0.012 /usr/lib/python2.7/dist-packages/salt/utils/context.py:130(__init__)

@Ch3LL
Copy link
Contributor

Ch3LL commented Jan 24, 2017

@bobrik thanks for the profiling information. Before we further troubleshoot this can you try making this change on the master side and seeing if there are performance improvements. That PR was meant to tackle an issue between the the time it takes between hitting "enter" and the master getting a publish command.

@Ch3LL
Copy link
Contributor

Ch3LL commented Jan 24, 2017

I'm going to setup a test scenario to see if i can't replicate this as well. I think I might possibly be mistaken with the fix for this issue.

@Ch3LL
Copy link
Contributor

Ch3LL commented Jan 24, 2017

@bobrik okay it looks like I can replicate this and the PR does not fix the issue. I realized when re-reading your issue a litte more careful that this is something on the minion side, because you were querying both minions at the same time. Apologies on that, but I can indeed replicate this and we will need to get this fixed up.

[root@ip-10-167-165-222 ~]# salt '*' state.apply test 
ch3ll_2016.3.4:
----------
          ID: /tmp/test.txt
    Function: file.managed
      Result: True
     Comment: File /tmp/test.txt is in the correct state
     Started: 21:23:27.901700
    Duration: 42.375 ms
     Changes:   

Summary for ch3ll_2016.3.4
------------
Succeeded: 1
Failed:    0
------------
Total states run:     1
Total run time:  42.375 ms
ch3ll_2016.11.1:
----------
          ID: /tmp/test.txt
    Function: file.managed
      Result: True
     Comment: File /tmp/test.txt is in the correct state
     Started: 21:23:28.072819
    Duration: 92.916 ms
     Changes:   

Summary for ch3ll_2016.11.1
------------
Succeeded: 1
Failed:    0
------------
Total states run:     1
Total run time:  92.916 ms

And when removing the source and adding contents to that file state i see about hte same times just as you stated:

[root@ip-10-167-165-222 ~]# salt '*' state.apply test 
ch3ll_2016.3.4:
----------
          ID: /tmp/test.txt
    Function: file.managed
      Result: True
     Comment: File /tmp/test.txt exists with proper permissions. No changes made.
     Started: 21:25:03.664477
    Duration: 15.93 ms
     Changes:   

Summary for ch3ll_2016.3.4
------------
Succeeded: 1
Failed:    0
------------
Total states run:     1
Total run time:  15.930 ms
ch3ll_2016.11.1:
----------
          ID: /tmp/test.txt
    Function: file.managed
      Result: True
     Comment: File /tmp/test.txt exists with proper permissions. No changes made.
     Started: 21:25:03.818538
    Duration: 15.705 ms
     Changes:   

Summary for ch3ll_2016.11.1
------------
Succeeded: 1
Failed:    0
------------
Total states run:     1
Total run time:  15.705 ms

So it seems there is indeed an issue with the file state when looking for source.

ping @terminalmage any idea on this one? I believe I saw some issues a while back about slow performance with file.maanged on 2016.3 but I cannot find those for the life of me in the issue backlog. But it looks like there was even more of a performance hit in 2016.11.1

@Ch3LL Ch3LL added Bug broken, incorrect, or confusing behavior Core relates to code central or existential to Salt severity-high 2nd top severity, seen by most users, causes major problems State-Module P2 Priority 2 Confirmed Salt engineer has confirmed bug/feature - often including a MCVE ZRELEASED - 2016.11.3 severity-critical top severity, seen by most users, serious issues and removed info-needed waiting for more info labels Jan 24, 2017
@Ch3LL
Copy link
Contributor

Ch3LL commented Jan 24, 2017

Nevermind @terminalmage ...@cachedout is gonna take a look

@bobrik
Copy link
Contributor Author

bobrik commented Jan 25, 2017

I think the more pillars (?) you have, the more #37378 will hit you. We have around 500kb of fairly complex structures per minion in this test case.

@meggiebot meggiebot modified the milestones: Nitrogen 3, Blocked Jan 25, 2017
@danielmotaleite
Copy link
Contributor

I just updated the salt-master and one salt-minion and the highstate to one machine jumped from ~5s to ~140s, yet salt-call on the remote host is still about 5s

Doing a salt host pillar.items i have 1.2MB pillar

Reverting salt-minion to 2016.3.4 i get the old performance back (still with salt-master in 2016.11.1)

@bobrik
Copy link
Contributor Author

bobrik commented Jan 31, 2017

In SaltStack, speed isn’t a byproduct, it is a design goal. SaltStack was created as an extremely fast, lightweight communication bus to provide the foundation for a remote execution engine.

Sad to see the there's no fix in 2016.11.2.

@ninja-
Copy link
Contributor

ninja- commented Jan 31, 2017

Same issue here :(
I'll try to confirm the issue is because of context PR and revert that.
Also, how about disabling context copy if multiprocessing is set to true which is now the default?

EDIT: context is not the problem. But I still see file.maaged being slow with remote file client instead of local. 40-50ms is clearly too much when master is on localhost

@ClaudiuLSH
Copy link
Contributor

2016.11.2:

Total states run:     254
Total run time:   344.886 s

2016.11.2 with the following changes (which I am sure will mess up some salt functionality):

vim /usr/lib/python2.7/dist-packages/salt/utils/context.py +142
                #self._data[k] = copy.deepcopy(v)
                self._data[k] = v

Same states applied:

Total states run:     254
Total run time:    73.726 s

Downgraded to 2016.3.5:

Total states run:     254
Total run time:    56.004 s

All tests were done on the same master/minion.

@cachedout
Copy link
Contributor

cachedout commented Feb 19, 2017

Could any of you please measure any improvements made by applying #39505? Thanks.

@cachedout cachedout added the fixed-pls-verify fix is linked, bug author to confirm fix label Feb 19, 2017
@bobrik
Copy link
Contributor Author

bobrik commented Feb 19, 2017

@cachedout, I did a test run for a state with many templates and yuge complex pillar.

  • 2016.11.1 without the patch applied (I almost fell asleep):

    --------------
    Succeeded: 175 (unchanged=2, changed=1)
    Failed:      0
    --------------
    Total states run:     175
    Total run time:   240.048 s
    
    real	5m5.030s
    user	0m4.075s
    sys	0m0.375s
    
  • 2016.11.1 with the patch applied:

    --------------
    Succeeded: 175 (unchanged=2, changed=1)
    Failed:      0
    --------------
    Total states run:     175
    Total run time:     8.280 s
    
    real	0m58.984s
    user	0m1.094s
    sys	0m0.219s
    

Looking forward to upgrading to 2016.11.3 with this patch applied.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Bug broken, incorrect, or confusing behavior Confirmed Salt engineer has confirmed bug/feature - often including a MCVE Core relates to code central or existential to Salt fixed-pls-verify fix is linked, bug author to confirm fix P2 Priority 2 severity-critical top severity, seen by most users, serious issues severity-high 2nd top severity, seen by most users, causes major problems State-Module ZRELEASED - 2016.11.4
Projects
None yet
Development

No branches or pull requests

7 participants