Skip to content

[BUG] Orchestration with "parallel: True" does not behave properly with napalm/deltaproxy #61439

@COvirtNetwork

Description

@COvirtNetwork

Description
Running a state file which runs a series of 17 commands against napalm devices (network switches) intermittently fails when using the combination of orchestrate runner, napalm proxy, and deltaproxy

When running 12 or fewer commands, no issues are observed. When running 13-14 commands, approximately 20% of orchestration runs experience failure on at least 1 command. When running 15 commands, approximately 40% of orchestration runs experience failure on at least 1 command. When running 16 commands, approximately 60% of orchestration runs experience failure on at least 1 command. When running 17 commands, 100% of orchestration runs experience failure on at least 1 command

No issues are observed when running the commands in serial.

Setup
(Please provide relevant configs and/or SLS files (be sure to remove sensitive info. There is no general set-up of Salt.)

All devices are on-prem. Salt master is running as a VM in on-prem infrastructure. A second VM is running as a Salt minion. Delta proxy service is running on the salt minion VM. Pillar data is stored in an external MongoDB running on a 3rd VM.

below is srv/salt/nxos_ssh_commands_org/init.sls file which contains the example state file.

{% set target = salt['pillar.get']('target') %}
{% do salt.log.debug('xxx: ' ~ target) %}
{% for cmd in ['show version','show interfaces description','show ip bgp all','show module','show vlan','show mac address','show ip arp','show interfaces status','show cdp nei','show etherchannel summary','show file systems','show bootvar','show spanning-tree','show inventory','show redundancy','show switch virtual','show run'] %}
run {{ cmd }} :
  salt.function:
    - name: net.cli
    - tgt: '{{ target }}'
    - tgt_type: "compound"
    - arg:
      - {{ cmd }}
    - parallel: True
{% endfor %}

Steps to Reproduce the behavior
Run the attached .sls file using the following command:
salt-run state.orch nxos_ssh_commands_org pillar="{'target': 'L@<minion_id>'}"

Expected behavior
All 17 commands included in the .sls file should successfully return output

Screenshots
NA

Versions Report

salt --versions-report (Provided by running salt --versions-report. Please also mention any differences in master/minion versions.)
Salt Version:
          Salt: 3004

Dependency Versions:
          cffi: 1.14.6
      cherrypy: unknown
      dateutil: Not Installed
     docker-py: Not Installed
         gitdb: Not Installed
     gitpython: Not Installed
        Jinja2: 3.0.1
       libgit2: Not Installed
      M2Crypto: 0.35.2
          Mako: Not Installed
       msgpack: 0.6.2
  msgpack-pure: Not Installed
  mysql-python: Not Installed
     pycparser: 2.20
      pycrypto: Not Installed
  pycryptodome: 3.10.1
        pygit2: Not Installed
        Python: 3.6.8 (default, Nov 16 2020, 16:55:22)
  python-gnupg: Not Installed
        PyYAML: 5.4.1
         PyZMQ: 17.0.0
         smmap: Not Installed
       timelib: Not Installed
       Tornado: 4.5.3
           ZMQ: 4.1.4

Salt Extensions:
        SSEAPE: 8.6.1.4

System Versions:
          dist: centos 7 Core
        locale: UTF-8
       machine: x86_64
       release: 3.10.0-1160.42.2.el7.x86_64
        system: Linux
       version: CentOS Linux 7 Core

Additional context
The following tracebacks are observed on command failures:

              The minion function caused an exception: Traceback (most recent call last):
                File "/usr/local/lib/python3.6/site-packages/netmiko/base_connection.py", line 935, in establish_connection
                  self.remote_conn_pre.connect(**ssh_connect_params)
                File "/usr/local/lib/python3.6/site-packages/paramiko/client.py", line 412, in connect
                  server_key = t.get_remote_server_key()
                File "/usr/local/lib/python3.6/site-packages/paramiko/transport.py", line 834, in get_remote_server_key
                  raise SSHException("No existing session")
              paramiko.ssh_exception.SSHException: No existing session

              During handling of the above exception, another exception occurred:

              Traceback (most recent call last):
                File "/usr/local/lib/python3.6/site-packages/napalm/base/base.py", line 92, in _netmiko_open
                  **netmiko_optional_args
                File "/usr/local/lib/python3.6/site-packages/netmiko/ssh_dispatcher.py", line 326, in ConnectHandler
                  return ConnectionClass(*args, **kwargs)
                File "/usr/local/lib/python3.6/site-packages/netmiko/cisco/cisco_nxos_ssh.py", line 12, in __init__
                  return super().__init__(*args, **kwargs)
                File "/usr/local/lib/python3.6/site-packages/netmiko/base_connection.py", line 350, in __init__
                  self._open()
                File "/usr/local/lib/python3.6/site-packages/netmiko/base_connection.py", line 355, in _open
                  self.establish_connection()
                File "/usr/local/lib/python3.6/site-packages/netmiko/base_connection.py", line 980, in establish_connection
                  raise NetmikoTimeoutException(msg)
              netmiko.ssh_exception.NetmikoTimeoutException: Paramiko: 'No existing session' error: try increasing 'conn_timeout' to 10 seconds or larger.

              During handling of the above exception, another exception occurred:

              Traceback (most recent call last):
                File "/usr/lib/python3.6/site-packages/salt/utils/napalm.py", line 359, in get_device
                  network_device.get("DRIVER").open()
                File "/usr/local/lib/python3.6/site-packages/napalm/nxos_ssh/nxos_ssh.py", line 446, in open
                  device_type="cisco_nxos", netmiko_optional_args=self.netmiko_optional_args
                File "/usr/local/lib/python3.6/site-packages/napalm/base/base.py", line 95, in _netmiko_open
                  raise ConnectionException("Cannot connect to {}".format(self.hostname))
              napalm.base.exceptions.ConnectionException: Cannot connect to nxs1-ork3.vmware.com

              During handling of the above exception, another exception occurred:

              Traceback (most recent call last):
                File "/usr/lib/python3.6/site-packages/salt/metaproxy/deltaproxy.py", line 595, in thread_return
                  opts, data, func, args, kwargs
                File "/usr/lib/python3.6/site-packages/salt/loader/lazy.py", line 149, in __call__
                  return self.loader.run(run_func, *args, **kwargs)
                File "/usr/lib/python3.6/site-packages/salt/loader/lazy.py", line 1201, in run
                  return self._last_context.run(self._run_as, _func_or_method, *args, **kwargs)
                File "/usr/lib/python3.6/site-packages/contextvars/__init__.py", line 38, in run
                  return callable(*args, **kwargs)
                File "/usr/lib/python3.6/site-packages/salt/loader/lazy.py", line 1216, in _run_as
                  return _func_or_method(*args, **kwargs)
                File "/usr/lib/python3.6/site-packages/salt/executors/direct_call.py", line 10, in execute
                  return func(*args, **kwargs)
                File "/usr/lib/python3.6/site-packages/salt/loader/lazy.py", line 149, in __call__
                  return self.loader.run(run_func, *args, **kwargs)
                File "/usr/lib/python3.6/site-packages/salt/loader/lazy.py", line 1201, in run
                  return self._last_context.run(self._run_as, _func_or_method, *args, **kwargs)
                File "/usr/lib/python3.6/site-packages/contextvars/__init__.py", line 38, in run
                  return callable(*args, **kwargs)
                File "/usr/lib/python3.6/site-packages/salt/loader/lazy.py", line 1216, in _run_as
                  return _func_or_method(*args, **kwargs)
                File "/usr/lib/python3.6/site-packages/salt/utils/napalm.py", line 535, in func_wrapper
                  ret = func(*args, **kwargs)
                File "/usr/lib/python3.6/site-packages/salt/modules/napalm_network.py", line 698, in cli
                  **{"commands": list(commands), "force_reconnect":True}
                File "/usr/lib/python3.6/site-packages/salt/utils/napalm.py", line 160, in call
                  napalm_device = get_device(opts)
                File "/usr/lib/python3.6/site-packages/salt/utils/napalm.py", line 386, in get_device
                  raise napalm_base.exceptions.ConnectionException(base_err_msg)
              napalm.base.exceptions.ConnectionException: Cannot connect to nxs1-ork3.vmware.com as svc.salt.net.
              The minion function caused an exception: Traceback (most recent call last):
                File "/usr/lib/python3.6/site-packages/salt/metaproxy/deltaproxy.py", line 595, in thread_return
                  opts, data, func, args, kwargs
                File "/usr/lib/python3.6/site-packages/salt/loader/lazy.py", line 149, in __call__
                  return self.loader.run(run_func, *args, **kwargs)
                File "/usr/lib/python3.6/site-packages/salt/loader/lazy.py", line 1201, in run
                  return self._last_context.run(self._run_as, _func_or_method, *args, **kwargs)
                File "/usr/lib/python3.6/site-packages/contextvars/__init__.py", line 38, in run
                  return callable(*args, **kwargs)
                File "/usr/lib/python3.6/site-packages/salt/loader/lazy.py", line 1216, in _run_as
                  return _func_or_method(*args, **kwargs)
                File "/usr/lib/python3.6/site-packages/salt/executors/direct_call.py", line 10, in execute
                  return func(*args, **kwargs)
                File "/usr/lib/python3.6/site-packages/salt/loader/lazy.py", line 149, in __call__
                  return self.loader.run(run_func, *args, **kwargs)
                File "/usr/lib/python3.6/site-packages/salt/loader/lazy.py", line 1201, in run
                  return self._last_context.run(self._run_as, _func_or_method, *args, **kwargs)
                File "/usr/lib/python3.6/site-packages/contextvars/__init__.py", line 38, in run
                  return callable(*args, **kwargs)
                File "/usr/lib/python3.6/site-packages/salt/loader/lazy.py", line 1216, in _run_as
                  return _func_or_method(*args, **kwargs)
                File "/usr/lib/python3.6/site-packages/salt/utils/napalm.py", line 535, in func_wrapper
                  ret = func(*args, **kwargs)
                File "/usr/lib/python3.6/site-packages/salt/modules/napalm_network.py", line 698, in cli
                  **{"commands": list(commands), "force_reconnect":True}
                File "/usr/lib/python3.6/site-packages/salt/utils/napalm.py", line 160, in call
                  napalm_device = get_device(opts)
                File "/usr/lib/python3.6/site-packages/salt/utils/napalm.py", line 359, in get_device
                  network_device.get("DRIVER").open()
                File "/usr/local/lib/python3.6/site-packages/napalm/nxos_ssh/nxos_ssh.py", line 446, in open
                  device_type="cisco_nxos", netmiko_optional_args=self.netmiko_optional_args
                File "/usr/local/lib/python3.6/site-packages/napalm/base/base.py", line 92, in _netmiko_open
                  **netmiko_optional_args
                File "/usr/local/lib/python3.6/site-packages/netmiko/ssh_dispatcher.py", line 326, in ConnectHandler
                  return ConnectionClass(*args, **kwargs)
                File "/usr/local/lib/python3.6/site-packages/netmiko/cisco/cisco_nxos_ssh.py", line 12, in __init__
                  return super().__init__(*args, **kwargs)
                File "/usr/local/lib/python3.6/site-packages/netmiko/base_connection.py", line 350, in __init__
                  self._open()
                File "/usr/local/lib/python3.6/site-packages/netmiko/base_connection.py", line 356, in _open
                  self._try_session_preparation()
                File "/usr/local/lib/python3.6/site-packages/netmiko/base_connection.py", line 792, in _try_session_preparation
                  self.session_preparation()
                File "/usr/local/lib/python3.6/site-packages/netmiko/cisco/cisco_nxos_ssh.py", line 20, in session_preparation
                  command="terminal width 511", pattern=r"terminal width 511"
                File "/usr/local/lib/python3.6/site-packages/netmiko/base_connection.py", line 1126, in set_terminal_width
                  self.write_channel(command)
                File "/usr/local/lib/python3.6/site-packages/netmiko/base_connection.py", line 459, in write_channel
                  self._write_channel(out_data)
                File "/usr/local/lib/python3.6/site-packages/netmiko/base_connection.py", line 417, in _write_channel
                  self.remote_conn.sendall(write_bytes(out_data, encoding=self.encoding))
                File "/usr/local/lib/python3.6/site-packages/paramiko/channel.py", line 846, in sendall
                  sent = self.send(s)
                File "/usr/local/lib/python3.6/site-packages/paramiko/channel.py", line 801, in send
                  return self._send(s, m)
                File "/usr/local/lib/python3.6/site-packages/paramiko/channel.py", line 1198, in _send
                  raise socket.error("Socket is closed")
              OSError: Socket is closed
              The minion function caused an exception: Traceback (most recent call last):
                File "/usr/lib/python3.6/site-packages/salt/metaproxy/deltaproxy.py", line 595, in thread_return
                  opts, data, func, args, kwargs
                File "/usr/lib/python3.6/site-packages/salt/loader/lazy.py", line 149, in __call__
                  return self.loader.run(run_func, *args, **kwargs)
                File "/usr/lib/python3.6/site-packages/salt/loader/lazy.py", line 1201, in run
                  return self._last_context.run(self._run_as, _func_or_method, *args, **kwargs)
                File "/usr/lib/python3.6/site-packages/contextvars/__init__.py", line 38, in run
                  return callable(*args, **kwargs)
                File "/usr/lib/python3.6/site-packages/salt/loader/lazy.py", line 1216, in _run_as
                  return _func_or_method(*args, **kwargs)
                File "/usr/lib/python3.6/site-packages/salt/executors/direct_call.py", line 10, in execute
                  return func(*args, **kwargs)
                File "/usr/lib/python3.6/site-packages/salt/loader/lazy.py", line 149, in __call__
                  return self.loader.run(run_func, *args, **kwargs)
                File "/usr/lib/python3.6/site-packages/salt/loader/lazy.py", line 1201, in run
                  return self._last_context.run(self._run_as, _func_or_method, *args, **kwargs)
                File "/usr/lib/python3.6/site-packages/contextvars/__init__.py", line 38, in run
                  return callable(*args, **kwargs)
                File "/usr/lib/python3.6/site-packages/salt/loader/lazy.py", line 1216, in _run_as
                  return _func_or_method(*args, **kwargs)
                File "/usr/lib/python3.6/site-packages/salt/utils/napalm.py", line 535, in func_wrapper
                  ret = func(*args, **kwargs)
                File "/usr/lib/python3.6/site-packages/salt/modules/napalm_network.py", line 698, in cli
                  **{"commands": list(commands), "force_reconnect":True}
                File "/usr/lib/python3.6/site-packages/salt/utils/napalm.py", line 157, in call
                  opts["proxy"].update(**kwargs)
              KeyError: 'proxy'

Metadata

Metadata

Type

No type
No fields configured for issues without a type.

Projects

No projects

Relationships

None yet

Development

No branches or pull requests

Issue actions