Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

localhost wait_for does not return a failure nor exception #1941

Closed
stepanblyschak opened this issue Jul 21, 2020 · 3 comments · Fixed by #2063
Closed

localhost wait_for does not return a failure nor exception #1941

stepanblyschak opened this issue Jul 21, 2020 · 3 comments · Fixed by #2063
Assignees

Comments

@stepanblyschak
Copy link
Contributor

Description

Steps to reproduce the issue:

  1. res = localhost.wait_for(host=duthost.hostname, port=22, state='stopped', timeout=1)
  2. Do not reboot device
  3. res.is_failed == False
  4. 'exception' not in res

Describe the results you received:
localhost.wait_for does not give a way to know if there was a failure.

        res = localhost.wait_for(host=duthost.hostname, 

                port=22, state="stopped", delay=5, 

                timeout=watchdog_timeout + TIMEOUT_DEVIATION, 

                module_ignore_errors=True) 

 

>       assert 'exception' in res 

E       AssertionError 

Describe the results you expected:
localhost.wait_for sets is_failed flag or adds 'exception' into res.

Additional information you deem important:

Previous version of sonic-mgmt docker image has localhost.wait_for working and wait_for set 'exception' in result. Recent sonic-mgmt docker image broke this.

**Output of `show version`:**

```
(paste your output here)
```

**Attach debug file `sudo generate_dump`:**

```
(paste your output here)
```
@stepanblyschak
Copy link
Contributor Author

The sonic-mgmt version that works in this case is based on master.0-3ab4b716.

Output of pip list:

Package                            Version
---------------------------------- -----------
adal                               1.2.2
ansible                            2.8.7
atomicwrites                       1.3.0
attrs                              19.3.0
azure-common                       1.1.23
azure-core                         1.1.1
azure-kusto-data                   0.0.13
azure-kusto-ingest                 0.0.13
azure-nspkg                        3.0.2
azure-storage-blob                 12.1.0
azure-storage-common               2.1.0
azure-storage-nspkg                3.1.0
azure-storage-queue                12.1.0
backports.shutil-get-terminal-size 1.0.0
certifi                            2019.11.28
cffi                               1.10.0
chardet                            3.0.4
configparser                       4.0.2
contextlib2                        0.6.0.post1
cryptography                       2.8
decorator                          4.4.1
dpkt                               1.9.2
enum34                             1.1.6
funcsigs                           1.0.2
futures                            3.3.0
idna                               2.8
importlib-metadata                 1.3.0
ipaddr                             2.2.0
ipaddress                          1.0.23
ipython                            5.4.1
ipython-genutils                   0.2.0
isodate                            0.6.0
Jinja2                             2.7.2
lxml                               4.4.2
MarkupSafe                         1.1.1
more-itertools                     5.0.0
msrest                             0.6.10
natsort                            6.2.0
netaddr                            0.7.19
nnpy                               1.4.2
oauthlib                           3.1.0
packaging                          19.2
paramiko                           2.1.2
passlib                            1.7.2
pathlib2                           2.3.5
pexpect                            4.7.0
pickleshare                        0.7.5
pip                                20.1.1
pluggy                             0.13.1
prompt-toolkit                     1.0.18
ptf                                0.9.1
ptyprocess                         0.6.0
py                                 1.8.0
pyasn1                             0.1.9
pycparser                          2.19
pycrypto                           2.6.1
Pygments                           2.5.2
PyJWT                              1.7.1
pyparsing                          2.4.5
pysnmp                             4.2.5
pytest                             4.6.7
pytest-ansible                     2.2.2
python-dateutil                    2.8.1
PyYAML                             5.2
requests                           2.22.0
requests-oauthlib                  1.3.0
scandir                            1.10.0
scapy                              2.2.0.dev0
setuptools                         20.7.0
simplegeneric                      0.8.1
six                                1.13.0
traitlets                          4.3.3
typing                             3.7.4.1
urllib3                            1.25.7
wcwidth                            0.1.7
zipp                               0.6.0

@lguohan
Copy link
Contributor

lguohan commented Jul 31, 2020

@wangxin , can you take a look?

@wangxin wangxin self-assigned this Aug 7, 2020
@wangxin
Copy link
Collaborator

wangxin commented Aug 7, 2020

This is a complicated and tricky issue. There are multiple problems behind this issue.

Firstly, for your specific case, please check if you can directly issue ssh admin@<duthost.hostname> to access the duthost. In ansible testing, the host argument of the wait_for module is set to IP address of the DUT host using {{ ansible_host }}. In pytest, if we use argument host=duthost.hostname, the sonic-mgmt docker need to be able to resolve the IP address of the duthost just from its hostname. Otherwise, the wait_for module will consider that the connection is stopped and return success.

The other issue is within pytest_ansible "ansible/pytest-ansible#47 The is_failed property of failed result of some ansible modules is False".

For some reason I don't know yet, ansible removes the keys like 'failed' and 'skipped' in failed test result dict before calling the callback functions of ansible plugins. Ansible calls different callback functions for succeeded and failed ansible module results. However, the pytest_ansible plugin uses the same callback function for both succeeded and failed ansible modules. So, in case an ansible module failed, the pytest_ansible is only able to tell if it is failed only when the module result dict has key 'rc' with non-zero value. A PR "#1786 Workaround for no Exception issue for some failed ansible modules" has been submitted as a workaround for this issue. PR #17686 enhanced the sonic-mgmt library to consider any ansible module result with 'exception' field as failed. But the problem is still not fully resolved.

In some cases, result of localhost.wait_for may or may not have 'exception' field. Let's assume IP address '10.250.0.102' is a host reachable from both localhost and duthost:

If we run wait_for on localhost and on duthost, the results may be different:

  • Run localhost.wait_for
    res = localhost.wait_for(host='10.250.0.101', port=22, state='stopped', timeout=1, module_ignore_errors=True)
    logger.info(json.dumps(res, indent=2))

The log:

09:10:14 DEBUG devices.py:_run:61: /var/johnar/code/sonic-mgmt/tests/test_wait.py::test_wait1#11: [localhost] AnsibleModule::wait_for, args=[], kwargs={"host": "10.250.0.102", "module_ignore_errors": true, "port": 22, "timeout": 1, "state": "stopped"}
09:10:15 DEBUG devices.py:_run:75: /var/johnar/code/sonic-mgmt/tests/test_wait.py::test_wait1#11: [localhost] AnsibleModule::wait_for Result => {"msg": "Timeout when waiting for 10.250.0.102:22 to stop.", "invocation": {"module_args": {"active_connection_states": ["ESTABLISHED", "FIN_WAIT1", "FIN_WAIT2", "SYN_RECV", "SYN_SENT", "TIME_WAIT"], "host": "10.250.0.102", "connect_timeout": 5, "delay": 0, "search_regex": null, "state": "stopped", "sleep": 1, "timeout": 1, "exclude_hosts": null, "msg": null, "path": null, "port": 22}}, "changed": false, "_ansible_no_log": false, "elapsed": 1}
09:10:15 INFO test_wait.py:test_wait1:12: {
  "msg": "Timeout when waiting for 10.250.0.102:22 to stop.", 
  "invocation": {
    "module_args": {
      "active_connection_states": [
        "ESTABLISHED", 
        "FIN_WAIT1", 
        "FIN_WAIT2", 
        "SYN_RECV", 
        "SYN_SENT", 
        "TIME_WAIT"
      ], 
      "host": "10.250.0.102", 
      "connect_timeout": 5, 
      "delay": 0, 
      "search_regex": null, 
      "state": "stopped", 
      "sleep": 1, 
      "timeout": 1, 
      "exclude_hosts": null, 
      "msg": null, 
      "path": null, 
      "port": 22
    }
  }, 
  "changed": false, 
  "_ansible_no_log": false, 
  "elapsed": 1
}
  • Run duthost.wait_for
    res = duthost.wait_for(host='10.250.0.102', port=22, state='stopped', timeout=1, module_ignore_errors=True)
    logger.info(json.dumps(res, indent=2))

The log:

09:11:27 DEBUG devices.py:_run:61: /var/johnar/code/sonic-mgmt/tests/test_wait.py::test_wait1#11: [vlab-01] AnsibleModule::wait_for, args=[], kwargs={"host": "10.250.0.102", "module_ignore_errors": true, "port": 22, "timeout": 1, "state": "stopped"}
09:11:29 DEBUG devices.py:_run:75: /var/johnar/code/sonic-mgmt/tests/test_wait.py::test_wait1#11: [vlab-01] AnsibleModule::wait_for Result => {"msg": "Timeout when waiting for 10.250.0.102:22 to stop.", "exception": "WARNING: The below traceback may *not* be related to the actual failure.\n  File \"/tmp/ansible_wait_for_payload_SP4hBD/__main__.py\", line 239, in <module>\n    import psutil\n", "_ansible_no_log": false, "invocation": {"module_args": {"active_connection_states": ["ESTABLISHED", "FIN_WAIT1", "FIN_WAIT2", "SYN_RECV", "SYN_SENT", "TIME_WAIT"], "host": "10.250.0.102", "connect_timeout": 5, "delay": 0, "search_regex": null, "state": "stopped", "sleep": 1, "timeout": 1, "exclude_hosts": null, "msg": null, "path": null, "port": 22}}, "changed": false, "elapsed": 1}
09:11:29 INFO test_wait.py:test_wait1:12: {
  "msg": "Timeout when waiting for 10.250.0.102:22 to stop.", 
  "exception": "WARNING: The below traceback may *not* be related to the actual failure.\n  File \"/tmp/ansible_wait_for_payload_SP4hBD/__main__.py\", line 239, in <module>\n    import psutil\n", 
  "_ansible_no_log": false, 
  "invocation": {
    "module_args": {
      "active_connection_states": [
        "ESTABLISHED", 
        "FIN_WAIT1", 
        "FIN_WAIT2", 
        "SYN_RECV", 
        "SYN_SENT", 
        "TIME_WAIT"
      ], 
      "host": "10.250.0.102", 
      "connect_timeout": 5, 
      "delay": 0, 
      "search_regex": null, 
      "state": "stopped", 
      "sleep": 1, 
      "timeout": 1, 
      "exclude_hosts": null, 
      "msg": null, 
      "path": null, 
      "port": 22
    }
  }, 
  "changed": false, 
  "elapsed": 1
}

Both results have "msg": "Timeout when waiting for 10.250.0.102:22 to stop.". But the second result has the exception field. The difference is because that the localhost has python package psutil installed.

On sonic-mgmt docker:

johnar@4c1f7b289839:~/code/sonic-mgmt/tests$ pip list | grep psutil
DEPRECATION: Python 2.7 reached the end of its life on January 1st, 2020. Please upgrade your Python as Python 2.7 is no longer maintained. pip 21.0 will drop support for Python 2.7 in January 2021. More details about Python 2 support in pip, can be found at https://pip.pypa.io/en/latest/development/release-process/#python-2-support
psutil                             5.7.0

On duthost:

admin@vlab-01:~$ pip list | grep psutil
admin@vlab-01:~$

According to the implementation of wait_for, if we wait for ssh port of a reachable host to be 'stopped', the module will call module.fail_json to return failed result. The wait_for module itself does not add 'exception' field in such results: https://github.com/ansible/ansible/blob/v2.8.7/lib/ansible/modules/utilities/logic/wait_for.py#L559

The 'exception' field is added by the module.fail_json method: https://github.com/ansible/ansible/blob/v2.8.7/lib/ansible/module_utils/basic.py#L2064

According to the logic, when traceback is generated while executing the ansible module, an exception field will be added to the returned result.

The wait_for module tries to import psutil somehow: https://github.com/ansible/ansible/blob/v2.8.7/lib/ansible/modules/utilities/logic/wait_for.py#L243
If importing the psutil module failed, exception will be raised and caught. Meanwhile some traceback will also be generated. When the module.fail_json method detect that trace back, it will add an exception field to the ansible module result. Because the duthost does not have the psutil package installed, that's why the result of duthost.wait_for has exception field.

Depending on 'exception' field in the ansible module result is not reliable enough.

To fully fix the problem, there are two options:

  1. Fix the pytest_ansible issue The is_failed property of failed result of some ansible modules is False ansible/pytest-ansible#47
  2. Hack the code to avoid ansible to remove the 'failed' key from result dict.

It seems that option 2 would be easier.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants