Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Mitogen runner will sometimes mask playbook execution errors #527

Closed
cloudnull opened this issue Feb 9, 2019 · 8 comments
Closed

Mitogen runner will sometimes mask playbook execution errors #527

cloudnull opened this issue Feb 9, 2019 · 8 comments

Comments

@cloudnull
Copy link

@cloudnull cloudnull commented Feb 9, 2019

While running the openstack-ansible playbooks I ran into the following error: https://pasted.tech/pastes/3cbdf40acfc84ef8311be3475c2cc9636233078e.raw

The task being executed is through a handler which can be found here: https://github.com/logan2211/ansible-haproxy-endpoints/blob/90603684ba258516d10c33f8a01852d59e755d56/handlers/main.yml#L17

This issues was able to be reproduced consistently across playbook runs, however, its not clear to me what the actual issue is?

This is the focused stacktrace

The full traceback is:
Traceback (most recent call last):
  File "/opt/ansible-runtime/local/lib/python2.7/site-packages/ansible/executor/task_executor.py", line 106, in run
    item_results = self._run_loop(items)
  File "/opt/ansible-runtime/local/lib/python2.7/site-packages/ansible/executor/task_executor.py", line 343, in _run_loop
    res = self._execute(variables=task_vars)
  File "/opt/ansible-runtime/local/lib/python2.7/site-packages/ansible/executor/task_executor.py", line 612, in _execute
    result = self._handler.run(task_vars=variables)
  File "/etc/ansible/roles/mitogen/ansible_mitogen/mixins.py", line 117, in run
    return super(ActionModuleMixin, self).run(tmp, task_vars)
  File "/opt/ansible-runtime/local/lib/python2.7/site-packages/ansible/plugins/action/normal.py", line 46, in run
    result = merge_hash(result, self._execute_module(task_vars=task_vars, wrap_async=wrap_async))
  File "/etc/ansible/roles/mitogen/ansible_mitogen/mixins.py", line 346, in _execute_module
    timeout_secs=self.get_task_timeout_secs(),
  File "/etc/ansible/roles/mitogen/ansible_mitogen/planner.py", line 495, in invoke
    kwargs=planner.get_kwargs(),
  File "/etc/ansible/roles/mitogen/ansible_mitogen/connection.py", line 466, in call
    return self._rethrow(recv)
  File "/etc/ansible/roles/mitogen/ansible_mitogen/connection.py", line 452, in _rethrow
    return recv.get().unpickle()
  File "/etc/ansible/roles/mitogen/mitogen/core.py", line 797, in unpickle
    raise obj
CallError: socket.error: [Errno 111] Connection refused
  File "<stdin>", line 3047, in _dispatch_one
  File "master:/etc/ansible/roles/mitogen/ansible_mitogen/target.py", line 423, in run_module
    return impl.run()
  File "master:/etc/ansible/roles/mitogen/ansible_mitogen/runner.py", line 379, in run
    return self._run()
  File "master:/etc/ansible/roles/mitogen/ansible_mitogen/runner.py", line 870, in _run
    self._run_code(code, mod)
  File "master:/etc/ansible/roles/mitogen/ansible_mitogen/runner.py", line 849, in _run_code
    exec('exec code in vars(mod)')
  File "<string>", line 1, in <module>
  File "master:/opt/ansible-runtime/local/lib/python2.7/site-packages/ansible/modules/net_tools/haproxy.py", line 449, in <module>
  File "master:/opt/ansible-runtime/local/lib/python2.7/site-packages/ansible/modules/net_tools/haproxy.py", line 445, in main
  File "master:/opt/ansible-runtime/local/lib/python2.7/site-packages/ansible/modules/net_tools/haproxy.py", line 394, in act
  File "master:/opt/ansible-runtime/local/lib/python2.7/site-packages/ansible/modules/net_tools/haproxy.py", line 325, in get_state_for
  File "master:/opt/ansible-runtime/local/lib/python2.7/site-packages/ansible/modules/net_tools/haproxy.py", line 245, in execute
  File "/usr/lib64/python2.7/socket.py", line 224, in meth
    return getattr(self._sock,name)(*args)


fatal: [utility1_galera_container-9d5dac72]: FAILED! => {
    "msg": "Unexpected failure during module execution.",
    "stdout": ""
}

It makes sense the issue is within the exec in runner.py however I'm simply not sure why.

@cloudnull
Copy link
Author

@cloudnull cloudnull commented Feb 9, 2019

I'm able to recreate this issue using the HEAD of master.

Using commit 79c6d758f3081b7c199e6c8087ecb76cef1eb603 + the Issue510 branch + PR #515

@cloudnull
Copy link
Author

@cloudnull cloudnull commented Feb 9, 2019

Looks like the underlying error is simply being masked. This is what I get when going back to stock, which is likely an environmental issue.

Traceback (most recent call last):
  File "/root/.ansible/tmp/ansible-tmp-1549734302.13-244668050956889/AnsiballZ_haproxy.py", line 113, in <module>
    _ansiballz_main()
  File "/root/.ansible/tmp/ansible-tmp-1549734302.13-244668050956889/AnsiballZ_haproxy.py", line 105, in _ansiballz_main
    invoke_module(zipped_mod, temp_path, ANSIBALLZ_PARAMS)
  File "/root/.ansible/tmp/ansible-tmp-1549734302.13-244668050956889/AnsiballZ_haproxy.py", line 48, in invoke_module
    imp.load_module('__main__', mod, module, MOD_DESC)
  File "/tmp/ansible_haproxy_payload_hWZRdj/__main__.py", line 449, in <module>
  File "/tmp/ansible_haproxy_payload_hWZRdj/__main__.py", line 445, in main
  File "/tmp/ansible_haproxy_payload_hWZRdj/__main__.py", line 394, in act
  File "/tmp/ansible_haproxy_payload_hWZRdj/__main__.py", line 325, in get_state_for
  File "/tmp/ansible_haproxy_payload_hWZRdj/__main__.py", line 245, in execute
  File "/usr/lib64/python2.7/socket.py", line 224, in meth
    return getattr(self._sock,name)(*args)
socket.error: [Errno 111] Connection refused
@cloudnull
Copy link
Author

@cloudnull cloudnull commented Feb 9, 2019

Just to circle back, once I corrected this environmental issue everything worked as expected.

@cloudnull cloudnull changed the title Stacktrace when running net_tools "haproxy" Mitogen runner will sometimes mask playbook execution errors Feb 9, 2019
@dw
Copy link
Member

@dw dw commented Feb 9, 2019

That doesn't look good. Do you think it's Mitogen raising the wrong exception type perhaps? I'll setup a local reproduction, should be easy to fix

@dw
Copy link
Member

@dw dw commented Feb 9, 2019

ooh! It's because the module output doesn't contain the error, right

@dw
Copy link
Member

@dw dw commented Feb 9, 2019

I've seen this a million times and never stopped to think about it -- its because there is nothing in ansible_mitogen/runner.py NewStyleRunner._run() bothering to catch any exception from the module aside from SystemExit. So that propagates back up to the call dispatcher which returns it as a CallError to the worker process, which rethrows it, and again nothing catches CallError so it turns into a crash of the action plug-in.

This is just some missed emulation of vanilla, thanks for reporting it!

@dw dw closed this in 8a2dcbf Feb 9, 2019
dw added a commit that referenced this issue Feb 9, 2019
* origin/issue527:
  docs: update Changelog; closes #527.
  issue #527: catch new-style module tracebacks like vanilla.
@dw
Copy link
Member

@dw dw commented Feb 9, 2019

Thanks for reporting this! If I've misunderstood the problem please don't hesitate to reopen

@cloudnull
Copy link
Author

@cloudnull cloudnull commented Feb 10, 2019

The "issue527" branch seems to have fixed it. Thanks for smashing that so quickly!

dw added a commit that referenced this issue Feb 10, 2019
* origin/master: (661 commits)
  Bump version for release.
  docs: update Changelog; closes #481
  issue #481: core: preserve stderr TTY FD if one is present.
  issue #481: avoid crash if disconnect occurs during forward_modules()
  Add a few more important modules to preamble_size.py.
  .ci: add verbiage for run_batches() too.
  .ci: add README.md.
  docs: update thanks
  docs: lose "approaching stability" language, we're pretty good now
  docs: fix changelog syntax/order/"20KB"
  tests: add new compression parameter to mitogen_get_stack results
  tests: disable affinity_test on Travis :/
  issue #508: fix responder stats test due to new smaller parent.py.
  issue #508: tests: skip minify_test Py2.4/2.5 for profiler.py.
  tests: fix fallout from 36fb318
  issue #520: add AIX auth failure string to su.
  tests: move affinity_test to Ansible tests.
  core: cProfile is not available in 2.4.
  issue #505: docs: add new detail graph for one scenario.
  docs: update and re-record profile graphs in docs; closes #505
  service: fix PushFileService exception
  tests: pad out localhost-*
  service: start pool shutdown on broker shutdown.
  master: .encode() needed for Py3.
  ansible: stash PID files in CWD if requested for debugging.
  issue #508: master: minify_safe_re must be bytes for Py3.
  bench: tidy up and cpu-pin some more files.
  tests: add localhost-x100
  ansible: double the default pool size.
  ansible: raise error with correct exception type.
  issue #508: master: minify all Mitogen/ansible_mitogen sources.
  parent: PartialZlib docstrings.
  ansible: hacky parser to alow bools to be specified on command line
  parent: pre-cache bootstrap if possible.
  docs: update Changelog.
  ansible: add mitogen_ssh_compression variable.
  service: PushFileService never recorded a file as sent.
  parent: synchronize get_core_source()
  service: use correct profile aggregation name.
  SyntaxError.
  ansible: don't pin controller if <4 cores.
  tests: make soak testing work reliably on vanilla.
  docs: changelog tidyups.
  ansible: document and make affinity stuff portable to non-Linux
  ansible: fix affinity.py test failure on 2 cores.
  ansible: preheat PluginLoader caches before fork.
  tests: make mitogen_shutdown_all be run_once by default.
  docs: update Changelog.
  ansible: use Poller for WorkerProcess; closes #491.
  ansible: new multiplexer/workers configuration
  docs: update Changelog.
  docs: update Changelog.
  ansible: pin connection multiplexer to a single core
  utils: pad out reset_affinity() and integrate with detach_popen()
  utils: import reset_affinity() function.
  master: set Router.profiling if MITOGEN_PROFILING variable present.
  parent: don't kill children when profiling is active.
  ansible: hook strategy and worker processes into profiler
  profiler: import from linear2 branch
  core: tidy up existing profiling code and support MITOGEN_PROFILE_FMT
  issue #260: redundant if statement.
  ansible: ensure MuxProcess MITOGEN_PROFILING results reach disk.
  ansible/bench: make end= configurable.
  master: cache sent/forwarded module names
  Aggregate code coverage data across tox all runs
  Allow independant control of coverage erase and reporting
  Fix incorrect attempt to use coverage
  docs: update Changelog; closes #527.
  issue #527: catch new-style module tracebacks like vanilla.
  Fix DeprecationWarning in mitogen.utils.run_with_router()
  Generate coverage report even if some tests fail
  ci: fix incorrect partition/rpartition from 8a4caea
  issue #260: hide force-disconnect messages.
  issue #498: fix shutdown crash
  issue #260: avoid start_transmit()/on_transmit()/stop_transmit()
  core: ensure broker profiling output reaches disk
  master: keep is_stdlib_path() result as negative cache entry
  ci: Allow DISTROS="debian*32" variable, and KEEP=1
  Use develop mode in tox
  issue #429: fix sudo regression.
  misc: rename to scripts. tab completion!!
  core: Latch._wake improvements
  issue #498: prevent crash on double 'disconnect' signal.
  issue #413: don't double-propagate DEL_ROUTE to parent.
  issue #498: wrap Router dict mutations in a lock
  issue #429: enable en_US locale to unbreak debops test.
  issue #499: fix another mind-numbingly stupid vanilla inconsistency
  issue #497: do our best to cope with crap upstream code
  ssh: fix test to match updated log format.
  issue #429: update Changelog.
  issue #429: update Changelog.
  issue #429: teach sudo about every know i18n password string.
  issue #429: install i18n-related bits in test images.
  ssh: tidy up logs and stream names.
  tests: ensure file is closed in connection_test.
  gcloud: small updates
  tests: give ansible/gcloud/ its own requirements file.
  issue #499: another totally moronic implementation difference
  issue #499: disable new test on vanilla.
  docs: update Changelog; closes #499.
  ...
dw added a commit that referenced this issue Feb 10, 2019
* origin/v024: (662 commits)
  docs: update Changelog release date.
  Bump version for release.
  docs: update Changelog; closes #481
  issue #481: core: preserve stderr TTY FD if one is present.
  issue #481: avoid crash if disconnect occurs during forward_modules()
  Add a few more important modules to preamble_size.py.
  .ci: add verbiage for run_batches() too.
  .ci: add README.md.
  docs: update thanks
  docs: lose "approaching stability" language, we're pretty good now
  docs: fix changelog syntax/order/"20KB"
  tests: add new compression parameter to mitogen_get_stack results
  tests: disable affinity_test on Travis :/
  issue #508: fix responder stats test due to new smaller parent.py.
  issue #508: tests: skip minify_test Py2.4/2.5 for profiler.py.
  tests: fix fallout from 36fb318
  issue #520: add AIX auth failure string to su.
  tests: move affinity_test to Ansible tests.
  core: cProfile is not available in 2.4.
  issue #505: docs: add new detail graph for one scenario.
  docs: update and re-record profile graphs in docs; closes #505
  service: fix PushFileService exception
  tests: pad out localhost-*
  service: start pool shutdown on broker shutdown.
  master: .encode() needed for Py3.
  ansible: stash PID files in CWD if requested for debugging.
  issue #508: master: minify_safe_re must be bytes for Py3.
  bench: tidy up and cpu-pin some more files.
  tests: add localhost-x100
  ansible: double the default pool size.
  ansible: raise error with correct exception type.
  issue #508: master: minify all Mitogen/ansible_mitogen sources.
  parent: PartialZlib docstrings.
  ansible: hacky parser to alow bools to be specified on command line
  parent: pre-cache bootstrap if possible.
  docs: update Changelog.
  ansible: add mitogen_ssh_compression variable.
  service: PushFileService never recorded a file as sent.
  parent: synchronize get_core_source()
  service: use correct profile aggregation name.
  SyntaxError.
  ansible: don't pin controller if <4 cores.
  tests: make soak testing work reliably on vanilla.
  docs: changelog tidyups.
  ansible: document and make affinity stuff portable to non-Linux
  ansible: fix affinity.py test failure on 2 cores.
  ansible: preheat PluginLoader caches before fork.
  tests: make mitogen_shutdown_all be run_once by default.
  docs: update Changelog.
  ansible: use Poller for WorkerProcess; closes #491.
  ansible: new multiplexer/workers configuration
  docs: update Changelog.
  docs: update Changelog.
  ansible: pin connection multiplexer to a single core
  utils: pad out reset_affinity() and integrate with detach_popen()
  utils: import reset_affinity() function.
  master: set Router.profiling if MITOGEN_PROFILING variable present.
  parent: don't kill children when profiling is active.
  ansible: hook strategy and worker processes into profiler
  profiler: import from linear2 branch
  core: tidy up existing profiling code and support MITOGEN_PROFILE_FMT
  issue #260: redundant if statement.
  ansible: ensure MuxProcess MITOGEN_PROFILING results reach disk.
  ansible/bench: make end= configurable.
  master: cache sent/forwarded module names
  Aggregate code coverage data across tox all runs
  Allow independant control of coverage erase and reporting
  Fix incorrect attempt to use coverage
  docs: update Changelog; closes #527.
  issue #527: catch new-style module tracebacks like vanilla.
  Fix DeprecationWarning in mitogen.utils.run_with_router()
  Generate coverage report even if some tests fail
  ci: fix incorrect partition/rpartition from 8a4caea
  issue #260: hide force-disconnect messages.
  issue #498: fix shutdown crash
  issue #260: avoid start_transmit()/on_transmit()/stop_transmit()
  core: ensure broker profiling output reaches disk
  master: keep is_stdlib_path() result as negative cache entry
  ci: Allow DISTROS="debian*32" variable, and KEEP=1
  Use develop mode in tox
  issue #429: fix sudo regression.
  misc: rename to scripts. tab completion!!
  core: Latch._wake improvements
  issue #498: prevent crash on double 'disconnect' signal.
  issue #413: don't double-propagate DEL_ROUTE to parent.
  issue #498: wrap Router dict mutations in a lock
  issue #429: enable en_US locale to unbreak debops test.
  issue #499: fix another mind-numbingly stupid vanilla inconsistency
  issue #497: do our best to cope with crap upstream code
  ssh: fix test to match updated log format.
  issue #429: update Changelog.
  issue #429: update Changelog.
  issue #429: teach sudo about every know i18n password string.
  issue #429: install i18n-related bits in test images.
  ssh: tidy up logs and stream names.
  tests: ensure file is closed in connection_test.
  gcloud: small updates
  tests: give ansible/gcloud/ its own requirements file.
  issue #499: another totally moronic implementation difference
  issue #499: disable new test on vanilla.
  ...
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Linked pull requests

Successfully merging a pull request may close this issue.

None yet
2 participants