Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

freeze on task with xml module #454

Closed
LuckySB opened this issue Dec 13, 2018 · 15 comments
Closed

freeze on task with xml module #454

LuckySB opened this issue Dec 13, 2018 · 15 comments

Comments

@LuckySB
Copy link

@LuckySB LuckySB commented Dec 13, 2018

  • Have you tried the latest master version from Git?
    yes

  • Mention your host and target OS and versions
    Centos 7.5 on host and Centos 7.6 on target

  • Mention your host and target Python versions
    python 2.7 on host and target

Hang up with mitogen v0.2.3, crash with mitogen from master branch

Role to create kvm VM:
Ansible task with xml module

- name: Get pool type (xml)
  xml:
    path: "/etc/libvirt/storage/{{ item.pool_name }}.xml"
    xpath: "/pool"
    content: attribute
    attribute: type
  when: >
    (item.name not in kvm_existing_vms.list_vms and item.state|d('running') != 'undefined') or
    (item.name in kvm_existing_vms.list_vms and item.state|d('running') == 'undefined')
  with_items: "{{ kvm_vm }}"
  loop_control:
    label: "{{ item.pool_name }}"
  register: kvm_pool_xml_type
  changed_when: kvm_pool_xml_type.matches[0].pool.type is defined

Error on /var/log/messages

Dec 13 13:41:55 bigtest kernel: python[20653]: segfault at 10 ip 00007f1e55bd8afd sp 00007ffd2f5e2550 error 4 in etree.so[7f1e558a1000+547000]

ansible output -vvvv

log.txt

@dw
Copy link
Member

@dw dw commented Dec 13, 2018

Thanks for reporting this.

Would it be possible to attach an example snippet of the pool XML file, and also indicate how many items exist in the loop variable (kvm_vm).

You may find setting:

vars:
    mitogen_task_isolation: fork

Could have an effect on this, although from a first glance of the module source, it is not holding any LXML state across invocations. Looks like the module is somehow triggering a segfault in lxml, possibly by reusing some state.

Can you please also confirm this does not happen at all with vanilla? I guess not -- memory corruption issues often don't present until the bad code has run a few iterations

@dw
Copy link
Member

@dw dw commented Dec 13, 2018

Ok, I have reproduced it locally. :)

@dw
Copy link
Member

@dw dw commented Dec 13, 2018

And to confirm, setting mitogen_task_isolation works around the problem. Still investigating root cause

@dw
Copy link
Member

@dw dw commented Dec 13, 2018

It looks like somehow a constructor is crashing:

Thread 1 "python" received signal SIGSEGV, Segmentation fault.
0x00007f32d0499246 in ?? () from /usr/lib/python2.7/dist-packages/lxml/etree.x86_64-linux-gnu.so
(gdb) bt
#0  0x00007f32d0499246 in ?? () from /usr/lib/python2.7/dist-packages/lxml/etree.x86_64-linux-gnu.so
#1  0x00007f32d0499556 in ?? () from /usr/lib/python2.7/dist-packages/lxml/etree.x86_64-linux-gnu.so
#2  0x0000560af7ca0795 in type_call.lto_priv () at ../Objects/typeobject.c:749
#3  0x0000560af7cb7420 in PyObject_Call (kw=0x0, arg=('/pool',), func=<optimized out>) at ../Objects/abstract.c:2547
#4  do_call (nk=<optimized out>, na=<optimized out>, pp_stack=0x7ffed29feb00, func=<optimized out>) at ../Python/ceval.c:4589
#5  call_function (oparg=<optimized out>, pp_stack=0x7ffed29feb00) at ../Python/ceval.c:4394
#6  PyEval_EvalFrameEx () at ../Python/ceval.c:3009
#7  0x0000560af7cafd0a in PyEval_EvalCodeEx () at ../Python/ceval.c:3604
#8  0x0000560af7cb7c38 in fast_function (nk=0, na=<optimized out>, n=<optimized out>, pp_stack=0x7ffed29fece0, func=<optimized out>) at ../Python/ceval.c:4467
#9  call_function (oparg=<optimized out>, pp_stack=0x7ffed29fece0) at ../Python/ceval.c:4392
#10 PyEval_EvalFrameEx () at ../Python/ceval.c:3009
#11 0x0000560af7cafd0a in PyEval_EvalCodeEx () at ../Python/ceval.c:3604
#12 0x0000560af7cb639d in PyEval_EvalCode () at ../Python/ceval.c:669
#13 exec_statement (
    locals={'_RE_SPLITSIMPLEATTRLAST':

And the crash is due to a NULL pointer dereference:

(gdb) p $_siginfo._sifields._sigfault.si_addr
$1 = (void *) 0x10

So at first take, the lxml C code is internally calling some CPython API that returns NULL without checking it, and somehow only when mitogen_task_isolation!=fork.

This could potentially be related to the importer. Will try again with a debug build

@dw
Copy link
Member

@dw dw commented Dec 13, 2018

This is very weird, somehow an XPath constructor is being invoked with a NULL args tuple:

Thread 1 "python" received signal SIGSEGV, Segmentation fault.
0x00007f1760281bfc in __pyx_pw_4lxml_5etree_19_XPathEvaluatorBase_1__cinit__ (__pyx_v_self=0x55e301b9a880, __pyx_args=0x0, __pyx_kwds=0x0) at src/lxml/etree.c:174798
174798    if (unlikely(PyTuple_GET_SIZE(__pyx_args) > 0)) {
(gdb) bt
#0  0x00007f1760281bfc in __pyx_pw_4lxml_5etree_19_XPathEvaluatorBase_1__cinit__ (__pyx_v_self=0x55e301b9a880, __pyx_args=0x0, __pyx_kwds=0x0) at src/lxml/etree.c:174798
#1  0x00007f17602f3b7f in __pyx_tp_new_4lxml_5etree__XPathEvaluatorBase (t=0x7f17605d83e0 <__pyx_type_4lxml_5etree_XPath>, a=0x55e30173d250, k=0x0)
    at src/lxml/etree.c:219507
#2  0x00007f17602f431b in __pyx_tp_new_4lxml_5etree_XPath (t=0x7f17605d83e0 <__pyx_type_4lxml_5etree_XPath>, a=0x55e30173d250, k=0x0) at src/lxml/etree.c:219806
#3  0x000055e2ffbf58ac in type_call (type=0x7f17605d83e0 <__pyx_type_4lxml_5etree_XPath>, args=0x55e30173d250, kwds=0x0) at Objects/typeobject.c:749
#4  0x000055e2ffb8cf22 in PyObject_Call (func=0x7f17605d83e0 <__pyx_type_4lxml_5etree_XPath>, arg=0x55e30173d250, kw=0x0) at Objects/abstract.c:2544
#5  0x000055e2ffc4f310 in do_call (func=0x7f17605d83e0 <__pyx_type_4lxml_5etree_XPath>, pp_stack=0x7fff12d68c28, na=1, nk=0) at Python/ceval.c:4589
#6  0x000055e2ffc4e6a3 in call_function (pp_stack=0x7fff12d68c28, oparg=1) at Python/ceval.c:4394
@dw
Copy link
Member

@dw dw commented Dec 13, 2018

__pyx_tp_new_4lxml_5etree__XPathEvaluatorBase:219507 calls

  if (unlikely(__pyx_pw_4lxml_5etree_19_XPathEvaluatorBase_1__cinit__(o, __pyx_empty_tuple, NULL) < 0)) goto bad;

The problem is __pyx_empty_tuple is somehow NULL.

@dw
Copy link
Member

@dw dw commented Dec 13, 2018

__pyx_empty_tuple is constructed in __pyx_pymod_exec_etree via __pyx_moduledef_slots (PyModuleDef_Slot, CYTHON_PEP489_MULTI_PHASE_INIT)

__pyx_module_cleanup is responsible for Py_CLEARing it, it is either the m_free function of PyModuleDef on 3.x, or WTF installed as an atexit callback on 2.x.

The problem is related to Mitogen firing atexit handlers after module execution.

@dw
Copy link
Member

@dw dw commented Dec 13, 2018

At least 2 bugs here:

  • All Cython code on 2.x becomes memory-unsafe during the Python-accessible atexit phase of interpreter shutdown
  • Mitogen can't blindly run all atexit handlers, we need a deeper hack to cope with Ansible's recent use of atexit to clean up tmpdirs
@dw
Copy link
Member

@dw dw commented Dec 13, 2018

For the time being, use either mitogen_task_isolation=fork on just those XML tasks, downgrade to 0.2.2 if possible, or simply wait for 0.2.4 :)

dw added a commit that referenced this issue Dec 13, 2018
@dw
Copy link
Member

@dw dw commented Dec 13, 2018

This is now on the master branch and will make it into the next release. To be updated when a new release is made, subscribe to https://networkgenomics.com/mail/mitogen-announce/

Thanks for reporting this!

@dw dw closed this Dec 13, 2018
@LuckySB
Copy link
Author

@LuckySB LuckySB commented Dec 14, 2018

i am sorry, but please reopen issue
i try with -e mitogen_task_isolation=fork and with this task - the error remains

  xml:
    path: "/etc/libvirt/storage/{{ item.pool_name }}.xml"
    xpath: "/pool/target/path"
    content: text
  when: >
    (item.name not in kvm_existing_vms.list_vms and item.state|d('running') != 'undefined') or
    (item.name in kvm_existing_vms.list_vms and item.state|d('running') == 'undefined')
  with_items: "{{ kvm_vm }}"
  loop_control:
    label: "{{ item.pool_name }}"
  register: kvm_pool_xml_path
  changed_when: kvm_pool_xml_path.matches[0].path is defined
  vars:
    mitogen_task_isolation: fork

i try with latest master 1eb08fb
and with commit cd01957

my playbook fail on task with module yum

  yum:
    name: sudo
    state: present

log: https://www.pastiebin.com/5c13efce28e9e

PS: with version 0.2.2 all works fine

@dw dw reopened this Dec 14, 2018
@dw
Copy link
Member

@dw dw commented Dec 14, 2018

Thanks for letting me know. I will investigate shortly

dw added a commit that referenced this issue Dec 15, 2018
@dw
Copy link
Member

@dw dw commented Dec 15, 2018

Hi there,

This should now work on master, but there are some seemingly unrelated test failures at present. It may be better to wait for the next release. Thanks again for letting me know the fix was bad!

@LuckySB
Copy link
Author

@LuckySB LuckySB commented Dec 17, 2018

Thank you so much.
Playbook worked quickly and without errors.
We are waiting for release 2.4 )

@dw
Copy link
Member

@dw dw commented Jan 19, 2019

Hi, I am satisfied the test failures were not related to this change. Thanks again.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Linked pull requests

Successfully merging a pull request may close this issue.

None yet
2 participants