Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ovirtlago collect artificates get stuck (and does not print on which VM?) #126

Closed
mykaul opened this issue Feb 21, 2016 · 5 comments
Closed

Comments

@mykaul
Copy link

mykaul commented Feb 21, 2016

It probably needs some kind of timeout as well.

Relevant log:

2016-02-21 15:33:21,659::log_utils.py::start_log_task::617::nose::INFO::  # add_hosts: 
2016-02-21 15:43:23,731::log_utils.py::__enter__::574::ovirtlago::INFO::    * Collect artifacts: 
2016-02-21 15:43:23,731::log_utils.py::__enter__::574::ovirtlago::INFO::
2016-02-21 15:43:23,732::log_utils.py::__enter__::574::ovirtlago::INFO::
2016-02-21 15:43:23,732::log_utils.py::__enter__::574::ovirtlago::INFO::
2016-02-21 15:43:23,733::log_utils.py::__enter__::574::ovirtlago::INFO::
2016-02-21 15:43:23,733::log_utils.py::__enter__::574::ovirtlago::INFO::
2016-02-21 15:43:23,983::virt.py::ssh::691::lago.virt::DEBUG::Running 13e5cb5c on lago_basic_suite_3_6_engine: /sbin/service qemu-ga status
2016-02-21 15:43:24,169::virt.py::ssh::715::lago.virt::DEBUG::Command 13e5cb5c on lago_basic_suite_3_6_engine returned with 0
2016-02-21 15:43:24,169::virt.py::ssh::723::lago.virt::DEBUG::Command 13e5cb5c on lago_basic_suite_3_6_engine output:
 qemu-ga (pid  1120) is running...

2016-02-21 15:43:24,259::virt.py::ssh::691::lago.virt::DEBUG::Running 14100854 on lago_basic_suite_3_6_host0: /usr/bin/systemctl status --lines=0 qemu-guest-agent
2016-02-21 15:43:24,260::virt.py::ssh::691::lago.virt::DEBUG::Running 14100a66 on lago_basic_suite_3_6_storage-nfs: /usr/bin/systemctl status --lines=0 qemu-guest-agent
2016-02-21 15:43:24,261::virt.py::ssh::691::lago.virt::DEBUG::Running 141042e2 on lago_basic_suite_3_6_storage-iscsi: /usr/bin/systemctl status --lines=0 qemu-guest-agent
2016-02-21 15:43:24,393::virt.py::ssh::715::lago.virt::DEBUG::Command 141042e2 on lago_basic_suite_3_6_storage-iscsi returned with 0
2016-02-21 15:43:24,393::virt.py::ssh::723::lago.virt::DEBUG::Command 141042e2 on lago_basic_suite_3_6_storage-iscsi output:
 qemu-guest-agent.service - QEMU Guest Agent
   Loaded: loaded (/usr/lib/systemd/system/qemu-guest-agent.service; static)
   Active: active (running) since Sun 2016-02-21 15:30:06 IST; 13min ago
 Main PID: 494 (qemu-ga)
   CGroup: /system.slice/qemu-guest-agent.service
           └─494 /usr/bin/qemu-ga


2016-02-21 15:43:24,397::virt.py::ssh::715::lago.virt::DEBUG::Command 14100854 on lago_basic_suite_3_6_host0 returned with 0
2016-02-21 15:43:24,397::virt.py::ssh::723::lago.virt::DEBUG::Command 14100854 on lago_basic_suite_3_6_host0 output:
 ● qemu-guest-agent.service - QEMU Guest Agent
   Loaded: loaded (/usr/lib/systemd/system/qemu-guest-agent.service; static; vendor preset: enabled)
   Active: active (running) since Sun 2016-02-21 08:30:14 EST; 1h 46min left
 Main PID: 613 (qemu-ga)
   CGroup: /system.slice/qemu-guest-agent.service
           └─613 /usr/bin/qemu-ga --method=virtio-serial --path=/dev/virtio-ports/org.qemu.guest_agent.0 --blacklist=guest-file-open guest-file-close guest-file-read guest-file-write guest-file-seek guest-file-flush -F/etc/qemu-ga/fsfreeze-hook

2016-02-21 15:43:24,548::virt.py::ssh::715::lago.virt::DEBUG::Command 14100a66 on lago_basic_suite_3_6_storage-nfs returned with 0
2016-02-21 15:43:24,548::virt.py::ssh::723::lago.virt::DEBUG::Command 14100a66 on lago_basic_suite_3_6_storage-nfs output:
 qemu-guest-agent.service - QEMU Guest Agent
   Loaded: loaded (/usr/lib/systemd/system/qemu-guest-agent.service; static)
   Active: active (running) since Sun 2016-02-21 15:30:12 IST; 13min ago
 Main PID: 484 (qemu-ga)
   CGroup: /system.slice/qemu-guest-agent.service
           └─484 /usr/bin/qemu-ga


2016-02-21 15:43:26,801::log_utils.py::__exit__::581::ovirtlago::INFO::Success (in 0:00:03)
2016-02-21 15:43:27,163::log_utils.py::__exit__::581::ovirtlago::INFO::Success (in 0:00:03)
2016-02-21 15:43:27,202::log_utils.py::__exit__::581::ovirtlago::INFO::Success (in 0:00:03)
2016-02-21 15:43:27,219::log_utils.py::__exit__::581::ovirtlago::INFO::Success (in 0:00:03)
@mykaul
Copy link
Author

mykaul commented Feb 22, 2016

This is the stdout without any information:

@ Collect artifacts: 
  # [Thread-1] lago_basic_suite_3_6_host1: 
  # [Thread-2] lago_basic_suite_3_6_engine: 
  # [Thread-3] lago_basic_suite_3_6_storage-iscsi: 
  # [Thread-4] lago_basic_suite_3_6_storage-nfs: 
  # [Thread-5] lago_basic_suite_3_6_host0: 
  # [Thread-4] lago_basic_suite_3_6_storage-nfs: Success (in 0:00:09)
  # [Thread-2] lago_basic_suite_3_6_engine: Success (in 0:00:09)
  # [Thread-3] lago_basic_suite_3_6_storage-iscsi: Success (in 0:00:36)
  # [Thread-5] lago_basic_suite_3_6_host0: Success (in 0:00:36)
  # [Thread-1] lago_basic_suite_3_6_host1: Success (in 0:00:37)

Adding something basic already makes a difference:

@ Collect artifacts: 
  # [Thread-1] _collect_artifacts for VM: lago_basic_suite_3_6_host1: 
  # [Thread-2] _collect_artifacts for VM: lago_basic_suite_3_6_engine: 
  # [Thread-3] _collect_artifacts for VM: lago_basic_suite_3_6_storage-iscsi: 
  # [Thread-4] _collect_artifacts for VM: lago_basic_suite_3_6_storage-nfs: 
  # [Thread-5] _collect_artifacts for VM: lago_basic_suite_3_6_host0: 
  # [Thread-3] _collect_artifacts for VM: lago_basic_suite_3_6_storage-iscsi: Success (in 0:00:06)
  # [Thread-4] _collect_artifacts for VM: lago_basic_suite_3_6_storage-nfs: Success (in 0:00:06)
  # [Thread-2] _collect_artifacts for VM: lago_basic_suite_3_6_engine: Success (in 0:00:06)
  # [Thread-5] _collect_artifacts for VM: lago_basic_suite_3_6_host0: Success (in 0:00:06)
  # [Thread-1] _collect_artifacts for VM: lago_basic_suite_3_6_host1: Success (in 0:00:06)

Will send a patch (via gerrit).

@david-caro
Copy link
Member

I see the same info (you are in the collect artifacts task, running on each of the vms, there's a header char that differenciates the subtask from the task and should be an indentation too that is not seen here).

If that's what you see before getting stuck, then the issue is not in the artifact collection (there I see that all of them finished already)

@mykaul
Copy link
Author

mykaul commented Feb 22, 2016

Added a print in #129 .

@mykaul
Copy link
Author

mykaul commented Feb 22, 2016

And it got stuck again. To me it's quite clear it did not complete artifact collection - which can always happen - if indeed a test failed, something might have happened to a host!

looks like I have 5 calls (::574) and only 4 succeeded (::581)

2016-02-22 10:04:24,495::log_utils.py::start_log_task::617::nose::INFO::  # add_hosts: 
2016-02-22 10:14:27,258::log_utils.py::__enter__::574::ovirtlago::INFO::    * Collect artifacts: 
2016-02-22 10:14:27,258::log_utils.py::__enter__::574::ovirtlago::INFO::
2016-02-22 10:14:27,259::log_utils.py::__enter__::574::ovirtlago::INFO::
2016-02-22 10:14:27,259::log_utils.py::__enter__::574::ovirtlago::INFO::
2016-02-22 10:14:27,259::log_utils.py::__enter__::574::ovirtlago::INFO::
2016-02-22 10:14:27,260::log_utils.py::__enter__::574::ovirtlago::INFO::
2016-02-22 10:14:27,971::virt.py::ssh::691::lago.virt::DEBUG::Running 4abb2898 on lago_basic_suite_3_6_storage-iscsi: /usr/bin/systemctl status --lines=0 qemu-guest-agent
2016-02-22 10:14:28,008::virt.py::ssh::715::lago.virt::DEBUG::Command 4abb2898 on lago_basic_suite_3_6_storage-iscsi returned with 0
2016-02-22 10:14:28,008::virt.py::ssh::723::lago.virt::DEBUG::Command 4abb2898 on lago_basic_suite_3_6_storage-iscsi output:
 qemu-guest-agent.service - QEMU Guest Agent
   Loaded: loaded (/usr/lib/systemd/system/qemu-guest-agent.service; static)
   Active: active (running) since Mon 2016-02-22 10:00:50 IST; 13min ago
 Main PID: 495 (qemu-ga)
   CGroup: /system.slice/qemu-guest-agent.service
           └─495 /usr/bin/qemu-ga


2016-02-22 10:14:28,055::virt.py::ssh::691::lago.virt::DEBUG::Running 4ac7ef60 on lago_basic_suite_3_6_host0: /usr/bin/systemctl status --lines=0 qemu-guest-agent
2016-02-22 10:14:28,086::virt.py::ssh::691::lago.virt::DEBUG::Running 4accbca2 on lago_basic_suite_3_6_storage-nfs: /usr/bin/systemctl status --lines=0 qemu-guest-agent
2016-02-22 10:14:28,138::virt.py::ssh::691::lago.virt::DEBUG::Running 4ad4a962 on lago_basic_suite_3_6_engine: /sbin/service qemu-ga status
2016-02-22 10:14:28,193::virt.py::ssh::715::lago.virt::DEBUG::Command 4ac7ef60 on lago_basic_suite_3_6_host0 returned with 0
2016-02-22 10:14:28,193::virt.py::ssh::723::lago.virt::DEBUG::Command 4ac7ef60 on lago_basic_suite_3_6_host0 output:
 ● qemu-guest-agent.service - QEMU Guest Agent
   Loaded: loaded (/usr/lib/systemd/system/qemu-guest-agent.service; static; vendor preset: enabled)
   Active: active (running) since Mon 2016-02-22 03:01:08 EST; 1h 46min left
 Main PID: 603 (qemu-ga)
   CGroup: /system.slice/qemu-guest-agent.service
           └─603 /usr/bin/qemu-ga --method=virtio-serial --path=/dev/virtio-ports/org.qemu.guest_agent.0 --blacklist=guest-file-open guest-file-close guest-file-read guest-file-write guest-file-seek guest-file-flush -F/etc/qemu-ga/fsfreeze-hook

2016-02-22 10:14:28,200::virt.py::ssh::715::lago.virt::DEBUG::Command 4accbca2 on lago_basic_suite_3_6_storage-nfs returned with 0
2016-02-22 10:14:28,200::virt.py::ssh::723::lago.virt::DEBUG::Command 4accbca2 on lago_basic_suite_3_6_storage-nfs output:
 qemu-guest-agent.service - QEMU Guest Agent
   Loaded: loaded (/usr/lib/systemd/system/qemu-guest-agent.service; static)
   Active: active (running) since Mon 2016-02-22 10:00:54 IST; 13min ago
 Main PID: 485 (qemu-ga)
   CGroup: /system.slice/qemu-guest-agent.service
           └─485 /usr/bin/qemu-ga


2016-02-22 10:14:29,501::virt.py::ssh::715::lago.virt::DEBUG::Command 4ad4a962 on lago_basic_suite_3_6_engine returned with 0
2016-02-22 10:14:29,501::virt.py::ssh::723::lago.virt::DEBUG::Command 4ad4a962 on lago_basic_suite_3_6_engine output:
 qemu-ga (pid  1120) is running...

2016-02-22 10:14:36,054::log_utils.py::__exit__::581::ovirtlago::INFO::Success (in 0:00:08)
2016-02-22 10:14:36,055::log_utils.py::__exit__::581::ovirtlago::INFO::Success (in 0:00:08)
2016-02-22 10:14:36,055::log_utils.py::__exit__::581::ovirtlago::INFO::Success (in 0:00:08)
2016-02-22 10:14:36,057::log_utils.py::__exit__::581::ovirtlago::INFO::Success (in 0:00:08)

gdb'ing the process:
#0  0x00007f8406a9fe57 in futex_abstimed_wait_cancelable (private=0, abstime=0x0, expected=0, futex_word=0x560218fd9770) at ../sysdeps/unix/sysv/linux/futex-internal.h:205
#1  do_futex_wait (sem=sem@entry=0x560218fd9770, abstime=0x0) at sem_waitcommon.c:111
#2  0x00007f8406a9ff04 in __new_sem_wait_slow (sem=0x560218fd9770, abstime=0x0) at sem_waitcommon.c:181
#3  0x00007f8406a9ffaa in __new_sem_wait (sem=<optimized out>) at sem_wait.c:29
#4  0x00007f8406dbcd55 in PyThread_acquire_lock (lock=0x560218fd9770, waitflag=1) at /usr/src/debug/Python-2.7.10/Python/thread_pthread.h:324
#5  0x00007f8406dc0a92 in lock_PyThread_acquire_lock (self=0x7f83ec25c5d0, args=<optimized out>) at /usr/src/debug/Python-2.7.10/Modules/threadmodule.c:52
#6  0x00007f8406d908be in call_function (oparg=<optimized out>, pp_stack=0x7ffebd7e9300) at /usr/src/debug/Python-2.7.10/Python/ceval.c:4112
#7  PyEval_EvalFrameEx (
    f=f@entry=Frame 0x5602193becf0, for file /usr/lib64/python2.7/threading.py, line 340, in wait (self=<_Condition(_Verbose__verbose=False, _Condition__lock=<thread.lock at remote 0x7f83ec25c6b0>, acquire=<built-in method acquire of thread.lock object at remote 0x7f83ec25c6b0>, _Condition__waiters=[<thread.lock at remote 0x7f83ec25c5d0>], release=<built-in method release of thread.lock object at remote 0x7f83ec25c6b0>) at remote 0x7f83e96e6050>, timeout=None, balancing=True, waiter=<thread.lock at remote 0x7f83ec25c5d0>, saved_state=None), throwflag=throwflag@entry=0) at /usr/src/debug/Python-2.7.10/Python/ceval.c:2755
#8  0x00007f8406d916b4 in PyEval_EvalCodeEx (co=<optimized out>, globals=<optimized out>, locals=locals@entry=0x0, args=<optimized out>, argcount=argcount@entry=1, kws=<optimized out>, kwcount=0, 
    defs=0x7f83faa30c80, defcount=2, closure=0x0) at /usr/src/debug/Python-2.7.10/Python/ceval.c:3344
#9  0x00007f8406d905c6 in fast_function (nk=0, na=1, n=<optimized out>, pp_stack=0x7ffebd7e94f0, func=<function at remote 0x7f83fa7990c8>) at /usr/src/debug/Python-2.7.10/Python/ceval.c:4208
#10 call_function (oparg=<optimized out>, pp_stack=0x7ffebd7e94f0) at /usr/src/debug/Python-2.7.10/Python/ceval.c:4133
#11 PyEval_EvalFrameEx (
    f=f@entry=Frame 0x7f83ea7aabc0, for file /usr/lib64/python2.7/threading.py, line 952, in join (self=<Thread(_Thread__ident=140204551210752, _Thread__block=<_Condition(_Verbose__verbose=False, _Condition__lock=<thread.lock at remote 0x7f83ec25c6b0>, acquire=<built-in method acquire of thread.lock object at remote 0x7f83ec25c6b0>, _Condition__waiters=[<thread.lock at remote 0x7f83ec25c5d0>], release=<built-in method release of thread.lock object at remote 0x7f83ec25c6b0>) at remote 0x7f83e96e6050>, _Thread__name='Thread-6', _Thread__daemonic=False, _Thread__started=<_Event(_Verbose__verbose=False, _Event__flag=True, _Event__cond=<_Condition(_Verbose__verbose=False, _Condition__lock=<thread.lock at remote 0x7f83ec25c650>, acquire=<built-in method acquire of thread.lock object at remote 0x7f83ec25c650>, _Condition__waiters=[], release=<built-in method release of thread.lock object at remote 0x7f83ec25c650>) at remote 0x7f83e96e2f90>) at remote 0x7f83e96e2f50>, _Thread__stderr=<Tee(_encoding='UTF-8', _streams=(<StringIO(sof...(truncated), 
    throwflag=throwflag@entry=0) at /usr/src/debug/Python-2.7.10/Python/ceval.c:2755
#12 0x00007f8406d916b4 in PyEval_EvalCodeEx (co=<optimized out>, globals=<optimized out>, locals=locals@entry=0x0, args=<optimized out>, argcount=argcount@entry=1, kws=<optimized out>, kwcount=0, 
    defs=0x7f83faa0c920, defcount=2, closure=0x0) at /usr/src/debug/Python-2.7.10/Python/ceval.c:3344
#13 0x00007f8406d905c6 in fast_function (nk=0, na=1, n=<optimized out>, pp_stack=0x7ffebd7e96e0, func=<function at remote 0x7f83fa799f50>) at /usr/src/debug/Python-2.7.10/Python/ceval.c:4208
#14 call_function (oparg=<optimized out>, pp_stack=0x7ffebd7e96e0) at /usr/src/debug/Python-2.7.10/Python/ceval.c:4133
#15 PyEval_EvalFrameEx (
    f=f@entry=Frame 0x7f83e9f0f830, for file /usr/lib/python2.7/site-packages/lago/utils.py, line 73, in join_all (self=<VectorThread(targets=[<functools.partial at remote 0x7f83e96e36d8>, <functools.partial at remote 0x7f83e96e3730>, <functools.partial at remote 0x7f83e96e3788>, <functools.partial at remote 0x7f83e96e37e0>, <functools.partial at remote 0x7f83e96e3838>], thread_handles=[(<Thread(_Thread__ident=140204551210752, _Thread__block=<_Condition(_Verbose__verbose=False, _Condition__lock=<thread.lock at remote 0x7f83ec25c6b0>, acquire=<built-in method acquire of thread.lock object at remote 0x7f83ec25c6b0>, _Condition__waiters=[<thread.lock at remote 0x7f83ec25c5d0>], release=<built-in method release of thread.lock object at remote 0x7f83ec25c6b0>) at remote 0x7f83e96e6050>, _Thread__name='Thread-6', _Thread__daemonic=False, _Thread__started=<_Event(_Verbose__verbose=False, _Event__flag=True, _Event__cond=<_Condition(_Verbose__verbose=False, _Condition__lock=<thread.lock at remote 0x7f83ec25c650>, acquire=<built-in ...(truncated), 
    throwflag=throwflag@entry=0) at /usr/src/debug/Python-2.7.10/Python/ceval.c:2755
#16 0x00007f8406d916b4 in PyEval_EvalCodeEx (co=<optimized out>, globals=<optimized out>, locals=locals@entry=0x0, args=<optimized out>, argcount=argcount@entry=1, kws=<optimized out>, kwcount=0, 
    defs=0x7f83fa0f61e8, defcount=1, closure=0x0) at /usr/src/debug/Python-2.7.10/Python/ceval.c:3344
#17 0x00007f8406d905c6 in fast_function (nk=0, na=1, n=<optimized out>, pp_stack=0x7ffebd7e98d0, func=<function at remote 0x7f83f93dd6e0>) at /usr/src/debug/Python-2.7.10/Python/ceval.c:4208
#18 call_function (oparg=<optimized out>, pp_stack=0x7ffebd7e98d0) at /usr/src/debug/Python-2.7.10/Python/ceval.c:4133
#19 PyEval_EvalFrameEx (
    f=f@entry=Frame 0x7f83e9f1ed70, for file /usr/lib/python2.7/site-packages/lago/utils.py, line 87, in invoke_in_parallel (func=<function at remote 0x7f83e96dd500>, args_sequences=([<HostVM(_env=<OvirtVirtEnv(_engine_vm=<EngineVM(_env=<...>, _ssh_client=None, _service_class=<classobj at remote 0x7f83eee85a10>, _api=None, _spec={u'name': u'lago_basic_suite_3_6_engine', u'nics': [{u'ip': u'192.168.200.3', u'net': u'lago_basic_suite_3_6_lago'}], u'disks': [{u'path': u'/home/zram/3.6/images/lago_basic_suite_3_6_engine_root.qcow2', u'metadata': {u'distro': u'el6', u'root-password': u'123456'}, u'type': u'template', u'dev': u'vda', u'format': u'qcow2'}], u'guest-agent': u'qemu-ga', u'snapshots': {}, u'memory': u'4096', u'service_class': u'sysvinit', u'root-password': u'123456', u'metadata': {u'ovirt-scripts': [u'/home/mini/ovirt-system-tests/basic_suite_3.6/deploy-scripts/add_local_repo.sh', u'/home/mini/ovirt-system-tests/basic_suite_3.6/deploy-scripts/setup_engine.sh'], u'ovirt-engine-password': u'123', u'ovirt-role': u'engi...(truncated), 
    throwflag=throwflag@entry=0) at /usr/src/debug/Python-2.7.10/Python/ceval.c:2755
#20 0x00007f8406d916b4 in PyEval_EvalCodeEx (co=<optimized out>, globals=<optimized out>, locals=locals@entry=0x0, args=<optimized out>, argcount=argcount@entry=2, kws=<optimized out>, kwcount=0, defs=0x0, 
    defcount=0, closure=0x0) at /usr/src/debug/Python-2.7.10/Python/ceval.c:3344
#21 0x00007f8406d905c6 in fast_function (nk=0, na=2, n=<optimized out>, pp_stack=0x7ffebd7e9ac0, func=<function at remote 0x7f83f93dd578>) at /usr/src/debug/Python-2.7.10/Python/ceval.c:4208
#22 call_function (oparg=<optimized out>, pp_stack=0x7ffebd7e9ac0) at /usr/src/debug/Python-2.7.10/Python/ceval.c:4133

@david-caro
Copy link
Member

Maybe we should not rely on ovirt-guest-agent for the log collection, as it might not be running or responding at all at that moment

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants