Hangs and prints "main-loop: WARNING: I/O thread spun for 1000 iterations" #3

Closed
manuelafm opened this Issue Jan 17, 2015 · 8 comments

Projects

None yet

5 participants

@manuelafm

With riscv-qemu from commit bfd1ee7 (29th of Dec of 2014) and using the kernel currently provided as a download from the website (version 3.14.15-g4073e84-dirty (skarandikar@a8)), the emulation seems to stop after printing this message:

main-loop: WARNING: I/O thread spun for 1000 iterations

Sometimes it happens very early, other times takes hours before this happens.

When that happens, I am able to quit with "ctrl-a x", but not to cancel running programs with "ctrl-c" nor to type new commands if this happens at the shell prompt.

This is possibly a bug in riscv-linux instead of riscv-qemu. Sadly I cannot provide any useful information, except possibly a backtrace of qemu-system-riscv when this happens, if you think that it's useful.

@NCommander

I can confirm this behavior, it seems related to heavy net I/O for me.

@NCommander

I applied this patch: http://comments.gmane.org/gmane.comp.emulators.qemu/236931 manually to the QEMU source tree and rebuild. This seems to allow it to get past the deadlock, you'll get the I/O blocking messages occassionally, but after a delay, QEMU will recover and continue where it left off. Seems to be an OK workaround for now, I was able to native-compile gmake using this w/ a NFS root.

@NCommander

Ok, I wasn't 100% right on that being a fix. The issue appears to be coming from something setting a blocking I/O, which seems to become a non-blocking I/O which causes the mutexs to deadlock. My attempts to add debugging printfs seem to have "fixed" the problem from occuring so trying to debug this is stalled ...

@lambdafu
lambdafu commented Mar 4, 2015

+1. this is a real show stopper.

@NCommander

Eek, I forgot to post I found a workaround. If you force everything to be blocking, the issue goes away, you have to set a variable in the main loop which controls blocking.

In vl.c:

Find:

    nonblocking = !kvm_enabled() && !xen_enabled() && last_io > 0;

and set it to 0.

I think. It might be one. I don't have my machine handy to verify which value fixes it. Performance takes a bit of a dive but it works; I was able to native build GCC like this.

@lambdafu
lambdafu commented Mar 5, 2015

yeah, that helps a lot! thanks!

@sagark
Contributor
sagark commented Feb 11, 2016

Closing this since I haven't seen it happen with the privileged update + rebase. Let me know if it reappears.

@sagark sagark closed this Feb 11, 2016
@visbits
visbits commented Apr 17, 2016 edited

Seeing this currently.

root@osc-1011.prd > rpm -qa | grep qemu
qemu-kvm-1.5.3-105.el7_2.3.x86_64
qemu-kvm-common-1.5.3-105.el7_2.3.x86_64
qemu-img-1.5.3-105.el7_2.3.x86_64
libvirt-daemon-driver-qemu-1.2.17-13.el7_2.4.x86_64
ipxe-roms-qemu-20130517-7.gitc4bce43.el7.noarch
root@osc-1011.prd > rpm -qa | grep libvirt
libvirt-client-1.2.17-13.el7_2.4.x86_64
libvirt-daemon-driver-storage-1.2.17-13.el7_2.4.x86_64
libvirt-python-1.2.17-2.el7.x86_64
libvirt-daemon-driver-nodedev-1.2.17-13.el7_2.4.x86_64
libvirt-daemon-driver-network-1.2.17-13.el7_2.4.x86_64
libvirt-daemon-driver-interface-1.2.17-13.el7_2.4.x86_64
libvirt-daemon-driver-qemu-1.2.17-13.el7_2.4.x86_64
libvirt-daemon-driver-nwfilter-1.2.17-13.el7_2.4.x86_64
libvirt-daemon-driver-secret-1.2.17-13.el7_2.4.x86_64
libvirt-daemon-1.2.17-13.el7_2.4.x86_64
libvirt-daemon-kvm-1.2.17-13.el7_2.4.x86_64

Linux osc-1011.prd 3.10.0-327.13.1.el7.x86_64 #1 SMP Thu Mar 31 16:04:38 UTC 2016 x86_64 x86_64 x86_64 GNU/Linux

@sagark sagark pushed a commit that referenced this issue Sep 23, 2016
@elmarco @ehabkost elmarco + ehabkost linux-user-i386: Fix crash on cpuid
Running cpuid instructions with a simple run like:
i386-linux-user/qemu-i386 tests/tcg/sha1-i386

Results in the following assert:
 #0  0x00007ffff64246f5 in raise () from /lib64/libc.so.6
 #1  0x00007ffff64262fa in abort () from /lib64/libc.so.6
 #2  0x00007ffff7937ec5 in g_assertion_message () from /lib64/libglib-2.0.so.0
 #3  0x00007ffff7937f5a in g_assertion_message_expr () from /lib64/libglib-2.0.so.0
 #4  0x000055555561b54c in apicid_bitwidth_for_count (count=0) at /home/elmarco/src/qemu/include/hw/i386/topology.h:58
 #5  0x000055555561b58a in apicid_smt_width (nr_cores=0, nr_threads=0) at /home/elmarco/src/qemu/include/hw/i386/topology.h:67
 #6  0x000055555561b5c3 in apicid_core_offset (nr_cores=0, nr_threads=0) at /home/elmarco/src/qemu/include/hw/i386/topology.h:82
 #7  0x000055555561b5e3 in apicid_pkg_offset (nr_cores=0, nr_threads=0) at /home/elmarco/src/qemu/include/hw/i386/topology.h:89
 #8  0x000055555561dd86 in cpu_x86_cpuid (env=0x555557999550, index=4, count=3, eax=0x7fffffffcae8, ebx=0x7fffffffcaec, ecx=0x7fffffffcaf0, edx=0x7fffffffcaf4) at /home/elmarco/src/qemu/target-i386/cpu.c:2405
 #9  0x0000555555638e8e in helper_cpuid (env=0x555557999550) at /home/elmarco/src/qemu/target-i386/misc_helper.c:106
 #10 0x000055555599dc5e in static_code_gen_buffer ()
 #11 0x00005555555952f8 in cpu_tb_exec (cpu=0x5555579912d0, itb=0x7ffff4371ab0) at /home/elmarco/src/qemu/cpu-exec.c:166
 #12 0x0000555555595c8e in cpu_loop_exec_tb (cpu=0x5555579912d0, tb=0x7ffff4371ab0, last_tb=0x7fffffffd088, tb_exit=0x7fffffffd084, sc=0x7fffffffd0a0) at /home/elmarco/src/qemu/cpu-exec.c:517
 #13 0x0000555555595e50 in cpu_exec (cpu=0x5555579912d0) at /home/elmarco/src/qemu/cpu-exec.c:612
 #14 0x00005555555c065b in cpu_loop (env=0x555557999550) at /home/elmarco/src/qemu/linux-user/main.c:297
 #15 0x00005555555c25b2 in main (argc=2, argv=0x7fffffffd848, envp=0x7fffffffd860) at /home/elmarco/src/qemu/linux-user/main.c:4803

The fields are set in qemu_init_vcpu() with softmmu, but it's a stub
with linux-user.

Signed-off-by: Marc-André Lureau <marcandre.lureau@redhat.com>
Reviewed-by: Eduardo Habkost <ehabkost@redhat.com>
Signed-off-by: Eduardo Habkost <ehabkost@redhat.com>
fa5376d
@sagark sagark pushed a commit that referenced this issue Sep 23, 2016
@morecache @bonzini morecache + bonzini msmouse: Fix segfault caused by free the chr before chardev cleanup.
Segfault happens when leaving qemu with msmouse backend:

 #0  0x00007fa8526ac975 in raise () at /lib64/libc.so.6
 #1  0x00007fa8526add8a in abort () at /lib64/libc.so.6
 #2  0x0000558be78846ab in error_exit (err=16, msg=0x558be799da10 ...
 #3  0x0000558be7884717 in qemu_mutex_destroy (mutex=0x558be93be750) at ...
 #4  0x0000558be7549951 in qemu_chr_free_common (chr=0x558be93be750) at ...
 #5  0x0000558be754999c in qemu_chr_free (chr=0x558be93be750) at ...
 #6  0x0000558be7549a20 in qemu_chr_delete (chr=0x558be93be750) at ...
 #7  0x0000558be754a8ef in qemu_chr_cleanup () at qemu-char.c:4643
 #8  0x0000558be755843e in main (argc=5, argv=0x7ffe925d7118, ...

The chr was freed by msmouse close callback before chardev cleanup,
Then qemu_mutex_destroy triggered raise().

Because freeing chr is handled by qemu_chr_free_common, Remove the free from
msmouse_chr_close to avoid double free.

Fixes: c1111a2
Cc: qemu-stable@nongnu.org
Signed-off-by: Lin Ma <lma@suse.com>
Message-Id: <20160915143158.4796-1-lma@suse.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
9e14037
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment