qemu coroutines get confused about their thread identity #1260

citrus-it · 2022-11-22T16:31:41Z

I'm filing this here while I gather information. It may be an upstream illumos bug in which case I'll file it there.

TL,TR; It appears that a thread restarted via siglongjmp() is confused about its identity.

This started while trying to get qemu 7 working on OmniOS. When starting an aarch64 guest, the process aborts:

Assertion failed: qemu_in_coroutine(),
    file ../util/qemu-coroutine-lock.c, line 274,
    function qemu_co_mutex_unlock

Looking into the qemu_in_coroutine() function, it retrieves a pointer to a Coroutine struct from thread local storage (TLS) and checks that it indicates that a coroutine is active.

I used dtrace to watch writes and reads to that pointer in TLS, and print out the thread ID alongside the value being set or retrieved. The functions involved are set_current() and get_current(). The thread ID (dtrace variable tid) is shown in the square brackets. This showed:

  9  79542               get_current:return [3] - 2327f50
  9  79541                set_current:entry [3] - fffffc7fed7d24d0
 28  79542               get_current:return [1] - fffffc7fed7d04d0
 28  79541                set_current:entry [1] - 2327f50
 28  79542               get_current:return [1] - fffffc7fed7d24d0
 28  79542               get_current:return STOP return fffffc7fed7d24d0 != 2327f50

That's weird. Thread 3 reads the value and then sets a new one, and reads it back. So far so good. But then thread 1 writes the value to its own TLS and reads it back, but the read returns a different value - definitely not the one which was just set! It's actually the value that was previously set on thread 3.

At this point I broke out fprintf(stderr and added a few lines to the code. This tells a slightly different story to dtrace. The following is from the same run as the dtrace output above, so the addresses match. Again, the thread ID (this time retrieved via pthread_self()) is shown in the square brackets.

<- [3] - 2327f50
-> [3] - fffffc7fed7d24d0 (was 2327f50)
<- [1] - fffffc7fed7d04d0
-> [1] - 2327f50 (was fffffc7fed7d04d0)
<- [3] - fffffc7fed7d24d0

This says that it was thread 3 that did the last read, and not thread 1 as shown by dtrace. This is the root cause of the assertion failure - it's apparently getting data from the wrong thread. After blinking a couple of times, I went and explicitly printed the value of %fs:0 in the routine that is calling get_current, specifically:

        __asm__ __volatile__("movq %%fs:0, %0" : "=r" (__value));

and saw (different run, different address, sorry):

[3] qemu_in_coroutine
<- [3] - fffffc7fed6824d0
[3] qemu_in_coroutine self=fffffc7fed6824d0 self->caller=0 thread=fffffc7fed682940

With the process stopped by dtrace, I can take a look via mdb:

> ::walk ulwp
0xfffffc7fed680940
0xfffffc7fed681940
0xfffffc7fed682940
0xfffffc7fed683940

The thread address printed by the assembly matches thread 3.

The variable in TLS looks right for thread 1 (indicating that a co-routine is in progress) and wrong for thread 3:

> 1::tls xxx_current | /J | ::print Coroutine entry caller
entry = blk_aio_read_entry
caller = 0xfffffc7fed6804d0
> 3::tls xxx_current | /J | ::print Coroutine entry caller
entry = 0
caller = 0

I'm looking at thread 1 in the debugger and the qemu_coroutine_self() function is the top of the stack:

> $l
1
> $C
fffffc7fec91ba50 qemu_coroutine_self+0x29()
fffffc7fec91bad0 qemu_co_mutex_lock+0x2e()
fffffc7fec91baf0 tracked_request_end+0x21()
fffffc7fec91bc40 bdrv_co_preadv_part+0x18d()
fffffc7fec91bce0 bdrv_driver_preadv+0x110()
fffffc7fec91bdd0 bdrv_aligned_preadv+0x8c8()
fffffc7fec91bf20 bdrv_co_preadv_part+0x17e()
fffffc7fec91bfa0 blk_co_do_preadv_part+0x9e()
fffffc7fec91bfc0 blk_aio_read_entry+0x4d()
fffffc7fec91bff0 coroutine_trampoline+0x64()

Thread 3 doesn't seem to be doing much:

> ::stacks
THREAD           STATE    SOBJ                COUNT
3                UNPARKED <NONE>                  1

1                UNPARKED <NONE>                  1
                 qemu_coroutine_self+0x29
                 qemu_co_mutex_unlock+0x159

Well, look at this.. %fsbase is the same for threads 1 and 3 and, based on the output of ::walk ulwp above, thread 1 is wrong (should be 0xfffffc7fed680940)

> ::walk thread | ::regs ! grep %fsbase
%fsbase = 0xfffffc7fed682940
%fsbase = 0xfffffc7fed681940
%fsbase = 0xfffffc7fed682940
%fsbase = 0xfffffc7fed683940

The text was updated successfully, but these errors were encountered:

citrus-it · 2022-11-24T13:18:13Z

Opened upstream as https://www.illumos.org/issues/15206

The coroutine ucontext backend uses a combination of sigsetjmp/siglongjmp and set/getcontext(). It stores the current coroutine data in thread local storage. When a context is restored into a different thread than the one in which it last ran, the %fsbase is also restored, leading to a thread that thinks it is a different one, and critically for this code, a thread that accesses the TLS of the other. This patch copies %fsbase from the existing thread context to the new one before switching so that it resumes with the right identity. If relies on this being an amd64 binary, and insider knowledge of the libc implementations. It is not yet understood why this code works on other platforms without this change, or if there is a better fix. It would be slightly cleaner to use set/getcontext() throughout, but that adds additional system call overhead. For more analysis, see omniosorg/illumos-omnios#1260

citrus-it closed this as completed Nov 24, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

qemu coroutines get confused about their thread identity #1260

qemu coroutines get confused about their thread identity #1260

citrus-it commented Nov 22, 2022 •

edited

Loading

citrus-it commented Nov 24, 2022

qemu coroutines get confused about their thread identity #1260

qemu coroutines get confused about their thread identity #1260

Comments

citrus-it commented Nov 22, 2022 • edited Loading

citrus-it commented Nov 24, 2022

citrus-it commented Nov 22, 2022 •

edited

Loading