Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

qemu coroutines get confused about their thread identity #1260

Closed
citrus-it opened this issue Nov 22, 2022 · 1 comment
Closed

qemu coroutines get confused about their thread identity #1260

citrus-it opened this issue Nov 22, 2022 · 1 comment

Comments

@citrus-it
Copy link
Member

citrus-it commented Nov 22, 2022

I'm filing this here while I gather information. It may be an upstream illumos bug in which case I'll file it there.

TL,TR; It appears that a thread restarted via siglongjmp() is confused about its identity.

This started while trying to get qemu 7 working on OmniOS. When starting an aarch64 guest, the process aborts:

Assertion failed: qemu_in_coroutine(),
    file ../util/qemu-coroutine-lock.c, line 274,
    function qemu_co_mutex_unlock

Looking into the qemu_in_coroutine() function, it retrieves a pointer to a Coroutine struct from thread local storage (TLS) and checks that it indicates that a coroutine is active.

I used dtrace to watch writes and reads to that pointer in TLS, and print out the thread ID alongside the value being set or retrieved. The functions involved are set_current() and get_current(). The thread ID (dtrace variable tid) is shown in the square brackets. This showed:

  9  79542               get_current:return [3] - 2327f50
  9  79541                set_current:entry [3] - fffffc7fed7d24d0
 28  79542               get_current:return [1] - fffffc7fed7d04d0
 28  79541                set_current:entry [1] - 2327f50
 28  79542               get_current:return [1] - fffffc7fed7d24d0
 28  79542               get_current:return STOP return fffffc7fed7d24d0 != 2327f50

That's weird. Thread 3 reads the value and then sets a new one, and reads it back. So far so good. But then thread 1 writes the value to its own TLS and reads it back, but the read returns a different value - definitely not the one which was just set! It's actually the value that was previously set on thread 3.

At this point I broke out fprintf(stderr and added a few lines to the code. This tells a slightly different story to dtrace. The following is from the same run as the dtrace output above, so the addresses match. Again, the thread ID (this time retrieved via pthread_self()) is shown in the square brackets.

<- [3] - 2327f50
-> [3] - fffffc7fed7d24d0 (was 2327f50)
<- [1] - fffffc7fed7d04d0
-> [1] - 2327f50 (was fffffc7fed7d04d0)
<- [3] - fffffc7fed7d24d0

This says that it was thread 3 that did the last read, and not thread 1 as shown by dtrace. This is the root cause of the assertion failure - it's apparently getting data from the wrong thread. After blinking a couple of times, I went and explicitly printed the value of %fs:0 in the routine that is calling get_current, specifically:

        __asm__ __volatile__("movq %%fs:0, %0" : "=r" (__value));

and saw (different run, different address, sorry):

[3] qemu_in_coroutine
<- [3] - fffffc7fed6824d0
[3] qemu_in_coroutine self=fffffc7fed6824d0 self->caller=0 thread=fffffc7fed682940

With the process stopped by dtrace, I can take a look via mdb:

> ::walk ulwp
0xfffffc7fed680940
0xfffffc7fed681940
0xfffffc7fed682940
0xfffffc7fed683940

The thread address printed by the assembly matches thread 3.

The variable in TLS looks right for thread 1 (indicating that a co-routine is in progress) and wrong for thread 3:

> 1::tls xxx_current | /J | ::print Coroutine entry caller
entry = blk_aio_read_entry
caller = 0xfffffc7fed6804d0
> 3::tls xxx_current | /J | ::print Coroutine entry caller
entry = 0
caller = 0

I'm looking at thread 1 in the debugger and the qemu_coroutine_self() function is the top of the stack:

> $l
1
> $C
fffffc7fec91ba50 qemu_coroutine_self+0x29()
fffffc7fec91bad0 qemu_co_mutex_lock+0x2e()
fffffc7fec91baf0 tracked_request_end+0x21()
fffffc7fec91bc40 bdrv_co_preadv_part+0x18d()
fffffc7fec91bce0 bdrv_driver_preadv+0x110()
fffffc7fec91bdd0 bdrv_aligned_preadv+0x8c8()
fffffc7fec91bf20 bdrv_co_preadv_part+0x17e()
fffffc7fec91bfa0 blk_co_do_preadv_part+0x9e()
fffffc7fec91bfc0 blk_aio_read_entry+0x4d()
fffffc7fec91bff0 coroutine_trampoline+0x64()

Thread 3 doesn't seem to be doing much:

> ::stacks
THREAD           STATE    SOBJ                COUNT
3                UNPARKED <NONE>                  1

1                UNPARKED <NONE>                  1
                 qemu_coroutine_self+0x29
                 qemu_co_mutex_unlock+0x159

Well, look at this.. %fsbase is the same for threads 1 and 3 and, based on the output of ::walk ulwp above, thread 1 is wrong (should be 0xfffffc7fed680940)

> ::walk thread | ::regs ! grep %fsbase
%fsbase = 0xfffffc7fed682940
%fsbase = 0xfffffc7fed681940
%fsbase = 0xfffffc7fed682940
%fsbase = 0xfffffc7fed683940
@citrus-it
Copy link
Member Author

Opened upstream as https://www.illumos.org/issues/15206

citrus-it added a commit to omniosorg/qemu that referenced this issue Dec 1, 2022
The coroutine ucontext backend uses a combination of sigsetjmp/siglongjmp
and set/getcontext(). It stores the current coroutine data in thread local
storage. When a context is restored into a different thread than the one in
which it last ran, the %fsbase is also restored, leading to a thread that
thinks it is a different one, and critically for this code, a thread that
accesses the TLS of the other.

This patch copies %fsbase from the existing thread context to the new one
before switching so that it resumes with the right identity. If relies on this
being an amd64 binary, and insider knowledge of the libc implementations.

It is not yet understood why this code works on other platforms without this
change, or if there is a better fix. It would be slightly cleaner to use
set/getcontext() throughout, but that adds additional system call overhead.

For more analysis, see omniosorg/illumos-omnios#1260
citrus-it added a commit to omniosorg/qemu that referenced this issue Dec 28, 2022
The coroutine ucontext backend uses a combination of sigsetjmp/siglongjmp
and set/getcontext(). It stores the current coroutine data in thread local
storage. When a context is restored into a different thread than the one in
which it last ran, the %fsbase is also restored, leading to a thread that
thinks it is a different one, and critically for this code, a thread that
accesses the TLS of the other.

This patch copies %fsbase from the existing thread context to the new one
before switching so that it resumes with the right identity. If relies on this
being an amd64 binary, and insider knowledge of the libc implementations.

It is not yet understood why this code works on other platforms without this
change, or if there is a better fix. It would be slightly cleaner to use
set/getcontext() throughout, but that adds additional system call overhead.

For more analysis, see omniosorg/illumos-omnios#1260
citrus-it added a commit to omniosorg/qemu that referenced this issue Dec 29, 2022
The coroutine ucontext backend uses a combination of sigsetjmp/siglongjmp
and set/getcontext(). It stores the current coroutine data in thread local
storage. When a context is restored into a different thread than the one in
which it last ran, the %fsbase is also restored, leading to a thread that
thinks it is a different one, and critically for this code, a thread that
accesses the TLS of the other.

This patch copies %fsbase from the existing thread context to the new one
before switching so that it resumes with the right identity. If relies on this
being an amd64 binary, and insider knowledge of the libc implementations.

It is not yet understood why this code works on other platforms without this
change, or if there is a better fix. It would be slightly cleaner to use
set/getcontext() throughout, but that adds additional system call overhead.

For more analysis, see omniosorg/illumos-omnios#1260
citrus-it added a commit to omniosorg/qemu that referenced this issue Jan 5, 2023
The coroutine ucontext backend uses a combination of sigsetjmp/siglongjmp
and set/getcontext(). It stores the current coroutine data in thread local
storage. When a context is restored into a different thread than the one in
which it last ran, the %fsbase is also restored, leading to a thread that
thinks it is a different one, and critically for this code, a thread that
accesses the TLS of the other.

This patch copies %fsbase from the existing thread context to the new one
before switching so that it resumes with the right identity. If relies on this
being an amd64 binary, and insider knowledge of the libc implementations.

It is not yet understood why this code works on other platforms without this
change, or if there is a better fix. It would be slightly cleaner to use
set/getcontext() throughout, but that adds additional system call overhead.

For more analysis, see omniosorg/illumos-omnios#1260
citrus-it added a commit to omniosorg/qemu that referenced this issue Apr 22, 2023
The coroutine ucontext backend uses a combination of sigsetjmp/siglongjmp
and set/getcontext(). It stores the current coroutine data in thread local
storage. When a context is restored into a different thread than the one in
which it last ran, the %fsbase is also restored, leading to a thread that
thinks it is a different one, and critically for this code, a thread that
accesses the TLS of the other.

This patch copies %fsbase from the existing thread context to the new one
before switching so that it resumes with the right identity. If relies on this
being an amd64 binary, and insider knowledge of the libc implementations.

It is not yet understood why this code works on other platforms without this
change, or if there is a better fix. It would be slightly cleaner to use
set/getcontext() throughout, but that adds additional system call overhead.

For more analysis, see omniosorg/illumos-omnios#1260
hadfl pushed a commit to omniosorg/qemu that referenced this issue Aug 25, 2023
The coroutine ucontext backend uses a combination of sigsetjmp/siglongjmp
and set/getcontext(). It stores the current coroutine data in thread local
storage. When a context is restored into a different thread than the one in
which it last ran, the %fsbase is also restored, leading to a thread that
thinks it is a different one, and critically for this code, a thread that
accesses the TLS of the other.

This patch copies %fsbase from the existing thread context to the new one
before switching so that it resumes with the right identity. If relies on this
being an amd64 binary, and insider knowledge of the libc implementations.

It is not yet understood why this code works on other platforms without this
change, or if there is a better fix. It would be slightly cleaner to use
set/getcontext() throughout, but that adds additional system call overhead.

For more analysis, see omniosorg/illumos-omnios#1260
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant