-
Notifications
You must be signed in to change notification settings - Fork 24
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add a test for an epoll+signalfd issue encountered in wayland #32
Conversation
The wayland event_loop_signal test fails non-deterministically on FreeBSD due to a missing SIGUSR1 notification. This test case is a minimal reproducer for the issue: it passes on Linux but fails when run on FreeBSD.
I tried to debug this and include a fix, but so far I've failed to narrow it down further than the |
Thanks for the test! It looks like we are hitting a race in the FreeBSD kernel. I've managed to reproduce it with #ifdef NDEBUG
#undef NDEBUG
#endif
#define _GNU_SOURCE
#include <sys/types.h>
#include <sys/event.h>
#include <assert.h>
#include <errno.h>
#include <signal.h>
#include <stdlib.h>
#include <poll.h>
#include <unistd.h>
int
main(void)
{
int rv;
sigset_t set;
rv = sigemptyset(&set);
assert(rv == 0);
rv = sigaddset(&set, SIGUSR1);
assert(rv == 0);
rv = sigprocmask(SIG_BLOCK, &set, NULL);
assert(rv == 0);
int skq = kqueue();
assert(skq >= 0);
struct kevent kev;
EV_SET(&kev, SIGUSR1, EVFILT_SIGNAL, EV_ADD, 0, 0, 0);
rv = kevent(skq, &kev, 1, NULL, 0, NULL);
assert(rv == 0);
int kq = kqueue();
assert(kq >= 0);
EV_SET(&kev, skq, EVFILT_READ, EV_ADD | EV_CLEAR, 0, 0, 0);
rv = kevent(kq, &kev, 1, NULL, 0, NULL);
assert(rv == 0);
for (;;) {
rv = kill(getpid(), SIGUSR1);
assert(rv == 0);
/* Turn this into `#if 1` to avoid the race. */
#if 1
rv = kevent(kq, NULL, 0, &kev, 1, NULL);
#else
rv = kevent(kq, NULL, 0, &kev, 1, &(struct timespec) { 0, 0 });
#endif
assert(rv == 1);
rv = kevent(kq, NULL, 0, &kev, 1, &(struct timespec) { 0, 0 });
assert(rv == 0);
rv = kevent(skq, NULL, 0, &kev, 1, &(struct timespec) { 0, 0 });
assert(rv == 1);
rv = kevent(skq, NULL, 0, &kev, 1, &(struct timespec) { 0, 0 });
assert(rv == 0);
siginfo_t siginfo;
rv = sigtimedwait(&set, &siginfo, &(struct timespec) { 0, 0 });
assert(rv == SIGUSR1);
rv = sigtimedwait(&set, &siginfo, &(struct timespec) { 0, 0 });
assert(rv < 0);
assert(errno == EAGAIN);
}
}
So the notification from the I think I recall from having looked at the FreeBSD kernel sources some time ago that this happens because various parts of the involved signal/notification mechanisms can be scheduled on different CPU cores. So it isn't 100% guaranteed that right after the |
I'm curious if Linux guarantees that the |
I would assume that at least a single-thread program always sees the signal immediately (even if it was migrated to another CPU between the kill and epoll_wait call). Similarly I'd also assume it to be visible immediately for a signal sent to the current thread using pthread_kill/thr_kill/tkill. |
No, I haven't -- but I'm also not sure if this is an actual bug in the kernel or "by design". One might interpret this behavior as violating the "spirit of POSIX", because they say:
Of course there is no kqueue/epoll in POSIX, but to me it reads that all side effects of the signal should be observable after the kill() returned. |
I merged your test case, but I changed it so that it won't fail outright, but only skip. I guess this falls under the category of edge cases that cannot be 100% faithfully emulated right now. |
Thanks again! |
Reported as https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=258310 since I'm not sure if this is intended or an implementation bug. |
Fixed in freebsd/freebsd-src@98168a6 |
Fantastic! I'll update the text in the |
This test (based on jiixyj/epoll-shim#32 (comment)) fails reproducibly on QEMU with KVM and `-smp 2` prior to D31858 (committed as 98168a6) and passes with the patch applied. Reviewed By: kib, imp Differential Revision: https://reviews.freebsd.org/D31862
This test (based on jiixyj/epoll-shim#32 (comment)) fails reproducibly on QEMU with KVM and `-smp 2` prior to D31858 (committed as 98168a6) and passes with the patch applied. Reviewed By: kib, imp Differential Revision: https://reviews.freebsd.org/D31862
This test (based on jiixyj/epoll-shim#32 (comment)) fails reproducibly on QEMU with KVM and `-smp 2` prior to D31858 (committed as 98168a6) and passes with the patch applied. Reviewed By: kib, imp Differential Revision: https://reviews.freebsd.org/D31862 (cherry picked from commit d7d962e)
The wayland event_loop_signal test fails non-deterministically on FreeBSD
due to a missing SIGUSR1 notification. This test case is a minimal
reproducer for the issue: it passes on Linux but fails when run on FreeBSD.