Conversation
futex_waitv (SYS 449) is SYSCALL_DEFINE5 in the kernel: (waiters, nr_futexes, flags, timeout, clockid). The dispatch wrapper sc_futex_waitv was forwarding only x0..x3, dropping x4. The implementation hardcoded a CLOCK_MONOTONIC->CLOCK_REALTIME deadline conversion, so a guest asking for CLOCK_REALTIME got monotonic semantics regardless of what it passed. Wire x4 through and give sys_futex_waitv a clockid parameter. Validate against CLOCK_REALTIME or CLOCK_MONOTONIC when timeout is non-NULL (Linux returns EINVAL otherwise) and branch the absolute-deadline conversion on clockid. The pre-existing waitv loop slept on a private cond that no wake site ever signalled; forward progress relied on a 50ms poll. Add optional group_lock / group_cond pointers to futex_waiter_t plus a futex_waiter_notify_group helper. Each wake site (futex_wake, futex_requeue, futex_wake_op both passes, futex_unlock_pi) calls it after marking woken=1 under the bucket lock. waitv blocks on shared.cond directly; the loop's bounded sleep is now 500ms, kept only so exit_group and timeout edges are still observed when no signal arrives. Lock ordering is bucket -> group_lock; futex_waiter_notify_group is only called by wake sites that already hold the bucket lock, and the waitv thread never holds shared.lock while taking a bucket lock. Stack lifetime of shared.lock and shared.cond is protected by the bucket-lock pairing in waitv_unlink, which synchronizes with every wake's notify_group before the destroys run. Locked in by tests/test-futex-waitv.c (19 cases): single-element wake returning index 0, multi-element wake returning the woken index, EAGAIN on stale val, eight EINVAL paths covering nr=0, nr>128, top-level flags, reserved!=0, element flags reserved bits, size!=U32, malformed CLOCK_REALTIME / CLOCK_MONOTONIC nsec, unaligned uaddr, NULL waiters_gva, bad clockid with timeout; two EFAULT paths via PROT_NONE pages; the inclusive nr_futexes==128 boundary; and ETIMEDOUT under both CLOCK_MONOTONIC and CLOCK_REALTIME deadlines. The CLOCK_REALTIME case exists specifically to catch a regression of the dropped-clockid bug. Verified against Linux ground truth via tests/qemu-runner.sh; all 19 cases match.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
futex_waitv (SYS 449) is SYSCALL_DEFINE5 in the kernel: (waiters, nr_futexes, flags, timeout, clockid). The dispatch wrapper sc_futex_waitv was forwarding only x0..x3, dropping x4. The implementation hardcoded a CLOCK_MONOTONIC->CLOCK_REALTIME deadline conversion, so a guest asking for CLOCK_REALTIME got monotonic semantics regardless of what it passed. Wire x4 through and give sys_futex_waitv a clockid parameter. Validate against CLOCK_REALTIME or CLOCK_MONOTONIC when timeout is non-NULL (Linux returns EINVAL otherwise) and branch the absolute-deadline conversion on clockid.
The pre-existing waitv loop slept on a private cond that no wake site ever signalled; forward progress relied on a 50ms poll. Add optional group_lock / group_cond pointers to futex_waiter_t plus a futex_waiter_notify_group helper. Each wake site (futex_wake, futex_requeue, futex_wake_op both passes, futex_unlock_pi) calls it after marking woken=1 under the bucket lock. waitv blocks on shared.cond directly; the loop's bounded sleep is now 500ms, kept only so exit_group and timeout edges are still observed when no signal arrives.
Lock ordering is bucket -> group_lock; futex_waiter_notify_group is only called by wake sites that already hold the bucket lock, and the waitv thread never holds shared.lock while taking a bucket lock. Stack lifetime of shared.lock and shared.cond is protected by the bucket-lock pairing in waitv_unlink, which synchronizes with every wake's notify_group before the destroys run.
Locked in by tests/test-futex-waitv.c (19 cases): single-element wake returning index 0, multi-element wake returning the woken index, EAGAIN on stale val, eight EINVAL paths covering nr=0, nr>128, top-level flags, reserved!=0, element flags reserved bits, size!=U32, malformed CLOCK_REALTIME / CLOCK_MONOTONIC nsec, unaligned uaddr, NULL waiters_gva, bad clockid with timeout; two EFAULT paths via PROT_NONE pages; the inclusive nr_futexes==128 boundary; and ETIMEDOUT under both CLOCK_MONOTONIC and CLOCK_REALTIME deadlines. The CLOCK_REALTIME case exists specifically to catch a regression of the dropped-clockid bug. Verified against Linux ground truth via tests/qemu-runner.sh; all 19 cases match.
Summary by cubic
Fixes futex_waitv to honor the clockid argument and switches waitv to signal-driven wakeups, removing the 50ms poll and matching Linux behavior. This improves correctness (timeouts) and reduces wake latency.
Bug Fixes
clockidtosys_futex_waitv; supportCLOCK_REALTIMEandCLOCK_MONOTONIC, return EINVAL for others when a timeout is provided.waiters, and enforce 1..128nr_futexes.Refactors
waitv; keep a 500ms bounded sleep to observe exit/timeouts.tests/test-futex-waitv(19 cases) with a Makefile rule using-lpthread, and register it intests/manifest.txt.Written for commit 1423b4a. Summary will update on new commits.