Skip to content

sys_close does not clear epoll registrations — stale regs[fd].active leaves the PR #73 ABA guard load-bearing #78

@Max042004

Description

@Max042004

Summary

Closing a guest fd does not remove that fd from any epoll instance's interest
table. Each epoll instance keeps its own epoll_reg_t regs[FD_TABLE_SIZE]
indexed by guest fd number (src/syscall/poll.c:640-642), but sys_close()
(src/syscall/fs.c:423) — and every other close path that funnels through
fd_cleanup_entry() (dup2-over-existing, the execve CLOEXEC sweep) — only tears
down the fd_table[fd] slot. It never touches the regs[fd] entry of the epoll
instances that registered that fd, so regs[fd].active stays true after the
fd is gone.

On Linux, closing the last descriptor that refers to an open file description
auto-removes it from every epoll interest list. elfuse leaves the registration
behind.

This was flagged by jserv in the PR #73 review as an adjacent, pre-existing
issue, explicitly to be filed separately:

Adjacent (pre-existing, but more observable after this change): sys_close
does not clear inst->regs[fd].active for epoll instances holding the fd, so
a closed-then-reopened guest fd still looks active to epoll. Worth filing
separately.

Root cause

There is no reverse index from a target fd to the set of epoll instances that
registered it. Epoll instances are reachable only through their own epfd in
fd_table[epfd].dir. A close has the fd number in hand but no cheap way to find
"which epoll instances watch this fd," so it does nothing — the registration
tables drift out of sync with reality and only get corrected lazily, the next
time epoll_ctl() happens to touch that exact (epfd, fd) pair.

Current mitigation (PR #73) and why it is not a fix

PR #73 added an fd_entry_t.generation stamp and a cross-call ABA guard in
sys_epoll_ctl() (src/syscall/poll.c:743). At the head of every
epoll_ctl(), a registration whose stamped generation no longer matches
fd_table[fd].generation is treated as stale and dropped, so DEL/MOD report
ENOENT and ADD starts fresh after a close+reopen of the same fd number. This
makes the observable close+reopen ABA correct and is covered by
tests/test-epoll-aba.c.

But the guard is a point-of-use patch at the symptom layer, not a fix for
the root cause:

  • The registration table still lies: regs[fd].active == true for a fd that
    is closed (and possibly reopened as a different file). Correctness now depends
    on every reader re-deriving truth from generation.
  • sys_epoll_ctl() does consult generation. sys_epoll_pwait() does not.
    The readiness loop maps kevents[i].udata (the guest fd number) straight back
    into inst->regs[gfd].active / .oneshot_armed with no generation check
    (src/syscall/poll.c:1010-1012 and 1046-1051). Today this is masked because
    a closed host fd's knote is already gone from kqueue, so no event carries the
    stale udata — but that is an indirect kqueue side effect, not an invariant
    the code enforces. Any future change that re-adds a host fd, or a path that
    reuses a udata value, resurfaces the stale entry as a wrong/mismatched-data
    readiness report.
  • The guard is therefore load-bearing: remove or bypass the generation check
    in any one epoll read path and the original wrong-knote / wrong-data bug
    returns. That is fragile for a correctness-critical path.

Stale entries also simply linger for the epoll instance's whole lifetime
(regs[] is FD_TABLE_SIZE == 1024 entries per instance), never reflecting that
the fd is gone.

Out of scope (related, separate divergence)

elfuse keys epoll state on the guest fd number, while Linux keys on the open
file description. So dup()-then-close-the-original (registration should
survive via the surviving descriptor) is a distinct modeling gap that neither
the generation guard nor eager-cleanup-on-close addresses. Track separately;
do not conflate with this issue.

Reproduction (state-level; observable symptom is currently masked by the guard)

  1. epfd = epoll_create1(); fd = eventfd().
  2. epoll_ctl(epfd, EPOLL_CTL_ADD, fd, …)regs[fd].active = true.
  3. close(fd).
  4. Inspect the instance: regs[fd].active is still true (Linux: the
    registration is gone).

With the PR #73 guard in place the observable DEL/MOD/ADD behavior after a
reopen is correct (see test-epoll-aba.c); this issue is about the underlying
state being wrong and the guard being the only thing standing between that wrong
state and a user-visible bug.

References

  • PR Fix epoll_ctl dropping registrations in multi-threaded guests #73 "Fix epoll_ctl dropping registrations in multi-threaded guests" — adds
    the generation guard and test-epoll-aba.c.
  • src/syscall/poll.c:743 — cross-call ABA guard (consults generation).
  • src/syscall/poll.c:1010-1012, 1046-1051sys_epoll_pwait reads
    regs[].active / .oneshot_armed without a generation check.
  • src/syscall/poll.c:640-642 — per-instance regs[FD_TABLE_SIZE] table.
  • src/syscall/fs.c:423 (sys_close), src/syscall/fdtable.c:457
    (fd_cleanup_entry) — close teardown that does not touch epoll state.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions