Skip to content

Honor CLOCK_BOOTTIME and TFD_NONBLOCK in timerfd#87

Open
jserv wants to merge 4 commits into
mainfrom
timerfd
Open

Honor CLOCK_BOOTTIME and TFD_NONBLOCK in timerfd#87
jserv wants to merge 4 commits into
mainfrom
timerfd

Conversation

@jserv
Copy link
Copy Markdown
Contributor

@jserv jserv commented Jun 6, 2026

sys_timerfd_create is known to be incomplete:

  • The clockid allow-list only accepted CLOCK_{REALTIME,MONOTONIC}, so CLOCK_BOOTTIME (7) returned -EINVAL. Linux has no suspend-aware equivalent on macOS; treating BOOTTIME as MONOTONIC matches the existing translate_clockid mapping in src/syscall/time.c.
  • TFD_NONBLOCK was applied via fd_set_nonblock(kq), which issues fcntl(F_SETFL, O_NONBLOCK) on the kqueue host fd. macOS rejects that with ENOTTY (errno 25), so every timerfd_create(..., TFD_NONBLOCK) call failed regardless of clockid. The "Inappropriate ioctl for device" string in foot's log is glibc strerror for ENOTTY leaking through.

The non-blocking flag now lives in fd_table[gfd].linux_flags alongside the existing CLOEXEC bit, and timerfd_read consults that field after snapshotting it under fd_lock (order 3) before acquiring sfd_lock (order 5a). The lock-order snapshot matches the documented discipline in src/syscall/internal.h and the eventfd_dup_fd pattern.

Close #82


Summary by cubic

Adds support for CLOCK_BOOTTIME and makes timerfd non-blocking and fcntl behavior match Linux on macOS. Also unifies linux_flags handling so reads, dup, and FUSE behave correctly.

  • New Features

    • Accept CLOCK_BOOTTIME in timerfd_create (mapped to MONOTONIC).
  • Bug Fixes

    • Track O_NONBLOCK in fd_table[].linux_flags; timerfd_read snapshots it. fcntl(F_GETFL) for timerfd returns O_RDWR|{O_APPEND,O_NONBLOCK,O_NOATIME}, and F_SETFL updates those bits; O_DIRECT now returns -EINVAL.
    • Serialize all linux_flags writes via fd_publish_linux_flags; sys_fcntl snapshots and revalidates before F_SETFD/F_SETFL updates under fd_lock.
    • Preserve O_NONBLOCK and access mode across dup (fs and FUSE); FUSE F_SETFL now preserves access mode.
    • Add LINUX_O_ASYNC and LINUX_O_NOATIME; translate helpers use these symbols.
    • Expanded tests for BOOTTIME+NONBLOCK, EAGAIN on armed timers, F_GETFL/F_SETFL semantics, and dup flag preservation.

Written for commit 0a2199c. Summary will update on new commits.

Review in cubic

cubic-dev-ai[bot]

This comment was marked as resolved.

jserv added 4 commits June 7, 2026 04:01
foot fails at startup under elfuse with "failed to create keyboard repeat
timer FD: Inappropriate ioctl for device" because timerfd_create(
CLOCK_BOOTTIME, TFD_CLOEXEC | TFD_NONBLOCK) returns -1. Two root causes:
- The clockid allow-list only accepted CLOCK_REALTIME and CLOCK_MONOTONIC,
  so CLOCK_BOOTTIME (7) returned -EINVAL. Linux has no suspend-aware
  equivalent on macOS; treating BOOTTIME as MONOTONIC matches the existing
  translate_clockid mapping in src/syscall/time.c.

- TFD_NONBLOCK was applied via fd_set_nonblock(kq), which issues
  fcntl(F_SETFL, O_NONBLOCK) on the kqueue host fd. macOS rejects that
  with ENOTTY (errno 25), so every timerfd_create(..., TFD_NONBLOCK) call
  failed regardless of clockid. The "Inappropriate ioctl for device"
  string foot logs is glibc strerror for ENOTTY leaking through.

The non-blocking flag now lives in fd_table[gfd].linux_flags alongside
the existing CLOEXEC bit, and timerfd_read consults that field after
snapshotting it under fd_lock (order 3) before acquiring sfd_lock
(order 5a). The lock-order snapshot matches the documented discipline
in src/syscall/internal.h and the eventfd_dup_fd pattern.

Tests cover create succeeding with CLOCK_BOOTTIME+TFD_NONBLOCK and an
armed-but-unfired non-blocking read returning EAGAIN through the shadow
rather than the unarmed-timer EAGAIN path. Verified against an ARM64
Linux host that read-before-fire on a non-blocking timerfd returns
errno=EAGAIN, matching the elfuse behavior.

Close #82
Without an FD_TIMERFD branch, fcntl(timerfd, F_GETFL) routed to the
kqueue host fd and surfaced macOS-side flags, while F_SETFL hit the
same ENOTTY rejection that broke the create path. Wire both branches
through fd_table[fd].linux_flags so the shadow is the source of truth.

  F_GETFL returns O_RDWR plus the writable bits Linux honors on a
  timerfd inode: O_APPEND, O_NONBLOCK, and O_NOATIME (Linux's full
  SETFL_MASK minus O_ASYNC, which timerfd_fops drops because it lacks
  ->fasync, and minus O_DIRECT). O_RDWR is hard-coded because Linux
  opens the inode O_RDWR via anon_inode_getfd in fs/timerfd.c, and is
  also stamped into linux_flags at create time so the access mode is
  visible to other consumers.

  F_SETFL accepts O_APPEND, O_NONBLOCK, and O_NOATIME, silently drops
  access mode / CLOEXEC / non-writable bits matching how Linux F_SETFL
  treats them, and returns -EINVAL on O_DIRECT mirroring Linux's
  vfs_set_direct_io_flags rejecting it on an inode without
  FMODE_CAN_ODIRECT.

dup(2) preservation: the install_fd_alias_metadata_atomic preserved
mask now includes LINUX_O_NONBLOCK, and fuse_dup_fd does the same.
Without this a duplicated non-blocking fd of any kind that stores
NONBLOCK in linux_flags rather than on the host fd (FUSE files, now
also timerfds) silently reverted to blocking.

abi.h adds LINUX_O_ASYNC=0x2000 and LINUX_O_NOATIME=0x40000 (octal
020000 and 01000000 from asm-generic/fcntl.h). The translate helpers
now use the LINUX_O_ASYNC symbol instead of the 0x2000 inline literal.

Tests cover the full fcntl coherence surface: O_RDWR access mode is
visible, accepted-plus-stray F_SETFL persists O_APPEND while dropping
O_WRONLY and O_CLOEXEC, O_DIRECT returns EINVAL, O_NOATIME round-trips,
and dup preserves both NONBLOCK and the access mode.
The FUSE F_SETFL branch's preserved mask omitted LINUX_O_ACCMODE, while
the assignment OR'd in arg bits outside the preserved set. As a result,
fcntl(fuse_rdwr_fd, F_SETFL, 0) silently turned an O_RDWR FUSE shadow
into O_RDONLY, and a subsequent fcntl(fd, F_GETFL) reported the wrong
access mode -- breaking the Linux contract that F_SETFL cannot change
the access mode.

Add LINUX_O_ACCMODE to both the preserved mask and the strip applied
to the incoming arg, matching how Linux generic_setfl() preserves the
access mode bits outside its SETFL_MASK.
Several callers wrote fd_table[gfd].linux_flags under different locks
or none at all, so a concurrent fcntl(F_SETFL/F_SETFD) on fd_lock could
race a creator's bare assignment. sys_fcntl read the slot's type and
flags outside fd_lock and mutated them without revalidation, so a
close+reopen between the read and the write could update an unrelated
fd. This commit unifies both concerns under fd_lock.

fd_publish_linux_flags helper
  New fdtable helper takes fd_lock around a single linux_flags write.
  Replaces bare assignments in sys_timerfd_create, sys_eventfd,
  sys_signalfd, sys_inotify_init1, the FUSE dev mount, and fuse_open.
  fuse_dup_fd takes fd_lock once for both the source read and the
  destination write so the preserved-flags snapshot stays consistent
  with a racing F_SETFL on either fd. The result: every write to
  fd_table[*].linux_flags is now serialized on the same lock, with no
  fuse_lock<->fd_lock nesting introduced.

sys_fcntl snapshot-then-revalidate
  sys_fcntl now takes a single fd_snapshot at entry and uses it for
  F_GETFD, F_GETFL, and F_DUPFD source reads. F_SETFD and the FUSE /
  timerfd F_SETFL writers reacquire fd_lock and revalidate against
  fd_snap.generation before mutating linux_flags. fd_alloc bumps a
  monotonic generation counter per slot reuse, so close+reopen between
  snapshot and lock is caught and returns EBADF rather than mutating an
  unrelated fd.

The timerfd F_SETFL O_DIRECT EINVAL check moves inside the lock so a
stale-snapshot race cannot report EINVAL based on a fd that is no
longer a timerfd; the revalidation returns EBADF first instead.

A new test exercises the cross-cutting fd_lock RMW: F_SETFL stamps the
writable status bits, then F_SETFD toggles CLOEXEC, and F_GETFL must
still surface the status bits unperturbed.
@jserv jserv changed the title Honor CLOCK_BOOTTIME and TFD_NONBLOCK Honor CLOCK_BOOTTIME and TFD_NONBLOCK in timerfd Jun 6, 2026
@jserv
Copy link
Copy Markdown
Contributor Author

jserv commented Jun 6, 2026

@doanbaotrung , Please confirm if this PR helps.

@doanbaotrung
Copy link
Copy Markdown

Dear @jserv ,

I've just built code from this branch and try to execute the application. The issue of timerfd was gone. It works now.

Thank,
Trung

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

timerfd_create crashed

2 participants