Skip to content

ipc/mqueue: add fcntl(F_MQ_PEEK) for non-destructive message inspection#911

Open
vfsci-bot[bot] wants to merge 3 commits intovfs.base.cifrom
pw/1072498/vfs.base.ci
Open

ipc/mqueue: add fcntl(F_MQ_PEEK) for non-destructive message inspection#911
vfsci-bot[bot] wants to merge 3 commits intovfs.base.cifrom
pw/1072498/vfs.base.ci

Conversation

@vfsci-bot
Copy link

@vfsci-bot vfsci-bot bot commented Mar 25, 2026

Series: https://patchwork.kernel.org/project/linux-fsdevel/list/?series=1072498
Submitter: Shaurya Rane
Version: 1
Patches: 3/3
Message-ID: <20260325190025.40312-1-ssrane_b23@ee.vjti.ac.in>
Base: vfs.base.ci
Lore: https://lore.kernel.org/linux-fsdevel/20260325190025.40312-1-ssrane_b23@ee.vjti.ac.in


Automated by ml2pr

Add the user-visible interface for non-destructive POSIX message queue
inspection via fcntl(2).

POSIX message queues have no way to inspect queued messages without
consuming them: mq_receive() always dequeues the message it returns.
This makes it impossible for checkpoint/restore tools such as CRIU to
save and replay message queue contents without destroying the queue
state in the process.

struct mq_peek_attr describes the request: the caller specifies an
index into the queue in receive order (0 = next message that
mq_receive() would return, i.e. highest priority, FIFO within same
priority) and a buffer to receive the payload.  On return, msg_prio is
filled with the message priority and the return value is the number of
bytes copied.

F_MQ_PEEK = F_LINUX_SPECIFIC_BASE + 17 is the new fcntl command that
accepts a pointer to struct mq_peek_attr.

Link: checkpoint-restore/criu#2285
Signed-off-by: Shaurya Rane <ssrane_b23@ee.vjti.ac.in>
struct msg_msgseg and the DATALEN_MSG / DATALEN_SEG macros are
currently private to ipc/msgutil.c.  struct msg_msg (already in the
public kernel header include/linux/msg.h) carries a pointer to
msg_msgseg, making it an incomplete type for all callers outside
msgutil.c.

Move the definition of struct msg_msgseg and the two DATALEN macros to
include/linux/msg.h so that other IPC code can safely copy
multi-segment message payloads into a kernel buffer under a spinlock,
without calling store_msg() which performs copy_to_user() and therefore
cannot be used under a spinlock.

ipc/msgutil.c already includes <linux/msg.h>, so it picks up the
definitions from the header with no functional change.

Signed-off-by: Shaurya Rane <ssrane_b23@ee.vjti.ac.in>
…spection

Add support for F_MQ_PEEK, a new fcntl command that reads a POSIX
message queue message by index without removing it from the queue.

Background:
CRIU (Checkpoint/Restore In Userspace) supports live container migration
and process checkpoint/restore.  POSIX message queues are a widely-used
IPC mechanism, but CRIU cannot checkpoint processes that hold open mqueue
file descriptors: there is no kernel interface to inspect queued messages
non-destructively.  The SysV IPC analogue (MSG_COPY for msgrcv) was
introduced specifically for CRIU in commit 4a674f3 ("ipc: introduce
message queue copy feature").  This patch provides the equivalent for
POSIX mqueues.

Implementation:
The queue stores messages in a red-black tree (info->msg_tree) keyed
by priority, with each tree node holding a FIFO list of messages at
that priority level.  mq_peek_at_offset() walks this structure in
receive order (highest priority first, FIFO within priority) to locate
the message at the requested index without modifying any state.

Message payload is copied into a kvmalloc'd kernel buffer under
info->lock using pure memcpy() (no page faults possible).  This
correctly handles multi-segment messages by walking the msg_msgseg
chain.  The lock is released before copy_to_user() transfers the
kernel buffer to userspace.

A new include/linux/mqueue.h kernel header is added to declare
do_mq_peek() for use from fs/fcntl.c, following the same pattern as
include/linux/memfd.h for memfd_fcntl().

Concurrency:
The snapshot is consistent within the spin_lock() critical section.
Between two F_MQ_PEEK calls the queue may change (messages may be sent
or received).  This is documented snapshot semantics, analogous to
/proc entries.  CRIU freezes the target process via ptrace before
dumping, so in practice the queue is stable for the entire checkpoint
sequence.

Link: checkpoint-restore/criu#2285
Signed-off-by: Shaurya Rane <ssrane_b23@ee.vjti.ac.in>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant