RingCore is a minimal, zero-abstraction async runtime built directly on Linux’s io_uring.
Add this to your Cargo.toml:
[dependencies]
ringcore = "0.1"While mainstream runtimes offer "black-box" convenience, RingCore provides transparency and surgical control. It is a "white-box" implementation designed to expose exactly how Rust's Future model maps to real kernel operations through:
- Surgical Task Scheduling: A minimalist executor that reveals the raw mechanics of the Rust Waker system.
- Direct-to-Kernel Mapping: Every operation maps directly to an
io_uringSubmission Queue Entry (SQE) with no intermediate overhead. - Batch Submission: Harnesses
io_uringto group multiple I/O requests into a single system call, nearly eliminating context-switching costs.
RingCore prioritizes insight over abstraction, making it the definitive reference for understanding how async/await actually works under the hood.
The project is structured into four distinct layers:
This layer handles the "raw" interaction with the Linux kernel.
- Syscalls: Manually invokes
SYS_IO_URING_SETUPandSYS_IO_URING_ENTERvialibc. - Memory Mapping: Uses
mmapto map the kernel's Submission Queue (SQ) and Completion Queue (CQ) directly into the process's address space. - Synchronization: Uses
std::sync::atomicto manage the head and tail pointers of the rings, ensuring thread-safe (or in this case, single-threaded but re-entrant) access to the shared kernel memory.
This layer translates io_uring operations into Rust Futures.
- Lazy Submission: An
Opfuture only writes its Submission Queue Entry (SQE) to the ring when it is first polled. - Waker Management: Each operation is assigned a unique
user_dataID. Before returningPoll::Pending, the future stores itsWakerin a global map. - Completion: When the executor finds a Completion Queue Entry (CQE) with a matching ID, it retrieves the result and triggers the
Waker.
The "heart" of the runtime.
- Task Queue: A
VecDequeof tasks ready to be polled. - Run Loop:
- Polls all ready tasks.
- Enters the ring to submit pending SQEs and harvest CQEs.
- Wakes tasks associated with finished CQEs.
- Blocks on
io_uring_enter(withmin_complete=1) if there is no work to do, effectively putting the thread to sleep until the kernel notifies it of a completion.
High-level, idiomatic wrappers for system resources.
- Thread-Local Storage: Uses a
thread_local!macro to provide access to theIoUringinstance, allowingTcpListenerandTcpStreamto submit operations without carrying around references or handles. - Async API: Provides
async fnmethods foraccept,read, andwritethat feel like standard Rust networking but use the underlyingio_uringops.
- OS: Linux 5.10+ (for stable
IORING_OP_ACCEPTsupport). - Architecture: x86_64.
- Dependencies:
libcandstdonly.
In a standard threaded model, every connection incurs the overhead of a thread stack and context switching. More importantly, every read and write is a separate system call that triggers a user-to-kernel mode switch.
With io_uring:
- System Call Elision: Multiple operations (SQEs) can be submitted with a single
io_uring_entercall. - Zero-Copy (Potential): While this minimal runtime uses buffers,
io_uringsupports registered buffers for even higher performance. - Single-Threaded Efficiency: We handle thousands of connections on a single thread without the overhead of thread management.
We compared RingCore against the Rust Standard Library and Tokio on a Debian 13 (Kernel 6.12) system.
Sequential read and write performance.
| Runtime | Real Time | System Time |
|---|---|---|
Standard (std::fs) |
0.057s | 0.016s |
| Tokio (epoll + thread pool) | 0.461s | 0.376s |
RingCore (io_uring) |
0.088s | 0.036s |
Note: RingCore significantly outperforms Tokio on file I/O because it uses true asynchronous kernel operations instead of a blocking thread pool.
Total time for sequential and high-concurrency workloads.
| Test Case | Std (Threaded) | Tokio (Epoll) | RingCore (io_uring) |
|---|---|---|---|
| 100 Seq Requests | 12.8ms | 14.9ms | 7.5ms |
| 1000 Stress Requests | 48.3ms | 1.08s | 67.9ms |
RingCore's single-threaded architecture excels at high-concurrency without the task-scheduling overhead seen in multi-threaded runtimes.
Using IOSQE_IO_LINK, RingCore can batch dependent operations (like Read -> Write) so the kernel executes them back-to-back without returning to userspace.
- Transparency: Shows how to eliminate userspace "ping-pong" for dependent I/O tasks.
- Efficiency: Batches multiple SQEs into a single
io_uring_entercall.
Run any example using cargo run --example <name>.
- Echo Server:
cargo run --example echo- Chained Accept -> Read -> Write.
- Async Cat:
cargo run --example cat -- <file>- File I/O in isolation.
- Async Timer:
cargo run --example timer- Non-I/O task parking and waking.
- Ping Pong:
cargo run --example ping_pong- Task synchronization over a
socketpair.
- Task synchronization over a
- Concurrent Reads:
cargo run --example concurrent_downloads- Submitting 100 SQEs simultaneously.
- Tee Utility:
cargo run --example tee -- <file1> <file2>- Fan-out writes to multiple files.
- Timeout Race:
cargo run --example timeout_race- Demonstrates operation cancellation (
IORING_OP_ASYNC_CANCEL).
- Demonstrates operation cancellation (
- HTTP Server:
cargo run --example http_server- High-concurrency "Hello World" benchmark.
- File Server:
cargo run --example file_server- Serving static files over TCP.
- Logger:
cargo run --example logger- Batch writes using Scatter-Gather (
writev).
- Batch writes using Scatter-Gather (
- SQPOLL:
sudo cargo run --example sqpoll- Kernel-side SQ polling (requires
sudoforCAP_SYS_ADMIN).
- Kernel-side SQ polling (requires
- Linked Cat:
cargo run --example linked_cat -- <file>- Chaining Read + Write operations at the kernel level via
IOSQE_IO_LINK.
- Chaining Read + Write operations at the kernel level via
- Multishot Accept:
cargo run --example multishot_accept- One SQE generating infinite connection CQEs.
Every io_uring_sqe field and io_uring_cqe field is documented in src/sys.rs to explain how the kernel interprets the submission and reports results.