A reimplementation of ls designed to answer one specific question:
Why is
ls -lso much slower thanlson a directory with a million files, and can we make it faster?
The blog post that spawned this code is The Layer Below: Inodes on nazquadri.dev. Read it first if you want the full story — this repo is the code that produced the numbers in that post.
ls-alpha implements three different stat strategies and lets you switch between them with a CLI flag:
--mode=classic— the boring single-threaded version.readdirevery entry, thenstateach one sequentially. This is roughly what GNUls -ldoes.--mode=par— the same work, but with crossbeam scoped threads doing thestatcalls in parallel. One batch per CPU core.--mode=iou—io_uringwith batchedstatxsubmissions, because I wanted to know if the fancy async I/O interface could beat the classic approach on warm cache. Spoiler in the blog post: it couldn't, but the reason why is interesting.
It also implements a -f flag (no stat, readdir-only) as a control, and a uid/gid name cache because once the post-mortem started it turned out getpwuid() was a bigger bottleneck than stat itself.
| Mode | stat phase |
Sort | Total | vs classic |
|---|---|---|---|---|
classic |
84.7 ms | 6.2 ms | 119.0 ms | 1.0× |
par (crossbeam) |
36.7 ms | 5.8 ms | 72.5 ms | 1.6× faster |
iou (io_uring) |
92.0 ms | 6.7 ms | 126.8 ms | 0.94× (slightly slower) |
-f (readdir only, no stat) |
— | — | 31.8 ms | — (baseline floor) |
And for comparison, the system ls:
| Command | Total |
|---|---|
ls -l |
266 ms |
ls (no flags) |
74 ms |
ls -U (no sort, no stat) |
55 ms |
Two things to notice:
-
parbeats GNUls -lby ~3.7× on warm cache. The parallelism is real, crossbeam scoped threads are the right tool, and there's a significant speedup sitting on the floor that GNUlsdoesn't pick up becauselsis designed to be portable to systems without native threading primitives. -
iouis slower thanclassicon warm cache. This surprised me the first time. The reason is that on warm cache, everystathits the kernel's inode cache and returns in microseconds. The overhead of buildingio_uringsubmission queue entries, submitting them, and reaping completion queue entries is larger than the syscall overhead it's supposed to eliminate. io_uring shines when there's real device latency to overlap — on a cold inode cache with a spinning disk, it's a completely different story. On warm cache on NVMe, the ring management costs more than the syscall.
This is the payoff of the blog post. The obvious optimisation (parallelise the stat calls) worked. The shiny modern optimisation (use io_uring) didn't — and understanding why tells you something real about how to pick the right tool.
Before I added a uid/gid name cache, ls-alpha -l took 715 ms — nearly 3× slower than GNU ls. The stat phase was 91 ms. The other 624 ms? getpwuid(). Every call to users::get_user_by_uid() does an NSS lookup — reading /etc/passwd or hitting LDAP or SSSD. Called 100,000 times, that's the real bottleneck. One HashMap<u32, String> cache brought the total from 715 ms to 130 ms.
The obvious optimisation (parallel stat) gave 2×. The non-obvious one (cache uid→name) gave 5.5×. Profile before you parallelise.
cargo build --releaseRust 2024 edition, no unusual dependencies. The io-uring crate needs Linux (kernel 5.1+ for basic support). Other modes work on any Unix.
# Plain usage — defaults to --mode=classic -l
./target/release/ls-alpha /some/directory
# Force a specific strategy
./target/release/ls-alpha /some/directory -l --mode=par
./target/release/ls-alpha /some/directory -l --mode=iou
# readdir-only (no stat calls — the baseline floor)
./target/release/ls-alpha /some/directory -f
# Show phase timings (what the benchmark table uses)
./target/release/ls-alpha /some/directory -l --mode=par --timingcargo build --release
./bench.shbench.sh creates 100,000 files in /tmp/inode-bench, runs each mode on warm cache, compares to the system ls, and optionally runs cold-cache variants if passwordless sudo is available for drop_caches. It cleans up after itself.
Override the file count with COUNT=500000 ./bench.sh.
If you're on a recent Ubuntu, your system ls might be uutils coreutils (the Rust rewrite), not GNU coreutils. They mostly match, but ls -f — the GNU way to say "no sort, no stat" — is not implemented in uutils. Use ls -U instead, which works on both. bench.sh uses -U for this reason.
"ls-alpha" started life as an alphabetic-sort-order ls variant and then grew into a benchmarking tool. The name stuck because I couldn't think of a better one. If you're looking for a production-ready modern ls replacement, try eza or lsd — this repo is a research companion, not a daily driver.
MIT — see LICENSE.
- The Layer Below: Inodes — the blog post this code was built for.
- io-uring-bench — the other experiment from the same series, answering the same question for file reads instead of stats.