ls-alpha

A reimplementation of ls designed to answer one specific question:

Why is ls -l so much slower than ls on a directory with a million files, and can we make it faster?

The blog post that spawned this code is The Layer Below: Inodes on nazquadri.dev. Read it first if you want the full story — this repo is the code that produced the numbers in that post.

What it does

ls-alpha implements three different stat strategies and lets you switch between them with a CLI flag:

--mode=classic — the boring single-threaded version. readdir every entry, then stat each one sequentially. This is roughly what GNU ls -l does.
--mode=par — the same work, but with crossbeam scoped threads doing the stat calls in parallel. One batch per CPU core.
--mode=iou — io_uring with batched statx submissions, because I wanted to know if the fancy async I/O interface could beat the classic approach on warm cache. Spoiler in the blog post: it couldn't, but the reason why is interesting.

It also implements a -f flag (no stat, readdir-only) as a control, and a uid/gid name cache because once the post-mortem started it turned out getpwuid() was a bigger bottleneck than stat itself.

The results (100,000 files, warm cache, NVMe ext4)

Mode	`stat` phase	Sort	Total	vs `classic`
`classic`	84.7 ms	6.2 ms	119.0 ms	1.0×
`par` (crossbeam)	36.7 ms	5.8 ms	72.5 ms	1.6× faster
`iou` (io_uring)	92.0 ms	6.7 ms	126.8 ms	0.94× (slightly slower)
`-f` (readdir only, no stat)	—	—	31.8 ms	— (baseline floor)

And for comparison, the system ls:

Command	Total
`ls -l`	266 ms
`ls` (no flags)	74 ms
`ls -U` (no sort, no stat)	55 ms

Two things to notice:

par beats GNU ls -l by ~3.7× on warm cache. The parallelism is real, crossbeam scoped threads are the right tool, and there's a significant speedup sitting on the floor that GNU ls doesn't pick up because ls is designed to be portable to systems without native threading primitives.
iou is slower than classic on warm cache. This surprised me the first time. The reason is that on warm cache, every stat hits the kernel's inode cache and returns in microseconds. The overhead of building io_uring submission queue entries, submitting them, and reaping completion queue entries is larger than the syscall overhead it's supposed to eliminate. io_uring shines when there's real device latency to overlap — on a cold inode cache with a spinning disk, it's a completely different story. On warm cache on NVMe, the ring management costs more than the syscall.

This is the payoff of the blog post. The obvious optimisation (parallelise the stat calls) worked. The shiny modern optimisation (use io_uring) didn't — and understanding why tells you something real about how to pick the right tool.

The biggest surprise wasn't about `stat` at all

Before I added a uid/gid name cache, ls-alpha -l took 715 ms — nearly 3× slower than GNU ls. The stat phase was 91 ms. The other 624 ms? getpwuid(). Every call to users::get_user_by_uid() does an NSS lookup — reading /etc/passwd or hitting LDAP or SSSD. Called 100,000 times, that's the real bottleneck. One HashMap<u32, String> cache brought the total from 715 ms to 130 ms.

The obvious optimisation (parallel stat) gave 2×. The non-obvious one (cache uid→name) gave 5.5×. Profile before you parallelise.

Building

cargo build --release

Rust 2024 edition, no unusual dependencies. The io-uring crate needs Linux (kernel 5.1+ for basic support). Other modes work on any Unix.

Running

# Plain usage — defaults to --mode=classic -l
./target/release/ls-alpha /some/directory

# Force a specific strategy
./target/release/ls-alpha /some/directory -l --mode=par
./target/release/ls-alpha /some/directory -l --mode=iou

# readdir-only (no stat calls — the baseline floor)
./target/release/ls-alpha /some/directory -f

# Show phase timings (what the benchmark table uses)
./target/release/ls-alpha /some/directory -l --mode=par --timing

Reproducing the benchmark

cargo build --release
./bench.sh

bench.sh creates 100,000 files in /tmp/inode-bench, runs each mode on warm cache, compares to the system ls, and optionally runs cold-cache variants if passwordless sudo is available for drop_caches. It cleans up after itself.

Override the file count with COUNT=500000 ./bench.sh.

A note on uutils coreutils

If you're on a recent Ubuntu, your system ls might be uutils coreutils (the Rust rewrite), not GNU coreutils. They mostly match, but ls -f — the GNU way to say "no sort, no stat" — is not implemented in uutils. Use ls -U instead, which works on both. bench.sh uses -U for this reason.

Why the name

"ls-alpha" started life as an alphabetic-sort-order ls variant and then grew into a benchmarking tool. The name stuck because I couldn't think of a better one. If you're looking for a production-ready modern ls replacement, try eza or lsd — this repo is a research companion, not a daily driver.

License

MIT — see LICENSE.

Name		Name	Last commit message	Last commit date
Latest commit History 1 Commit
src		src
.gitignore		.gitignore
Cargo.lock		Cargo.lock
Cargo.toml		Cargo.toml
LICENSE		LICENSE
README.md		README.md
bench.sh		bench.sh

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

ls-alpha

What it does

The results (100,000 files, warm cache, NVMe ext4)

The biggest surprise wasn't about `stat` at all

Building

Running

Reproducing the benchmark

A note on uutils coreutils

Why the name

License

Related

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

ls-alpha

What it does

The results (100,000 files, warm cache, NVMe ext4)

The biggest surprise wasn't about stat at all

Building

Running

Reproducing the benchmark

A note on uutils coreutils

Why the name

License

Related

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

The biggest surprise wasn't about `stat` at all

Packages