Skip to content

jamesgober/page-db

Rust logo
page-db
PAGING SUBSTRATE FOR STORAGE ENGINES

Crates.io Crates.io Downloads docs.rs GitHub CI MSRV

page-db is the paging substrate that sits beneath B-tree and heap storage engines. It owns the unglamorous, get-it-exactly-right layer every database needs: fixed-size pages on disk, each with a header carrying a CRC32 integrity check and an LSN slot for write-ahead-log coordination, read and written through cross-platform Direct I/O that bypasses the OS page cache.

Above the file sits an LRU buffer pool with dirty-page pinning: hot pages stay resident, in-flight pages are pinned against eviction, and dirty pages are flushed on a controlled schedule. The engine above asks for a page by id and gets a pinned, checksummed frame back.



MSRV is 1.85+ (Rust 2024 edition). Fixed-size pages. CRC32C + LSN headers. Cross-platform Direct I/O.

Status: pre-1.0, API frozen. As of v0.5.0 the page format, the durable Direct I/O file, the LRU buffer pool with pinning and dirty tracking, and the page-id allocator are all implemented; the parse and recovery paths are fuzzed, and the public API is frozen for 1.0. The remaining road to 1.0 is integration soak and the on-disk-format freeze per dev/ROADMAP.md. The on-disk format is unstable until 1.0.


What it does

  • Fixed-size pages — configurable page size (4 KiB–1 MiB); a versioned 32-byte header with magic, CRC32C, page id, and an LSN slot
  • CRC32C integrity — every page is checksummed; a torn, corrupt, or misdirected page is detected on read and returned as a typed error, never silently trusted
  • Cross-platform Direct I/O — O_DIRECT (Linux), F_NOCACHE (macOS), FILE_FLAG_NO_BUFFERING (Windows), into buffers aligned to the page size, with a buffered fallback for filesystems that reject it
  • Durable on demandwrite_page places bytes, sync makes them durable (fdatasync / FlushFileBuffers / macOS F_FULLFSYNC)
  • LRU buffer pool — a bounded frame cache over the file with clock (second-chance) eviction
  • Pinning & dirty tracking — a pinned page is never evicted; a dirty page is always flushed before its frame is reused — both verified by property tests and loom model checks
  • Page-id allocator — an on-disk free-list that hands out unused ids and reclaims freed ones; allocate and free are pure in-memory operations



Installation

[dependencies]
page-db = "0.5"

Usage

use page_db::{PageFile, PageId, Lsn, DEFAULT_PAGE_SIZE};

fn main() -> Result<(), page_db::PageError> {
    // A 4 KiB-page file, Direct I/O, created if absent.
    let file = PageFile::open("data.pages", DEFAULT_PAGE_SIZE)?;

    // Fill a page, tag it with a log sequence number, write it to slot 0.
    let mut page = file.allocate_page();
    page.set_lsn(Lsn::new(1));
    page.payload_mut()[..5].copy_from_slice(b"hello");
    file.write_page(PageId::new(0), &mut page)?;
    file.sync()?;

    // Read it back — the header and checksum are verified on the way out.
    let got = file.read_page(PageId::new(0))?;
    assert_eq!(&got.payload()[..5], b"hello");
    assert_eq!(got.lsn(), Lsn::new(1));
    Ok(())
}

On a filesystem that rejects O_DIRECT (some overlay and network mounts), open with PageFileOptions::new().direct_io(false) — same API, same durability via sync, only the page cache differs.

Through the buffer pool, hot pages stay resident and a fetch returns a pinned frame:

use page_db::{BufferPool, PageId, Lsn, DEFAULT_PAGE_SIZE};

fn main() -> Result<(), page_db::PageError> {
    // 256 frames cached over a 4 KiB-page file.
    let pool = BufferPool::open("data.pages", DEFAULT_PAGE_SIZE, 256)?;

    // Create page 0; writing through the guard marks the frame dirty.
    {
        let guard = pool.new_page(PageId::new(0))?;
        guard.write().set_lsn(Lsn::new(1));
    }
    pool.checkpoint()?;   // flush dirty frames, then make the file durable

    // Fetch it — a cache hit, served without touching the disk.
    let guard = pool.fetch(PageId::new(0))?;
    assert_eq!(guard.read().lsn(), Lsn::new(1));
    Ok(())
}

To put a whole engine layer together, the allocator picks ids and the pool caches the pages at them, both over one shared file:

use std::sync::Arc;
use page_db::{BufferPool, PageAllocator, PageFile, DEFAULT_PAGE_SIZE};

fn main() -> Result<(), page_db::PageError> {
    let store = Arc::new(PageFile::open("data.pages", DEFAULT_PAGE_SIZE)?);
    let alloc = PageAllocator::new(Arc::clone(&store))?;
    let pool = BufferPool::new(Arc::clone(&store), 128);

    let id = alloc.allocate()?;          // allocator chooses the id
    {
        let guard = pool.new_page(id)?;  // pool caches the page there
        guard.write().payload_mut()[0] = 0x7;
    }
    pool.flush_all()?;
    alloc.sync()?;                       // persist allocator state + page data
    Ok(())
}

API Overview

For the complete reference with examples, see docs/API.md.

  • BufferPool — the bounded page cache with pinning and dirty tracking
  • PageGuard — an RAII pin on a cached page; read / write borrows
  • PageAllocator — the page-id allocator with an on-disk free-list
  • PageFile / PageFileOptions — the durable page store and its open options
  • Page — a fixed-size page: header accessors, payload, checksummed framing
  • PageId / Lsn / PageSize — the value types
  • PageStore — the storage seam the pool and allocator sit on
  • PageError — typed integrity and I/O failures
  • crc32c — the CRC32C checksum, exposed directly



Where It Fits

page-db is the lowest layer of the storage-engine stack. It is built on by:

  • index-db — B+tree nodes are pages allocated and cached here
  • lock-db — the concurrency-control sibling over the same paged store
  • wal-db — the LSN slot in each page header coordinates with the write-ahead log
  • heap / B-tree engines — any storage engine that needs durable, cached, fixed-size pages

It depends on no sibling crates — only thiserror (error types) and, on Unix, libc (for O_DIRECT and the macOS durability syscalls) — so it builds and tests standalone today.


Cross-Platform Support

Linux (x86_64, aarch64), macOS (x86_64, Apple Silicon), and Windows (x86_64) are first-class and verified by the CI matrix.


Contributing

See CONTRIBUTING.md and dev/DIRECTIVES.md. Before a PR: cargo fmt --all, cargo clippy --all-targets --all-features -- -D warnings, and cargo test --all-features must be clean.


License

Licensed under either of

at your option.

COPYRIGHT © 2026 JAMES GOBER.

About

The paging substrate beneath B-tree and heap storage engines - fixed-size pages, CRC32 headers with LSN slots, an LRU buffer pool with dirty-page pinning, and cross-platform Direct I/O.

Topics

Resources

License

Apache-2.0, MIT licenses found

Licenses found

Apache-2.0
LICENSE-APACHE
MIT
LICENSE-MIT

Contributing

Stars

Watchers

Forks

Packages

 
 
 

Contributors

Languages