Skip to content

salesforce-misc/merutable

merutable

CI crates.io docs.rs License

An embeddable Rust table engine. LSM writes, Parquet storage, Iceberg-compatible metadata.

The writes go through a WAL + skip-list memtable; flushes land as Apache Parquet based SSTables. Invoke db.export_iceberg(path) when you need an Iceberg v2 view — DuckDB, Spark, Trino, Snowflake, and pyiceberg read it with no format conversion.

use merutable::{MeruDB, OpenOptions};
use merutable::schema::{ColumnDef, ColumnType, TableSchema};
use merutable::value::{FieldValue, Row};

#[tokio::main]
async fn main() -> merutable::error::Result<()> {
    let schema = TableSchema {
        table_name: "events".into(),
        columns: vec![
            ColumnDef { name: "id".into(),      col_type: ColumnType::Int64,     nullable: false, ..Default::default() },
            ColumnDef { name: "payload".into(), col_type: ColumnType::ByteArray, nullable: true,  ..Default::default() },
        ],
        primary_key: vec![0],
        ..Default::default()
    };

    let db = MeruDB::open(OpenOptions::new(schema)).await?;

    db.put(Row::new(vec![
        Some(FieldValue::Int64(1)),
        Some(FieldValue::Bytes(b"hello"[..].into())),
    ])).await?;

    let row = db.get(&[FieldValue::Int64(1)])?;
    println!("{row:?}");

    db.close().await?;   // flush + fsync + seal; reads remain until drop
    Ok(())
}

When merutable fits

Structured data thats both write-heavy - agent memory, session state, audit logs, feature stores, embedded time-series - and readable by analytical engines without an ETL job. An LSM gives you the fast-writes; Iceberg compatible metadata layer gives you the analytics reads.

What's in the box

  • Durable LSM write path. Write-ahead log with 32 KiB block framing and CRC32, crossbeam skip-list memtable, graduated writer backpressure on L0-file buildup. visible_seq advances only after the memtable apply, so readers never observe a torn write.
  • Leveled compaction. Full-rewrite, run in parallel on disjoint level sets, bounded per-job memory, fsync-before-commit, version-pinned GC so a long scan never sees a file disappear mid-read.
  • Iceberg export on demand. db.export_iceberg(path) writes a spec-clean Iceberg v2 chain — metadata.json + manifest-list Avro + manifest Avro — that DuckDB iceberg_scan, pyiceberg, Spark, Trino, and Athena consume as-is. You call export_iceberg when you want the view. merutable's metadata layer efficiency is not bound by the Iceberg spec.
  • Change feed. Committed operations are exposed as a change feed table provider with seq > N predicate pushdown and per-DELETE pre-image reconstruction.
  • Read-only replica (opt-in). Base + tail replayed from the change feed; rebase hot-swaps behind ArcSwap so in-flight readers never see a torn state.
  • Schema evolution. db.add_column(ColumnDef) — reopen accepts the extension, reads of pre-evolution files fill defaults, writes pad short rows with write_default.
  • Python bindings (via PyO3). crates/merutable-python/.

Install

[dependencies]
merutable = "0.0.2"

Architecture at a glance

          ┌──────── your process ────────┐
writes ──▶│ WAL → memtable → flush → SST │
reads  ◀──│   memtable  ∪  L0  ∪  L1…    │
          └─────────────┬────────────────┘
                        │  Parquet files on disk
                        ▼
              db.export_iceberg(path)
                        │
                        ▼
           DuckDB / Spark / Trino / pyiceberg

Deeper reads: docs/architecture.svg · docs/SEMANTICS.md · docs/EXTERNAL_READS.md · docs/MIRROR.md · docs/SCALE_OUT_REPLICA.md · docs/TAXONOMY.md · DEVELOPER.md

Status

Area 0.0.1
Storage format LSM tree layout optimized for both row and columnar. Iceberg v2-compatible.
Durability fsync on SST write, fsync on WAL, fsync on manifest commit.
Concurrency Designed for one primary writer per catalog (not yet lock-enforced); many concurrent readers via version pinning.

Named after Mount Meru — the axis around which the cosmos is ordered in Indian cosmology.

About

Embedded single-table engine in rust, where the data is both row and columnar and the metadata is Iceberg-compatible. Write rows to Table, that can be queried via SQL from DuckDB/Spark/Trino/Snowflake/SFDataCloud - zero ETL.

Topics

Resources

License

Code of conduct

Contributing

Security policy

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages