Skip to content

smithclay/icepick

Repository files navigation

icepick

Crates.io Documentation License Rust

Experimental client for Apache Iceberg in Rust

icepick provides simple access to Apache Iceberg tables in AWS S3 Tables and Cloudflare R2 Data Catalog. Built on the official iceberg-rust library, icepick handles authentication, REST API details, and platform compatibility so you can focus on working with your data.


Why icepick?

Why not use iceberg-rust? This project targets WASM as a compilation target (not yet supported in iceberg-rust) and focuses on "serverless" catalogs that implement a subset of the overall Iceberg specification.

Features

Catalog Support

  • AWS S3 Tables — Full support with SigV4 authentication (native platforms only)
  • Cloudflare R2 Data Catalog — Full support with bearer token auth (WASM-compatible)
  • Generic REST Catalog — Build clients for any Iceberg REST endpoint (Nessie, Glue REST, custom)
  • Direct S3 Parquet Writes — Write Arrow data directly to S3 without Iceberg metadata

Developer Experience

  • Clean API — Simple factory methods, no complex builders
  • Type-safe errors — Comprehensive error handling with context
  • Zero-config auth — Uses AWS credential chain and Cloudflare API tokens
  • Production-ready — Used in real applications with real data

Platform Support

Catalog Linux/macOS/Windows WASM (browser/Cloudflare Workers)
S3 Tables ❌ (requires AWS SDK)
R2 Data Catalog
No Catalog (direct parquet to object storage)

Note: R2 Data Catalog and direct Parquet writes are fully WASM-compatible, making them suitable for Cloudflare Workers, browser applications, and other WASM environments.

Installation

Add to your Cargo.toml:

[dependencies]
icepick = "0.1"

Quick Start

AWS S3 Tables

use icepick::S3TablesCatalog;

#[tokio::main]
async fn main() -> Result<(), Box<dyn std::error::Error>> {
    // Create catalog from S3 Tables ARN
    let catalog = S3TablesCatalog::from_arn(
        "my-catalog",
        "arn:aws:s3tables:us-west-2:123456789012:bucket/my-bucket"
    ).await?;

    // Load a table
    let table = catalog.load_table(
        &"namespace.table_name".parse()?
    ).await?;

    Ok(())
}

Cloudflare R2 Data Catalog

use icepick::R2Catalog;

#[tokio::main]
async fn main() -> Result<(), Box<dyn std::error::Error>> {
    // Create catalog for R2
    let catalog = R2Catalog::new(
        "my-catalog",
        "account-id",
        "bucket-name",
        "api-token"
    ).await?;

    // Load a table
    let table = catalog.load_table(
        &"namespace.table_name".parse()?
    ).await?;

    Ok(())
}

Generic Iceberg REST Catalog

use icepick::{FileIO, RestCatalog};
use opendal::Operator;

#[tokio::main]
async fn main() -> Result<(), Box<dyn std::error::Error>> {
    // Configure your FileIO (S3, R2, filesystem, etc.)
    let operator = Operator::via_iter(opendal::Scheme::Memory, [])?;
    let file_io = FileIO::new(operator);

    // Build a catalog for any Iceberg REST endpoint (Nessie, Glue REST, custom services)
    let catalog = RestCatalog::builder("nessie", "https://nessie.example.com/api/iceberg")
        .with_prefix("warehouse")
        .with_file_io(file_io)
        .with_bearer_token(std::env::var("NESSIE_TOKEN")?)
        .build()?;

    let table = catalog.load_table(&"namespace.table".parse()?).await?;
    Ok(())
}

Authentication

AWS S3 Tables

Uses the AWS default credential provider chain in the following order:

  1. Environment variables (AWS_ACCESS_KEY_ID, AWS_SECRET_ACCESS_KEY)
  2. AWS credentials file (~/.aws/credentials)
  3. IAM instance profile (EC2)
  4. ECS task role

Important: Ensure your credentials have S3 Tables permissions.

Cloudflare R2 Data Catalog

Uses Cloudflare API tokens. To set up:

  1. Log into the Cloudflare dashboard
  2. Navigate to My ProfileAPI Tokens
  3. Create a token with R2 read/write permissions
  4. Pass the token when constructing the catalog

Direct S3 Parquet Writes

Need to write Parquet files directly to S3 for external tools (Spark, DuckDB, etc.) without Iceberg metadata? Use the arrow_to_parquet function:

use icepick::{arrow_to_parquet, FileIO, io::AwsCredentials};
use arrow::array::{Int32Array, StringArray};
use arrow::datatypes::{DataType, Field, Schema};
use arrow::record_batch::RecordBatch;
use parquet::basic::Compression;
use std::sync::Arc;

#[tokio::main]
async fn main() -> Result<(), Box<dyn std::error::Error>> {
    // Setup FileIO with AWS credentials
    let file_io = FileIO::from_aws_credentials(
        AwsCredentials {
            access_key_id: "your-key".to_string(),
            secret_access_key: "your-secret".to_string(),
            session_token: None,
        },
        "us-west-2".to_string()
    );

    // Create Arrow data
    let schema = Arc::new(Schema::new(vec![
        Field::new("id", DataType::Int32, false),
        Field::new("name", DataType::Utf8, false),
    ]));

    let batch = RecordBatch::try_new(
        schema,
        vec![
            Arc::new(Int32Array::from(vec![1, 2, 3])),
            Arc::new(StringArray::from(vec!["a", "b", "c"])),
        ],
    )?;

    // Simple write with defaults
    arrow_to_parquet(&batch, "s3://my-bucket/output.parquet", &file_io).await?;

    // With compression
    arrow_to_parquet(&batch, "s3://my-bucket/compressed.parquet", &file_io)
        .with_compression(Compression::ZSTD(parquet::basic::ZstdLevel::default()))
        .await?;

    // Manual partitioning (Hive-style or any structure)
    let date = "2025-01-15";
    let path = format!("s3://my-bucket/data/date={}/data.parquet", date);
    arrow_to_parquet(&batch, &path, &file_io).await?;

    Ok(())
}

Note: This writes standalone Parquet files without Iceberg metadata. For writing to Iceberg tables, use the Transaction API instead.

Examples

Explore complete working examples in the examples/ directory:

Example Description Command
s3_tables_basic.rs Complete S3 Tables workflow cargo run --example s3_tables_basic
r2_basic.rs Complete R2 Data Catalog workflow cargo run --example r2_basic

Development

Running Tests

cargo test

WASM Build

Verify R2Catalog compiles for WASM:

cargo build --target wasm32-unknown-unknown

Code Quality

# Format code
cargo fmt

# Run linter
cargo clippy -- -D warnings

# Check documentation
cargo doc --no-deps --all-features

Contributing

Contributions are welcome! Please feel free to submit issues and pull requests.

Acknowledgments

Built on the official iceberg-rust library from the Apache Iceberg project.

About

append-only rust iceberg client for Cloudflare R2 Catalog (wasm support) and Amazon S3 Tables

Topics

Resources

License

Stars

Watchers

Forks

Packages

No packages published