Apache Parquet implementation in Rust
Clone or download

README.md

parquet-rs

Build Status Coverage Status License API docs Master API docs

An Apache Parquet implementation in Rust (work in progress)

Usage

Add this to your Cargo.toml:

[dependencies]
parquet = "0.4"

and this to your crate root:

extern crate parquet;

Example usage of reading data:

use std::fs::File;
use std::path::Path;
use parquet::file::reader::{FileReader, SerializedFileReader};

let file = File::open(&Path::new("/path/to/file")).unwrap();
let reader = SerializedFileReader::new(file).unwrap();
let mut iter = reader.get_row_iter(None).unwrap();
while let Some(record) = iter.next() {
  println!("{}", record);
}

See crate documentation on available API.

Supported Parquet Version

  • Parquet-format 2.4.0

To update Parquet format to a newer version, check if parquet-format version is available. Then simply update version of parquet-format crate in Cargo.toml.

Features

  • All encodings supported
  • All compression codecs supported
  • Read support
    • Primitive column value readers
    • Row record reader
    • Arrow record reader
  • Statistics support
  • Write support
    • Primitive column value writers
    • Row record writer
    • Arrow record writer
  • Predicate pushdown
  • Parquet format 2.5 support
  • HDFS support

Requirements

  • Rust nightly

See Working with nightly Rust to install nightly toolchain and set it as default.

Build

Run cargo build or cargo build --release to build in release mode. Some features take advantage of SSE4.2 instructions, which can be enabled by adding RUSTFLAGS="-C target-feature=+sse4.2" before the cargo build command.

Test

Run cargo test for unit tests.

Binaries

The following binaries are provided (use cargo install to install them):

  • parquet-schema for printing Parquet file schema and metadata. Usage: parquet-schema <file-path> [verbose], where file-path is the path to a Parquet file, and optional verbose is the boolean flag that allows to print full metadata or schema only (when not specified only schema will be printed).

  • parquet-read for reading records from a Parquet file. Usage: parquet-read <file-path> [num-records], where file-path is the path to a Parquet file, and num-records is the number of records to read from a file (when not specified all records will be printed).

If you see Library not loaded error, please make sure LD_LIBRARY_PATH is set properly:

export LD_LIBRARY_PATH=$LD_LIBRARY_PATH:$(rustc --print sysroot)/lib

Benchmarks

Run cargo bench for benchmarks.

Docs

To build documentation, run cargo doc --no-deps. To compile and view in the browser, run cargo doc --no-deps --open.

License

Licensed under the Apache License, Version 2.0: http://www.apache.org/licenses/LICENSE-2.0.