Skip to content

jerolba/parquet-carpet

Repository files navigation

Build Status Maven Central License javadoc codecov

Carpet: Parquet Serialization and Deserialization Library for Java

A Java library for serializing and deserializing Parquet files efficiently using Java records. This library provides a simple and user-friendly API for working with Parquet files, making it easy to read and write data in the Parquet format in your Java applications.

For comprehensive documentation, please visit our full documentation site.

Features

  • Serialize Java records to Parquet files
  • Deserialize Parquet files to Java records
  • Support nested data structures
  • Support nested Collections and Maps
  • Very simple API
  • Low level configuration of Parquet properties
  • Low overhead procesing files
  • Minimized parquet-java and hadoop transitive dependencies

Table of Contents

Installation

You can include this library in your Java project using Maven:

<dependency>
    <groupId>com.jerolba</groupId>
    <artifactId>carpet-record</artifactId>
    <version>0.4.0</version>
</dependency>

or using Gradle:

implementation 'com.jerolba:carpet-record:0.4.0'

Carpet includes only the essential transitive dependencies required for file read and write operations.

Basic Usage

To serialize and deserialize Parquet files in your Java application, you just need Java records. You don't need to generate classes or inherit from Carpet classes.

record MyRecord(long id, String name, int size, double value, double percentile)

Carpet provides a writer and a reader with a default configuration and convenience methods.

Serialization

Using reflection, Carpet defines Parquet file schema, and writes all the content of your objects into the file:

List<MyRecord> data = calculateDataToPersist();

try (OutputStream outputStream = new FileOutputStream("my_file.parquet")) {
    try (CarpetWriter<MyRecord> writer = new CarpetWriter<>(outputStream, MyRecord.class)) {
        writer.write(data);
    }
}

Deserialization

You just need to provide a File and Record class that match parquet schema to read:

List<MyRecord> data = new CarpetReader<>(new File("my_file.parquet"), MyRecord.class).toList();

If you don't know the schema of the file, or a Map is valid, you can deserialize to Map<String, Object>:

List<Map> data = new CarpetReader<>(new File("my_file.parquet"), Map.class).toList();

Advanced Usage

Carpet offers a rich set of features for advanced scenarios. For detailed explanations, API references, and examples, please refer to our comprehensive documentation site.

Key advanced topics include:

Build

To run the unit tests:

./gradlew test

To build the jars:

./gradlew assemble

The build runs in GitHub Actions:

Build Status

Contribute

Feel free to dive in! Open an issue or submit PRs.

Any contributor and maintainer of this project follows the Contributor Covenant Code of Conduct.

License

Apache 2 © Jerónimo López

About

Java Parquet serialization and deserialization library using Java 17 Records

Topics

Resources

License

Code of conduct

Stars

Watchers

Forks

Packages

No packages published

Contributors 2

  •  
  •  

Languages