A Java library for serializing and deserializing Parquet files efficiently using Java records. This library provides a simple and user-friendly API for working with Parquet files, making it easy to read and write data in the Parquet format in your Java applications.
For comprehensive documentation, please visit our full documentation site.
- Serialize Java records to Parquet files
- Deserialize Parquet files to Java records
- Support nested data structures
- Support nested Collections and Maps
- Very simple API
- Low level configuration of Parquet properties
- Low overhead procesing files
- Minimized
parquet-java
and hadoop transitive dependencies
You can include this library in your Java project using Maven:
<dependency>
<groupId>com.jerolba</groupId>
<artifactId>carpet-record</artifactId>
<version>0.4.0</version>
</dependency>
or using Gradle:
implementation 'com.jerolba:carpet-record:0.4.0'
Carpet includes only the essential transitive dependencies required for file read and write operations.
To serialize and deserialize Parquet files in your Java application, you just need Java records. You don't need to generate classes or inherit from Carpet classes.
record MyRecord(long id, String name, int size, double value, double percentile)
Carpet provides a writer and a reader with a default configuration and convenience methods.
Using reflection, Carpet defines Parquet file schema, and writes all the content of your objects into the file:
List<MyRecord> data = calculateDataToPersist();
try (OutputStream outputStream = new FileOutputStream("my_file.parquet")) {
try (CarpetWriter<MyRecord> writer = new CarpetWriter<>(outputStream, MyRecord.class)) {
writer.write(data);
}
}
You just need to provide a File and Record class that match parquet schema to read:
List<MyRecord> data = new CarpetReader<>(new File("my_file.parquet"), MyRecord.class).toList();
If you don't know the schema of the file, or a Map is valid, you can deserialize to Map<String, Object>
:
List<Map> data = new CarpetReader<>(new File("my_file.parquet"), Map.class).toList();
Carpet offers a rich set of features for advanced scenarios. For detailed explanations, API references, and examples, please refer to our comprehensive documentation site.
Key advanced topics include:
- API Details:
- Schema and Data Handling:
- Configuration & Low-Level Access:
To run the unit tests:
./gradlew test
To build the jars:
./gradlew assemble
The build runs in GitHub Actions:
Feel free to dive in! Open an issue or submit PRs.
Any contributor and maintainer of this project follows the Contributor Covenant Code of Conduct.
Apache 2 © Jerónimo López