Permalink
Find file Copy path
ed5d423 Mar 8, 2018
4 contributors

Users who have contributed to this file

@sandy-may @grbinho @aloneguid @dominiqueplante
31 lines (15 sloc) 2.01 KB

Getting started with Parquet development

If you are a complete novice to Parquet we would recommend starting with these documents:

Encodings and types

If you are looking for a description of parquet encodings please follow this link.

To understand how Parquet represents rich logical types read this

Reference implementations

There are already working implementations in other languages we find useful to check we are doing things right or when stuck understanding how a particular feature is supposed to work.

parquet-mr is an official specification repository containing Thrift definitions for data structures within the Parquet file. This spec is referenced by any library that implements Parquet.

parquet-mr is an official Java implementation, somewhat over-engineered, however the most stable.

fastparquet is probably the best implementation for Python, and it is extremely easy to follow. This is also our library of choice to work with the parquet format (of course, before parquet-dotnet was created :) )

parquet-cpp is an awful implementation using the C++ language, struggling both with code quality and compatibility. I would not recommend looking at it if you are new to parquet.

3rd Party Libraries

Snappy Sharp is used to compress and decompress via Snappy Algorithm