Skip to content
Hadoop MapReduce tool to convert Avro data files to Parquet format.
Find file
New pull request
Latest commit 11f24f9 @laserson Added readme
Failed to load latest commit information.
src/main/java/com/cloudera/science/avro2parquet Initial commit
.gitignore Initial commit Added readme
pom.xml Initial commit


Hadoop MapReduce program to convert Avro data files to Parquet format.


git clone
cd avro2parquet
mvn clean package

This will generate the jar files in the target/ directory.


This tool will work on Avro container files (which I believe is just the standard Avro data file format). It contains the Avro GenericRecord objects as the key and a NullWritable as the value.

The tool is currently hardcoded to output Snappy-compressed Parquet. It is simply a MapReduce job using the Tool interface.

The command is like so:

hadoop jar <avro2parquet jar file> \ \
<and generic options to the JVM> \
hdfs:///path/to/avro/schema.avsc \
hdfs:///path/to/avro/data \

so for example:

hadoop jar avro2parquet-0.1.0-jar-with-dependencies.jar \ \
-D \
hdfs:///user/lasersou/schemas/data.avsc \
hdfs:///user/lasersou/data \
Something went wrong with that request. Please try again.