Skip to content
A command line tool to decompress snappy files produced by Hadoop
Branch: master
Clone or download
Fetching latest commit…
Cannot retrieve the latest commit at this time.
Permalink
Type Name Latest commit message Commit time
Failed to load latest commit information.
src
LICENSE
Makefile
README.md

README.md

snappycat

A command line tool to decompress snappy files produced by Hadoop.

The snappy files produced by Hadoop contain data headers produced by BlockCompressorStream. Each block is prepended with two 32-bit integers stating the size of the decompressed data and size of the compressed data, respectively.h Therefore, snappy library cannot directly decompress it. This command line utility handles the headers and decompresses the data without any dependencies on the Hadoop libraries.

Build

snappycat requires the snappy library to build.

To install it under Ubuntu, run:

sudo apt-get install libsnappy-dev

Usage

List the files to decompress as arguments, and snappycat would output the concatenated results to the standard output.

./snappycat DIRECTORY/*.snappy

When no arguments are given, standard input is used as the source of the compressed data.

cat DIRECTARY/*.snappy | snappycat

snappycat is able to handle the input correctly even if some files don't have any records.

To save the output as a file:

./snappycat DIRECTORY/*.snappy > output.txt
You can’t perform that action at this time.