Sparser: Raw Filtering for Faster Analytics over Raw Data
Switch branches/tags
Nothing to show
Clone or download
JustinAzoff and sppalkia problem: control-d goes into an infinite loop. (#1)
fgets returning NULL means eof (or error) and the loop should break, not
Latest commit 2be82e3 Sep 4, 2018


This code base implements Sparser, raw filtering for faster analytics over raw data. Sparser can parse JSON, Avro, and Parquet data up to 22x faster than the state of the art. For more details, check out our paper published at VLDB 2018.

See the demo-repl directory for a brief example. To run it:

# update rapidjson submodule
git submodule init
git submodule update
cd demo-repl
./bench /path/to/large/file.json

Then enter 1 at the Sparser> prompt.

Sparser itself is just a header file and only depends on standard C libraries available on most systems.