SQL on S3

Data Engineering project comparing distributed SQL processing frameworks

Provides timings for basic SQL queries (aggregations, filters) and a self-join on the Reddit comments data set (converted to Parquet from JSON).

Requirements

Tested with:
Presto 0.136
Spark 1.5.2
Drill 1.4
(optional Hive 1.2.1)

Name		Name	Last commit message	Last commit date
Latest commit History 20 Commits
prep		prep
site		site
source		source
time_results		time_results
README.md		README.md