Skip to content
BlinkDB: Sub-Second Approximate Queries on Very Large Data.
Scala Shell Java Other
Branch: alpha-0.2.0
Clone or download

Files

Permalink
Type Name Latest commit message Commit time
Failed to load latest commit information.
bin
conf
data/files
hive_blinkdb @ 8a5d550
lib
project
sbt
src
.gitignore
.gitmodules
LICENSE
README.md
run
scalastyle-config.xml

README.md

BlinkDB

Queries with Bounded Errors and Bounded Response Times on Very Large Data

BlinkDB is a large-scale data warehouse system built on Shark and Spark and is designed to be compatible with Apache Hive. It can answer HiveQL queries up to 200-300 times faster than Hive by executing them on user-specified samples of data and providing approximate answers that are augmented with meaningful error bars. BlinkDB 0.1.0 is an alpha developer release that supports creating/deleting samples on any input table and/or materialized view and executing approximate HiveQL queries with those aggregates that have statistical closed forms (i.e., AVG, SUM, COUNT, VAR and STDEV).

BlinkDB requires:

  • Scala 2.10.x
  • Spark 0.9.x

For current documentation, see the BlinkDB Wiki.

For more information about the BlinkDB Project, see the BlinkDB Website.

You can’t perform that action at this time.