GitHub - mindis/squall: A streaming / online query processing / analytics engine based on Apache Storm

#Squall Squall is an online query processing engine built on top of Storm. Similar to how Hive provides SQL syntax on top of Hadoop for doing batch processing, Squall executes SQL queries on top of Storm for doing online processing. Squall supports a wide class of SQL analytics ranging from simple aggregations to more advanced UDF join predicates and adaptive rebalancing of load. It is being actively developed by several contributors from the EPFL DATA lab. Squall is undergoing a continuous process of development, currently it supports the following:

Example:

Consider the following SQL query:

SELECT C_MKTSEGMENT, COUNT(O_ORDERKEY)
FROM CUSTOMER join ORDERS on C_CUSTKEY = O_CUSTKEY
GROUP BY C_MKTSEGMENT

We provide several interfaces for running this query:

Declarative

A Declarative interface that directly parses this SQL query and creates an efficient storm Topology. This module is implicitly equipped with a cost-based optimizer.

Functional

A Functional Scala-interface that leverages the brevity, productivity, convenience, and syntactic sugar of functional programming. For example the previous query is represented (full code) as follows:

    val customers = Source[customer]("customer").map { t => Tuple2(t._1, t._7) }
    val orders = Source[orders]("orders").map { t => t._2 }
    val join = customers.join(orders)(k1=> k1._1)(k2 => k2) //key1=key2
    val agg = join.groupByKey(x => 1, k => k._1._2) //count and groupby
    agg.execute(conf)

Imperative

An Imperative Java-interface that facilitates design and construction of online distributed query plans. For example the previous query is represented (full code) as follows:

Component customer = new DataSourceComponent("customer", conf)
                            .add(new ProjectOperator(0, 6));
Component orders = new DataSourceComponent("orders", conf)
                            .add(new ProjectOperator(1));
Component custOrders = new EquiJoinComponent(customer, 0, orders, 0) //key1 (index 0) =key2 (index 0)
                            .add(new AggregateCountOperator(conf).setGroupByColumns(1));

Queries are mapped to operator trees in the spirit of the query plans of relational database systems. These are are in turn mapped to Storm workers. (There is a parallel implementation of each operator, so in general an operator is processed by multiple workers). Some operations of relational algebra, such as selections and projections, are quite simple, and assigning them to separate workers is inefficient. Rather than requiring the predecessor operator to send its output over the network to the workers implementing these simple operations, the simple operations can be integrated into the predecessor operators and postprocess the output there. This is typically also done in classical relational database systems, but in a distributed environment, the benefits are even greater. In the Squall API, query plans are built bottom-up from operators (called components or super-operators) such as data source scans and joins; these components can then be extended by postprocessing operators such as projections.

Window Semantics Example

Squall also provides out-of-the-box functionality for window semantics. That is the user does not have to be concerned with internal details of assignining timestamps, data distribution and state maintenance and finally result consistency and correctness. Final results and aggregations are stored in key-value stores that expose window-identifiers and the corresponding timestamp ranges. The interface exposes the following semantics:

Sliding Window Semantics:

    //Examples
    Agg.onWindow(20, 5) //Range 20 secs and slide every 5 seconds
    Join.onSlidingWindow(10) // Range 10 seconds and slide every 1 second

Tumbling Window Semantics:

    //Examples
    Agg.onTumblingWindow(20) // Tumble aggregations every 20 seconds

Landmark Window Semantics.

Here is an example of a fully running query with window semantics.

Documentation

Detailed documentation can be found on the Squall wiki.

Contributing to Squall

We'd love to have your help in making Squall better. If you're interested, please communicate with us your suggestions and get your name to the Contributors list.

License

Squall is licensed under Apache License v2.0.

Name		Name	Last commit message	Last commit date
Latest commit History 597 Commits
contrib		contrib
logo		logo
project		project
squall-core/src		squall-core/src
squall-examples/squall-java-examples/src/ch/epfl/data/squall/examples/imperative		squall-examples/squall-java-examples/src/ch/epfl/data/squall/examples/imperative
squall-functional		squall-functional
squall-signals/src		squall-signals/src
squall-signals2/ch/epfl/data/squall		squall-signals2/ch/epfl/data/squall
test		test
.gitignore		.gitignore
.travis.yml		.travis.yml
LICENSE		LICENSE
NOTICE		NOTICE
README.markdown		README.markdown
build.sbt		build.sbt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Example:

Declarative

Functional

Imperative

Window Semantics Example

Documentation

Contributing to Squall

License

About

Releases

Packages

Languages

License

mindis/squall

Folders and files

Latest commit

History

Repository files navigation

Example:

Declarative

Functional

Imperative

Window Semantics Example

Documentation

Contributing to Squall

License

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages