Switch branches/tags
Nothing to show
Find file History
Fetching latest commit…
Cannot retrieve the latest commit at this time.
Permalink
..
Failed to load latest commit information.
RPM Add Shard-Query Sep 14, 2013
bin last fixes to provisioner. added starting message to other workers Nov 17, 2014
bundle DooPHP HTTP Digest Fix Aug 6, 2014
doc Add Shard-Query Sep 14, 2013
include Fix problem with ICE + limit 0 Jun 12, 2016
log Need log dir Apr 7, 2014
proxy Add Shard-Query Sep 14, 2013
testsuite Add window function tests Sep 22, 2014
tools Fix QGEN and add SQ client Apr 28, 2015
ui fix user.inc path May 6, 2014
INFOBRIGHT_README Add special infobright message Feb 17, 2014
INSTALL
README.md Typo fix May 6, 2014
Vagrantfile Make vagrant multi-core and increase memory size Oct 12, 2013
bootstrap.ini.example update example config Apr 19, 2014
install_config_repo.php Add support for storage engine selection by editing one line in .sql … Feb 13, 2014
point_at_config_repo.php
provision.sh last fixes to provisioner. added starting message to other workers Nov 17, 2014
setup_virtual_schema.php - added 'batch' option suport to common.php and setup_virtual_schema.php Nov 13, 2014
shard_query.sql
sq_helper.sql

README.md

Shard-Query: Easy to use massively parallel processing OLAP scale-out (grid computing) for MySQL

Shard-Query is a high performance MySQL query engine for which offers increased parallelism compared to stand-alone MySQL. This increased parallelism is achieved by taking advantage of MySQL [partitioning](http://dev.mysql.com/doc/refman/5.5/en/partitioning.html partitioning), MySQL sharding, common MySQL query clauses like BETWEEN and IN, or some combination of the above.

The primary goal of Shard-Query is to enable low-latency query access to extremely large volumes of data utilizing commodity hardware and open source database software. Shard-Query is a federated query engine which is designed to perform as much work in parallel as possible over a sharded MySQL dataset, that is one that is split over multiple servers (shards) or partitioned tables.

###What kind of interfaces does Shard-Query have

  • A RESTful UI which allows you to submit queries and examine results as well as configure Shard-Query
  • A MySQL proxy script
  • A PHP Object Oriented interface

###What kind of queries are supported?

  • You can run just about all SQL queries over your dataset:
  • For SELECT queries:
    • All aggregate functions are supported.
      • SUM,COUNT,MIN,MAX,AVG,STD,VAR are the fastest aggregate operations
      • SUM/COUNT/AVG(DISTINCT ..) are supported, but are slower
      • Custom aggregate functions are also supported.
        • PERCENTILE(expr, N) - take a percentile, for example percentile(score,90)
    • JOINs are supported (unshareded tables are duplicated on all nodes to support JOINS)
    • ORDER BY, GROUP BY, HAVING, WITH ROLLUP, and LIMIT are supported
  • Also supports INSERT, UPDATE, DELETE
  • Also supports DDL such as CREATE TABLE, ALTER TABLE and DROP TABLE

###Key Features

  • MPP - distributed query engine runs fragments of queries in parallel, combining the results at the end.
  • Supports almost all MySQL features
  • Virtual Schema - All shards are treated as one virtual database.
  • Automatic Sharding
  • Massively parallel loader
  • Shard Elimination - When possible, Shard-Query sends queries only to the shards containing the requested data.
  • Shared Nothing Architecture - Aggregation, joins and filtering are always performed at the shard level which fully distributes the work
  • Works similar to a map/reduce except that it understands complex SQL.
  • Supports asynchronous queries for long running jobs

###Massively Parallel Query The following SQL features enable parallel query execution:

  • Data level paralellism

    • partitioning
    • sharding
  • Operator level

    • UNION
    • UNION ALL
    • IN clauses
    • BETWEEN (with integer or date operands)
    • subqueries in the FROM clause