The rewritten Hyrise, in contrast to its predecessor, offers an SQL interface. Typically, there is no need for research database systems to support the processing of SQL queries. In order to enable simpler benchmarking and also opening up opportunitites for research in the area of query optimization, we decided to include an SQL interface.
The SQL interface's modular architecture consists of four core components each producing intermediary results as shown in above figure. We describe each component in more detail in this article. The input to the interface is an SQL string and its output being a physical query plan that is executed by our execution engine. Throughout the next subsections we will look at the following example statement:
SELECT l_linenumber FROM lineitem WHERE l_shipdate >= '1994-01-01' AND l_quantity < 24;
The SQL parser transforms a given string into C++ data structures that form an abstract syntax tree (AST).
The parser was developed as a standalone module and is integrated into Hyrise as a git submodule. However, it could (and is) used independently in other environments. Further information and development documentation can be found at its GitHub repository.
The SQL translator takes the formerly produced AST and transforms it into a Logical Query Plan (LQP) thereby changing the abstraction level. A LQP representation is closer to the relational algebra as the aforementioned AST. LQPs are directed acyclic graphs consisting of LQPNodes. The nodes of an LQP represent operations like the application of predicates or joins. While LQPs contain information about which logical operations to apply to data they do not specifically state which physical operator should handle the data. An example can be seen in the following figure.
The optimizer reads and transforms the formerly produced LQP. During this phase rules and optimization strategies are applied. These can influence the order of nodes. Also, new nodes can be added and old nodes removed. The output of this step is a new LQP. Furthermore, some nodes might be enriched with more detailed information which come into play during the next step, LQP translation. Let's take a look at our example which contains multiple predicate nodes which means that multiple attributes are filtered. Currently, Hyrise can choose between a couple of alternatives to filter data: a full column scan, an index scan (if an index is present on the particular attribute) or a JIT scan. The optimizer determines which of these alternatives is most efficient to filter the data based on statistics and cost functions. As a result, the predicates are annotated with information that is later used to translate from logical to specific physical operators. The following figure shows an optimized LQP. In contrast to the first figure, the predicate nodes have been reordered for efficiency reasons because the predicate on
l_quantity is more selective.
The LQP translator takes a logical query plan and translates the contained nodes to physical operators, resulting in a so-called physical query plan (PQP). Physical operators are, in contrast to abstract logical operations, specific implementations. For example, join nodes from the LQP can in Hyrise's current development state be translated to either hash joins, sort-merge joins, or (index) nested loop joins. The PQP is handed to the execution engine where it is going to be executed by the scheduler. The following figure shows a physical query plan. It depicts also the effect of annotated LQP nodes since one of the predicate nodes got translated to an IndexScan.