Skip to content

ComparisonToHyrise1

Markus Dreseler edited this page Nov 13, 2018 · 4 revisions

Comparison to the old version of Hyrise

On this page, we will refer to the old version as Hyrise1 and to the new version as Hyrise2. In the rest of this wiki, we simply use Hyrise to refer to the current version.

In 2016, we started to rebuild Hyrise from scratch. While we left most of its proven concepts in place, we wrote an entirely new codebase that will serve as the basis for our future research projects.

Reasons for the Rewrite

This effort was driven by a number of both functional and nonfunctional requirements:

  • The original code was developed at a time when modern C++ features like smart pointers and move semantics were not common. Integrating these features into an existing codebase is cumbersome and prone to errors. With the new version of Hyrise, we introduced C++17 features in order to improve readability, maintainability, and performance of the database.

  • Rewriting core data structures such as tables and internal data stores gave us the possibility to apply some of the learnings of the past years. We found that the high number of virtual method calls in the previous version of Hyrise caused a significant overhead during execution. This number has been considerably reduced in Hyrise2.

  • The design of Hyrise2 much better accounts for NUMA awareness and NUMA-related optimizations.

  • In the previous version of Hyrise, query plans had to be written by hand, creating and connecting physical operators using our own JSON format. This was a tiresome and error-prone process that resulted in long files that were hard to maintain.

  • For better reproducibility of our research results and to allow for the use of Hyrise in practical exercises of our database lectures, we established a simple setup process. Using the install script and explicit dependency management, Hyrise2 can be set up in three steps and in under ten minutes.

New Features

  • We have built a full SQL pipeline that starts with our own SQL Parser, translates the SQL query into an abstract syntax tree (AST), performs a number of rule-based optimizations, and returns an executable query plan.

TODO:

  • chunks
  • psql networking
  • plan visualization

Still missing

Most features of Hyrise1 are supported by Hyrise2 and many more have been added. In the following areas, we have not (yet) reimplemented some features that were discussed in prior publications:

Hybrid Column Layouts

One of the earliest research ideas evaluated on Hyrise1 was that of hybrid column layouts. The idea was that when certain columns are almost always accessed together, they should also be physically stored together. For example, think of the quantity of a product, which is often stored as a number and a unit (e.g., 15 kg). By storing these two values in the same cache line, they can be loaded and stored at the same time. Other, unrelated columns are still stored seperately with the known benefits (and drawbacks) of the columnar layout.

Currently, we are not pursuing any further research into this area, which is why we have not yet implemented hybrid layouts in Hyrise2. From a conceptual point of view, this would be straight-forward, as we can make use of our architecture for compressed columns. A simple approach would be to add a ProxySegment : public BaseEncodedSegment that shares physical storage with the other proxy segments in the same hybrid segment group. On the execution side, the proxy segment would behave just like any other encoded segment, so that operators do not need to change.

While the storage- and execution-layer implementations of hybrid columns can be done with a relatively low effort, more steps would have to be made for these to be meaningful. This includes automatically deciding how to group columns, making the optimizer aware of hybrid columns, configuration options for im- and export, DDL commands, and more. In the spirit of keeping Hyrise2 a lightweight platform for research experiment, we will thus only revisit hybrid columns once new research ideas emerge.

You can’t perform that action at this time.