Skip to content

Statistics Component

Fabian Wiebe edited this page Aug 11, 2017 · 2 revisions

Architecture

The statistics component is used by the query optimizer in Hyrise.

The statistics for a table stored within the storage manager can be accessed via a getter function from the corresponding table. The storage manager adds a statistics table object to a table when the table is added to the storage manager. Each table statistics holds the column statistics for all columns if the table. Column statistics are created when accessed.

Column statistics currently only work with the minimum, maximum and distinct count of a column. This information is gathered when needed with the help of the aggregate operator.

On the long run the statistics component should offer for each operator in Hyrise a prediction of the result row count. The idea is that the statistics component offers the same interface to predict the result size of an operator as the corresponding operator. Instead of table input operators statistics uses other table statistics. Therefore, the statistics need to be nested in the same way as the operators in order to predict the final operators output result size.

When looking into the code, start off at the table statistics header: table_statistics.hpp

Assumptions within statistics component

  • Uniform value distribution is assumed within a column.
  • No dependencies between different columns.

Currently, only the prediction for predicates (table scans) is implemented.

Supported features in statistics for predicates

Column data type Scan type AllParameterVariant type  Supported
int, float, double ==, !=, <, <=, >, >=, between AllTypeVariant  [x]
string ==, != AllTypeVariant  [x]
int, float, double ==, !=, <, <=, >, >=, between ValuePlaceholder  [x]
string ==, !=, <, <=, >, >=, between ValuePlaceholder  [x]
int, float, double ==, !=, <, <=, >, >= ColumnName  [x]
string ColumnName  [ ]

No support for between for ColumnNames as table scan does not support this. In case a certain type is not supported a selectivity of 1 is assumed for the operation. So the statistics component can handle all requests and does not fail on not implemented types.

Updating statistics

Statistics are currently not updated after inserts, updates or deletes.