Quantifying TPC-H Choke Points and Their Optimizations

Welcome to the online resources for our paper "Quantifying TPC-H Choke Points and Their Optimizations". In this repository, we share instructions and the code needed for reproducing the results presented in the paper. Additionally, we provide the raw benchmark results as generated by the hyriseBenchmarkTPCH binary.

For any questions about setting up the environment, reproducing the benchmarks, or just to discuss the paper as such, please feel free to contact us.

Setting up Hyrise

Hyrise can be retrieved from Github. Our Step-by-Step Guide will take you through the steps needed to set up Hyrise, starting the console, or running the TPC-H benchmark.

The patch files in this repository are based on the paper/tpch tag. For better reproducibility, that tag will not be updated with more recent Hyrise developments. To use Hyrise beyond the experiments presented here, please use the current master branch.

To execute a benchmark as presented in the paper, first generate baseline results using

./hyriseBenchmarkTPCH -s 10 -o baseline.json

Next, apply the provided patch file using git apply patchfile.diff in the repositories root folder. After recompiling, re-run the benchmarks and store the results in a different file. You can then compare two runs using the ./scripts/compare_benchmarks.py script.

Section 1. Motivation

We crawled Google Scholar using this script from Strobel and Hofmann. Raw data is contained in the downloadable repository.

Section 3. Experimental Setup

The performance breakdown of Hyrise operators (Figure 2) can be obtained by running the benchmark with the --visualize parameter and calling the plot_performance_breakdown.py script.
For re-running the Hyrise TPC-H Comparison shown in Figure 3, execute hyriseBenchmarkTPCH. The average time for each benchmark query is given on the console.
Instructions for executing TPC-H on MonetDB can be found here. We used MonetDB 5 server 11.35.9, which was obtained using Ubuntu's package manager.
DuckDB was built from source (70c20f28f) using a patch to enable SF 10.

Section 4. Plan-Level Choke Points

Please find the mentioned diff files and the raw results in the respective folder or linked below.

4.1 Join Ordering

Baseline: original_join_order.diff (raw data not linked here, see folder)
(1) DPccp: unmodified Hyrise code
(2) Greedy: greedy.diff

4.2 Predicate Placement and Ordering

Baseline: no_pushdown_no_ordering.diff
(1) Pushdown without ordering: pushdown_no_ordering.diff
(2) Pushdown and ordering: pushdown_ordering.diff

4.3 Between Composition

Baseline: no_between_composition.diff
(1) Enabled composition: unmodified Hyrise code

4.4 Join-Dependent Predicate Duplication

Baseline: no_duplication.diff
(1) Enabled duplication: unmodified Hyrise code

4.5 Physical Locality

Baseline: unsorted.diff
(1) Shuffled lineitem and orders tables: shuffled.diff
(2) Physical locality without DBMS optimizations: sorted_no_pruning.diff
(3) Physical locality with DBMS optimizations but without usage of correlations: sorted_pruning_no_correlation.diff

4.6 Correlated Columns

Baseline: same as 4.5 (3)
(1) Exploitation of correlation: unmodified Hyrise code

4.7 Flattening Subqueries

Baseline: no_flattening.diff
(1) With subquery flattening: unmodified Hyrise code

4.8 Semi Join Reduction

Baseline: no_reduction.diff
(1) With indiscriminate reduction: indiscriminate_reduction.diff
(2) With cardinality-based reduction: unmodified Hyrise code

4.9 Subplan Reuse

Baseline: no_reuse.diff
(1) With subplan reuse and enabled column pruning: unmodified Hyrise code
(2) With subplan reuse and disabled column pruning: no_column_pruning.diff

Section 5. Logical Operator Choke Points

5.1 Dependent Group-By Keys

Baseline: no_dependent_groupby_detection.diff
Automatic group-by detection: unmodified Hyrise code
Manual group-by detection for foreign keys: manual_groupby_detection.diff

5.2 Large IN Clauses

Baseline: no_in_reformulation.diff
Reformulation of IN clauses: unmodified Hyrise code

Name		Name	Last commit message	Last commit date
Latest commit History 4 Commits
1_motivation		1_motivation
3_experimental_setup		3_experimental_setup
4_plan_level_choke_points		4_plan_level_choke_points
5_logical_operator_choke_points		5_logical_operator_choke_points
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

1_motivation

1_motivation

3_experimental_setup

3_experimental_setup

4_plan_level_choke_points

4_plan_level_choke_points

5_logical_operator_choke_points

5_logical_operator_choke_points

README.md

README.md

Repository files navigation

Quantifying TPC-H Choke Points and Their Optimizations

Setting up Hyrise

Section 1. Motivation

Section 3. Experimental Setup

Section 4. Plan-Level Choke Points

4.1 Join Ordering

4.2 Predicate Placement and Ordering

4.3 Between Composition

4.4 Join-Dependent Predicate Duplication

4.5 Physical Locality

4.6 Correlated Columns

4.7 Flattening Subqueries

4.8 Semi Join Reduction

4.9 Subplan Reuse

Section 5. Logical Operator Choke Points

5.1 Dependent Group-By Keys

5.2 Large IN Clauses

About

Releases

Packages

mweisgut/tpch_paper

Folders and files

Latest commit

History

Repository files navigation

Quantifying TPC-H Choke Points and Their Optimizations

Setting up Hyrise

Section 1. Motivation

Section 3. Experimental Setup

Section 4. Plan-Level Choke Points

4.1 Join Ordering

4.2 Predicate Placement and Ordering

4.3 Between Composition

4.4 Join-Dependent Predicate Duplication

4.5 Physical Locality

4.6 Correlated Columns

4.7 Flattening Subqueries

4.8 Semi Join Reduction

4.9 Subplan Reuse

Section 5. Logical Operator Choke Points

5.1 Dependent Group-By Keys

5.2 Large IN Clauses

About

Resources

Stars

Watchers

Forks