Join Order Benchmark

Note: This repo is forked from the original Join Order Benchmark. The data sources referenced in the original README are no longer accessible. See the updated setup instructions below.

Quick Start

1. Download the data

The IMDB data can be downloaded from CedarDB's mirror: https://cedardb.com/docs/example_datasets/job

mkdir data && cd data
curl -OL https://bonsai.cedardb.com/job/imdb.tgz
tar -zxvf imdb.tgz
cd ..

2. Setup the database

Use setup.sh to create and populate the database:

# Without join stats
./setup.sh

# With explicit join stats (CREATE STATISTICS for joins) - recommended
./setup.sh -s explicit

# With implicit join stats (FK constraints + multi-column stats)
./setup.sh -s implicit

Results should be similar between explicit and implicit modes. If you just want to test with join stats, explicit is the more straightforward choice.

Options:

-d <dbname> - database name (default: imdb)
-s <mode> - join stats mode: none, implicit, or explicit (default: none)

3. Run benchmark queries

./run_queries.sh -o <output_dir> -n <on|off>

Options:

-o <dir> - output directory for EXPLAIN ANALYZE results (default: explain_results)
-n <on|off> - enable or disable nested loop joins via enable_nestloop GUC (default: on)

Example:

./run_queries.sh -o explain_results_run_0 -n on

Tips for benchmarking

For cold runs, restart the database and clear system caches before each run:

pg_ctl stop && sync && purge && pg_ctl start

Original README

This package contains the Join Order Benchmark (JOB) queries from:
"How Good Are Query Optimizers, Really?"
by Viktor Leis, Andrey Gubichev, Atans Mirchev, Peter Boncz, Alfons Kemper, Thomas Neumann
PVLDB Volume 9, No. 3, 2015
http://www.vldb.org/pvldb/vol9/p204-leis.pdf

This repository is not maintained by the original authors of the Join Order Benchmark. The purpose is to ease the distribution of Join Order Benchmark queries (e.g., as a git submodule).

Please be aware that the queries assume the data set from the original paper (see below for the CWI link). The queries might yield unexpected results when used on the "Frozen Data Set" generated by the step-by-step instructions below.

IMDB Data Set

The CSV files used in the paper, which are from May 2013, can be found at http://event.cwi.nl/da/job/imdb.tgz

The license and links to the current version IMDB data set can be found at http://www.imdb.com/interfaces

Step-by-step instructions

download *gz files (unpacking not necessary)

wget ftp://ftp.fu-berlin.de/misc/movies/database/frozendata/*gz

download and unpack imdbpy and the imdbpy2sql.py script

wget https://bitbucket.org/alberanid/imdbpy/get/5.0.zip

create PostgreSQL database (e.g., name imdbload):

createdb imdbload

transform *gz files to relational schema (takes a while)

imdbpy2sql.py -d PATH_TO_GZ_FILES -u postgres://username:password@hostname/imdbload

Now you should have a PostgreSQL database named imdbload with the imdb data. Note that this database has some secondary indexes (but not on all foreign key attributes). You can export all tables to CSV:

\copy aka_name to 'PATH/aka_name.csv' csv
\copy aka_title to 'PATH/aka_title.csv' csv
\copy cast_info to 'PATH/cast_info.csv' csv
\copy char_name to 'PATH/char_name.csv' csv
\copy comp_cast_type to 'PATH/comp_cast_type.csv' csv
\copy company_name to 'PATH/company_name.csv' csv
\copy company_type to 'PATH/company_type.csv' csv
\copy complete_cast to 'PATH/complete_cast.csv' csv
\copy info_type to 'PATH/info_type.csv' csv
\copy keyword to 'PATH/keyword.csv' csv
\copy kind_type to 'PATH/kind_type.csv' csv
\copy link_type to 'PATH/link_type.csv' csv
\copy movie_companies to 'PATH/movie_companies.csv' csv
\copy movie_info to 'PATH/movie_info.csv' csv
\copy movie_info_idx to 'PATH/movie_info_idx.csv' csv
\copy movie_keyword to 'PATH/movie_keyword.csv' csv
\copy movie_link to 'PATH/movie_link.csv' csv
\copy name to 'PATH/name.csv' csv
\copy person_info to 'PATH/person_info.csv' csv
\copy role_type to 'PATH/role_type.csv' csv
\copy title to 'PATH/title.csv' csv

To import the CSV files to another database, create all tables (see schema.sql and optionally fkindexes.sql) and run the same copy as above statements but replace the keyword "to" by "from".

Questions

Contact Viktor Leis (leis@in.tum.de) if you have any questions.

Name		Name	Last commit message	Last commit date
Latest commit History 21 Commits
.gitignore		.gitignore
10a.sql		10a.sql
10b.sql		10b.sql
10c.sql		10c.sql
11a.sql		11a.sql
11b.sql		11b.sql
11c.sql		11c.sql
11d.sql		11d.sql
12a.sql		12a.sql
12b.sql		12b.sql
12c.sql		12c.sql
13a.sql		13a.sql
13b.sql		13b.sql
13c.sql		13c.sql
13d.sql		13d.sql
14a.sql		14a.sql
14b.sql		14b.sql
14c.sql		14c.sql
15a.sql		15a.sql
15b.sql		15b.sql
15c.sql		15c.sql
15d.sql		15d.sql
16a.sql		16a.sql
16b.sql		16b.sql
16c.sql		16c.sql
16d.sql		16d.sql
17a.sql		17a.sql
17b.sql		17b.sql
17c.sql		17c.sql
17d.sql		17d.sql
17e.sql		17e.sql
17f.sql		17f.sql
18a.sql		18a.sql
18b.sql		18b.sql
18c.sql		18c.sql
19a.sql		19a.sql
19b.sql		19b.sql
19c.sql		19c.sql
19d.sql		19d.sql
1a.sql		1a.sql
1b.sql		1b.sql
1c.sql		1c.sql
1d.sql		1d.sql
20a.sql		20a.sql
20b.sql		20b.sql
20c.sql		20c.sql
21a.sql		21a.sql
21b.sql		21b.sql
21c.sql		21c.sql
22a.sql		22a.sql
22b.sql		22b.sql
22c.sql		22c.sql
22d.sql		22d.sql
23a.sql		23a.sql
23b.sql		23b.sql
23c.sql		23c.sql
24a.sql		24a.sql
24b.sql		24b.sql
25a.sql		25a.sql
25b.sql		25b.sql
25c.sql		25c.sql
26a.sql		26a.sql
26b.sql		26b.sql
26c.sql		26c.sql
27a.sql		27a.sql
27b.sql		27b.sql
27c.sql		27c.sql
28a.sql		28a.sql
28b.sql		28b.sql
28c.sql		28c.sql
29a.sql		29a.sql
29b.sql		29b.sql
29c.sql		29c.sql
2a.sql		2a.sql
2b.sql		2b.sql
2c.sql		2c.sql
2d.sql		2d.sql
30a.sql		30a.sql
30b.sql		30b.sql
30c.sql		30c.sql
31a.sql		31a.sql
31b.sql		31b.sql
31c.sql		31c.sql
32a.sql		32a.sql
32b.sql		32b.sql
33a.sql		33a.sql
33b.sql		33b.sql
33c.sql		33c.sql
3a.sql		3a.sql
3b.sql		3b.sql
3c.sql		3c.sql
4a.sql		4a.sql
4b.sql		4b.sql
4c.sql		4c.sql
5a.sql		5a.sql
5b.sql		5b.sql
5c.sql		5c.sql
6a.sql		6a.sql
6b.sql		6b.sql
6c.sql		6c.sql

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Join Order Benchmark

Quick Start

1. Download the data

2. Setup the database

3. Run benchmark queries

Tips for benchmarking

Original README

IMDB Data Set

Step-by-step instructions

Questions

About

Uh oh!

Releases

Packages

Languages

l-wang/join-order-benchmark

Folders and files

Latest commit

History

Repository files navigation

Join Order Benchmark

Quick Start

1. Download the data

2. Setup the database

3. Run benchmark queries

Tips for benchmarking

Original README

IMDB Data Set

Step-by-step instructions

Questions

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages