Skip to content

l-wang/join-order-benchmark

 
 

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

21 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Join Order Benchmark

Note: This repo is forked from the original Join Order Benchmark. The data sources referenced in the original README are no longer accessible. See the updated setup instructions below.

Quick Start

1. Download the data

The IMDB data can be downloaded from CedarDB's mirror: https://cedardb.com/docs/example_datasets/job

mkdir data && cd data
curl -OL https://bonsai.cedardb.com/job/imdb.tgz
tar -zxvf imdb.tgz
cd ..

2. Setup the database

Use setup.sh to create and populate the database:

# Without join stats
./setup.sh

# With explicit join stats (CREATE STATISTICS for joins) - recommended
./setup.sh -s explicit

# With implicit join stats (FK constraints + multi-column stats)
./setup.sh -s implicit

Results should be similar between explicit and implicit modes. If you just want to test with join stats, explicit is the more straightforward choice.

Options:

  • -d <dbname> - database name (default: imdb)
  • -s <mode> - join stats mode: none, implicit, or explicit (default: none)

3. Run benchmark queries

./run_queries.sh -o <output_dir> -n <on|off>

Options:

  • -o <dir> - output directory for EXPLAIN ANALYZE results (default: explain_results)
  • -n <on|off> - enable or disable nested loop joins via enable_nestloop GUC (default: on)

Example:

./run_queries.sh -o explain_results_run_0 -n on

Tips for benchmarking

For cold runs, restart the database and clear system caches before each run:

pg_ctl stop && sync && purge && pg_ctl start

Original README

This package contains the Join Order Benchmark (JOB) queries from:
"How Good Are Query Optimizers, Really?"
by Viktor Leis, Andrey Gubichev, Atans Mirchev, Peter Boncz, Alfons Kemper, Thomas Neumann
PVLDB Volume 9, No. 3, 2015
http://www.vldb.org/pvldb/vol9/p204-leis.pdf

This repository is not maintained by the original authors of the Join Order Benchmark. The purpose is to ease the distribution of Join Order Benchmark queries (e.g., as a git submodule).

Please be aware that the queries assume the data set from the original paper (see below for the CWI link). The queries might yield unexpected results when used on the "Frozen Data Set" generated by the step-by-step instructions below.

IMDB Data Set

The CSV files used in the paper, which are from May 2013, can be found at http://event.cwi.nl/da/job/imdb.tgz

The license and links to the current version IMDB data set can be found at http://www.imdb.com/interfaces

Step-by-step instructions

  1. download *gz files (unpacking not necessary)
wget ftp://ftp.fu-berlin.de/misc/movies/database/frozendata/*gz
  1. download and unpack imdbpy and the imdbpy2sql.py script
wget https://bitbucket.org/alberanid/imdbpy/get/5.0.zip
  1. create PostgreSQL database (e.g., name imdbload):
createdb imdbload
  1. transform *gz files to relational schema (takes a while)
imdbpy2sql.py -d PATH_TO_GZ_FILES -u postgres://username:password@hostname/imdbload

Now you should have a PostgreSQL database named imdbload with the imdb data. Note that this database has some secondary indexes (but not on all foreign key attributes). You can export all tables to CSV:

\copy aka_name to 'PATH/aka_name.csv' csv
\copy aka_title to 'PATH/aka_title.csv' csv
\copy cast_info to 'PATH/cast_info.csv' csv
\copy char_name to 'PATH/char_name.csv' csv
\copy comp_cast_type to 'PATH/comp_cast_type.csv' csv
\copy company_name to 'PATH/company_name.csv' csv
\copy company_type to 'PATH/company_type.csv' csv
\copy complete_cast to 'PATH/complete_cast.csv' csv
\copy info_type to 'PATH/info_type.csv' csv
\copy keyword to 'PATH/keyword.csv' csv
\copy kind_type to 'PATH/kind_type.csv' csv
\copy link_type to 'PATH/link_type.csv' csv
\copy movie_companies to 'PATH/movie_companies.csv' csv
\copy movie_info to 'PATH/movie_info.csv' csv
\copy movie_info_idx to 'PATH/movie_info_idx.csv' csv
\copy movie_keyword to 'PATH/movie_keyword.csv' csv
\copy movie_link to 'PATH/movie_link.csv' csv
\copy name to 'PATH/name.csv' csv
\copy person_info to 'PATH/person_info.csv' csv
\copy role_type to 'PATH/role_type.csv' csv
\copy title to 'PATH/title.csv' csv

To import the CSV files to another database, create all tables (see schema.sql and optionally fkindexes.sql) and run the same copy as above statements but replace the keyword "to" by "from".

Questions

Contact Viktor Leis (leis@in.tum.de) if you have any questions.

About

Join Order Benchmark (JOB)

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages

  • Shell 100.0%