The Join Order Benchmark (JOB) is a set of queries designed to evaluate the performance of database systems. It is meant to be executed against data from IMDb, I have been using the benchmark during my PhD to evaluate query optimisation methods. This repository contains a simple setup for building a PostgreSQL database instance with all the necessary data inside a Docker container. You will then be able to connect to the database via localhost
. A pgweb instance is also made available for querying the data.
You first need to install the community edition of Docker; instructions can be found here.
The docker-compose.yml
file contains the details of the Docker instance that will be built. You can build it using the following command.
docker-compose build
You may then start the services. This will launch PostgreSQL, as well as the pgweb interface.
docker-compose up -d
You now have to provision the database with the IMDB data. This takes a few hours but thankfully you only have to do it once.
docker-compose run --rm job bash setup.sh
You can now access the pgweb interface by navigating to localhost:8081. You can stop the services once you are done.
docker-compose down
The connection details for the PostgreSQL instance are as follows:
- Host:
localhost
- Port:
5432
- Database name:
job
- User name:
postgres
- Password:
postgres
Here is a connection example using SQLAlchemy under Python 3:
import sqlalchemy
from sqlalchemy import orm
uri = 'postgresql://postgres:postgres@localhost:5432/job'
engine = sqlalchemy.create_engine(uri)
session = orm.sessionmaker(bind=engine)()
rows = session.execute('SELECT COUNT(*) FROM cast_info')
You can now run any of the JOB queries available from here. Feel free to get in touch with me at maxhalford25@gmail.com if you have any questions; or even better open an issue.