Scalable Package Builder System

This project contains the code for running experiments with the DIRECT and SKETCHREFINE algorithms presented in the paper:

Matteo Brucato, Juan Felipe Beltran, Azza Abouzied, Alexandra Meliou: Scalable Package Queries in Relational Database Systems. PVLDB 9(7): 576-587 (2016)

Contact Author: Matteo Brucato
Webpage: https://people.cs.umass.edu/~matteo/

Quick Setup

Run make from the root of this project (the folder that contains bin, etc).
Install PostgreSQL and create a database that contains your input tables.
Execute the SQL script "scripts/prepare_data_db.sql".
Set up an environmental varible called PB_HOME (and export it) that points to the root of this project.
Modify the file "cfg/example.cfg" (or create a new similar file) to reflect your database connection. Concentrate on [Data DB] (all settings) and [Folders] "dbms_folder". You don't need to modify the other settings.
Run bin/pb set cfg/example.cfg or use the setting file that you have created in step 5.
Create a file containing your PaQL query. Suppose this file is called "query.paql".
To solve the query using DIRECT, run:

bin/pb exprun paql_eval direct -q query.paql
To solve the query using SKETCHREFINE, run:

bin/pb exprun paql_eval sketchrefine -q query.paql -a* -C .10

Where -a* means "partition the dataset on all of the query attributes" and -C .10 means to partition until each partition is no more than 10% of the input dataset size.

Notice that the first time you run it, it will firstly partition the dataset. Then, when you re-run SKETCHREFINE with the same options, the partitioning phase will not be performed again: the system is able to detect whether the dataset is currently partitioned in the correct way. You can always bypass this automatic check by using the option --already-partitioned.
Read "src/experiments/paql_eval/sketchrefine.py" to learn the other command-line options you have for SKETCHREFINE, and "src/experiments/paql_eval/main.py" for the command-line options available for both DIRECT and SKETCHREFINE. For instance, you can list the exact partitioning columns you want to use, an absolute maximum partition size, an epsilon value for quality guarantee, time and memory limits, etc.

Name		Name	Last commit message	Last commit date
Latest commit History 6 Commits
bin		bin
cfg		cfg
scripts		scripts
sh		sh
src		src
.gitignore		.gitignore
LICENSE.md		LICENSE.md
Makefile		Makefile
README.md		README.md
_config.yml		_config.yml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

bin

bin

cfg

cfg

scripts

scripts

sh

sh

src

src

.gitignore

.gitignore

LICENSE.md

LICENSE.md

Makefile

Makefile

README.md

README.md

_config.yml

_config.yml

Repository files navigation

Scalable Package Builder System

Quick Setup

About

Releases

Packages

Languages

License

matteo-brucato/Scalable-PaQL-Queries

Folders and files

Latest commit

History

Repository files navigation

Scalable Package Builder System

Quick Setup

About

Resources

License

Stars

Watchers

Forks

Languages