GitHub - postechdblab/learned-cardinality-estimation

Technical Paper (link)

Installation

Download Github repository

git clone https://github.com/postechdblab/learned-cardinality-estimation.git

Set up environments

# environment for FCN, MSCN, and FCN+Pool
bash environments/MSCN/setup.sh

# environment for E2E
bash environments/E2E/setup.sh

# environment for NeuroCard and UAE
bash environments/NeuroCard/setup.sh

# environment for DeepDB, DeepDB-JCT, and DeepDB-JCT-NARU
bash environments/DeepDB/setup.sh

Download datasets and unzip

# IMDB
# Original dataset: http://homepages.cwi.nl/~boncz/job/imdb.tgz
# We used Postgres exported version. The original dataset can make parsing errors in the Pandas parsing engine. 
link: https://drive.google.com/file/d/1j8DZd0TwZ6fBFN9FqXOzfYsFz_pDGiMp/view?usp=sharing
location: ./datasets/imdb

# TPC-DS
# Origianl dataset: http://tpc.org/tpc_documents_current_versions/current_specifications5.asp
# We used Postgres exported version. The original dataset can make parsing errors in the Pandas parsing engine. 
link: https://drive.google.com/file/d/1FIZjv6gsGFq74OXBGicusTfH1AxKRowu/view?usp=sharing
location: ./datasets/tpcds

# Synthetic
link: https://drive.google.com/file/d/1NrLysKrMIZ88Znnpm40HZKcYrMWI1iVF/view?usp=sharing
location: ./datasets/synthetic

# Sampled dataset (for bitmap features in query-driven methods)
link: https://drive.google.com/file/d/1xaJVLR9vcxsbW7Mx3eNVM2sF4PCTUQsN/view?usp=sharing
location: ./samples

Download models

# FCN
link: https://drive.google.com/file/d/19bVvA8_Yj9tsTrLvlMagtZ68U5EvLoDr/view?usp=sharing
location: ./models/FCN

# MSCN
link: https://drive.google.com/file/d/1YRmodqlRFPkoqBcDaT7wiKi2orkm84hl/view?usp=sharing
location: ./models/MSCN

# FCN+Pool
link: https://drive.google.com/file/d/1JT7PSl8J0Jjqk29Dkamq8SIgVtxoIm1r/view?usp=sharing
location: ./models/FCN+Pool

# E2E
link: https://drive.google.com/file/d/1G6C5xIZQMLbRWLcCqF8c70wU_eRShOru/view?usp=sharing
location: ./models/E2E

# NeuroCard
link: https://drive.google.com/file/d/1lH1SpNJFj9eXHbCBc372mFnJFtYEOdoK/view?usp=sharing
location: ./models/NeuroCard

# UAE
link: https://drive.google.com/file/d/18vmBRTUwOE-z9p4oKolev4tlNygcy7fL/view?usp=sharing
location: ./models/UAE

# DeepDB(+JCT, +NARU)
link: https://drive.google.com/file/d/1aIuMcl9dp6uaZ7NYadbzM29S-xkwe7_t/view?usp=sharing
location: ./models/DeepDB

Download training queries and unzip

link: https://drive.google.com/file/d/1-O-fckKGuea09x5IQoANhQxAmKh5eRb2/view?usp=sharing
location: ./train

Download workloads and unzip

link: https://drive.google.com/file/d/1nyYk_fYg5uBe0wpu8b9l_OwTK0GDudUM/view?usp=sharing
location: ./workloads

Download pre-trained word embeddings and unzip

link: https://drive.google.com/file/d/10-RvdESO6Z4OtlLPZ4EqGrl22Am4Bemv/view?usp=sharing
location: ./wordvectors

How to Run

Reproduce the FCN results in the paper

# Activate environment
source activate mscn
# Run experiment script
bash scripts/FCN.sh

# After executing script file, the result will be stored in ./results/FCN/<workload>_<database>.csv file.

Reproduce the MSCN results in the paper

# Activate environment
source activate mscn
# Run experiment script
bash scripts/MSCN.sh

# After executing script file, the result will be stored in ./results/MSCN/<workload>_<database>.csv file.

Reproduce the FCN+Pool results in the paper

# Activate environment
source activate mscn
# Run experiment script
bash scripts/FCN+Pool.sh

# After executing script file, the result will be stored in ./results/FCN+Pool/<workload>_<database>.csv file.

Reproduce the E2E results in the paper

# Activate environment
source activate e2e
# Run experiment script
bash scripts/E2E.sh

# After executing script file, the result will be stored in ./results/E2E/<workload>_<database>.csv file.

Reproduce the NeuroCard results in the paper

# Activate environment
source activate neurocard
# Run experiment script
bash scripts/NeuroCard.sh

# After executing script file, the result will be stored in ./results/NeuroCard/<workload>_<database>.csv file.

Reproduce the UAE results in the paper

# Activate environment
source activate neurocard
# Run experiment script
bash scripts/UAE.sh

# After executing script file, the result will be stored in ./results/UAE/<workload>_<database>.csv file.

Reproduce the DeepDB results in the paper

# Activate environment
source activate deepdb
# Run experiment script
bash scripts/DeepDB.sh

# After executing script file, the result will be stored in ./results/DeepDB/<workload>_<database>.csv file.

Reproduce the DeepDB-JCT results in the paper

# Activate environment
source activate deepdb
# Run experiment script
bash scripts/DeepDB-JCT.sh

# After executing script file, the result will be stored in ./results/DeepDB-JCT/<workload>_<database>.csv file.

Reproduce the DeepDB-JCT-NARU results in the paper

# Activate environment
source activate deepdb
# Run experiment script
bash scripts/DeepDB-JCT-NARU.sh

# After executing script file, the result will be stored in ./results/DeepDB-JCT-NARU/<workload>_<database>.csv file.

Synthetic dataset index

Syn-Single databases

Varying domain size:

Index	Domain size	Skewness	Correlation
01	10	1.0	0.8
02	100	1.0	0.8
00	1k	1.0	0.8
23	10k	1.0	0.8

Varying skewness:

Index	Domain size	Skewness	Correlation
03	1k	0.0	0.8
04	1k	0.2	0.8
05	1k	0.4	0.8
06	1k	0.6	0.8
07	1k	0.8	0.8
00	1k	1.0	0.8
18	1k	1.2	0.8
19	1k	1.4	0.8
20	1k	1.6	0.8
21	1k	1.8	0.8
22	1k	2.0	0.8

Varying correlation:

Index	Domain size	Skewness	Correlation
08	1k	1.0	0.0
09	1k	1.0	0.1
10	1k	1.0	0.2
11	1k	1.0	0.3
12	1k	1.0	0.4
13	1k	1.0	0.5
14	1k	1.0	0.6
15	1k	1.0	0.7
00	1k	1.0	0.8
16	1k	1.0	0.9
17	1k	1.0	1.0

Syn-Multi databases

Varying domain size:

Index	Fanout Domain Size	Fanout Skewness
02	10	1.0
00	100	1.0
13	1k	1.0

Varying skewness:

Index	Fanout Domain Size	Fanout Skewness
03	100	0.0
04	100	0.2
05	100	0.4
06	100	0.6
07	100	0.8
00	100	1.0
08	100	1.20
14	100	1.24
15	100	1.28
16	100	1.32
17	100	1.36
09	100	1.40
18	100	1.44
19	100	1.48
20	100	1.52
21	100	1.56
10	100	1.6
11	100	1.8
12	100	2.0

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

DeepDB

DeepDB

E2E

E2E

MSCN

MSCN

NeuroCard

NeuroCard

UAE

UAE

environments

environments

scripts

scripts

README.md

README.md

Repository files navigation

Installation

How to Run

Synthetic dataset index

Syn-Single databases

Syn-Multi databases

About

Releases

Packages

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 12 Commits
DeepDB		DeepDB
E2E		E2E
MSCN		MSCN
NeuroCard		NeuroCard
UAE		UAE
environments		environments
scripts		scripts
README.md		README.md

Index	Domain size	Skewness	Correlation
03	1k	0.0	0.8
04	1k	0.2	0.8
05	1k	0.4	0.8
06	1k	0.6	0.8
07	1k	0.8	0.8
00	1k	1.0	0.8
18	1k	1.2	0.8
19	1k	1.4	0.8
20	1k	1.6	0.8
21	1k	1.8	0.8
22	1k	2.0	0.8

Index	Domain size	Skewness	Correlation
08	1k	1.0	0.0
09	1k	1.0	0.1
10	1k	1.0	0.2
11	1k	1.0	0.3
12	1k	1.0	0.4
13	1k	1.0	0.5
14	1k	1.0	0.6
15	1k	1.0	0.7
00	1k	1.0	0.8
16	1k	1.0	0.9
17	1k	1.0	1.0

Index	Domain size	Skewness	Correlation
03	1k	0.0	0.8
04	1k	0.2	0.8
05	1k	0.4	0.8
06	1k	0.6	0.8
07	1k	0.8	0.8
00	1k	1.0	0.8
18	1k	1.2	0.8
19	1k	1.4	0.8
20	1k	1.6	0.8
21	1k	1.8	0.8
22	1k	2.0	0.8

Index	Domain size	Skewness	Correlation
08	1k	1.0	0.0
09	1k	1.0	0.1
10	1k	1.0	0.2
11	1k	1.0	0.3
12	1k	1.0	0.4
13	1k	1.0	0.5
14	1k	1.0	0.6
15	1k	1.0	0.7
00	1k	1.0	0.8
16	1k	1.0	0.9
17	1k	1.0	1.0

postechdblab/learned-cardinality-estimation

Folders and files

Latest commit

History

Repository files navigation

Installation

How to Run

Synthetic dataset index

Syn-Single databases

Syn-Multi databases

About

Resources

Stars

Watchers

Forks

Languages

Index	Domain size	Skewness	Correlation
03	1k	0.0	0.8
04	1k	0.2	0.8
05	1k	0.4	0.8
06	1k	0.6	0.8
07	1k	0.8	0.8
00	1k	1.0	0.8
18	1k	1.2	0.8
19	1k	1.4	0.8
20	1k	1.6	0.8
21	1k	1.8	0.8
22	1k	2.0	0.8

Index	Domain size	Skewness	Correlation
08	1k	1.0	0.0
09	1k	1.0	0.1
10	1k	1.0	0.2
11	1k	1.0	0.3
12	1k	1.0	0.4
13	1k	1.0	0.5
14	1k	1.0	0.6
15	1k	1.0	0.7
00	1k	1.0	0.8
16	1k	1.0	0.9
17	1k	1.0	1.0