Experimental Framework for "DEANN - Density Estimation from Approximate Nearest Neighbors"

The following repository contains the infrastructure to benchmark different KDE implementations.

Implementations

DEANN: ANN-based KDE, Random Sampling
HBE: LSH-based KDE, Random Sampling
scikit-learn

Datasets

Dataset	Size	Dimensions
ALOI	108,000	128
Census	2,458,285	68
Covtype	581,012	54
GloVe	1,193,514	100
last.fm	292,385	65
MNIST	60,000	784
MSD	515,345	90
Shuttle	58,000	9
SVHN	531,131	3072

Each dataset is automatically preprocessed from its raw definition, including bandwidth selection for mean kde values of 0.00001 to 0.01 in steps of 10. This happens when running the experiment for the first time on a dataset, or can be invoked explicitly by running

$ python preprocess_dataset.py --dataset shuffle --compute-bandwidth

Intended workflow

Installation

Requirements:

Python3
Docker

The framework is to supposed to be run through Docker. After cloning the repository, build the container as follows:

$ pip install -r requirements.txt
$ python install.py

The containers will setup all required libraries necessary to run the experiments.

After building the container, mount the repository and connect to the container as follows:

Running experiments

You can run individual experiments using run_exp.py. Please invoke python run_exp.py --help to get an overview over arguments.

$ python run_exp.py --dataset shuttle

takes care of (1) preprocessing the shuttle dataset, including bandwidth selection for target kde values; (2) running all algorithms present in algos.yaml on the dataset with target kde value 0.01.

Evaluating experiments

$ python data_export.py -o res.csv

produces a csv file containing results of all runs. The --help argument shows how to filter through results. The folder additional/... contains scripts to further navigate runs.

Running locally

In case that you have a setup that supports running deann and hbe locally, you do not need to go through the docker environment. Just run all experiments with the flag --no-docker. Take a look how the libraries are set up in ./install..

Evaluation

Reference

If you found this repository useful in academic work, please cite the following paper:

(arXiv link TBA)

Name		Name	Last commit message	Last commit date
Latest commit History 65 Commits
.github/workflows		.github/workflows
additional		additional
algorithms		algorithms
definitions		definitions
install		install
misc		misc
.dockerignore		.dockerignore
.gitignore		.gitignore
Dockerfile		Dockerfile
LICENSE		LICENSE
README.md		README.md
algos.yaml		algos.yaml
cmd_runner.py		cmd_runner.py
data_export.py		data_export.py
from_hdf5.py		from_hdf5.py
generate_algos.py		generate_algos.py
generate_algos_from_best_params.py		generate_algos_from_best_params.py
generate_askit.py		generate_askit.py
hacks.py		hacks.py
install.py		install.py
make_groundtruth.py		make_groundtruth.py
preprocess_datasets.py		preprocess_datasets.py
prune_timeouts.py		prune_timeouts.py
requirements.txt		requirements.txt
result.py		result.py
run_exp.py		run_exp.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Experimental Framework for "DEANN - Density Estimation from Approximate Nearest Neighbors"

Implementations

Datasets

Intended workflow

Installation

Running experiments

Evaluating experiments

Running locally

Evaluation

Reference

About

Releases

Packages

Contributors 2

Languages

mkarppa/deann-experiments

Folders and files

Latest commit

History

Repository files navigation

Experimental Framework for "DEANN - Density Estimation from Approximate Nearest Neighbors"

Implementations

Datasets

Intended workflow

Installation

Running experiments

Evaluating experiments

Running locally

Evaluation

Reference

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Contributors 2

Languages

Packages