Non-Parametric Estimation in Information Theory

1. Introduction

This is a repository for our paper on: "Evaluating Density- and Nearest Neighbor-based Methods to Accurately Estimate Information-Theoretic Quantities from Multi-Dimensional Sample Data".

The projects is organizes as follows:

├── analysis_results\
│   ├── plots\
├── data_evaluation\
│   ├── data\
│   ├── notebooks\
│   ├── results\
│   ├── utils\
│   ├── (...) scripts
├── data_generation\
├── README.md
└── .gitignore

2. Installation

Code was written in Python 3.11.5 but should be compatible with later and earlier versions of Python down to Python 3.6. Check the requirements.txt file for any dependency issues.

Usage is recommended by cloning the repository to a local directory and setting up the required environment using venv and pip:

    python -m venv .venv
    source .venv/Scripts/activate
    pip install -r requirements.txt

3. Generating Data

Initially data is generated and stored in the data_evaluation/data directory using the script in the data_generation/ directory. The data for the experiments is stored as an HDF5 database.

From the root directory:

    python data_generation/data_generation.py

Note: as the data.hdf5 file is ~123 GB, it is recommended to be locally generated. This process takes about ~12 hrs in an Intel Xeon E5-26280 v2 but shouldn't vary too much in any modern CPU.

4. Conducting an Evaluation

The scripts in the directory data_evaluation/ are used to read the data and perform the experiments. Results are stored in the results/ directory.

Again, from the root directory:

    python data_evaluation/eval_bin_entropy.py

All of the names of the scripts have the format eval_{estimator}_{quantity}.py. In total, 12 scripts must be run, tree for each estimator: binning, KDE, numerical integration of KDE and k-NN.

The notebooks/ directory serves as an archive of the development of the workflow to test each estimator. The contents of each notebook are generally the same as the code in the scripts. Log files describe the history of the project.

5. Visualizing Results

The analysis_results directory contains a notebook to create the plots used in the paper, as well as a script to read the log files and calculate the time per iteration of the different experiments.

The plots are generated using the results from the data_evaluation/results directory. Results are read from .hdf5 files.

5. Contact

All results produced using the UNITE Toolbox.

Contact: manuel.alvarez-chaves@simtech.uni-stuttgart.de

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

analysis_results

analysis_results

data_evaluation

data_evaluation

data_generation

data_generation

.gitignore

.gitignore

README.md

README.md

requirements.txt

requirements.txt

Repository files navigation

Non-Parametric Estimation in Information Theory

1. Introduction

2. Installation

3. Generating Data

4. Conducting an Evaluation

5. Visualizing Results

5. Contact

About

Releases

Packages

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 48 Commits
analysis_results		analysis_results
data_evaluation		data_evaluation
data_generation		data_generation
.gitignore		.gitignore
README.md		README.md
requirements.txt		requirements.txt

manuel-alvarez-chaves/estimators-paper

Folders and files

Latest commit

History

Repository files navigation

Non-Parametric Estimation in Information Theory

1. Introduction

2. Installation

3. Generating Data

4. Conducting an Evaluation

5. Visualizing Results

5. Contact

About

Resources

Stars

Watchers

Forks

Languages