Learning Exceptional Subgroups by End-to-End Maximizing KL-divergence

This is the Python implementation belonging to the paper Learning Exceptional Subgroups by End-to-End Maximizing KL-divergence. The repository also includes the code required to replicate the results from the paper e.g. the implementation of Robust Subgroup Discovery. We additionally provide a demo in a jupyter notebook to provide an easy starting point. All our experiments are run on a machine equipped with a DGX-A100, AMD EPYC 7742 and 256GB of memory.

1. Required packages

PyTorch
Numpy
Pandas
Matplotlib
flowtorch
pysubgroup
sklearn
tqdm

2. Folder organization

data: We provide the the insurance dataset used in the paper and the demo.
experiments: Python scripts used to run all the experiments
scripts: Bash scripts to run the experiments with various sets of parameters
src: Contains the source code of Syflow and RSD, as well as further utils to replicate the results

3. Demo

In the jupyter notebook demo.ipynb, we provide a demo to show how to use Syflow. As an example, we use the Insurance dataset provided in data/ and replicate the plot below. The parameters for Syflow can be set using the SyflowConfig class. To ease the readability, the majority of the code is implemented in src/demo_utils.py.

4. Reproducibility

To run the experiments from the paper, either the Python scripts in experiments. These scripts can be called individually or via the bash scripts provided in scripts. In both cases, the scripts generate a folder results with subfolders corresponding to the specific experiment suite. For real-world data, we also output the description for the discovered subgroups.

5. Citation

If you find our work useful for your research, please consider citing:

@inproceedings{xu2024learningicml,
  title = {Learning Exceptional Subgroups by End-to-End Maximizing KL-divergence},
  author = {Xu, Sascha and Walter, Nils Philipp and Kalofolias, Janis and Vreeken, Jilles},
  booktitle = {Proceedings of the International Conference on Machine Learning (ICML)},
  year = {2024},
  organization = {PMLR},
}

6. License

This work is licensed under CC BY-NC-SA 4.0

Name		Name	Last commit message	Last commit date
Latest commit History 9 Commits
data/insurance		data/insurance
experiments		experiments
imgs		imgs
scripts		scripts
src		src
.gitignore		.gitignore
LICENSE		LICENSE
Readme.md		Readme.md
demo.ipynb		demo.ipynb

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Learning Exceptional Subgroups by End-to-End Maximizing KL-divergence

1. Required packages

2. Folder organization

3. Demo

4. Reproducibility

5. Citation

6. License

About

Releases

Packages

Languages

License

nilspwalter/syflow

Folders and files

Latest commit

History

Repository files navigation

Learning Exceptional Subgroups by End-to-End Maximizing KL-divergence

1. Required packages

2. Folder organization

3. Demo

4. Reproducibility

5. Citation

6. License

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages