An algorithm for clustering with hard and soft must-link and cannot-link constraints. A detailed description of the algorithm can be found in https://arxiv.org/abs/2212.14437. The data of the paper is available under https://github.com/phil85/PCCC-Data.
PCCC depends on:
- pandas==1.5.2
- scikit-learn==1.0.2
- numpy==1.23.5
- networkx==2.7.1
- scipy==1.9.3
- gurobipy==10.0.0
Gurobi is a commercial mathematical programming solver. Free academic licenses are available here.
- Clone this repository
- Download and install Gurobi (https://www.gurobi.com/downloads/)
- Install the other required packages
The main.py file contains code that applies the PCCC algorithm to an illustrative example.
labels = pccc(X, n_clusters,
ml=hard_must_link_constraints,
cl=hard_cannot_link_constraints,
sml=soft_must_link_constraints,
scl=soft_cannot_link_constraints,
sml_weights=confidence_levels_of_soft_must_link_constraints,
scl_weights=confidence_levels_of_soft_cannot_link_constraints,
random_state=24)
Please cite the following paper if you use this algorithm.
Baumann, P. and Hochbaum D. S. (2023): PCCC: The Pairwise-Confidence-Constraints-Clustering Algorithm. https://arxiv.org/abs/2212.14437
Bibtex:
@article{baumann2023pccc,
author={Baumann, Philipp and Hochbaum, Dorit S.},
booktitle={},
title = {PCCC: the pairwise-confidence-constraints-clustering algorithm},
year={2023},
url = {https://arxiv.org/abs/2212.14437},
doi = {10.48550/ARXIV.2212.14437},
}
This project is licensed under the MIT License - see the LICENSE file for details