This package provides a statistical inference framework for k-means clustering after domain adaptation (DA). It leverages the SI framework and employs a divide-and-conquer strategy to efficiently compute the p-value of selected features. Our method ensures reliable feature selection by controlling the false positive rate (FPR) while simultaneously maximizing the true positive rate (TPR), effectively reducing the false negative rate (FNR).
pip install -r requirements.txtWe provide several Jupyter notebooks demonstrating how to use the SCaDA.
- Example for computing p-values for k-means clustering after DA:
ex1_compute_pvalue.ipynb - Check the uniformity of the pivot:
ex2_validity_of_pvalue.ipynb
The SCaDA is available on the PyPI and can be installed as follows:
pip install PySCaDA