# Hyper-parameters for clustering parameters optimization (CPO)

UNAGI has a built-in clustering parameters optimization strategy to maintain consistency in cluster numbers and sizes, as well as the distances between cell neighbors, across various time-points. The improper number of neighbors in the neighborhood graph or the improper resolution setting can lead to over-clustering or underclustering, introducing complications in the analysis process. The consistency in the number and size of clusters is important for tracing the lineage of cell populations through various time-points of development or disease progression. The proposed CPO method encompasses two primary steps. 

-   Searching for the optimal number of neighbors to construct graphs with consistent cellneighbor distances across different time-points. Starting by selecting an anchor stage, which is the stage with a cell count closest to the median count of all time-points, denoted as $N_{anchor}$. Then the average distance between cells is caculated and their neighbors in this anchor stage are identified to establish the `anchor neighbor distance`. The goal for other time-points is to find a number of neighbors that yields a neighbor distance similar to that of the anchor stage. Noted that the number of neighbors should be within the pre-defined range $[N_{min}, N_{max}]$.

-   Determining the optimal clustering resolution. A resolution range $[R_{min},R_{max}]$ should be predefined for different time-points. CPO strategy will find a set of resolutions within the predefined range to have a similar median number of cells per cluster across time-points.

By employing the CPO method, UNAGI ensures that the neighborhood graphs for different stages maintain similar cell-neighbor distances. Additionally, this approach ensures a consistent number and size of clusters across different stages, thereby enhancing the coherence and robustness of our analytical framework.

Users can specify hyper-parameters described above using the function `UNAGI().run_UNAGI`. The parameter `CPO_parameters` should be a **dictionary**:

-   anchor_neighbors: $N_{anchor}$
-   max_neighbors: $N_{max}$
-   min_neighbors: $N_{min}$
-   resolution_min: $R_{min}$
-   resolution_max: $R_{max}$

The larger number of neighbors will lead to a more sparse cell neighbors graph and potentially lead to larger clusters. On the other hand, the smaller number of neighbors will have a more condensed cell neighbor graphs which could lead to smaller clusters. For the resolution, typically increasing the resolution will lead to more clusters.



In [None]:
import warnings
warnings.filterwarnings("ignore")
from UNAGI import UNAGI
unagi = UNAGI()

#....... load the data and setup the model architecture and hyperparameters ..........#

iDREM_Path = 'directory_to_iDREM_tool'

anchor_neighbors = 15
max_neighbors = 30
min_neighbors = 10
resolution_min = 0.5
resolution_max = 1.2

unagi.register_CPO_parameters(anchor_neighbors=anchor_neighbors, max_neighbors=max_neighbors, min_neighbors=min_neighbors, resolution_min=resolution_min, resolution_max=resolution_max)
unagi.run_UNAGI(iDREM_Path)

Increasing the number of neighbors while freezing the $R_{min}$ and $R_{max}$.

In [None]:
anchor_neighbors = 30
max_neighbors = 40
min_neighbors = 20
resolution_min = 0.5
resolution_max = 1.2

unagi.register_CPO_parameters(anchor_neighbors=anchor_neighbors, max_neighbors=max_neighbors, min_neighbors=min_neighbors, resolution_min=resolution_min, resolution_max=resolution_max)
unagi.run_UNAGI(iDREM_Path)

Decreasing the number of neighbors while freezing the $R_{min}$ and $R_{max}$.

In [None]:
anchor_neighbors = 10
max_neighbors = 15
min_neighbors = 5
resolution_min = 0.5
resolution_max = 1.2

unagi.register_CPO_parameters(anchor_neighbors=anchor_neighbors, max_neighbors=max_neighbors, min_neighbors=min_neighbors, resolution_min=resolution_min, resolution_max=resolution_max)
unagi.run_UNAGI(iDREM_Path)

Increasing the $R_{min}$ and $R_{max}$ while keeping the number of neighbors.

In [None]:
anchor_neighbors = 15
max_neighbors = 30
min_neighbors = 10
resolution_min = 0.8
resolution_max = 2.0

unagi.register_CPO_parameters(anchor_neighbors=anchor_neighbors, max_neighbors=max_neighbors, min_neighbors=min_neighbors, resolution_min=resolution_min, resolution_max=resolution_max)
unagi.run_UNAGI(iDREM_Path)

Decreasing the $R_{min}$ and $R_{max}$ while keeping the number of neighbors.

In [None]:
anchor_neighbors = 15
max_neighbors = 30
min_neighbors = 10
resolution_min = 0.2
resolution_max = 0.8

unagi.register_CPO_parameters(anchor_neighbors=anchor_neighbors, max_neighbors=max_neighbors, min_neighbors=min_neighbors, resolution_min=resolution_min, resolution_max=resolution_max)
unagi.run_UNAGI(iDREM_Path)