# Clustering
In this notebook we'll walk through the steps to set up a nearest neighbor network and a few different clustering algorithms. 

--------

We're going to directly import our pre-processed Giotto object that we've demonstrated in previous notebooks. The code for this import is in this [script](https://github.com/ndelrossi7/r-conda-binder/blob/main/scripts/preprocess.R).


(_Note: It will generate a plot for highly variable genes. You can either disregard this or run additional plots based on some examples from this [notebook](https://github.com/ndelrossi7/r-conda-binder/blob/main/notebooks/3-Dimensionality-Reduction.ipynb)_

In [None]:
# you may have to run this cell twice if you get an error first
# setwd(dirname(getwd()))
source("scripts/preprocess.R")

### 1. Nearest neighbor network
To start, we'll generate a [nearest neighbor network](https://rubd.github.io/Giotto_site/reference/createNearestNetwork.html).
We'll create a shared nearest neighbor (sNN) network in this example, but you can also use a k-nearest neighbor (kNN) network. This will be based on your provided dimension reduction space. You can also run this without any dimensionality reduction. 

In [None]:
my_giotto_object <- createNearestNetwork(
  my_giotto_object,
  type = "sNN", 
  dim_reduction_to_use = "pca",
  dim_reduction_name = "pca",
  dimensions_to_use = 1:10,
  genes_to_use = NULL,
  expression_values = c("normalized", "scaled", "custom"),
  name = "sNN.pca",
  return_gobject = TRUE,
  k = 30,
  minimum_shared = 5,
  top_shared = 3,
  verbose = T,
)

## We can now begin some clustering:

### 2. Leiden clustering

Below we will use the [Leiden](https://rubd.github.io/Giotto_site/reference/doLeidenCluster.html) community detection algorithm.

In [None]:
# run Leiden cluster
my_giotto_object = doLeidenCluster(
  my_giotto_object, 
  name = 'leiden_clus')

# Plot results
plotUMAP_2D(my_giotto_object, cell_color = 'leiden_clus', point_size = 3)[0]

### 3. Louvain clustering

Clustering with the [Louvain](https://rubd.github.io/Giotto_site/reference/doLouvainCluster.html) algorithm.

In [None]:
# run Louvain cluster
my_giotto_object = doLouvainCluster(my_giotto_object, name = 'louvain_clus')

# plot
plotUMAP_2D(my_giotto_object, cell_color = 'louvain_clus', point_size = 3)[0]

### 4. K-means clustering
Perform [K-means](https://rubd.github.io/Giotto_site/reference/doKmeans.html) clustering.

In [None]:
# run k-means
my_giotto_object = doKmeans(my_giotto_object, centers = 4, name = 'kmeans_clus')

# plot
plotUMAP_2D(my_giotto_object, cell_color = 'kmeans_clus', point_size = 3)[0]

### 5. Hierarchical Clustering
Perform [hierarchical](https://rubd.github.io/Giotto_site/reference/doHclust.html) clustering. 

In [None]:
my_giotto_object = doHclust(my_giotto_object, k = 4, name = 'hier_clus')
plotUMAP_2D(my_giotto_object, cell_color = 'hier_clus', point_size = 3)[0]

### 6. Cluster similarities
Here we can create a data table with pairwise correlation scores between each cluster and number of cells to check for [similarities](https://rubd.github.io/Giotto_site/reference/getClusterSimilarity.html). 

In [None]:
# calculate cluster similarities to see how individual clusters are correlated
cluster_similarities = getClusterSimilarity(my_giotto_object,
                                            cluster_column = 'leiden_clus')

# see a preview of what we're comparing
head(pDataDT(my_giotto_object), 10)

### 7. Merge similarities
We can also [merge](https://rubd.github.io/Giotto_site/reference/mergeClusters.html) selected clusters based on our pairwise correlation scores & cluster size. 

In [None]:
# merge similar clusters based on correlation and size parameters
my_giotto_object = mergeClusters(my_giotto_object, 
                                        cluster_column = 'leiden_clus', 
                                        min_cor_score = 0.7, 
                                        force_min_group_size = 4)

### 8. Dendrogram splits
We can also split the [dendrogram](https://rubd.github.io/Giotto_site/reference/getDendrogramSplits.html) at each node. 

In [None]:
splits = getDendrogramSplits(my_giotto_object, cluster_column = 'merged_cluster')

### 9. Subclustering

Finally, we can [subcluster](https://rubd.github.io/Giotto_site/reference/doLeidenSubCluster.html) cells using the Leiden algorithm. 

In [None]:
# Perform subclustering
my_giotto_object = doLeidenSubCluster(gobject = my_giotto_object, cluster_column = 'merged_cluster',
                             resolution = 0.2, k_neighbors = 10,
                             hvg_param = list(method = 'cov_loess', difference_in_cov = 0.1),
                             pca_param = list(expression_values = 'normalized', scale_unit = F),
                             nn_param = list(dimensions_to_use = 1:5),
                             selected_clusters = c(5, 6, 7),
                             name = 'sub_leiden_clus_select')

## set colors for clusters
subleiden_order = c( 1.1, 5.1, 5.2,  2.1, 3.1,
                     4.1, 6.2, 6.1,
                     7.1,  7.2, 9.1, 8.1)
subleiden_colors = Giotto:::getDistinctColors(length(subleiden_order)) 
names(subleiden_colors) = subleiden_order

# visualize
plotUMAP(gobject = my_giotto_object,
         cell_color = 'sub_leiden_clus_select', cell_color_code = subleiden_colors,
         show_NN_network = T, point_size = 2.5, show_center_label = F, 
         legend_text = 12, legend_symbol_size = 3,
         save_param = list(save_name = '4_b_UMAP_leiden_subcluster'))[0]