# Notebook for clustering with the Fast Search based on Density Peaks algorithm

In [1]:
%matplotlib notebook
import warnings
warnings.filterwarnings('ignore')
import numpy as np
import pyfloc

## Reading and preprocessing data

See notebook *data* for more details about the parameters of the methods *read_fcs* and *normalize*

In [3]:
B = pyfloc.PyFloc()
B.read_fcs(file_name = '/home/cito/flowc/levine_13dim.fcs', read_mode = 'all')
list_features = ['CD11b', 'CD123', 'CD19', 'CD20', 'CD3', 'CD33', 'CD34', 'CD38', 'CD4', 'CD45', 'CD45RA', 'CD8', 'CD90']
B.clean_samples(features = ['label',], mode = 'nan')
#B.read_fcs(file_name = '/home/cito/flowc/Blood_2_1_037.fcs', read_mode = 10000, conditions = '37')
#B.read_fcs(file_name = '/home/cito/flowc/Blood_2_2_038.fcs', read_mode = 10000, conditions = '38')
#list_features = ['APC-H7-A', 'SSC-W', 'APC-R700-A', 'BB700-A', 'BB515-A', 'PE-A', 'BV711-A', 'PE-CF594-A', 'SSC-A', 'FSC-H', 'SSC-H', 'FSC-A', 'Alexa Fluor 647-A', 'BV421-A', 'PE-Cy7-A', 'BV510-A', 'FSC-W', 'BV786-A']
B.remove_outliers(list_features)
B.normalize(list_features, mode = 'arcsinh')

Reading data from /home/cito/flowc/levine_13dim.fcs with mode all conditions undefined
Read 167044 samples from /home/cito/flowc/levine_13dim.fcs
Removing 85297 samples with feature label equal to nan
Number of samples before outliers removal 81747
Number of samples after outliers removal 67090
Running feature normalization with mode arcsinh


# Fitting the clustering algorithm

Here, the algorithm used for clustering is fitted on data

- **features** = which features are used for clustering the data
- **percents** = The radius of the kernel that is used to calculate the density is euqual to this percentile of the distances among samples. It is possible to define a single float value or a list of values. When providing a list, all the values in the list are tested, and the one that gives the maximum separation of clusters is choosen
- **ns_clusters** = It is possible to choose among:
    - None: The number of clusters is estimated automatically. When *ns_clusters* is None, it is necessary to define *n_stds_delta*
    - integere value: number of clusters
    - list of integer values: the clustering algorithm is repeated for all the number of clusters provided, and the value that gives the the maximum separation of clusters is choosen
- **n_stds_delta** = If defined, samples are considered as cluster centers when above *n_stds_delta* from the delta value expected at the sample's rho 
- **manual_refine** = When True, it is possible to refine manually which samples are considered as cluster's centers on the rho-delta plot

In [None]:
#replicare con i 4 modi
B.fit_cluster(features = list_features, mode = 'DP'
              , percents = [1.0,2.0,5.0,10.0]
              , ns_clusters = np.arange(2,50,1)
              , n_stds_delta = None
              , manual_refine = True)

## Clustering samples

In [None]:
B.predict_cluster()
B.experiments.show_distributions(list_features)
#print(B.cluster)

In [23]:
B.fit_cluster(features = list_features, mode = 'Kmeans', ns_clusters = 24)

Clustering data with Kmeans algorithm


In [24]:
B.predict_cluster()
print(B.cluster)

Results of clustering with method Kmeans
	Label 20 = 5
		means = [ 0.45736003  0.00865474 -0.08044641  0.00282207 -0.66934774 -0.22779775 -0.02311946  0.73502949 -0.18300513  0.00085941 -0.11561304 -0.16519792  0.49537632]
		stds = [0.88131736 0.57967889 0.04664302 0.36744343 0.63778665 0.07952815 0.049053   0.86380892 0.14424457 0.26218839 0.26491976 0.18976091 0.58712024]
	Label 19 = 32
		means = [ 0.56317261  4.08376454 -0.11480492  0.08145751  0.40053176  2.10992368  0.78575068  4.69541214  1.1345963   3.02690922  1.88495778  0.28720589  0.41010783]
		stds = [0.73365278 0.20192921 0.30821007 0.25694922 1.02297097 1.03475649 0.56583522 0.57453257 1.25358341 0.42251355 0.62018145 0.6079882  0.56523223]
	Label 13 = 39
		means = [ 0.48182074  1.95687956 -0.00732245  0.2430989   1.86125173  2.77531182  2.24383543  5.19849311  0.60974914  2.21981949  0.35410482  0.14903647  0.67610727]
		stds = [0.64644204 0.67371084 0.41954082 0.52002846 1.6285543  1.04559063 0.31101149 0.40426087 1.007