# Using Pyrea with Nutrimouse Data Utilising Hierarchical and Spectral Clustering

In this notebok we demonstrate Pyrea's usage by performing hierarchical and spectral clustering on the Nutrimouse[<sup>1</sup>](#fn1) dataset.

We will do this using the Parea_1 structure, a structure that is included as a helper function in the Pyrea software package.

## Imports
This notebook requires Pyrea, mvlearn, and Numpy, let's import them here:

In [1]:
import pyrea
import numpy as np

from mvlearn.datasets import load_nutrimouse

## Load Data

Load the Nutrimouse data from mvlearn:[<sup>2</sup>](#fn2)

In [2]:
nutrimouse_dataset = load_nutrimouse()
data = [nutrimouse_dataset['gene'], nutrimouse_dataset['lipid']]

Preview the shape of the data:

In [3]:
print(f'Number of views: {len(data)}')
print(f'Shape of view 1: {np.shape(data[0])[0]} x {np.shape(data[0])[1]}')
print(f'Shape of view 2: {np.shape(data[1])[0]} x {np.shape(data[1])[1]}')

Number of views: 2
Shape of view 1: 40 x 120
Shape of view 2: 40 x 21


As can be seen there are 2 views. View 1 has 120 features for each of the 40 mice, while view 2 has 21 features for each of the 40 mice. As this is a multi-view dataset, the 40 samples refer to the same 40 mice in both datasets.

We will use Parea_1 to perform hierarchical clustering and spectral clustering on this dataset, and use Pyrea's built-in genetic algorithm functionality to find the best hyperparameters to use for this data.

## Hierarchical Clustering

Perform the genetic algorithm as follows, which will learn the best parameters to use for the clustering:

In [4]:
params_hierarchical = pyrea.parea_1_genetic(data, k_min=2, k_max=5, n_generations=3, n_population=10)

Silhouette score: 0.44309006211180124
Silhouette score: 0.433197188449848
Silhouette score: 0.22683803763440863
Silhouette score: 0.626736111111111
Silhouette score: 0.5256649688737973
Silhouette score: 0.21063133640552997
Silhouette score: 0.4851190476190476
Silhouette score: 0.7875
Silhouette score: 0.49861111111111106
Silhouette score: 0.6840909090909092
gen	nevals	avg     	std     	min     	max   
0  	10    	0.492148	0.172895	0.210631	0.7875
Silhouette score: 0.49861111111111106
Silhouette score: 0.4851190476190476
Silhouette score: 0.4848214285714286
Silhouette score: 0.36851142558991395
Silhouette score: 0.44309006211180124
Silhouette score: 0.68127091423614
Silhouette score: 0.7136363636363636
Silhouette score: 0.68127091423614
1  	8     	0.557703	0.136442	0.368511	0.7875
Silhouette score: 0.68127091423614
Silhouette score: 0.7875
Silhouette score: 0.7136363636363636
Silhouette score: 0.7136363636363636
Silhouette score: 0.7875
Silhouette score: 0.7875
Silhouette score: 0.681270

Once this is complete `params_hierarchical` contains the optimal parameters for this data, which we can then use to call the `parea_1` function with the optimal parameters that we have learned:

In [5]:
labels_hierarchical = pyrea.parea_1(data, *params_hierarchical)

print(labels_hierarchical)

for l in range(len(labels_hierarchical)):
    print(f"Mouse {l+1} assigned cluster {labels_hierarchical[l]}")

[1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 0 0 0 0 1 0 0 0 0 1 1 0 0 0 1 0
 0 0 0]
Mouse 1 assigned cluster 1
Mouse 2 assigned cluster 1
Mouse 3 assigned cluster 1
Mouse 4 assigned cluster 1
Mouse 5 assigned cluster 1
Mouse 6 assigned cluster 1
Mouse 7 assigned cluster 1
Mouse 8 assigned cluster 1
Mouse 9 assigned cluster 1
Mouse 10 assigned cluster 1
Mouse 11 assigned cluster 1
Mouse 12 assigned cluster 1
Mouse 13 assigned cluster 1
Mouse 14 assigned cluster 1
Mouse 15 assigned cluster 1
Mouse 16 assigned cluster 1
Mouse 17 assigned cluster 1
Mouse 18 assigned cluster 1
Mouse 19 assigned cluster 1
Mouse 20 assigned cluster 1
Mouse 21 assigned cluster 1
Mouse 22 assigned cluster 0
Mouse 23 assigned cluster 0
Mouse 24 assigned cluster 0
Mouse 25 assigned cluster 0
Mouse 26 assigned cluster 1
Mouse 27 assigned cluster 0
Mouse 28 assigned cluster 0
Mouse 29 assigned cluster 0
Mouse 30 assigned cluster 0
Mouse 31 assigned cluster 1
Mouse 32 assigned cluster 1
Mouse 33 assigned cluster 0
Mo

We can also print the parameters to see which were selected by the genetic algorithm:

In [6]:
params_hierarchical

['hierarchical',
 'complete',
 2,
 'hierarchical',
 'ward2',
 3,
 'hierarchical',
 'single',
 2,
 'hierarchical',
 'median',
 2,
 'disagreement']

## Spectral Clustering

This is performed in almost the same way:

In [7]:
params_spectral = pyrea.parea_1_genetic_spectral(data, k_min=2, k_max=5, n_neighbors_min=5, n_neighbors_max=10, n_population=10, n_generations=3)

Silhouette score: -0.4083002854633289
Silhouette score: -0.31631391178266177
Silhouette score: -0.6138333333333333
Silhouette score: -0.4883448337724653
Silhouette score: -0.5068861753034547
Silhouette score: -0.6175945592687879
Silhouette score: -0.585704185520362
Silhouette score: -0.612412677621656
Silhouette score: -0.560596647687017
Silhouette score: -0.47912822420634915
gen	nevals	avg      	std      	min      	max      
0  	10    	-0.518911	0.0946743	-0.617595	-0.316314
Silhouette score: -0.7149242424242424
Silhouette score: -0.31710677699493484
Silhouette score: -0.5552083333333334
1  	3     	-0.464411	0.129284 	-0.714924	-0.316314
Silhouette score: -0.6590066391941392
Silhouette score: -0.6168939393939394
Silhouette score: -0.4339898989898991
Silhouette score: -0.4325513538748833
Silhouette score: -0.5651636934340422
Silhouette score: -0.5500994478251071
Silhouette score: -0.4704683207014863
Silhouette score: -0.6385416666666666
Silhouette score: -0.5205298786181138
2  	9     	

Again this returns our optimal parameters, which we then use to perform the clustering using the `parea_1_spectral` function:

In [8]:
labels_spectral = pyrea.parea_1_spectral(data, *params_spectral)

print(labels_spectral)

for l in range(len(labels_spectral)):
    print(f"Mouse {l+1} assigned cluster {labels_spectral[l]}")

[1 1 1 2 1 3 3 3 1 2 1 1 3 1 4 5 2 1 1 1 2 4 1 2 2 1 1 2 2 1 1 2 1 1 1 1 1
 3 1 2]
Mouse 1 assigned cluster 1
Mouse 2 assigned cluster 1
Mouse 3 assigned cluster 1
Mouse 4 assigned cluster 2
Mouse 5 assigned cluster 1
Mouse 6 assigned cluster 3
Mouse 7 assigned cluster 3
Mouse 8 assigned cluster 3
Mouse 9 assigned cluster 1
Mouse 10 assigned cluster 2
Mouse 11 assigned cluster 1
Mouse 12 assigned cluster 1
Mouse 13 assigned cluster 3
Mouse 14 assigned cluster 1
Mouse 15 assigned cluster 4
Mouse 16 assigned cluster 5
Mouse 17 assigned cluster 2
Mouse 18 assigned cluster 1
Mouse 19 assigned cluster 1
Mouse 20 assigned cluster 1
Mouse 21 assigned cluster 2
Mouse 22 assigned cluster 4
Mouse 23 assigned cluster 1
Mouse 24 assigned cluster 2
Mouse 25 assigned cluster 2
Mouse 26 assigned cluster 1
Mouse 27 assigned cluster 1
Mouse 28 assigned cluster 2
Mouse 29 assigned cluster 2
Mouse 30 assigned cluster 1
Mouse 31 assigned cluster 1
Mouse 32 assigned cluster 2
Mouse 33 assigned cluster 1
Mo

We can also print the parameters to see which were selected by the genetic algorithm:

In [9]:
params_spectral

['spectral',
 10,
 4,
 'spectral',
 10,
 5,
 'spectral',
 9,
 2,
 'spectral',
 5,
 3,
 'disagreement']

# Conclusions

TBC.

# References

<span id="fn1"><sup>1</sup> Nutrimouse data: https://aasldpubs.onlinelibrary.wiley.com/doi/10.1002/hep.21510</span>

<span id="fn2"><sup>2</sup> mvlearn Nutrimouse example: https://mvlearn.github.io/auto_examples/datasets/plot_nutrimouse.html</span>