# Using Pyrea with Nutrimouse Data Utilising Hierarchical and Spectral Clustering

In this notebok we demonstrate Pyrea's usage by performing hierarchical and spectral clustering on the Nutrimouse[<sup>1</sup>](#fn1) dataset.

We will do this using the Parea_1 structure, a structure that is included as a helper function in the Pyrea software package.

## Imports and Load Data

This notebook requires Pyrea, mvlearn, and Numpy, let's import them here:

In [1]:
import pyrea
import mvlearn
import numpy as np
from mvlearn.datasets import load_nutrimouse

Load the Nutrimouse data from mvlearn:[<sup>2</sup>](#fn2)

In [2]:
nutrimouse_dataset = load_nutrimouse()

data = [nutrimouse_dataset['gene'], nutrimouse_dataset['lipid']]

print(f'Number of views: {len(data)}. Shape of each view: {[np.shape(d) for d in data]}')

Number of views: 2. Shape of each view: [(40, 120), (40, 21)]


As can be seen there are 2 views. View 0 has 120 features for each of the 40 mice, while view 2 has 21 features for each of the 40 mice. As this is a multi-view dataset, the 40 samples refer to the same 40 mice in both datasets.

Note that we will not use any targets in this notebook, as we will perform unsupervised clustering on the dataset.

We will use Parea_1 to perform hierarchical clustering and spectral clustering on this dataset, and use Pyrea's built-in genetic algorithm functionality to find the best hyperparameters to use for this data.

## Hierarchical Clustering

Perform the genetic algorithm as follows, which will learn the best parameters to use for the clustering:

In [3]:
params_hierarchical = pyrea.parea_1_genetic(data, k_min=2, k_max=5)

Silhouette score: 0.3947840354090354
Silhouette score: 0.5979166666666667
Silhouette score: 0.5311011904761905
Silhouette score: 0.36815476190476193
Silhouette score: 0.49861111111111106
Silhouette score: 0.32134818007662835
Silhouette score: 0.6541554659498207
Silhouette score: 0.6244047619047619
Silhouette score: 0.3532045606608031
Silhouette score: 0.5617943548387097
Silhouette score: 0.452833850931677
Silhouette score: 0.5626016260162602
Silhouette score: 0.452833850931677
Silhouette score: 0.4491988989271598
Silhouette score: 0.21063133640552997
Silhouette score: 0.5186011904761905
Silhouette score: 0.5256649688737973
Silhouette score: 0.633184523809524
Silhouette score: 0.5308333333333334
Silhouette score: 0.3976190476190476
Silhouette score: 0.4226851851851852
Silhouette score: 0.23851025596072933
Silhouette score: 0.28100721153846153
Silhouette score: 1.0
Silhouette score: 0.3254844006568144
Silhouette score: 0.6201178451178452
Silhouette score: 0.1881235679214403
Silhouette sc

Once this is complete `params_hierarchical` contains the optimal parameters for this data, which we can then use to call the `parea_1` function with the optimal parameters that we have learned:

In [13]:
final_labels = pyrea.parea_1(data, *params_hierarchical)

print(final_labels)

for l in range(len(final_labels)):
    print(f"Mouse {l+1} assigned cluster {final_labels[l]}")

[1 1 1 1 0 0 1 1 1 0 1 1 1 1 1 1 0 1 0 1 0 0 0 1 0 0 0 0 1 0 0 0 0 0 1 0 0
 1 0 0]
Mouse 1 assigned cluster 1
Mouse 2 assigned cluster 1
Mouse 3 assigned cluster 1
Mouse 4 assigned cluster 1
Mouse 5 assigned cluster 0
Mouse 6 assigned cluster 0
Mouse 7 assigned cluster 1
Mouse 8 assigned cluster 1
Mouse 9 assigned cluster 1
Mouse 10 assigned cluster 0
Mouse 11 assigned cluster 1
Mouse 12 assigned cluster 1
Mouse 13 assigned cluster 1
Mouse 14 assigned cluster 1
Mouse 15 assigned cluster 1
Mouse 16 assigned cluster 1
Mouse 17 assigned cluster 0
Mouse 18 assigned cluster 1
Mouse 19 assigned cluster 0
Mouse 20 assigned cluster 1
Mouse 21 assigned cluster 0
Mouse 22 assigned cluster 0
Mouse 23 assigned cluster 0
Mouse 24 assigned cluster 1
Mouse 25 assigned cluster 0
Mouse 26 assigned cluster 0
Mouse 27 assigned cluster 0
Mouse 28 assigned cluster 0
Mouse 29 assigned cluster 1
Mouse 30 assigned cluster 0
Mouse 31 assigned cluster 0
Mouse 32 assigned cluster 0
Mouse 33 assigned cluster 0
Mo

## Spectral Clustering

This is performed in almost the same way:

In [5]:
params_spectral = pyrea.parea_1_genetic_spectral(data, k_min=2, k_max=5)

Again this returns our optimal parameters, which we then use to perform the clustering using the `parea_1_spectral` function:

In [None]:
pyrea.parea_1_spectral(data, *params_spectral)

# Conclusions

TBC.

# Random Data Tests

In [None]:
import pyrea
import numpy as np

# Create sample data:
d1 = np.random.rand(100,10)
d2 = np.random.rand(100,10)
d3 = np.random.rand(100,10)
d4 = np.random.rand(100,10)
d5 = np.random.rand(100,10)

data = [d1, d2, d3, d4, d5]

labels = pyrea.parea_1_spectral(data, k_final=6)
print(labels)

Now with the genetic algorithm:

# References

<span id="fn1"><sup>1</sup> Nutrimouse data: https://aasldpubs.onlinelibrary.wiley.com/doi/10.1002/hep.21510</span>

<span id="fn2"><sup>2</sup> mvlearn Nutrimouse example: https://mvlearn.github.io/auto_examples/datasets/plot_nutrimouse.html</span>