# Using Pyrea with Nutrimouse Data Utilising Hierarchical and Spectral Clustering

In this notebok we demonstrate Pyrea's usage by performing hierarchical and spectral clustering on the Nutrimouse[<sup>1</sup>](#fn1) dataset.

We will do this using the Parea_1 structure, a structure that is included as a helper function in the Pyrea software package.

## Imports and Load Data

This notebook requires Pyrea, mvlearn, and Numpy, let's import them here:

In [11]:
import pyrea
import mvlearn
import numpy as np
from mvlearn.datasets import load_nutrimouse

Load the Nutrimouse data from mvlearn:[<sup>2</sup>](#fn2)

In [23]:
nutrimouse_dataset = load_nutrimouse()

data = [nutrimouse_dataset['gene'], nutrimouse_dataset['lipid']]

print(f'Number of views: {len(data)}. Shape of each view: {[np.shape(d) for d in data]}')

Number of views: 2. Shape of each view: [(40, 120), (40, 21)]


As can be seen there are 2 views. View 0 has 120 features for each of the 40 mice, while view 2 has 21 features for each of the 40 mice. As this is a multi-view dataset, the 40 samples refer to the same 40 mice in both datasets.

Note that we will not use any targets in this notebook, as we will perform unsupervised clustering on the dataset.

We will use Parea_1 to perform hierarchical clustering and spectral clustering on this dataset, and use Pyrea's built-in genetic algorithm functionality to find the best hyperparameters to use for this data.

## Hierarchical Clustering

Perform the genetic algorithm as follows, which will learn the best parameters to use for the clustering:

In [24]:
data = [X[0], X[1], X[0]]

In [25]:
params_hierarchical = pyrea.parea_1_genetic(data, k_min=2, k_max=5)

Silhouette score: 0.681149732620321
Silhouette score: 0.5618233618233619
Silhouette score: 0.49129050925925927
Silhouette score: 0.3770131257631258
Silhouette score: 0.6218732693060167
Silhouette score: 1.0
Silhouette score: 0.6470588235294118
Silhouette score: 0.809090909090909
Silhouette score: 0.6355475040257648
Silhouette score: 0.5978260869565218
Silhouette score: 0.4705806489262371
Silhouette score: 0.6426739926739927
Silhouette score: 0.4019749835418038
Silhouette score: 0.49129050925925927
Silhouette score: 0.4028520429435064
Silhouette score: 0.49129050925925927
Silhouette score: 0.2841696535244923
Silhouette score: 0.46143627713749663
Silhouette score: 0.5848613174322688
Silhouette score: 0.6943181818181818
Silhouette score: 0.809090909090909
Silhouette score: 0.6608695652173913
Silhouette score: 0.5253623188405797
Silhouette score: 0.4705806489262371
Silhouette score: 0.49129050925925927
Silhouette score: 0.5585997442455243
Silhouette score: 0.6426739926739927
Silhouette sco

Traceback (most recent call last):
  File "/home/mblo/.pyenv/versions/3.9.15/lib/python3.9/site-packages/IPython/core/interactiveshell.py", line 3505, in run_code
    exec(code_obj, self.user_global_ns, self.user_ns)
  File "/tmp/ipykernel_2971/2447375687.py", line 1, in <module>
    params_hierarchical = pyrea.parea_1_genetic(data, k_min=2, k_max=5)
  File "/home/mblo/Dropbox/Pyrea/Pyrea/pyrea/core.py", line 562, in parea_1_genetic
    # Run the genetic algorithm
  File "/home/mblo/.pyenv/versions/3.9.15/lib/python3.9/site-packages/deap/algorithms.py", line 151, in eaSimple
    for ind, fit in zip(invalid_ind, fitnesses):
  File "/home/mblo/Dropbox/Pyrea/Pyrea/pyrea/core.py", line 524, in evaluate
  File "/home/mblo/Dropbox/Pyrea/Pyrea/pyrea/core.py", line 308, in parea_1
    v_ensemble_1 = view(execute_ensemble(views1, f), c1_pre)
  File "/home/mblo/Dropbox/Pyrea/Pyrea/pyrea/core.py", line 202, in execute_ensemble
    return Ensemble(views, fuser).execute()
  File "/home/mblo/Dropbox

Once this is complete `params_hierarchical` contains the optimal parameters for this data, which we can then use to call the `parea_1` function with the optimal parameters that we have learned:

In [26]:
pyrea.parea_1(data, *params_hierarchical)

NameError: name 'params_hierarchical' is not defined

## Spectral Clustering

This is performed in almost the same way:

In [27]:
params_spectral = pyrea.parea_1_genetic_spectral(data, k_min=2, k_max=5)

NameError: name 'linkages' is not defined

Again this returns our optimal parameters, which we then use to perform the clustering using the `parea_1_spectral` function:

In [28]:
pyrea.parea_1_spectral(data, *params_spectral)

NameError: name 'params_spectral' is not defined

# Conclusions

TBC.

# Random Data Tests

In [9]:
import pyrea
import numpy as np

# Create sample data:
d1 = np.random.rand(100,10)
d2 = np.random.rand(100,10)
d3 = np.random.rand(100,10)

data = [d1,d2, d3]

labels = pyrea.parea_1_spectral(data, k_final=6)
print(labels)

[2 2 5 5 1 2 1 2 2 2 1 3 1 4 2 2 4 2 2 1 2 2 1 1 2 2 2 1 2 1 1 2 2 2 2 2 2
 0 2 2 4 1 2 2 2 1 2 1 2 2 2 2 1 2 2 1 2 2 2 0 0 2 1 2 2 0 2 1 2 5 2 2 1 4
 2 2 2 4 2 0 2 2 4 1 1 1 2 2 2 2 2 1 1 2 2 1 2 2 2 2]


Now with the genetic algorithm:

# References

<span id="fn1"><sup>1</sup> Nutrimouse data: https://aasldpubs.onlinelibrary.wiley.com/doi/10.1002/hep.21510</span>

<span id="fn2"><sup>2</sup> mvlearn Nutrimouse example: https://mvlearn.github.io/auto_examples/datasets/plot_nutrimouse.html</span>