# Copyright and License

In [None]:
# This notebook was authored by Kevin P. Murphy (murphyk@gmail.com) and Mahmoud Soliman (mjs@aucegypt.edu)

[![GitHub](https://img.shields.io/github/license/probml/pyprobml)](https://github.com/probml/pml-book/blob/main/LICENSE/)

<a href="https://colab.research.google.com/github/probml/pyprobml/blob/master/notebooks/figures//chapter21_figures.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Cloning the pyprobml repo

In [None]:
!git clone https://github.com/probml/pyprobml 
%cd pyprobml/scripts

## Figure 21.2:

  (a) An example of single link clustering using city block distance. Pairs (1,3) and (4,5) are both distance 1 apart, so get merged first. (b) The resulting dendrogram. Adapted from Figure 7.5 of \citep  Alpaydin04 . 

In [None]:
!octave -W agglomDemo.m >> _

## Figure 21.4:

  Hierarchical clustering of yeast gene expression data. (a) Single linkage. (b) Complete linkage. (c) Average linkage. 

In [None]:
!octave -W hclustYeastDemo.m >> _

## Figure 21.5:

  (a) Some yeast gene expression data plotted as a heat map. (b) Same data plotted as a time series. 

In [None]:
!octave -W kmeansYeastDemo.m >> _

## Figure 21.6:

  Hierarchical clustering applied to the yeast gene expression data. (a) The rows are permuted according to a hierarchical clustering scheme (average link agglomerative clustering), in order to bring similar rows close together. (b) 16 clusters induced by cutting the average linkage tree at a certain height. 

In [None]:
!octave -W hclustYeastDemo.m >> _

## Figure 21.7:

  Illustration of K-means clustering in 2d. We show the result of using two different random seeds. Adapted from Figure 9.5 of \citep  Geron2019 . 

In [None]:
%run ./kmeans_voronoi.py

## Figure 21.8:

  Clustering the yeast data from \cref  fig:yeast  using K-means clustering with $K=16$. (a) Visualizing all the time series assigned to each cluster. (d) Visualizing the 16 cluster centers as prototypical time series. 

In [None]:
!octave -W kmeansYeastDemo.m >> _

## Figure 21.9:

  An image compressed using vector quantization with a codebook of size $K$. (a) $K=2$. (b) $K=4$. 

In [None]:
!octave -W vqDemo.m >> _

## Figure 21.10:

  Illustration of batch vs mini-batch K-means clustering on the 2d data from \cref  fig:kmeansVoronoi . Left: distortion vs $K$. Right: Training time vs $K$. Adapted from Figure 9.6 of \citep  Geron2019 . 

In [None]:
%run ./kmeans_minibatch.py

## Figure 21.11:

  Performance of K-means and GMM vs $K$ on the 2d dataset from \cref  fig:kmeansVoronoi . (a) Distortion vs $K$. 

In [None]:
%run ./kmeans_silhouette.py

In [None]:
%run ./gmm_2d.py

In [None]:
%run ./kmeans_silhouette.py

## Figure 21.12:

  (a) Synthetic data generated from a mixture of 3 Gaussians in 1d. (b) Fit vs $K$ using K-means. (c) Fit vs $K$ using GMM. 

In [None]:
!octave -W kmeansModelSel1d.m >> _

## Figure 21.13:

  Voronoi diagrams for K-means for different $K$ on the 2d dataset from \cref  fig:kmeansVoronoi . 

In [None]:
%run ./kmeans_silhouette.py

## Figure 21.14:

  Silhouette diagrams for K-means for different $K$ on the 2d dataset from \cref  fig:kmeansVoronoi . 

In [None]:
%run ./kmeans_silhouette.py

## Figure 21.15:

  Some data in 2d fit using a GMM with $K=5$ components. Left column: marginal distribution $p(\mathbf  x )$. Right column: visualization of each mixture distribution, and the hard assignment of points to their most likely cluster. (a-b) Full covariance. (c-d) Tied full covariance. (e-f) Diagonal covairamce, (g-h) Spherical covariance. Color coding is arbitrary. 

In [None]:
%run ./gmm_2d.py

## Figure 21.16:

  Clustering data consisting of 2 spirals. (a) K-means. (b) Spectral clustering. 

In [None]:
%run ./spectral_clustering_demo.py