<a href="https://colab.research.google.com/github/marshka/ml-20-21/blob/main/07_unsupervised_learning.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Machine Learning SP 2020/2021

Prof. Cesare Alippi   
Andrea Cini ([`andrea.cini@usi.ch`](mailto:andrea.cini@usi.ch))   
Ivan Marisca ([`ivan.marisca@usi.ch`](mailto:ivan.marisca@usi.ch))   
Nelson Brochado ([`nelson.brochado@usi.ch`](mailto:nelson.brochado@usi.ch))

---
# Lab 07: Unsupervised learning

In this lab, we will see practical applications of unsupervised learning techniques. 

We will focus on two main tasks: 

1. Clustering;
3. Dimensionality reduction.

We will use two datasets that we are now very familiar with:
 - [Iris](https://scikit-learn.org/stable/modules/generated/sklearn.datasets.load_iris.html#sklearn.datasets.load_iris)
 - [MNIST](https://keras.io/api/datasets/mnist/)


## Clustering the Iris dataset

Let's get a sense of the data set

In [None]:
from sklearn.datasets import load_iris

# load the data
iris = load_iris()

# list the keys
print(iris.keys())

In [None]:
print(iris['DESCR'])

In [None]:
# read the keys
print('feature_names:\n', iris['feature_names'])
print()
print('target_names:\n', iris['target_names'])
print()
print('data:\n', iris['data'][:10])
print()
print('target:\n', iris['target'])

In [None]:
# extract data
X = iris.data

**Remark:** This should be an _unsupervised_ learning setup. So, even though `iris.target` is present, we assume to have no label associated with the data.

#### First of all: shapes!

In [None]:
print('Shape of X:', X.shape)

(n, d) = X.shape
print('d:', d)
print('n:', n)

#### Histograms

One component at a time

In [None]:
import numpy as np
import matplotlib.pyplot as plt

plt.figure(figsize=(18, 4))

for i in range(d):
    # a subplot for each feature
    plt.subplot(1, d, i+1)

    # histogram
    plt.hist(X[:, i], density=True, color=f'C{i}')

    # axis labels
    plt.xlabel('x{}: {}'.format(i, iris.feature_names[i]))
    if i == 0:  plt.ylabel('estimated pdf')

Notice:

* the different ranges
* `x_2` and `x_3` are roughly bimodal 

#### Scatter plots 

More features at the same time.

* We have 4 features but we can visualize at most 3D.


In [None]:
from mpl_toolkits.mplot3d import Axes3D

fig = plt.figure(figsize=(15, 6))

# x0, x1, x2
ax = fig.add_subplot(121, projection='3d')
ax.scatter(X[:, 0], X[:, 1], X[:, 2])
ax.set_xlabel(r'$x_0$')
ax.set_ylabel(r'$x_2$')
ax.set_zlabel(r'$x_3$')

# x0, x2, x3
ax = fig.add_subplot(122, projection='3d')
ax.scatter(X[:, 0], X[:, 2], X[:, 3])
ax.set_xlabel(r'$x_0$')
ax.set_ylabel(r'$x_2$')
ax.set_zlabel(r'$x_3$')

plt.show()

In [None]:
fig = plt.figure(figsize=(15, 6))

# x0, x1, x2
ax = fig.add_subplot(121, projection='3d', elev=-150, azim=110)
ax.scatter(X[:, 0], X[:, 1], X[:, 2])
ax.set_xlabel(r'$x_0$')
ax.set_ylabel(r'$x_2$')
ax.set_zlabel(r'$x_3$')

# x0, x2, x3
ax = fig.add_subplot(122, projection='3d', elev=-150, azim=110)
ax.scatter(X[:, 0], X[:, 2], X[:, 3])
ax.set_xlabel(r'$x_0$')
ax.set_ylabel(r'$x_2$')
ax.set_zlabel(r'$x_3$')

plt.show()

Without the right perspective we may miss important clues.
* 2D plots are usually clearer than 3D ones (personal opinion!) 


In [None]:
def plot_every_pair(X, colors=None, same_axis=False, label_pfx="x"):
    d = X.shape[1]
    if colors is None:
        colors = np.zeros(X.shape[0])
    n_plots = d*(d-1)//2
    plt.figure(figsize=(3 * n_plots, 8))
    ct = 0 
    for i in range(1, d+1):
        for j in range(i+1, d+1):
            ct += 1
            plt.subplot(2, max(1, n_plots//2), ct)
            plt.scatter(X[:, i-1], X[:, j-1], c=colors)
            plt.xlabel(r'${}_{}$'.format(label_pfx, i-1))
            plt.ylabel(r'${}_{}$'.format(label_pfx, j-1))
            if same_axis:
                # Use same axis scaling
                plt.xlim([X.min(), X.max()])
                plt.ylim([X.min(), X.max()])
    plt.show()

plot_every_pair(X)               

Be careful about the different ranges!

In [None]:
plot_every_pair(X, same_axis=True)

### Seaborn

A cool package for data visualization is `seaborn`.

In [None]:
import seaborn as sns
import pandas as pd

sns.pairplot(pd.DataFrame(X, columns=iris.feature_names))

The above visualization is rather difficult when the number of feature is large.

## Principal Component Analysis

Recall the steps

* Subtract the mean
* (should we rescale?)
* Compute the matrix $\Sigma = X^\top X$
* Eigen-decomposition $ U \Lambda U^\top = \Sigma$ 

**Remark 1:** Eigenvalues and eigenvectors: $\Sigma \vec u = \lambda \vec u$

Now apply the transformation
1. Lossless rotation $U^\top \vec x$ to each vector.
2. Lossy transformation:
    - Discard some eigenvectors $\tilde U\leftarrow U$
    - apply transformation $\tilde U^\top \vec x$ to each vector.

**Remark 2:** To transform the entire dataset, simply do $XU$ or $X\tilde U$.



In [None]:
X_mean = np.mean(X, axis=0, keepdims=True)
X0 = X - X_mean

Sigma = (X0.T).dot(X0)
lam, U = np.linalg.eigh(Sigma)

print("shapes:", lam.shape, U.shape)
print("eigenvalues:", lam)

In [None]:
# Sort the eigenvalues
lam = lam[::-1]
U = U[:, ::-1]

plt.plot(lam, 'o-')
plt.title("eigenvalues")
plt.grid()
plt.xlabel("component");

In [None]:
# Apply rotation
X_rot = X0.dot(U)

plt.figure(figsize=(18, 4))
for i in range(d):
    plt.subplot(1, d, i+1)
    plt.hist(X_rot[:, i])
    plt.xlabel(r'$pc_{}$'.format(i))

plot_every_pair(X_rot, same_axis=True, label_pfx="pc")

In [None]:
# Apply reduced transformation
l = 2  # columns to discard
Utilde = U[:, :d-l]
X_red = X0.dot(Utilde)
# Equivalent to X_red = X_rot[:, :d-l]

plot_every_pair(X_red, same_axis=True, label_pfx="pc")

As usual, sklearn can speed up our work!

In [None]:
# PCA with sklearn
from sklearn.decomposition import PCA

# d:    num of original features (= num of all principal components)
# l:    num of discarded principal components
# d-l:  num of considered principal components
pca = PCA(n_components=d-l)
pca.fit(X0)
X_red = pca.transform(X0)

plot_every_pair(X_red, same_axis=True, label_pfx="pc")

### Data reconstruction

$$\vec x \to \vec {\tilde x} \to \vec x_{rec} \approx \vec x$$

- transformation $\vec{\tilde x}=\tilde U^\top \vec x$.
- reconstruction (inverse transformation) $\vec x_{rec} = \tilde U \vec{\tilde x}$.

In [None]:
# Visualize original vs reconstructed dataset
fig = plt.figure(figsize=(18, 4))
fig.subplots_adjust(wspace=.4)

# Original dataset
ax = fig.add_subplot(131, projection='3d', elev=30, azim=160)

ax.scatter(X0[:, 0], X0[:, 2], X0[:, 3]) #, X[:, 3])
ax.set_xlabel('x_0')
ax.set_ylabel('x_2')
ax.set_zlabel('x_3')
ax.set_title("X")

# Principal components
ax = fig.add_subplot(132)
ax.scatter(X_red[:, 0], X_red[:, 1])
ax.set_xlabel('pc_0')
ax.set_ylabel('pc_1')
ax.set_title("Principal Components")
ax.axis("equal")

# Reconstructed dataset
ax = fig.add_subplot(133, projection='3d', elev=30, azim=160)
#reconstruct
X_rec = pca.inverse_transform(X_red)
# #which is equivalent to 
# X_red_ = X.dot(Utilde)
# X_rec_ = X_red_.dot(Utilde.T)

ax.scatter(X_rec[:, 0], X_rec[:, 2], X_rec[:, 3]) #, X_rec[:, 3])
ax.set_xlabel('x_0')
ax.set_ylabel('x_2')
ax.set_zlabel('x_3')
ax.set_title("X reconstructed")
plt.show()

## Clustering: k-means



In [None]:
from sklearn.cluster import KMeans

k_clusters = 2

k_means = KMeans(n_clusters=k_clusters)
cluster_label = k_means.fit_predict(X_red)

In [None]:
# 3d
fig = plt.figure(figsize=(12, 4))
ax = fig.add_subplot(121, projection='3d', elev=30, azim=160)
ax.scatter(X[:, 0], X[:, 1], X[:, 2], c=cluster_label)
# 2d PC
ax = fig.add_subplot(122)
ax.scatter(X_red[:, 0], X_red[:, 1], c=cluster_label)
ax.axis("equal");

Since we know there are three classes in `iris.target`... 

In [None]:
# 3d
fig = plt.figure(figsize=(12, 4))
ax = fig.add_subplot(121, projection='3d', elev=30, azim=160)
ax.scatter(X[:, 0], X[:, 1], X[:, 2], c=iris.target)
plt.title("Classes (not clusters!)")
# 2d PC
ax = fig.add_subplot(122)
ax.scatter(X_red[:, 0], X_red[:, 1], c=iris.target)
ax.axis("equal");


However, k-means (as well as any other clustering method) does not necessarily retrieve the same classes, because classes are not necessarily confined into clusters.

In [None]:
from sklearn.cluster import KMeans

k_clusters = 3

k_means = KMeans(n_clusters=k_clusters)
cluster_label = k_means.fit_predict(X_red)

# 3d
fig = plt.figure(figsize=(16, 4))
ax = fig.add_subplot(131, projection='3d', elev=30, azim=160)
ax.scatter(X[:, 0], X[:, 1], X[:, 2], c=cluster_label)
# 2d PC
ax = fig.add_subplot(132)
ax.scatter(X_red[:, 0], X_red[:, 1], c=cluster_label)
ax.axis("equal")
ax.set_title("clusters")

# classes
ax = fig.add_subplot(133)
ax.scatter(X_red[:, 0], X_red[:, 1], c=iris.target)
ax.axis("equal")
ax.set_title("classes");

#### Remember

- We can cross-validate the number of clusters ([silhouette](https://scikit-learn.org/stable/modules/generated/sklearn.metrics.silhouette_score.html))
- Variety of clustering methods with different behaviours ([comparison](https://scikit-learn.org/stable/modules/clustering.html#clustering)) 



## For fun: MNIST digits compression

In [None]:
from PIL import Image

def plot_sample(imgs, labels, nrows, ncols, resize=None, tograyscale=False, shuffle=True):
    # create a grid of images
    fig, axs = plt.subplots(nrows, ncols, figsize=(4*ncols, 4*nrows))
    # take a random sample of images
    if shuffle:
        indices = np.random.choice(len(imgs), size=nrows*ncols, replace=False)
    else:
        indices = np.arange(nrows*ncols)
    for ax, idx in zip(axs.reshape(-1), indices):
        ax.axis('off')
        # sample an image
        ax.set_title(labels[idx])
        im = imgs[idx]
        if isinstance(im, np.ndarray):
            im = Image.fromarray(im)  
        if resize is not None:
            im = im.resize(resize)
        if tograyscale:
            im = im.convert('L')
        ax.imshow(im, cmap='gray')
    plt.show()


from tensorflow.keras.datasets import mnist

# Load the data
(x_train, y_train), (x_test, y_test) = mnist.load_data()

plot_sample(x_train, y_train, 2, 5)

Vectorize the images

In [None]:
print("x_train", x_train.shape) 

# Reshape to vectors
X = x_train.reshape(-1, 28 * 28) /255.
print("X", X.shape)

In [None]:
# PCA
X_mean = X.mean(axis=0, keepdims=True)
X0 = X - X_mean 
pca = PCA(n_components=300)
pca.fit(X0)

plt.plot(pca.singular_values_**2)
plt.show()

In [None]:
# compress
X_red = pca.transform(X0)
# extract
X_rec = pca.inverse_transform(X_red)
X_rec += X_mean

# reshape to image size and range
x_image_rec = 255*X_rec.clip(0, 1).reshape(-1, 28, 28)
x_image_orig = 255*X.clip(0, 1).reshape(-1, 28, 28)

# draw some random images
p = np.random.choice(X.shape[0], size=5)
print("Original images")
plot_sample(x_image_orig[p], y_train[p], 1, 5, shuffle=False)
print("Reconstructed images")
plot_sample(x_image_rec[p], y_train[p], 1, 5, shuffle=False)

In [None]:
# Reshape to vectors
X = x_train.reshape(-1, 28 * 28) /255.
print(X.shape) # shape: (60000, 784)
print(X.min(), X.max())
X += np.random.randn(*X.shape)*.2

# PCA
X_mean = X.mean(axis=0, keepdims=True)
X0 = X - X_mean 
pca = PCA(n_components=50)
pca.fit(X0)

plt.figure()
plt.plot(pca.singular_values_**2)

# compress
X_red = pca.transform(X0)
# extract
X_rec = pca.inverse_transform(X_red)
X_rec += X_mean

# reshape to image size and range
x_image_rec = 255*X_rec.clip(0, 1).reshape(-1, 28, 28)
x_image_orig = 255*X.clip(0, 1).reshape(-1, 28, 28)

# draw some random images
p = np.random.choice(X.shape[0], size=5)
plot_sample(x_image_orig[p], y_train, 1, 5, shuffle=False)
plot_sample(x_image_rec[p], y_train, 1, 5, shuffle=False)

In [None]:
plt.figure(figsize=(12, 12))
for d in range(10):
    ii = np.where(y_train==d)[0]
    # print(ii)
    m = str(d)
    # print(m)
    # for i in ii:
    #     plt.text(X_red[i, 0], X_red[i, 1], str(d), fontsize=12)
    plt.scatter(X_red[ii][:, 0], X_red[ii][:, 1], marker=f"${d}$", label=d)#str(d))
plt.legend()
plt.show()