# ECE 3 : Homework 3

## Instructions

To get started, you should go through the following steps.
- Rename this jupyter notebook by adding your name: e.g. `ECE3_HW3_<your-name>.ipynb`.
- Complete all the exercises by directly editing your notebook.
- Make sure that the coding portions run without errors.

## Problem 1 - Clustering with k-means algorithm (Total points 30: 10 + 5 + 10 + 5)




For this exercise we will use the "Digits" dataset from the scikit-learn package. 

The following chunk of code loads the dataset and prints a full description of it. Run it and carefully go through the description.


In [None]:
import numpy as np
from sklearn.datasets import load_digits

dataset = load_digits()
print(dataset.DESCR)

#### (a) The dataset contains 1797 images of handwritten digits (the description says that the number of instances is 5620 but that's a mistake). Each image has resolution 8x8 pixels. Here the images have been reshaped to vectors of size 64x1. Confirm this by running the following chunk of code.

In [None]:
X = ... # images go here
y = ... # labels go here

print("The images are inlcuded in a matrix of shape:", ...)
print("The labels are included in a vector of length:", ...)

#### (b) Let's take a look at our data and labels, display the 11th and the 231th images and their respective labels:

In [None]:
import matplotlib.pyplot as plt

# Display the first digit
plt.figure(1, figsize=(3, 3))

# we have saved the images as 64x1 vectors, for the purpose of plotting we will
# convert them to size 8x8. For the rest of the questions just use array X
images = X.reshape(-1, 8, 8)

print('The 11th image is one of digit {}:'.format(...))
plt.imshow(..., cmap=plt.cm.gray_r, interpolation='nearest')
plt.show()

print('The 231th image is one of digit {}:'.format(...))
plt.imshow(..., cmap=plt.cm.gray_r, interpolation='nearest')
plt.show()

In this exercise we will ignore the existence of the labels and we will assign our images to different clusters based only on the images themselves.

This is called **clustering** and it's an **unsupervised** learning task, as it's done with no knowledge of the true labels. In contrast, both **classification** and **regression** are **supervised** learning tasks, as to train our models we need to know the true labels/response variable of the training data.

### The K-means algorithm

k-means is an algorithm that performs clustering. k is a parameter that indicates the number of clusters. After we choose what k to run the algorithm for, the algorithm proceeds as follows:

1.  We pick k points from the dataset at random. We call these points the "centroids" or the "representatives "of the clusters.
2.  For each point in the dataset we calculate its distance to the k centroids and we assign to the cluster with the closest centroid.
3. For each cluster, we calculate a new centroid as the mean of its points. These new centroids don't have to belong to the dataset.
4. We repeat steps 2 & 3 until the centroid positions don't change.

In our example, we know that we're dealing with digits so we will set k=10.

#### (c) Use sklearn's KMeans class to perform K-means clustering on the digits dataset. Store the result in a NumPy vector.

In [None]:
from sklearn.cluster import KMeans

# Type your solution below

kmeans = KMeans(..., random_state=0)
kmeans.fit(...)
y_pred = kmeans.predict(...)

### Clustering performance metrics

After performing clustering, you have assigned a label to each point in the dataset. However, this isn't necessarily the same label as the true label. Here for example, you may have correctly grouped all zeros to the same cluster but assigned this cluster the label 5. Thus, accuracy (as it was defined in Problem 1) it's not an informative metric for the performance of clustering algorithms.

An appropriate metric for clustering performance is the **Adjusted Rand index**, which is a function that measures the similarity between the true and the predicted label assignments, ignoring permutations.

#### (d) Calculate the algorithm's adjusted rand index using sklearn's built-in method and print out the result.

In [None]:
from sklearn.metrics import adjusted_rand_score

# type your solution below

ari = adjusted_rand_score(...) 
print(ari)

## Problem 2 - Matrix Norm and Distance (Total points: $20 = 5+8+2+5$)

Let $A=\begin{bmatrix}
2&3&1\\
3&1&5
\end{bmatrix}$ and 
$B=\begin{bmatrix}
1&3&2\\
3&0&4
\end{bmatrix}$.


 a) Calculate the norm of matrix $A$. ***Note: If you are using matlab to verify your answer, use norm(A,'fro') instead of norm(A).***

 b) Let us multiply matrix $A$ by a scalar 2. Calculate $2A$ and the norm of matrix $2A$ and $\frac{||2A||}{||A||}$.

 c) For a general scalar $k$, write an expression for $||kA||$ that contains only $k$ and $||A||$.

 d) Calculate the distance between matrix $A$ and $B$.

**Write your answer here**

## Problem 3 - Matrix Vector Multiplication (Total points: $20 = 5+5+5+5$)

a) Let $I_2$ denote the 2 by 2 identity matrix. What should be the shape of $x$ in order for $I_2x$ to be computable?

b) $I_2$ contains $2$ column vectors $e_1,e_2$. Let $x$ be a $2$-vector $\begin{bmatrix}
x_1\\
x_2
\end{bmatrix}$. Rewrite $Ax$ in the form of a linear combination of $e_1,e_2$.

c) Part b tells us that $I_2$ is a basis for $R^2$ because any $2$-vector can be written as a linear combination of the column vectors of $I_2$. However, $I_2$ is not the only basis for $R^2$. Prove that $B = \begin{bmatrix}
1&1\\
0&1
\end{bmatrix}$ is a basis for $R^2$ by showing that the columns of $B$ are linearly independent.

d) Normally, the $2$-vector we write down use basis $I_2$. For example, $b = \begin{bmatrix}3\\2\end{bmatrix} = 3\begin{bmatrix}1\\0\end{bmatrix} + 2\begin{bmatrix}0\\1\end{bmatrix}$ uses basis $I_2$. What is vector $b$ using basis $B$? In other words, you are trying to find scalar $z_1,z_2$ such that $z_1\begin{bmatrix}1\\0\end{bmatrix} + z_2\begin{bmatrix}1\\1\end{bmatrix} = \begin{bmatrix}3\\2\end{bmatrix}$, where vector $z = \begin{bmatrix}z_1\\z_2\end{bmatrix}$ is how you would write $b$ using basis $B$.

**Write your answer here**