# 1. Linear Algebra

## Problem 3

In this problem, we want to explore a reflection transformation of $X$, which produces a mirror vector $Z$ with respect to vector $V$. See Figure 1.

<img src="./image_files/LA01.png", width = 250>
<center>Figure 1</center>

1) Is a reflection transformation linear?

2) If yes, find matrix $M$ such that $Z = MX$ using concept of projection.

3) If yes, find matrix $M$ such that $Z = MX$ using concept of eigen values and vectors. (you can use the results of the above problems)

3) For $V = \begin{bmatrix} 1 & 1 \end{bmatrix}^T$, compute $M$ and its eigenvalues/eigenvectors (here, vector $V$ is a vector with 45 degree angle with x-axis)

## Problem 5

Let $R = R(\theta)$ be a rotation matrix with a rotational angle of $\theta$ in $\mathbb{R}^n$.

1. Prove that $$R^TR=I \;\text{in}\; \mathbb{R}^n$$
    
    Hint: Euclidian distances of vectors are preserved after a rotational operation. (_i.e._, $\lVert Rx \rVert = \lVert x \rVert$ for any $x \in \mathbb{R}^n$ ) 
<br><br>
2. Prove that $$R^T(\theta) = R^{-1}(\theta) = R(-\theta)\;\text{in}\;\mathbb{R}^n$$
<br><br>
3. Show that column vectors in $R$ are orthogonal in $\mathbb{R}^n$.

## Problem 11

Permutation matrices. A square matrix A is called a permutation matrix if it satisfies the following three properties:

- all elements of A are either zero or one
- each column of A contains exactly one element equal to one
- each row of A contains exactly one element equal to one.

The matrices

$$A_1 = \begin{bmatrix}
0 & 1 & 0\\
0 & 0 & 1 \\
1& 0 & 0 
\end{bmatrix}, \quad
A_2 = \begin{bmatrix}
0 & 1 & 0 & 0\\
0 & 0 & 0 & 1\\
0 & 0& 1 & 0 \\
1 & 0 & 0 & 0
\end{bmatrix}$$

are examples of permutation matrices. A less formal definition is the following: a permutation matrix is the identity matrix with its rows reordered.

(a) Let A be an $n \times n$ permutation matrix. Give a simple description in words of the relation between a n-vector $x$ and $f(x) = Ax$.

(b) Prove that columns of permutation matrix is orthogonal.

(c) We can also define a second linear function $g(x) = A^Tx$ in terms of the same permutation matrix. What is the relation between $g$ and $f$ ?

Hint: Use the result of (b)

(d) Describe meaning of columns and rows of permutation matrix. Then, explain the result of (c).

# 3. Regression

## Problem 2

The regularized least-squares problem has the form

<br>
$$ \min_{\theta} \;\lVert A\theta -y\rVert_2^2 + \lambda \lVert \theta \rVert_2^2$$

(a) Show that the solution is given by
<br><br>
$$ \hat{\theta} = \left( A^T A + \lambda I_n \right)^{-1} A^T y $$
 (Do not use the method of Lagrangian multipliers)
<br><br>

(b) Write down a gradient descent algorithm for given optimization problem.

(c) Based on the result of (b), describe the role of regularizer term.

(d) Describe results of (a) and (b) have the same meaning.

(e) Find and draw an approximated curve of the given data points in Python using your gradient descent algorithm.
    - overcome overfitting 
    - use RBF 
    - choose a proper value of $\lambda$ on your own

In [3]:
x = -4.5:1:4.5;
x = x(:);

y = [0.9819 0.7973 1.9737 0.1838 1.3180 -0.8361 -0.6591 -2.4701 -2.8122 -6.2512]';
plot(x,y,'o'), axis([-5 5 -12 6])

SyntaxError: invalid syntax (<ipython-input-3-787b2d17e8e1>, line 1)

# 4. Classification

## Problem 1

In this problem, we will try to classify handwritten digits. For the sake of simplicity, we simplify this digit classification problem into a binary classification between digit 0 and digit 1.

### Step 1. Load the data

|Data | Data dexcription |
|---|---|
|data0 |1000 images (28×28 pixels) of handwritten digit 0 |
|data1 |1000 images (28×28 pixels) of handwritten digit 1 |

<a href = "https://www.dropbox.com/s/lt6shih09yq71ta/data0?dl=0"> download data 0 </a>/ 
<a href = "https://www.dropbox.com/s/tc9grm8vnl65tk5/data1?dl=0"> download data 1 </a>read the files in Matlab

To read the files in Matlab, use the following code: 

```octave
fid = fopen('data0', 'r');
[t1,N] = fread(fid,[28 28],'uchar');
[t2,N] = fread(fid,[28 28],'uchar');
```

Here, we will load 1000 numbers of data from ‘data0’ and ‘data1’. To display the image use imshow(t1) or imagesc(t1).

Note: make sure you are reading the files correctly. Check by displaying the first few and the last few images in each class.

Note: calculations are more manageable if  you go though and convert each of the pixels in $28\times28$ matrix to a binary value first. 

```octave
% Convert to binary image
t1 = t1 > 125;

% or

im2bw(t1)
```

### Step 2. Extract features

Now we must select the own ‘features’ from image data to detect digit 0 and digit 1. Two features are recommended 

1)	The total average pixels located at the center of the image (`t1(10:19,10:19)`).

2)	The total average pixels over the entire location. 

3)	Include the ones as our bias term.


$$\Phi(x) = \begin{bmatrix}\text{feature1}\\ \text{feature2}\\ 1 \end{bmatrix}$$

You should end up with a $2000\times3$ input matrix with the first $1000$ rows correspond to all of the ‘data0’ and the second 1000 rows correspond to the two features for all of the given ‘data1’. This matrix is matrix $\Phi$  which we learned in a class.

### Step 3. Plot the data
Plot the data to see if classes are separable. The expected plot is the following: 

<img src="./image_files/classification01.png", width = 400>

## Problem 3

We would like to use the SVM to classify digit 0 and digit 1 in the training set instead of the perceptron.

### Step 1. Define variables
Find a classifier using the support vector machine which is equivalent to solving the following optimization problem:

$$\begin{align*}
\text{minimize } & \lVert \omega \rVert_2 + \gamma(1^Tu + 1^Tv)\\
\text{subject to }& \\
& X^T \omega + \omega_0 \geq 1 - u\\
& Y^T \omega + \omega_0 \leq - (1-v)\\
& u \geq 0\\
& v \geq 0
\end{align*}$$

Here, $X$  and $Y$  are $1000\times3$ matrix for each class. You can divide the above matrix $\Phi$ into two parts to reuse it.

### Step 2. CVX

Run CVX to find  $\omega$.

### Step 3. Plot the result

Now you have an optimal $\omega$ . Plot a decision boundary. Note that our decision boundary is defined as follows: 

$$ \omega_1 x_1 + \omega_2 x_2 + \omega_3 = 0$$

The expected graph is: 

<img src="./image_files/classification04.png", width = 400>

## Problem 4

Let $f(x_1,x_2) = 2x_1 + 3x_2 + 1 = 0$ be the hyperplane (or decision surface in classification problems). Compute the shortest distances from $x_a = \begin{bmatrix} 2 \\1\end{bmatrix}$ and $x_b = \begin{bmatrix} 1 \\ -1\end{bmatrix}$ to the above hyperplane.


# 5. Clustering

## Problem 1

You will use K-means to compress an image by reducing the number of colors it contains.

1) Image Representation

The data for this exercise contains a 128-pixel by 128-pixel TIFF image named "bird.tiff." It looks like the picture in Figure 1.
<br><br>
<img src="./image_files/clus01.bmp", width = 250>
<br><br>
In a straightforward 24-bit color representation of this image, each pixel is represented as three 8-bit numbers (ranging from 0 to 255) that specify red, green and blue (RGB) intensity values. Our bird photo contains thousands of colors, but we'd like to reduce that number to 16. By making this reduction, it would be possible to represent the photo in a more efficient way by storing only the RGB values of the 16 colors present in the image.

In this problem, you will use K-means to reduce the color count to $k = 16$. That is, you will compute 16 colors as the cluster centroids and replace each pixel in the image with its nearest cluster centroid color.

2) K-means in Matlab

<a href = "https://www.dropbox.com/s/tibygtcahpw9hmk/bird.tiff?dl=0"> download bird.tiff </a>

In Matlab, load the image into your program with the following command:

```octave 
im = imread('bird.tiff');
A = double(im);
imshow(im)
```
This creates a three-dimensional matrix $A$ whose first two indices identify a pixel position and whose last index represents red, green, or blue. For example, $A(50, 33, 3)$ gives you the blue intensity of the pixel at position $y = 50, x = 33$. (The y-position is given first, but this does not matter so much in our example because the $x$ and $y$ dimensions have the same size).

Your task is to compute 16 cluster centroids from this image, with each centroid being a vector of length three that holds a set of RGB values. Here is the K-means algorithm as it applies to this problem:

3) K-means algorithm

> 1. For initialization, sample 16 colors randomly from the original picture. There are your 
$k$ means $\mu_1, \mu_2, \cdots, \mu_k$.
<br><br>
> 2. Go through each pixel in the small image and calculate its nearest mean.
 <br><br>
 $$c^{(i)} = \text{arg} min_j \lVert x^{(i)}-\mu_j \rVert^2$$
 <br><br>
> 3. Update the values of the means based on the pixels assigned to them. 
<br><br>
$$\mu_j = \frac{\sum\limits_i^m 1\{c^{(i)} = j\}x^{(i)}}{\sum\limits_i^m 1\{c^{(i)} = j\}}$$
<br><br>
>4. Repeat steps 2 and 3 until convergence. This should take between 30 and 100 iterations. You can either run the loop for a preset maximum number of iterations, or you can decide to terminate the loop when the locations of the means are no longer changing by a significant amount.

Note: In Step 3, you should update a mean only if there are pixels assigned to it. Otherwise, you will see a divide-by-zero error. For example, it's possible that during initialization, two of the means will be initialized to the same color (_i.e._ black). Depending on your implementation, all of the pixels in the photo that are closest to that color may get assigned to one of the means, leaving the other mean with no assigned pixels.

When you have recalculated the image, you can display it. When you are finished, compare your image to the one in the solutions.

```octave
imshow(unit8(A16));
```

<img src="./image_files/clus02.bmp", width = 250>