# Homework 1

## Problem 1

Prove that any projection matrix $P$ for an orthogonal projection in $\mathbb{R}^n$ satisfies the two properties:

1) $P^2 = P$

2) $P$ is symmetric


## Problem 2

Show that the hyperplane $g(X) = \omega^TX + b = 0$ is perpendicular to $\omega$.

## Problem 3

In this problem, we want to explore a reflection transformation of $X$, which produces a mirror vector $Z$ with respect to vector $V$. See Figure 1.

<img src="./image_files/problem01.png", width = 250>
<center>Figure 1</center>

1) Is a reflection transformation linear?

2) If yes, find matrix $M$ such that $Z = MX$

3) For $V = \begin{bmatrix} 1 & 1 \end{bmatrix}^T$, compute $M$ and its eigenvalues/eigenvectors (here, vector $V$ is a vector with 45 degree angle with x-axis)

## Problem 4 

Projection of a vector $x$ on a given vector $y$. Let y be a given $n$-vector, and consider the function $f : \mathbb{R}^n \implies \mathbb{R}^n$, defined as

$$f(x) = \frac{x^Ty}{\lVert y\rVert^2}y$$
 
We know that $f(x)$ is the projection of $x$ on the (fixed, given) vector $y$. Is $f$ a linear function of $x$? If your answer is yes, give an $n^Tn$  matrix $A$ such that $f(x) = Ax$  for all $x$ . If your answer is no, show with an example that $f$ does not satisfy the definition of linearity  $f(\alpha u + \beta v) = \alpha f(u) + \beta f(v)$.


## Problem 5

(a) Find the solution by setting the derivatives of the following objective function:

$$\min \, (2x_1 - 1)^2 + (-x_1 + x_2)^2 + (2x_2 +1)^2$$

(b) Formulate the corresponding least-squares problem (i.e., find $A$ and $b$), and solve it using cvx.
 
$$\min \, \lVert Ax - b\rVert_2$$

Hint: it is justified that if $x$ minimizes $\lVert Ax - b \rVert^2_2$ , then it also minimize $\lVert Ax - b\rVert_2$ .



## Problem 6

Suppose you receive the binary signal. The signal is corrupted with noises while transmitting through channels. We want to estimate original signal through the $L_1$ optimization. The mathematical problem statement is given: 

$$
\begin{array}{Icr}\begin{align*}
y = x + \omega\\
x \in \{ 0, 1 \}\\
\omega \text{ is noise}
\end{align*}\end{array}
\quad \Longrightarrow 
\begin{array}{I} \quad 
\text{Recover original signal } x \text{ from corrupted signal } y
\end{array}
$$

Note: this problem will be revisited after a midterm. We will solve this problem using the probability theory later.

### Step 1. Data generation

First, let’s generate the original signal as shown in the below figure. This can be simply done using ones and zeros commands in Matlab. 

<img src="./image_files/problem06.png", width = 400>

Next, corrupt the original signal with Gaussian noise. This can be done using randn command in MATLAB. One of realizations of the corrupted signal will be given: 

<img src="./image_files/problem07.png", width = 400>

### Step 2. $L_1$ optimization

Note that the signal is sparse (in a sense of frequency domain). Therefore we can apply the $L_1$ optimization. We will optimize the $L_2$ cost function with the $L_1$ constraints. Here’s a rough CVX code : 

```octave
cvx_begin quiet
    variable x_con(200)
    minimize norm(x_con - x_corrupt,2)
    subject to
        norm(x_con(2:200) - x_con(1:199),1) <= beta;
cvx_end
```

Plot the reconstructed signal. In addition, by changing the beta, explain how beta affects in an optimization process. One of reconstructed signal with a proper value of beta is shown below:

<img src="./image_files/problem08.png", width = 400>

## Problem 7

In this problem, we will try to classify handwritten digits. For the sake of simplicity, we simplify this digit classification problem into a binary classification between digit 0 and digit 1.

### Step 1. Load the data

|Data | Data dexcription |
|---|---|
|data0 |1000 images (28×28 pixels) of handwritten digit 0 |
|data1 |1000 images (28×28 pixels) of handwritten digit 1 |

<a href = "https://www.dropbox.com/s/2perxs8nf4putzw/data0?dl=0"> download data 0 </a>/ 
<a href = "https://www.dropbox.com/s/s9lo2zjsg3cq3fu/data1?dl=0"> download data 1 </a>read the files in Matlab

To read the files in Matlab, use the following code: 

```octave
fid = fopen('data0', 'r');
[t1,N] = fread(fid,[28 28],'uchar');
[t2,N] = fread(fid,[28 28],'uchar');
```

Here, we will load 1000 numbers of data from ‘data0’ and ‘data1’. To display the image use imshow(t1) or imagesc(t1).

Note: make sure you are reading the files correctly. Check by displaying the first few and the last few images in each class.

Note: calculations are more manageable if  you go though and convert each of the pixels in $28\times28$ matrix to a binary value first. 

```octave
% Convert to binary image
t1 = t1 > 125;

% or

im2bw(t1)
```

### Step 2. Extract features

Now we must select the own ‘features’ from image data to detect digit 0 and digit 1. Two features are recommended 

1)	The total average pixels located at the center of the image (`t1(10:19,10:19)`).

2)	The total average pixels over the entire location. 

3)	Include the ones as our bias term.


$$\Phi(x) = \begin{bmatrix}\text{feature1}\\ \text{feature2}\\ 1 \end{bmatrix}$$

You should end up with a $2000\times3$ input matrix with the first $1000$ rows correspond to all of the ‘data0’ and the second 1000 rows correspond to the two features for all of the given ‘data1’. This matrix is matrix $\Phi$  which we learned in a class.

### Step 3. Plot the data
Plot the data to see if classes are separable. The expected plot is the following: 

<img src="./image_files/problem02.png", width = 400>


## Problem 8

We would like to use the perceptron algorithm to classify digit 0 and digit 1 in the training set.
	
### Step 1. Initialization

You have to initialize $\omega$  . We usually set $\omega$ to a zero vector as an initial guess.

### Step 2. Update $\omega$

We update $\omega$  when the prediction is wrong. The update rule is the following:  

$$ \omega \leftarrow \omega + y \cdot x$$

We will repeat the same iteration for the number of data points. Here’s a pseudo code:

```
For k = 1:100 {
      For j = 1:100 {
            i = random integer selection between 1 and 2000
            compute yhat(i)
            if yhat(i) is wrong {
                w = w + y(i)x(i)
            }
    }
}
```

For every $k’$s iteration, count and store how many predictions are wrong to see if your classifier converges to somewhere or not.

### Step 3. Plot the result
You are asked to plot two graphs. First, plot the number of wrong predictions with respect to every $k’$s iteration. The graph is expected as follows: 

<img src="./image_files/problem03.png", width = 400>

Second, plot the classifier (decision boundary). Note that the decision boundary is given: 

$$\omega_1 x_1 + \omega_2x_2 + \omega_3 = 0$$

The expected graph is the following

<img src="./image_files/problem04.png", width = 400>

## Problem 9

We would like to use the SVM to classify digit 0 and digit 1 in the training set instead of the perceptron.

### Step 1. Define variables
Find a classifier using the support vector machine which is equivalent to solving the following optimization problem:

$$\begin{align*}
\text{minimize } & \lVert \omega \rVert_2 + \gamma(1^Tu + 1^Tv)\\
\text{subject to }& \\
& X^T \omega + \omega_0 \geq 1 - u\\
& Y^T \omega + \omega_0 \leq - (1-v)\\
& u \geq 0\\
& v \geq 0
\end{align*}$$

Here, $X$  and $Y$  are $1000\times3$ matrix for each class. You can divide the above matrix $\Phi$ into two parts to reuse it.

### Step 2. CVX

Run CVX to find  $\omega$.

### Step 3. Plot the result

Now you have an optimal $\omega$ . Plot a decision boundary. Note that our decision boundary is defined as follows: 

$$ \omega_1 x_1 + \omega_2 x_2 + \omega_3 = 0$$

The expected graph is: 

<img src="./image_files/problem05.png", width = 400>