# Homework 4
### Kyle Hadley

In [2]:
%matplotlib inline
import numpy as np
import matplotlib.pyplot as plt

In [3]:
import warnings
warnings.simplefilter('ignore')

## 1. PCA via Successive Deflation

*Note: Worked problem with Charlie and Joaquin from class on this problem.*

### (a)

For the covariance of the deflated matrix, we are given the relationship $\tilde{X}^T = (I - v_1v_1^T)X^T$. From this relationship, we know that $\tilde{X} = X(I - v_1v_1^T)$.

From these relationships, we start with our covariance of the deflated matrix such that,

$$\tilde{C} = \frac{1}{n} \tilde{X}^T\tilde{X}$$

Substituting our relationships for $\tilde{X}^T$ and $\tilde{X}$,

$$\tilde{C} = \frac{1}{n} (I - v_1v_1^T)X^TX(I - v_1v_1^T)$$

We can substitute $\frac{1}{n}X^TX = C$ such that

$$\tilde{C} = (I - v_1v_1^T)C(I - v_1v_1^T)$$
$$\tilde{C} = C - v_1v_1^TC - Cv_1v_1^T + v_1v_1^TCv_1v_1^T$$

Substituing our relationship $Cv_1 = \lambda_1v_1$ and given that $\lambda$ is a constant,

$$\tilde{C} = C - v_1v_1^TC - \lambda_1 v_1v_1^T + v_1v_1^T\lambda_1 v_1v_1^T$$
$$\tilde{C} = C - v_1v_1^TC - \lambda_1 v_1v_1^T + \lambda_1 v_1v_1^Tv_1v_1^T$$

Also, given our orthonormality relationship we know that $v_1^Tv_1 = 1$; thus,

$$\tilde{C} = C - v_1v_1^TC - \lambda_1 v_1v_1^T + \lambda_1 v_1v_1^T$$
$$\tilde{C} = C - v_1v_1^TC$$

We can take a "double" transpose of our left-most term (i.e. $a = (a^T)^T$); thus,

$$\tilde{C} = C - ((v_1v_1^TC)^T)^T$$
$$\tilde{C} = C - (C^Tv_1v_1^T)^T$$

From our relationship for $C$, we can see that $C = C^T$ ($C = \frac{1}{n}X^TX = C^T$); thus,

$$\tilde{C} = C - (Cv_1v_1^T)^T$$

Applying our relationship again of $Cv_1 = \lambda_1 v_1$ and $C = \frac{1}{n}X^TX$,

$$\tilde{C} = \frac{1}{n}X^TX - (\lambda_1 v_1v_1^T)^T$$
$$\tilde{C} = \frac{1}{n}X^TX - \lambda_1 v_1v_1^T$$

### (b)

For a $j \neq 1$ and $v_j$ as a principle eigvenvector of $C$, we can show that $v_j$ is also a principal eigenvector of $\tilde{C}$ with the same eigenvalue $\lambda_j$.

Given $\tilde{C} = \frac{1}{n}X^TX - \lambda_1 v_1v_1^T$, we right-multiply both sides by $v_j$,

$$\tilde{C}v_j = \left(\frac{1}{n}X^TX - \lambda_1 v_1v_1^T\right) v_j$$
$$\tilde{C}v_j = \frac{1}{n}X^TXv_j - \lambda_1 v_1v_1^Tv_j$$

Given our orthonormality relationship, we know that if $j \neq 1$ then $v_1^Tv_j = 0$; thus,

$$\tilde{C}v_j = \frac{1}{n}X^TXv_j - \lambda_1 v_1(0)$$
$$\tilde{C}v_j = \frac{1}{n}X^TXv_j$$

Substituting $\frac{1}{n}X^TX = C$,

$$\tilde{C}v_j = Cv_j = \lambda_j v_j$$

Thus, we can see that $v_j$ is also a principal eigenvector of $\tilde{C}$ with the same eigenvalue $\lambda_j$.

### (c)



### (d)



## 2. Locality Sensitive Hashing

### (a)

In order to avoid the 3rd condition where the magical oracle is unstable, we set $c = 1$ such that $r = cr$. This will ensure that we avoid the unstable condition when requesting a nearest neighbor.

After setting $c=1$, we can perform iterative queries by starting with an arbitrary $r_0$. Given an $r_0$ we can ask the oracle for the nearest neighbor of a query point $q$.

If the oracle returns a point $x'$, then we will reduce our $r_0$ by half (rounded down) and perform a new query with our new $r_1$ (i.e. half the original $r_0$). If the oracle still returns a point $x'$, then we continue to reduce our $r_i$ by half (as long as it's possible with integer values) until we've reached a point where the oracle finds nothing. Once the oracle finds nothing, then we take the average of our $r_{i-1}$ from the previous iteration and the latest $r_i$ for $r_{i+1}$. We perform this "yo-yo"-ing until we reach a point where we know that there is no smaller $r$ that yields an $x'$.

We perform a similar series of steps if the oracle doesn't return a point $x'$ with our initial $r_0$ but instead set $r_{i+1} = 2r_i$ until the oracle does return an $x'$. Then we begin the same "yo-yo"ing as described above until we reach a set $r$.

### (b)

The lower bound for the probability that $h$ maps $x_i$ and $x_j$ to the same value is a function of the number of common values within $x_i$ and $x_j$ at the same indices. While we don't have a function or value for the common values, we can use the Hamming distance as this represents where $x_i$ and $x_j$ have differing values at various positions within the vectors.

Given that $d(x_i, x_j) \leq r$, we know that the lower bound for common values (i.e. $h(x_i) = h(x_j)$) is when $d(x_i, x_j) = r$; thus, we can describe the probability of $h(x_i) = h(x_j)$ as,

$$p_1 = Pr(h(x_i) = h(x_j)) = 1 - \frac{r}{m}$$

where $m$ is the size of the binary vectors $x_i$ and $x_j$.

Given $d(x_i, x_j) \geq r$, we know that upper bound for common values is when $d(x_i, x_j) = cr$; thus, we can describe the probability of $h(x_i) = h(x_j)$ as,

$$p_2 = Pr(h(x_i) = h(x_j)) = 1 - \frac{cr}{m}$$

The inequality relationship between $p_1$ and $p_2$ is $p_2 \leq p_1$.



### (c)

Given that $g(x_i) = (h_1(x_i), h_2(x_i), \ldots , h_k(x_i))$, we can surmise that the probability of $g(x_i) = g(x_j)$ is equivalent to the combined probability of $h_1(x_i) = h_1(x_j)$, $h_2(x_i) = h_2(x_j)$, ..., $h_k(x_i) = h_k(x_j)$; thus,

$$Pr(g(x_i) = g(x_j)) = Pr(h_1(x_i) = h_1(x_j)) \times Pr(h_2(x_i) = h_2(x_j)) \times \ldots \times Pr(h_k(x_i) = h_k(x_j))$$

Substituting our $p_1$ for our lower bound condition, the lower bound of our probability is

$$Pr(g(x_i) = g(x_j)) = p_1 \times p_1 \times \ldots \times p_1$$
$$Pr(g(x_i) = g(x_j)) = p_1^k$$

Substituting our $p_2$ for our lower bound condition, the lower bound of our probability is

$$Pr(g(x_i) = g(x_j)) = p_2 \times p_2 \times \ldots \times p_2$$
$$Pr(g(x_i) = g(x_j)) = p_2^k$$

### (d)

### (e)

### (f)

## 3. Programming Problem: Lasso

*Note: Worked on this problem with Niraj and Charlie from class.