### **Dimensionality Reduction**

Why?
- visualization
- curse of dimensionality: pairwise distance converges as dimension increases
- computational cost
- noise reduction

How?
- eliminate some features -> lose information
- want to preserve information when reducing dimensionality
- Find a new coordinate system to reduce the dimensionality of the samples

###**Principal Components Analysis (PCA)**

- PCA is a dimensionality reduction method
  - A linear transformation
  - Find a new coordinate system for the dataset
  - Only use a small part of coordinates to represent data points
  - preserve as much of the data's variance as possible

- Formally, given a dataset with $n$ samples {$x_1,x_2,...,x_n$} | $x_i \in \mathbb{R}^d$
- Find a linear transformation $W^{dxk}$ where $k < d$
- $d$ = number of features in the original data
- $k$ = number of new features
- preserve the variance as much as possible

**First Princial Component**
- subtract the mean

  $X' = [x_1', x_2', ..., x_n']$   

  $x_i' = x_i - \bar{x}$

- apply the linear transformation
  - get data points in the new coordinate system
  
  $[w^Tx_1', w^Tx_2', ..., w^Tx_n']$

- compute the variance in the new coordinate system

  $\frac{1}{n}\sum(w^Tx_1')^2 $


Loss/Objective function:
- maximize the variance in the new coordinate system

  $max_{||w||_2=1}\frac{1}{n}\sum(w^Tx_1')^2 $

- optimizing this problem can find the new coordinate system

**Second principal component**

- $max_{||W||_2=1}, \frac{1}{n}\sum(w^Tx_1')^2 $

- $w$ is orthogonal to $w_1$


**Principal components analysis (find all components at once)**

$w^TX'X^T$$'w$

Objective function

$max_{W^TW = I} W^TX'X^T$$'W$

How to optimize this value?

- Eigen-decomposition for the covariance matrix

- the solution is the largest $k$ eigenvectors



**Steps**

1. Mean subtraction

  $X' = X - \frac{1}{n}X11^T$
  
  $X'_{dxn}$
2. Compute covariance matrix
  
  $A = XX^T$
  
  $A_{dxd}$

3. Eigen-decomposition
4. Keep the largest k eigenvectors
$W = [u_1, u_2, ..., u_k]$

  $W_{dxk}$

### **Recommender Systems**

1. **Content-based filtering**: recommendations based on item description/features

2. **Collaborative filtering**: look at the ratings of like-minded users to provide recommendations
  
  Assumption: similar feedback implies similar taste between users



###Memory-based Collaborative Filtering

- user-based & item-based
###**User-CF**
use the average ratings of neighbors to predict whether the target user will like the product
- Pearson correlation:

  $a,b$ = users

  $r_{a,p}$ = rating of user $a$ for item $p$

  $P$ = set of items, rated by both $a$ and $b$

  $sim(a,b) = \frac{\sum(r_{a,p}-\bar{r}_{a})(r_{b,p}-\bar{r}_b)}{\sqrt{\sum(r_{a,p}-\bar{r}_{a})^2} \sqrt{\sum{(r_{b,p}-\bar{r}_b)^2}}}$

  - Possible similarity values between -1 and 1

  ex:

  $r_a = [5, 4, 1, 2] $ ;  $ r_b = [5, 3, 1, 1]$

  $\bar{r}_a = \frac{5+4+1+2}{4} = 3$ ;
  $\bar{r}_b = \frac{5+3+1+1}{4} = 2.5$

  $r_a - \bar{r}_a = [2,1,-2,-1]$ ; $r_b - \bar{r}_b = [2.5, 0.5, -1.5, -1.5]$


  Predictions:
  
- use similarity threshold or fixed number of neighbors
$pred(a,p) = \bar{r}_a + \frac{\sum{sim(a,b)*(r_{b,p}-\bar{r}_b)}}{\sum{sim(a,b)}}$



###**Item-CF**
- use the similarity between items (not users) to make predictions
similarity:

  $sim(\vec{a},\vec{b}) = \frac{\vec{a} · \vec{b}}{|\vec{a}|*|\vec{b}|}$

  prediction:

  $pred(u,p) = \frac{\sum{sim(i,p)*r_{u,i}}}{\sum{sim(i,p)}}$

###Evaluation
1. MAE and RMSE (regression task)

2. Precision and Recall
   
   Precision: a measure of exactness, fraction of relevant items retrieved out of all items retrieved

  $precision = \frac{TP}{TP+FP} = \frac{good recommendations}{all recommendations}$

  Recall: a measure of completeness, fraction of relevant items retrieved out of all relevant items

   $recall = \frac{TP}{TP+FN} = \frac{good recommendations}{all good movies}$