# PCA with scikit learn

![Creative Commons License](https://i.creativecommons.org/l/by/4.0/88x31.png)  
This work by Jephian Lin is licensed under a [Creative Commons Attribution 4.0 International License](http://creativecommons.org/licenses/by/4.0/).

In [None]:
import numpy as np
import matplotlib.pyplot as plt

## Code
```python
from sklearn.decomposition import PCA
model = PCA(<parameters>)
X_new = model.fit_transform(X)
```

[Official Reference](https://scikit-learn.org/stable/modules/generated/sklearn.decomposition.PCA.html)

## Parameters
- `n_components`: target dimension

## Attributes
- `n_samples`: height of `X`
- `n_features`: width of `X`
- `n_components_`: target dimension
- `components_`: `n_components` rows of principal components
- `mean_`: `X.mean(axis=0)`
- `explained_variance_`: importance of each component
- `explained_variance_ratio_`: importance of each component in ratio
- `singular_values_`: singular values of shifted `X`  
(`singular_values_**2 / n_samples_ == explained_variance_`)  

## Fake data

##### Exercise 1
Let  
```python
mu = np.array([3,4])
cov = np.array([[1.1,1],
                [1,1.1]])
X = np.random.multivariate_normal(mu, cov, 100)
```
Let `X_new` be the result of PCA on `X` .

###### 1(a)
Plot points (rows) in `X` .  
Plot points (rows) in `X_new` .

In [None]:
### your answer here

##### 1(b)
Adding on top of the previous figure, draw vectors for the rows in `model.components_` with the tails at `model.mean_` .

In [None]:
### your answer here

###### 1(c)
Print `model.explained_variance_ratio_` .  
How important is the first component in percentage?

In [None]:
### your answer here

##### Exercise 2
Let  
```python
X = np.genfromtxt('hidden_text.csv', delimiter=',')
```
This data has all its points lie in a two-dimensional plane embedded in a much higher dimension.  
Can you find out what does this data say?

In [None]:
### your answer here