### Week 8 - 3/21/17 ###

## Principal Component Analysis (PCA) / Empirical Orthogonal Functions (EOFs) ##

Physical Oceanographers like to use EOF while other disciplines like to use PCA

Simplest type of PCA - Two variables
   
Represents the same data with new axis: __Principle Axis__

__ Example - Wind vectors __
- Define natural coordinates: alongshore
<img src="images/PCA_1.png" width="500">

Creates a new Coordinate frame that maximizes variability along an axis (PC1) and corresponding orthogonal axis (PC2)
    - where PC1 PC2 are uncorrelated
    
<img src="images/PCA_2.png" width="500">

__Example from Emery and Thompson - Current data__
<img src="images/PCA_3.png" width="500">

__Principal component analysis: Steps__
1. Create data matrix (size: N x M, N rows and M columns)
   - standardize if necessary (if data is a combination of different variables)
2. Form the covariance matrix (M x M)
3. Extract eigenvalues and eigenvectors from covariance matrix
4. The eigenvectors are the principal components, the eigenvalues are their magnitudes

Creating a data matrix, _ __ X __ _

N rows, M columns

$\begin{bmatrix} u & v 
                \\ data1 & data1 
                \\ data2 & data2
                \\ data3 & data3
                \\ vdots & vdots
                \\ data300 & data300 \end{bmatrix}$
                
Create covariance matrix, C 
or correlation matrix, R

C  -  M rows and M columns - (M x M)
    - Describes the covariance between all of the different variables
$        \begin{bmatrix}
        c_{11} & c_{12} & c_{13} \\
        c_{21} & c_{22} & c_{23} \\
        c_{31} & c_{32} & c_{33}
        \end{bmatrix}$
   
   - Correlation Matrix
   
   $\begin{bmatrix}
        1 & r_{12} & r_{13} \\
        r_{21} & 1 & r_{23} \\
        r_{31} & r_{32} & 1
        \end{bmatrix}$
   
   
$c_{uu}$  variance of u <br>

$c_{vv}$  variance of v<br>

$c_{uv}$  covariance of u and v <br>

$r_{uv}$  $\frac{c_{uv}}{S_u S_v}$, where $S_u$ is the standard deviation of u


Example Matrix

$\begin{bmatrix}
c_{uu} & c_{uv} \\
c_{vu} & c_{vv} \\
\end{bmatrix}$
        
$ \begin{bmatrix}
441 & -302.4\\
-302.4 & 324
\end{bmatrix}$
where units are in $\frac{cm^2}{s^2}$


__ Extracting the eigenvalues and eigenvectors from the covariance matrix __


$CV =  V \Lambda $ <br>
V =  eigenvector (M x M matrix)<br>
$        \begin{bmatrix}
v_{11} & v_{12} & v_{13} & \cdots \\
v_{21} & v_{22} & v_{23} & \cdots \\
v_{31} & v_{32} & v_{33} & \cdots \\
\vdots & \vdots & \vdots & \ddots
\end{bmatrix}$

Each column is one eigenvector



$\Lambda$ = eigenvalues (M x M)

$        \begin{bmatrix}
\lambda_{1} & 0 & 0 & \cdots \\
0 & \lambda_{2} & 0 & \cdots \\
0 & 0 & \lambda_{3} & \cdots \\
\vdots & \vdots & \vdots & \ddots
\end{bmatrix}$

where $\lambda _1$ - eigenvalue # 1 


### Back to the current meter example ###

Can use:
```python
from scipy import linalg
linalg.eig(C)
```
<br>

 V = $ \begin{bmatrix}
.771 & 0.636\\
-.636 & .771
\end{bmatrix}$<br>
PC1      PC2


$\Lambda$ = $ \begin{bmatrix}
690.5 & 0\\
0 & 74.5
\end{bmatrix}$


$\lambda _1 $ = 690.6 $\frac{cm^2}{S^2}$ variance along axis 1 <br>
$\lambda _2 $ = 74.5 $\frac{cm^2}{S^2}$ variance along axis 2  <br>

Time variability of PC1

$\tau _1 = X V_i$

$\tau _1 $ = u(0.771) + v(0.636) <br>
$\tau _2 $ = u(0.636) + v(0.771)

3-D visualization of principal components
<img src="images/PCA_4.png" width="700">
[source](http://ordination.okstate.edu/PCA.htm)

http://setosa.io/ev/principal-component-analysis/


__ Pacific Decadal Oscillation example  __

<img src="images/PCA_PDO.png" width="600">
[source](http://research.jisao.washington.edu/pdo/)

PDO is the first principal component of SST 

Binned SST data from 1947-1974 in 82 different sites across the Pacific Ocean

<img src="images/PCA_PDO_Bins.png" width="600">

<img src="images/PCA_PDO_STD.png" width="600">
Standard deviation of temperature anomaly [deg C] 

Temperature anomaly = temperature – seasonal monthly mean temperature



Create a data matrix:
    - (336 X 82)    -> 28 years * 12 month
    
Create a covariance matrix (what size?) 
    - (82 x 82)
    
First empirical orthogonal function  of North Pacific SST
<img src="images/PCA_PDO_First.png" width="600">


Other Components of North Pacific SST
<img src="images/PCA_PDO_All.png" width="600">



Fraction of Variance that is accounted for by the fist M empirical orthogonal functions
<img src="images/PCA_PDO_Var.png" width="600">


<img src="images/PCA_PDO_Index.png" width="600">


__PCA assumptions__
- Relationships between variables are linear
- Mean and variance are sufficient statistics (variables have Gaussian/normal distribution)
- Large variances have import dynamics (high signal to noise ration)
- The principal components are orthogonal

__ PCA Issues __

- Does not take into account phase lags between correlated variables, can detect "standing" features but not propagating waves

- Domain dependence: can get a difference answer by changing the spatial extent of the data

Shlens, J., A tutorial on principal component analysis: Derivation, discussion and singular value decomposition.  https://www.cs.princeton.edu/picasso/mats/PCA-Tutorial-Intuition_jp.pdf (on Moodle)

Example:
<img src="images/PCA_Issue.png" width="600">
If no dominate pattern exists, PCA will pick out a patterns may not be robust

__ Nutrient Profile PCA example__
<img src="images/PCA_Nutrients.png" width="600">

<img src="images/PCA_Factor.png" width="600">

Factor Loadings

A = V $\sqrt{\Lambda}$