<h1>Table of Contents<span class="tocSkip"></span></h1>
<div class="toc"><ul class="toc-item"><li><span><a href="#Performing-Principal-Component-Analysis-(PCA)---Lab" data-toc-modified-id="Performing-Principal-Component-Analysis-(PCA)---Lab-1"><span class="toc-item-num">1&nbsp;&nbsp;</span>Performing Principal Component Analysis (PCA) - Lab</a></span><ul class="toc-item"><li><span><a href="#Introduction" data-toc-modified-id="Introduction-1.1"><span class="toc-item-num">1.1&nbsp;&nbsp;</span>Introduction</a></span></li><li><span><a href="#Objectives" data-toc-modified-id="Objectives-1.2"><span class="toc-item-num">1.2&nbsp;&nbsp;</span>Objectives</a></span></li><li><span><a href="#Import-the-Data" data-toc-modified-id="Import-the-Data-1.3"><span class="toc-item-num">1.3&nbsp;&nbsp;</span>Import the Data</a></span></li><li><span><a href="#Normalize-the-Data" data-toc-modified-id="Normalize-the-Data-1.4"><span class="toc-item-num">1.4&nbsp;&nbsp;</span>Normalize the Data</a></span></li><li><span><a href="#Calculate-the-Covariance-Matrix" data-toc-modified-id="Calculate-the-Covariance-Matrix-1.5"><span class="toc-item-num">1.5&nbsp;&nbsp;</span>Calculate the Covariance Matrix</a></span></li><li><span><a href="#Calculate-the-Eigenvectors" data-toc-modified-id="Calculate-the-Eigenvectors-1.6"><span class="toc-item-num">1.6&nbsp;&nbsp;</span>Calculate the Eigenvectors</a></span></li><li><span><a href="#Sorting-the-Eigenvectors-to-Determine-Primary-Components" data-toc-modified-id="Sorting-the-Eigenvectors-to-Determine-Primary-Components-1.7"><span class="toc-item-num">1.7&nbsp;&nbsp;</span>Sorting the Eigenvectors to Determine Primary Components</a></span></li><li><span><a href="#Reprojecting-the-Data" data-toc-modified-id="Reprojecting-the-Data-1.8"><span class="toc-item-num">1.8&nbsp;&nbsp;</span>Reprojecting the Data</a></span></li><li><span><a href="#Summary" data-toc-modified-id="Summary-1.9"><span class="toc-item-num">1.9&nbsp;&nbsp;</span>Summary</a></span></li></ul></li></ul></div>

# Performing Principal Component Analysis (PCA) - Lab

## Introduction

Now that you have high level overview of PCA as well as some of the details in the algorithm itself, its time to practice implementing PCA on your own using the NumPy package. 

## Objectives

You will be able to:
    
* Implement PCA from scratch using NumPy

## Import the Data

To start, import the data stored in the file 'foodusa.csv'.

In [6]:
#Your code here
import pandas as pd
data = pd.read_csv('foodusa.csv')
data.head()
data.drop('City', axis=1, inplace=True)

## Normalize the Data

Next, normalize your data by subtracting the feature mean from each of the columns

In [7]:
#Your code here
mean_centered = data - data.mean()
mean_centered.head()

Unnamed: 0,Bread,Burger,Milk,Oranges,Tomatoes
0,-0.791304,2.643478,11.604348,-22.891304,-7.165217
1,1.208696,-0.856522,5.204348,-28.391304,4.534783
2,4.408696,8.943478,-0.895652,1.008696,10.834783
3,-2.491304,-5.256522,3.004348,15.408696,2.434783
4,1.408696,-5.156522,0.404348,2.908696,2.434783


## Calculate the Covariance Matrix

The next step for PCA is to calculate to covariance matrix for your normalized data. Do so here.

In [8]:
#Your code here
import numpy as np
cov = np.cov([mean_centered.Bread, mean_centered.Burger, mean_centered.Milk, 
             mean_centered.Oranges, mean_centered.Tomatoes])
cov

array([[  6.2844664 ,  12.91096838,   5.71905138,   1.31037549,
          7.28513834],
       [ 12.91096838,  57.07711462,  17.50752964,  22.69187747,
         36.29478261],
       [  5.71905138,  17.50752964,  48.30588933,  -0.27503953,
         13.44347826],
       [  1.31037549,  22.69187747,  -0.27503953, 202.75628458,
         38.76241107],
       [  7.28513834,  36.29478261,  13.44347826,  38.76241107,
         57.80055336]])

## Calculate the Eigenvectors

Next, calculate the eigenvectors for your covariance matrix.

In [12]:
#Your code here
eigen_value, eigen_vector = np.linalg.eig(cov)
print('Eigen Value:',eigen_value)
print('\n Eigen Vector:',eigen_vector)

Eigen Value: [218.99867893  91.72316894   3.02922934  20.81054128  37.66268981]

 Eigen Vector: [[-0.02848905 -0.16532108 -0.96716354 -0.18972574  0.02135748]
 [-0.2001224  -0.63218494  0.24877074 -0.65862454  0.25420475]
 [-0.0416723  -0.44215032  0.03606094  0.10765906 -0.88874949]
 [-0.93885906  0.31435473 -0.01521357 -0.06904699 -0.12135003]
 [-0.27558389 -0.52791603 -0.03429221  0.71684022  0.36100184]]


## Sorting the Eigenvectors to Determine Primary Components

Great! Now that you have the eigenvectors and their associated eigenvalues, sort the eigenvectors based on their eigenvalues!

In [15]:
#Your code here
e_indices = np.argsort(eigen_value)[::-1]
e_indices

array([0, 1, 4, 3, 2])

In [16]:
eigenvectors_sorted=eigen_vector[:,e_indices]
eigenvectors_sorted

array([[-0.02848905, -0.16532108,  0.02135748, -0.18972574, -0.96716354],
       [-0.2001224 , -0.63218494,  0.25420475, -0.65862454,  0.24877074],
       [-0.0416723 , -0.44215032, -0.88874949,  0.10765906,  0.03606094],
       [-0.93885906,  0.31435473, -0.12135003, -0.06904699, -0.01521357],
       [-0.27558389, -0.52791603,  0.36100184,  0.71684022, -0.03429221]])

## Reprojecting the Data

Finally, reproject the dataset using your eigenvectors. Reproject the dataset down to 2 dimensions.

In [18]:
#Your code here
eigenvectors_sorted[:2]

array([[-0.02848905, -0.16532108,  0.02135748, -0.18972574, -0.96716354],
       [-0.2001224 , -0.63218494,  0.25420475, -0.65862454,  0.24877074]])

## Summary

Well done! You've now coded PCA on your own using NumPy! With that, it's time to look at further application of PCA.