# Performing Principal Component Analysis (PCA) - Lab

## Introduction

Now that you have a high-level overview of PCA, as well as some of the details of the algorithm itself, it's time to practice implementing PCA on your own using the NumPy package. 

## Objectives

You will be able to:
    
* Implement PCA from scratch using NumPy

## Import the data

- Import the data stored in the file `'foodusa.csv'` (set `index_col=0`)
- Print the first five rows of the DataFrame 

In [1]:
import pandas as pd
data = pd.read_csv('foodusa.csv', index_col=0)
data.head()



Unnamed: 0_level_0,Bread,Burger,Milk,Oranges,Tomatoes
City,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1
ATLANTA,24.5,94.5,73.9,80.1,41.6
BALTIMORE,26.5,91.0,67.5,74.6,53.3
BOSTON,29.7,100.8,61.4,104.0,59.6
BUFFALO,22.8,86.6,65.3,118.4,51.2
CHICAGO,26.7,86.7,62.7,105.9,51.2


## Normalize the data

Next, normalize your data by subtracting the mean from each of the columns.

In [2]:
data = data - data.mean()
data.head()

Unnamed: 0_level_0,Bread,Burger,Milk,Oranges,Tomatoes
City,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1
ATLANTA,-0.791304,2.643478,11.604348,-22.891304,-7.165217
BALTIMORE,1.208696,-0.856522,5.204348,-28.391304,4.534783
BOSTON,4.408696,8.943478,-0.895652,1.008696,10.834783
BUFFALO,-2.491304,-5.256522,3.004348,15.408696,2.434783
CHICAGO,1.408696,-5.156522,0.404348,2.908696,2.434783


## Calculate the covariance matrix

The next step is to calculate the covariance matrix for your normalized data. 

In [25]:
cov_mat = data.cov()
cov_mat

Unnamed: 0,Bread,Burger,Milk,Oranges,Tomatoes
Bread,6.284466,12.910968,5.719051,1.310375,7.285138
Burger,12.910968,57.077115,17.50753,22.691877,36.294783
Milk,5.719051,17.50753,48.305889,-0.27504,13.443478
Oranges,1.310375,22.691877,-0.27504,202.756285,38.762411
Tomatoes,7.285138,36.294783,13.443478,38.762411,57.800553


## Calculate the eigenvectors

Next, calculate the eigenvectors and eigenvalues for your covariance matrix. 

In [10]:
import numpy as np
eig_values, eig_vectors = np.linalg.eig(cov_mat)

## Sort the eigenvectors 

Great! Now that you have the eigenvectors and their associated eigenvalues, sort the eigenvectors based on their eigenvalues to determine primary components!

In [11]:
# Get the index values of the sorted eigenvalues
e_indices = np.argsort(eig_values)[::-1] 

# Sort 
eigenvectors_sorted = eig_vectors[:, e_indices]
eigenvectors_sorted

array([[-4.02575805e-01+0.00000000e+00j, -2.59200829e-01+0.00000000e+00j,
         1.02499678e-01+0.00000000e+00j,  3.14613753e-01+0.00000000e+00j,
        -4.99426956e-01+0.00000000e+00j, -3.45822030e-04-6.14619929e-04j,
        -3.45822030e-04+6.14619929e-04j,  2.60073512e-02+0.00000000e+00j,
        -2.78434573e-03-1.85188775e-02j, -2.78434573e-03+1.85188775e-02j,
         1.62527806e-02+0.00000000e+00j,  2.69075291e-02+0.00000000e+00j,
         6.30093130e-02+0.00000000e+00j,  5.46656409e-02-1.35574428e-01j,
         5.46656409e-02+1.35574428e-01j,  9.39304522e-02+0.00000000e+00j,
         1.32289354e-01+4.60421969e-02j,  1.32289354e-01-4.60421969e-02j,
         3.18939134e-02+0.00000000e+00j,  6.87231038e-02+0.00000000e+00j,
         4.11003469e-02+2.82455719e-02j,  4.11003469e-02-2.82455719e-02j,
         2.29475290e-01+0.00000000e+00j],
       [-4.62827873e-01+0.00000000e+00j,  6.18076559e-02+0.00000000e+00j,
        -2.83415041e-01+0.00000000e+00j, -7.08823752e-02+0.00000000e+0

## Reprojecting the data

Finally, reproject the dataset using your eigenvectors. Reproject this dataset down to 2 dimensions.

In [26]:
transformed = (eigenvectors_sorted.T).dot(data).T
display(transformed[:2])
#orr....is it
eigenvectors_sorted[:2]

array([[-2.14909015+0.j        ,  3.2532187 +0.j        ,
         2.89317369+0.j        ,  3.44811278+0.j        ,
        -0.54521253+0.j        ,  2.5587528 +0.05504078j,
         2.5587528 -0.05504078j, -0.60856339+0.j        ,
         1.64481414-2.72069465j,  1.64481414+2.72069465j,
         0.67756818+0.j        , -0.24959474+0.j        ,
        -2.16961895+0.j        , -0.38474754+0.2969656j ,
        -0.38474754-0.2969656j , -1.45514943+0.j        ,
        -0.79505768+0.10077136j, -0.79505768-0.10077136j,
         2.7276102 +0.j        , -0.57178166+0.j        ,
         0.14431362-0.11511443j,  0.14431362+0.11511443j,
         0.90626083+0.j        ],
       [-1.87470138+0.j        , 19.56309358+0.j        ,
         7.48400313+0.j        , 26.65822159+0.j        ,
        -0.54521253+0.j        ,  2.5587528 +0.05504078j,
         2.5587528 -0.05504078j, -0.60856339+0.j        ,
         1.64481414-2.72069465j,  1.64481414+2.72069465j,
         0.67756818+0.j        , -0.24

array([[-4.02575805e-01+0.j        , -2.59200829e-01+0.j        ,
         1.02499678e-01+0.j        ,  3.14613753e-01+0.j        ,
        -4.99426956e-01+0.j        , -3.45822030e-04-0.00061462j,
        -3.45822030e-04+0.00061462j,  2.60073512e-02+0.j        ,
        -2.78434573e-03-0.01851888j, -2.78434573e-03+0.01851888j,
         1.62527806e-02+0.j        ,  2.69075291e-02+0.j        ,
         6.30093130e-02+0.j        ,  5.46656409e-02-0.13557443j,
         5.46656409e-02+0.13557443j,  9.39304522e-02+0.j        ,
         1.32289354e-01+0.0460422j ,  1.32289354e-01-0.0460422j ,
         3.18939134e-02+0.j        ,  6.87231038e-02+0.j        ,
         4.11003469e-02+0.02824557j,  4.11003469e-02-0.02824557j,
         2.29475290e-01+0.j        ],
       [-4.62827873e-01+0.j        ,  6.18076559e-02+0.j        ,
        -2.83415041e-01+0.j        , -7.08823752e-02+0.j        ,
         4.11131766e-01+0.j        ,  3.48130230e-02-0.00182494j,
         3.48130230e-02+0.00182494j,  

## Summary

Well done! You've now coded PCA on your own using NumPy! With that, it's time to look at further applications of PCA.