Some of the code in this lab was copied from the solution at https://github.com/learn-co-curriculum/dsc-pca-numpy-lab/tree/solution

# Performing Principal Component Analysis (PCA) - Lab

## Introduction

Now that you have a high level overview of PCA as well as some of the details of the algorithm itself, its time to practice implementing PCA on your own using the NumPy package. 

## Objectives

You will be able to:
    
* Implement PCA from scratch using NumPy

## Import the data

- Import the data stored in the file `'foodusa.csv'` (set `index_col=0`)
- Print the first five rows of the DataFrame 

In [20]:
import pandas as pd
data = pd.read_csv('foodusa.csv', index_col=0)
data.head()


Unnamed: 0_level_0,Bread,Burger,Milk,Oranges,Tomatoes
City,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1
ATLANTA,24.5,94.5,73.9,80.1,41.6
BALTIMORE,26.5,91.0,67.5,74.6,53.3
BOSTON,29.7,100.8,61.4,104.0,59.6
BUFFALO,22.8,86.6,65.3,118.4,51.2
CHICAGO,26.7,86.7,62.7,105.9,51.2


## Normalize the data

Next, normalize your data by subtracting the mean from each of the columns

In [21]:
data = data - data.mean()
data.head()

Unnamed: 0_level_0,Bread,Burger,Milk,Oranges,Tomatoes
City,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1
ATLANTA,-0.791304,2.643478,11.604348,-22.891304,-7.165217
BALTIMORE,1.208696,-0.856522,5.204348,-28.391304,4.534783
BOSTON,4.408696,8.943478,-0.895652,1.008696,10.834783
BUFFALO,-2.491304,-5.256522,3.004348,15.408696,2.434783
CHICAGO,1.408696,-5.156522,0.404348,2.908696,2.434783


## Calculate the covariance matrix

The next step is to calculate the covariance matrix for your normalized data. 

In [22]:
import numpy as np 

cov_mat = np.cov(data)
cov_mat

array([[ 1.65628796e+02,  1.53459426e+02, -2.57203025e+00,
        -7.59348781e+01, -2.18608129e+01,  4.07244915e+01,
        -7.98600520e+01, -7.62231172e+01, -7.46028563e+01,
        -6.99662911e+01, -1.49866824e+01,  2.32682958e+01,
         8.37080784e+01, -8.62530737e+01, -5.52929650e+01,
         1.10157306e+01,  2.85812524e+01, -7.15072259e+01,
        -6.07858346e+01,  3.09254045e+01, -8.54887807e+00,
         3.79245132e+01,  1.23158709e+02],
       [ 1.53459426e+02,  1.97233057e+02,  2.56101002e+01,
        -9.03397476e+01, -1.39991824e+01,  2.60341219e+01,
        -8.00284216e+01, -1.12873487e+02, -5.81142259e+01,
        -1.05876161e+02, -4.49475520e+01,  7.32042628e+00,
         9.08802089e+01, -7.04269433e+01, -3.82768346e+01,
         2.98243611e+01,  6.64678828e+01, -8.71905955e+01,
        -8.15837042e+01,  1.84315350e+01, -1.84792476e+01,
         5.16286437e+01,  1.35246339e+02],
       [-2.57203025e+00,  2.56101002e+01,  2.51341437e+01,
        -2.06072042e+01, -5.1

## Calculate the eigenvectors

Next, calculate the eigenvectors for your covariance matrix. 

In [23]:
import numpy as np
eig_values, eig_vectors = np.linalg.eig(cov_mat)

## Sort the eigenvectors 

Great! Now that you have the eigenvectors and their associated eigenvalues, sort the eigenvectors based on their eigenvalues to determine primary components!

In [24]:
# Get the index values of the sorted eigenvalues
e_indices = np.argsort(eig_values)[::-1]

# Sort 
eigenvectors_sorted = eig_vectors[:, e_indices]
eigenvectors_sorted

array([[-4.02575805e-01+0.j        , -2.59200829e-01+0.j        ,
         1.02499678e-01+0.j        ,  3.14613753e-01+0.j        ,
         5.44737112e-01+0.j        ,  1.30551421e-01+0.j        ,
        -4.24222247e-03+0.09645209j, -4.24222247e-03-0.09645209j,
        -3.84205099e-02+0.j        ,  9.03187001e-02-0.02153735j,
         9.03187001e-02+0.02153735j, -5.02815889e-02+0.j        ,
        -5.79642621e-02+0.06875171j, -5.79642621e-02-0.06875171j,
        -3.13898052e-02+0.02853895j, -3.13898052e-02-0.02853895j,
         5.85901779e-02+0.j        ,  9.18312925e-02+0.j        ,
        -6.30034513e-02+0.j        ,  5.49197065e-02+0.j        ,
         6.95933792e-02+0.j        , -2.18066030e-01+0.j        ,
        -2.69059353e-01+0.j        ],
       [-4.62827873e-01+0.j        ,  6.18076559e-02+0.j        ,
        -2.83415041e-01+0.j        , -7.08823752e-02+0.j        ,
        -3.51403252e-02+0.j        ,  9.82614646e-03+0.j        ,
         1.21321035e-03+0.07671175j,  

## Reprojecting the data

Finally, reproject the dataset using your eigenvectors. Reproject this dataset down to 2 dimensions.

In [1]:
# this cell of code was copied from the solution at https://github.com/learn-co-curriculum/dsc-pca-numpy-lab/tree/solution
eigenvectors_sorted[:2]

NameError: name 'eigenvectors_sorted' is not defined

## Summary

Well done! You've now coded PCA on your own using NumPy! With that, it's time to look at further applications of PCA.