# Lecture 17 LIVE

Topics for Today
- Recap on Pandas Dataframes
- PCA on the Wine Data Set

![wine-glasses](wine-glasses.jpg)

In [1]:
import numpy as np
import pandas as pd

wine_names = ['Class', 'Alcohol', 'Malic acid', 'Ash', 'Alcalinity of ash', 'Magnesium', 'Total phenols', 'Flavanoids', 'Nonflavanoid phenols', 'Proanthocyanins', 'Color intensity', 'Hue', 'OD280/OD315', 'Proline']
dataset_url = 'https://archive.ics.uci.edu/ml/machine-learning-databases/wine/wine.data'
wine_data = pd.read_csv('https://archive.ics.uci.edu/ml/machine-learning-databases/wine/wine.data', names = wine_names) 

## Checkout the Data

Use `wine_data.head()` and `wine_data.shape` to determine the number of samples and features.

In [3]:
wine_data.shape

(178, 14)

## Drop the Class Column

Use `wine_data.drop` to create a new dataframe `data` where the column "Class" is removed from `wine_data`

## Mean Centering the Data

Use the `data.mean()` command to compute `mean` of `data`

Define a new dataframe `B=data - mean`

## Exploratory Data Analysis

Use the `B.corr()` command to find correlations between features.

Identify two features that appear strongly correlated.

Use the code `B.plot.scatter(x = 'Feature 1', y = 'Feature 2')` to plot those correlated features on a scatter plot.

## Find Eigenvalues and Eigenvectors

Find the eigenvalues and eigenvectors of $\frac{1}{177}B^T B$

## Percent Variation of Data

Write a function that takes in an integer `n` and returns the percent variation in the data explained by the first n eigenvalues.

What do you believe is the "true" dimension of the data?

## Use Scikit-Learn's PCA and Plot

Use the code
```python
import matplotlib.pyplot as plt
from sklearn.decomposition import PCA

pca = PCA(n_components=2)
pca.fit(data)
```

To run PCA. 

Run `pca.components_` and `pca.explained_variance_ratio_` to get percent explained data. **Do these agree with the answers from above?**

### Create New DataFrame from Transformed Data

Use the code
```python
wine_pca = pca.transform(data)
wine_pca_df = pd.DataFrame(data = wine_pca, columns = ['PC1', 'PC2'])
```

To create a new data frame from the transformed data. Use the code above to create a scatter plot of the transformed data.

## 3D Plotting Bonus

Re-run the above code with 3 components and use the code below to plot the 3D projection.
```python
from mpl_toolkits.mplot3d import Axes3D
import matplotlib.pyplot as plt

fig = plt.figure()
ax = fig.add_subplot(111, projection='3d')

x=wine_pca_df['PC1']
y=wine_pca_df['PC2']
z=wine_pca_df['PC3']

ax.scatter(x,y,z)
```