# Principal Components

## Sample data set mtcars (motor cars)

In [None]:
head(mtcars)

We will fit a model, but we will use the principal components of the following variables: **mpg**, **disp**, **drat**, **wt** and **qsec**

Note **scale** will normalize the ranges of the different input variables
**validation="CV"** is telling the model to use cross validation.

In [None]:
library(pls)
model = pcr(hp~mpg+disp+drat+wt+qsec, data=mtcars, scale=TRUE, validation="CV")
summary(model)

### RMSE is root of mean squared error.  It is a measure of our prediction error.

In [None]:
validationplot(model)
validationplot(model, val.type="MSEP")
validationplot(model, val.type="R2")

In [None]:
pc = read.csv("example_PCA.csv")
head(pc)

In [None]:
options(repr.plot.width=20, repr.plot.height=15)
library(ggplot2)
pc[pc$race == "not reported", 2] = "unknown"
ggplot(pc, aes(x=eig1maf5, y=eig2maf5, color=race))  + geom_point(size=3)

## In-Class Exercise: PCA on the Iris Dataset

The iris dataset has 4 numeric measurements. Use PCA to reduce dimensionality and visualize species separation.

### Part 1: Basic PCA
1. Load the iris dataset and select only the numeric columns (columns 1-4)
2. Run PCA using `prcomp(iris[,1:4], scale=TRUE)` - scaling is important when variables have different units
3. Use `summary()` to see how much variance each principal component explains
4. How many components do you need to explain 95% of the variance?

### Part 2: Visualization
1. Extract the first two principal components from the PCA result (`pca$x[,1:2]`)
2. Create a scatter plot of PC1 vs PC2, colored by Species
3. Can you visually separate the three species using just these two components?

### Part 3: Interpretation
1. Look at the loadings (`pca$rotation`) - which original variables contribute most to PC1?
2. Create a biplot using `biplot(pca)` to visualize both samples and variable loadings
3. What does this tell you about which measurements best distinguish the species?

**Hint:** Use `prcomp()` for basic PCA - it's simpler than the `pls` package for exploration

In [None]:
# Part 1: Run PCA on iris
data(iris)
pca <- prcomp(iris[,1:4], scale=TRUE)
summary(pca)

# Part 2: Visualization
# Your code here - create a scatter plot of PC1 vs PC2


# Part 3: Interpretation
# Your code here - examine loadings and create biplot