# Day 3: Linear Regression

We have learned how to minipulate a data set and how to make certain displays.
Today we will learn how to perform a linear regression.


First we need to load the helpful libraries of ```dplyr``` and ```ggplot2```.

```
library(dplyr)
library(ggplot2)
```



The data set we will look at is of Lawyers' ratings of state judges in the US Superior Court. Load the data set and use the "?" command to find out more about this data set.


```
data(USJudgeRatings)
?USJudgeRatings
```



Get a quick look at the data set by using the head command
```
head(USJudgeRatings)
```

We are going to try to see if there is an association between the ratings for Judicial Integrity and Demeanor. We can type

```
USJudgeRatings$INTG
```

each time we want to use the Judicial Integrity variable or we could make a shortcut for ourselves. Try this

```
JI <-USJudgeRatings$INTG 
```

Now every time we type JI our program will read it as USJudgeRatings$INTG

Make a shortcut for Demeanor
```
DM<-USJudgeRatings$DMNR
```


###### Now create a scatterplot of Judicial Integrity vs Demeanor of the judges. Notice where we used our new shortcuts!

```
plot(DM,JI,col = "blue",main = "Ratings",
cex = 1.3,pch = 16,xlab = "DEM",ylab = "JI")
```

Notice that this way of making a scatterplot is slightly different than the one we made on Day 1. One advantage of this method is that it is more customizable. Notice that you can easily change the title and labels to whatever you want. Change the title to *Ratings of Judicial Integrity vs Demeanor* and the labels to *Judicial Integrity* and *Demeanor* 

You can also change the color of the by typing in different colors.Try typing in a few colors to see what happens

To find a list of all the colors availible you can type
```
colors()
```

You can also change the size of the dots by changing the number after cex and the shape of the dots by changing the number after pch. Try adjusting them now

Now we will create linear modelby performing a simple linear regression.
```
linearMod <- lm(JI ~ DM )
```

By typing
```
print(linearMod)
```
we can see the regression line where Intercept is the y-intercept and the number under the variable is the coefficient of the variable.

We can find the correlation between Judicial Integrity and Demeanor by using this command
```
cor(JI,DM)
```

If we use the command
```
summary(linearMod)
```
We will get the model, R-squared, and a bunch of other interesting information that may come in handy later.

Now lets add the regression line to our scatterplot

```
plot(DM,JI,col = "blue",main = "Judicial Integrity vs Demeanor Regression",
abline(lm(JI~DM)),cex = 1.3,pch = 16, 
    xlab = "Demeanor Rating",ylab = "Judicial Integrity Rating")
```
The only real difference is the *abline(lm(JI~DM))* before the cex.


Much like our calculators keep a list of all the residuals after running a regression R does as well
```
resid(linearMod)
```

We can add this list to our data set through this command
```
USJudgeRatings <- USJudgeRatings %>%
  mutate(Residuals = resid(linearMod))
```


Now we can make a graph of the residuals
```
plot(DM,USJudgeRatings$Residuals,col = "blue",main = "Residual Plot",
cex = 1.3,pch = 16,xlab = "Demeanor",ylab = "Residuals")
```

Do you see a pattern in the residuals?

Now what if you had three new judges whose Demeanor ratings were 1, 7.3 and 10. What would you predict that their Judicial Integrity ratigns would be?
```
predict(linearMod,data.frame(DM = c(1,7.3,10)))
```

## Trees

Is there a relationship between the girth and height of a tree?
```
data(trees)
```

Find out a little bit about your data set. How many trees are there? What are the variables? 

Make shortcuts for the girth and height variables.

Create a scatterplot of girth vs height where girth is the explanatory variable. Make sure that it has a good title, good labels, very small points, unusually shaped points, and a unique color.

Find the equation of the line of best fit and explain in context what the slope and y-intercept tell us.

What is the correlation between girth and height? What is R-squared?


Make a scatterplot that includes the line of best fit. Make sure that the title and labels are correct.

Make a residual plot and comment on the appropriateness of the model.

Predict the height of trees with girths 5, 6, 7, 8, 9, 10, 11, 12, 13 and 15.7