# 6. Linear Model
To evaluate the gunners' performance after the ball is caught, I will be adding the *squeezeDis*, *tackle*, and *missedTackle* variables to a linear model. The previous variables discussed will still be included because they will have an impact on return yards. This model will attempt to predict how many return yards a gunner will give up in the event that the ball is returned.

## 6.1 Imports

In [1]:
library(tidyverse)
library(car)
library(MASS)
library(here)

source(here("R", "00_source.R"))

specialistData <- read.csv(here("data", "specialist_data.csv"), 
                           na.strings = c('NA', NA, '', ' '))

# subset data to only returns
returns <- specialistData[which(specialistData$specialTeamsResult == 'Return'), ]

── [1mAttaching packages[22m ─────────────────────────────────────── tidyverse 1.3.1 ──

[32m✔[39m [34mggplot2[39m 3.3.5     [32m✔[39m [34mpurrr  [39m 0.3.4
[32m✔[39m [34mtibble [39m 3.1.6     [32m✔[39m [34mdplyr  [39m 1.0.8
[32m✔[39m [34mtidyr  [39m 1.2.0     [32m✔[39m [34mstringr[39m 1.4.0
[32m✔[39m [34mreadr  [39m 2.1.2     [32m✔[39m [34mforcats[39m 0.5.1

── [1mConflicts[22m ────────────────────────────────────────── tidyverse_conflicts() ──
[31m✖[39m [34mdplyr[39m::[32mfilter()[39m masks [34mstats[39m::filter()
[31m✖[39m [34mdplyr[39m::[32mlag()[39m    masks [34mstats[39m::lag()

Loading required package: carData


Attaching package: 'car'


The following object is masked from 'package:dplyr':

    recode


The following object is masked from 'package:purrr':

    some



Attaching package: 'MASS'


The following object is masked from 'package:dplyr':

    select


here() starts at C:/Users/Hunter



## 6.2 Model Creation
To help select the best model, I used forward stepwise selection to pick out the features with the most predictive power. These turned out to be *timeToBeatVise*, *disFromReturner*, *speedDev*, *squeezeDis* , *disFromLOS*, and *missedTackle*.

In [2]:
fullModel <- lm(returnYds ~ timeToBeatVise + disFromLOS + disFromReturner + topSpeed + squeezeDis + missedTackle + speedDev + release + correctRelease,
                data = na.omit(returns))

nullModel <- lm(returnYds ~ 1, data = na.omit(returns))

stepAIC(nullModel, scope = list(lower = nullModel, upper = fullModel), k = 2, direction = "forward")

Start:  AIC=8153.49
returnYds ~ 1

                  Df Sum of Sq    RSS    AIC
+ timeToBeatVise   1   11187.1 190730 8058.1
+ disFromReturner  1    6510.5 195406 8099.5
+ squeezeDis       1    5211.4 196706 8110.8
+ speedDev         1    1516.5 200400 8142.6
+ disFromLOS       1     484.1 201433 8151.4
+ correctRelease   1     427.5 201489 8151.9
+ topSpeed         1     376.3 201541 8152.3
<none>                         201917 8153.5
+ missedTackle     1      74.2 201843 8154.9
+ release          1      22.7 201894 8155.3

Step:  AIC=8058.13
returnYds ~ timeToBeatVise

                  Df Sum of Sq    RSS    AIC
+ squeezeDis       1    3792.3 186938 8025.8
+ disFromReturner  1    3327.7 187402 8030.1
+ speedDev         1    3099.6 187630 8032.1
+ topSpeed         1    1648.4 189081 8045.3
+ disFromLOS       1     878.1 189852 8052.3
+ correctRelease   1     324.9 190405 8057.2
<none>                         190730 8058.1
+ missedTackle     1     189.7 190540 8058.4
+ release        


Call:
lm(formula = returnYds ~ timeToBeatVise + squeezeDis + disFromReturner + 
    speedDev + disFromLOS + missedTackle, data = na.omit(returns))

Coefficients:
    (Intercept)   timeToBeatVise       squeezeDis  disFromReturner  
       -18.8580           0.7298          -0.1848           0.3467  
       speedDev       disFromLOS     missedTackle  
         3.4478           0.1945           2.0188  


In [3]:
fSelectedAIC <- lm(returnYds ~ timeToBeatVise + disFromReturner + speedDev + 
                     squeezeDis + disFromLOS + missedTackle, 
                   data = na.omit(returns))

## 6.3 Model Evaluation

### 6.3.1 Checking For Collinearity
The model selected both *disFromReturner* and *disFromLOS*. Due to these measurements being similar in nature, they may have collinearity issues. We can see if this is an issue by measuring each variables' Variable Inflation Factor (VIF). Since each variables' VIF is close to one, we can say that there are no collinearity issues with any of the variables in the model. 

In [4]:
car::vif(fSelectedAIC)

### 6.3.2 Summarizing Model
Due to their low p-values, each variable appears to be highly significant. However, the model's adjusted $R^{2}$ is very low at 0.1235. This means that the model can only account for 12.35% of the variability in the data. The low adjusted $R^{2}$ combined with the significant variables tells me that the variables selected have predictive power, but more variables need to be included in the model to improve accuracy.

In [5]:
summary(fSelectedAIC)


Call:
lm(formula = returnYds ~ timeToBeatVise + disFromReturner + speedDev + 
    squeezeDis + disFromLOS + missedTackle, data = na.omit(returns))

Residuals:
    Min      1Q  Median      3Q     Max 
-25.501  -5.266  -1.324   3.188  83.181 

Coefficients:
                 Estimate Std. Error t value Pr(>|t|)    
(Intercept)     -18.85802    2.48533  -7.588 5.33e-14 ***
timeToBeatVise    0.72978    0.08684   8.403  < 2e-16 ***
disFromReturner   0.34669    0.04056   8.549  < 2e-16 ***
speedDev          3.44783    0.68928   5.002 6.26e-07 ***
squeezeDis       -0.18482    0.03315  -5.576 2.86e-08 ***
disFromLOS        0.19452    0.04285   4.540 6.03e-06 ***
missedTackle      2.01879    0.97751   2.065   0.0391 *  
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Residual standard error: 10.18 on 1701 degrees of freedom
Multiple R-squared:  0.1265,	Adjusted R-squared:  0.1235 
F-statistic: 41.07 on 6 and 1701 DF,  p-value: < 2.2e-16


## 6.4 Conclusion
I do not believe the model performs well enough to be implemented by front offices. However, all of the variables selected are to be significant. These variables could be used by themselves to gauge gunner performance and guide decisions.