Find file Copy path
Fetching contributors…
Cannot retrieve contributors at this time
103 lines (64 sloc) 3.47 KB
title: "Homework 7"
pdf_document: default
css: ../lab.css
highlight: pygments
theme: cerulean
author: MA 276, Skidmore College
## Overview
In this lab, we'll practice implementing logistic regression to estimate the probability of successful NBA shots. We'll also link to shot-level probabilities and expected points. Before we do anything, we have to load and clean the data, as in Lab 6.
```{r, eval = FALSE}
url <- getURL("")
nba.shot <- read.csv(text = url)
nba.shot <- na.omit(nba.shot)
nba.shot <- filter(nba.shot, PTS <4, SHOT_DIST>=22 |PTS_TYPE==2)
## Expected Points
All else being equal, what's the most efficient shot in the NBA?
In our lab, we characterized by points type using the following code:
```{r, eval = FALSE}
tally(SHOT_RESULT ~ PTS_TYPE, data = nba.shot, format = "proportion")
Of course, all two-point shots are not created equal. Using the cut command, we split two-pointers by distance into different groups, labeled `D1` to `D7`, in order from shortest to longest and grouped by shot type (2 or 3 points). The two data sets, `nba.two` and `nba.three` contain the two and three-pointers, respectively.
```{r, eval = FALSE}
nba.two <- nba.shot %>%
filter(PTS_TYPE == 2) %>%
mutate( = cut(SHOT_DIST, breaks = c(-100, 3, 6, 12, 100),
labels = c("D1", "D2", "D3", "D4")))
nba.three <- nba.shot %>%
filter(PTS_TYPE == 3) %>%
mutate( = cut(SHOT_DIST, breaks = c(0, 23, 25, 100),
labels = c("D5", "D6", "D7")))
tally(SHOT_RESULT ~, data = nba.two, format = "proportion")
tally(SHOT_RESULT ~, data = nba.three, format = "proportion")
### Question 1
In order from best (highest expected points) to worst (lowest), order the categories D1 to D7.
### Question 2
Using code from our last lab, identify of expected points are higher on two or three point shots taken by Rajon Rondo.
### Question 3
Here's are two models of shot success (note that we re-bind all of the shots together).
```{r, eval = FALSE}
nba.shot2 <- rbind(nba.two, nba.three)
fit.1 <- glm(SHOT_RESULT == "made" ~ SHOT_DIST + TOUCH_TIME +
data = nba.shot2, family = "binomial")
fit.2 <- glm(SHOT_RESULT == "made" ~ + TOUCH_TIME +
data = nba.shot2, family = "binomial")
Using the AIC criteria, which is the preferred fit of shot success? Is it close?
### Question 4
Using `fit.2`, estimate the increased odds of a made shot given a one-unit increase in closest defender distance. Then, estimate the increased odds of a made shot given a ten-unit increase in closest defender distance.
### Question 5
Add game location (`LOCATION`) to `fit.2`. Does this improve the fit? Is the coefficient for this term statistically and/or practically significant? What does that suggest?
### Question 6
Does it make sense to add if the shooter's team was victorious (variable `W`) or margin of victory (`FINAL_MARGIN`) to the model? Why or why not? You do not need to run any code to answer this.
### Question 7
Using Seth's article [here]( and referencing the charts shown, explain Goodhart's law as it applies to statistics in the NBA.