You are the analytics director for Pitt football. Pitt is on defense facing a 2nd and 2. The defensive coordinator wants to put a coverage for the most probable play of the offense (i.e., pass -vs - run) and he asks for your help. You have collected data for the season tendencies of the opponent. The data include 3 columns:

*   Pass: 1 if the play was a pass and 0 otherwise
*   YardsToGo: how many yards the offense needs for a first down
*   Down: the down count of the play

Use these data to estimate the probability of a pass or a run for the offense in their current play (2nd down and 2 yards to go).

**Hint**: Consider the situation at hand (2nd and 2) as a more general situation of *second and short*, where the short represents situations with 4 or less yards to go. This will allow for more robust estimation, since coaches typically consider short, medium and long yardage to go instead of differentiating between 3 and 4 yards.



In [1]:
import pandas as pd
df = pd.read_csv("Bayesian_defensive_coordinator.csv")
df

Unnamed: 0,Pass,YardsToGo,Down
0,1,10,1
1,1,10,1
2,1,10,1
3,1,10,2
4,0,10,3
...,...,...,...
1020,0,14,3
1021,1,10,1
1022,1,10,2
1023,1,4,3


In the following cell we are going to start our calculation by estimating from the data the prior belief, $\Pr[pass]$, they give us on the offense passing.

In [2]:
# estimate from the data the prior belief of offense passing the ball Pr[pass]

p_pass_prior = df['Pass'].mean()
print(f"Pr[pass] = {p_pass_prior:.3f}")

Pr[pass] = 0.596


In the following cell we are going to calculate the conditional probability of the situation given the evidence; $\Pr[2nd\&short|pass]$.

In [3]:
# estimate the "evidence", i.e., Pr[2nd and short | pass]
# create a new column that indicates whether the data in the row fall under the 2nd and short situation

df['2nd_and_short'] = (df['Down'] == 2) & (df['YardsToGo'] <= 4).astype(int)
pr_evidence = df[df['Pass'] == 1]['2nd_and_short'].mean()
print(f"Pr[2nd and short | pass] = {pr_evidence:.3f}")

Pr[2nd and short | pass] = 0.056


In the following cell we are going to calculate the total probability of the evidence; $\Pr[2nd\&short]$.

In [4]:
# estimate the total probability of the "evidence'", i.e., Pr[2nd and short]

p_total_evidence = df['2nd_and_short'].mean()
print(f"Pr[2nd and short] = {p_total_evidence:.3f}")

Pr[2nd and short] = 0.075


We will now use the Bayes theorem to estimate the probability of the offense passing the ball in a 2nd and short situation as: $\Pr[pass|2nd\&short] = \dfrac{Pr[2nd\&short|pass]\cdot Pr[pass]}{Pr[2nd\&short]}$

In [5]:
# estimate the conditional probability Pr[pass| 2nd and short] using Bayes theorem

p = (pr_evidence * p_pass_prior) / p_total_evidence
print(f"Pr[pass | 2nd and short] = {p:.3f}")

Pr[pass | 2nd and short] = 0.442


You can practise more situations for yourself. E.g., what is the probability the offense runs the ball on a 3rd and 10 (falling under a 3rd and *long* situation, where *long* is considered 7 or more yards to go). You can also compare the results with those if you did not group the yards in short, medium, long, but rather used the exact yardage needed.