# Calculation of Basketball Scoring Probabilities

## Dataset Overview

The NBA Player Shot dataset contains information about shots taken during the 2022-2023 basketball season 
by LeBron James, James Harden, and Steph Curry. This project will specifically be analyzing the statistics 
for Steph Curry.

## Dataset Source

https://www.kaggle.com/datasets/dhavalrupapara/nba-2023-player-shot-dataset

## Dataset Columns

top: The vertical position on the court where the shot was taken.

left: The horizontal position on the court where the shot was taken.

date: The date when the shot was taken. (e.g., Oct 18, 2022)

qtr: The quarter in which the shot was attempted, typically represented as "1st Qtr," "2nd Qtr," etc.

time_remaining: The time remaining in the quarter when the shot was attempted, typically displayed as minutes and seconds (e.g., 09:26).

result: Indicates whether the shot was successful, with "TRUE" for a made shot and "FALSE" for a missed shot.

shot_type: Describes the type of shot attempted, such as a "2" for a two-point shot or "3" for a three-point shot.

distance_ft: The distance in feet from the hoop to where the shot was taken.

lead: Indicates whether the team was leading when the shot was attempted, with "TRUE" for a lead and "FALSE" for no lead.

player_team_score: Steph Curry's team's score (in points) when the shot was taken.

opponent_team_score: The opposing team's score (in points) when the shot was taken.

opponent: The abbreviation for the opposing team (e.g., LAL for Los Angeles Lakers).

team: The abbreviation for Steph Curry's team (e.g., GSW for Golden State Warriors).

season: The season in which the shots were taken, indicated as the year (e.g., 2023).

color: Represents the color code associated with the shot, which may indicate shot outcomes or other characteristics (e.g., "red" or "green").



## Project Instructions

There are four sections of this project to complete. 
Below are directions to assist with each section. 
Note "Made/Make" means "Was successful with."

## Overall Shooting Statistics

On this slide, calculate and report the:

- Overall probability of a make (result = TRUE)
- Overall probability of a miss (result = FALSE)
- Overall proportion of three-pointers (shot_type = 3)
- Overall proportion of two-pointers (shot_type = 2)

## Probabilities

Given the results you found on the previous slide:

- What is the probability of Steph making 3 of the next 4 shots?
- What is the probability that 4 of the next 5 shots are three-pointers?
- Also, Give the assumptions associated with the Binomial Distribution.

## Conditional Probabilities - Future

On this slide, use previously calculated probabilities to calculate and report the following conditional probabilities, which will require Bayes Theorem:

- If the next shot Steph shoots is a three-pointer...
    - What is the probability he makes it?
    - What is the probability it was taken while his team had the lead (lead = True)?
- If the next shot Steph shoots is a two-pointer...
    - What is the probability he makes it?
    - What is the probability it was taken while his team had the lead (lead = True)?

## Conditional Probabilities - Past

On this slide, calculate the following conditional probabilities. If Steph just made a shot...

- What is the probability that it was a three-pointer?
- What is the probability that it was a two-pointer?

Note: For these retrospective (past-conditional) probabilities, one could obtain the answers by direct counting, but compute them using Bayes' theorem and show your steps.

Hint: First map each verbal description to notation. For example:

- "made three-pointer": P(Made,3)
- "three-pointer given a make": P(3∣Made)
- "make": P(Made)

Once labeled, it’s clear which terms to plug into Bayes' theorem.



In [1]:
# imports
import math
import pandas as pd

## Overview of the data

In [2]:
df = pd.read_csv('stephen_curry_shots_2023.csv')
df.head()

Unnamed: 0,top,left,date,qtr,time_remaining,result,shot_type,distance_ft,lead,player_team_score,opponent_team_score,opponent,team,season,color
0,63,300,"Oct 18, 2022",1st Qtr,7:27,False,2,6,True,9,6,LAL,GSW,2023,red
1,133,389,"Oct 18, 2022",1st Qtr,7:22,True,2,17,True,11,6,LAL,GSW,2023,green
2,326,247,"Oct 18, 2022",1st Qtr,7:11,False,3,27,True,11,6,LAL,GSW,2023,red
3,249,89,"Oct 18, 2022",1st Qtr,5:16,False,3,25,True,19,13,LAL,GSW,2023,red
4,282,158,"Oct 18, 2022",1st Qtr,3:52,False,3,24,True,22,17,LAL,GSW,2023,red


In [3]:
# count missing values in the dataframe
missing_values = df.isnull().sum()
print(missing_values) # no missing values

top                    0
left                   0
date                   0
qtr                    0
time_remaining         0
result                 0
shot_type              0
distance_ft            0
lead                   0
player_team_score      0
opponent_team_score    0
opponent               0
team                   0
season                 0
color                  0
dtype: int64


In [4]:
total = len(df) # number of rows in dataframe
print(total)
print()

1434



## Overall Shooting Statistics (Part 1):

- Overall probability of a make (result = TRUE)
- Overall probability of a miss (result = FALSE)

In [5]:
makes = df['result'].value_counts(normalize=True).get(True, 0)
misses = df['result'].value_counts(normalize=True).get(False, 0)

In [6]:
print(f"Overall probability of a make: {makes:.4f}")
print(f"Overall probability of a miss: {misses:.4f}")
print()

Overall probability of a make: 0.4902
Overall probability of a miss: 0.5098



## Overall Shooting Statistics (Part 2):

- Overall proportion of three-pointers (shot_type = 3)
- Overall proportion of two-pointers (shot_type = 2)

In [7]:
threes = df['shot_type'].value_counts(normalize=True).get(3, 0)
twos = df['shot_type'].value_counts(normalize=True).get(2, 0)

In [8]:
print(f"Overall proportion of three-pointers: {threes:.4f}")
print(f"Overall proportion of two-pointers: {twos:.4f}")
print()

Overall proportion of three-pointers: 0.5481
Overall proportion of two-pointers: 0.4519



## Probabilities (Part 1):

- What is the probability of Steph making 3 of the next 4 shots?

In [9]:
# https://en.wikipedia.org/wiki/Combination
def combination(n, k):
    return math.factorial(n) / (math.factorial(k) * math.factorial(n - k))

In [10]:
# Probability of making 3 of the next 4 shots
n_shots = 4
k_makes = 3
# Binomial Distribution Formula: n! / ((k! * (n − k)!) * p**k * (1 − p)**(n − k))
prob_make_3_of_4 = combination(n_shots, k_makes) * (makes**k_makes) * ((1 - makes)**(n_shots - k_makes))

In [11]:
print(f"Probability of Steph making 3 of the next 4 shots: {prob_make_3_of_4:.4f}")
print()

Probability of Steph making 3 of the next 4 shots: 0.2402



## Probabilities (Part 2):

- What is the probability that 4 of the next 5 shots are three-pointers?

In [12]:
# Probability that 4 of the next 5 shots are three-pointers
n_shots = 5
k_threes = 4
# Binomial Distribution Formula: n! / ((k! * (n − k)!) * p**k * (1 − p)**(n − k))
prob_4_of_5_three_pointers = combination(n_shots, k_threes) * (threes**k_threes) * ((1 - threes)**(n_shots - k_threes))

In [13]:
print(f"Probability that 4 of the next 5 shots are three-pointers: {prob_4_of_5_three_pointers:.4f}")
print()

Probability that 4 of the next 5 shots are three-pointers: 0.2039



## The Probability Calculation Results Above Assume:

  - Each shot is independent
  - Each shot represents one of only two types: 
      - make or miss
      - 2-pointer or 3-pointer
  - The probability of each of the 2 outcomes is the same for each shot

## 

## Conditional Probabilities - Future (Part 1):

- If the next shot Steph shoots is a three-pointer...
    - What is the probability he makes it?
    - What is the probability it was taken while his team had the lead (lead = True)?

In [14]:
three_pointer_df = df[df['shot_type'] == 3]

prob_make_three_pointer = three_pointer_df['result'].value_counts(normalize=True).get(True, 0)
prob_three_pointer_with_lead = three_pointer_df['lead'].value_counts(normalize=True).get(True, 0)

print(f"Probability of making a three-pointer: {prob_make_three_pointer:.4f}")
print(f"Probability of a three-pointer being taken while the team had the lead: {prob_three_pointer_with_lead:.4f}")
print()

Probability of making a three-pointer: 0.4186
Probability of a three-pointer being taken while the team had the lead: 0.4936



## Conditional Probabilities - Future (Part 2):

- If the next shot Steph shoots is a two-pointer...
    - What is the probability he makes it?
    - What is the probability it was taken while his team had the lead (lead = True)?

In [15]:
two_pointer_df = df[df['shot_type'] == 2]

prob_make_two_pointer = two_pointer_df['result'].value_counts(normalize=True).get(True, 0)
prob_two_pointer_with_lead = two_pointer_df['lead'].value_counts(normalize=True).get(True, 0)

print(f"Probability of making a two-pointer: {prob_make_two_pointer:.4f}")
print(f"Probability of a two-pointer being taken while the team had the lead: {prob_two_pointer_with_lead:.4f}")
print()

Probability of making a two-pointer: 0.5772
Probability of a two-pointer being taken while the team had the lead: 0.5154



## Conditional Probabilities - Past (Part 1)

Calculate the following conditional probability. If Steph just made a shot...

- What is the probability that it was a three-pointer?

In [16]:
made_shots_df = df[df['result'] == True]

prob_three_pointer_given_make = made_shots_df['shot_type'].value_counts(normalize=True).get(3, 0)

print(f"Conditional probability of a made shot being a three-pointer: {prob_three_pointer_given_make:.4f}")
print()

Conditional probability of a made shot being a three-pointer: 0.4680



## Conditional Probabilities - Past (Part 2)

Calculate the following conditional probability. If Steph just made a shot...

- What is the probability that it was a two-pointer?

In [17]:
prob_two_pointer_given_make = made_shots_df['shot_type'].value_counts(normalize=True).get(2, 0)

print(f"Conditional probability of a made shot being a two-pointer: {prob_two_pointer_given_make:.4f}")
print()

Conditional probability of a made shot being a two-pointer: 0.5320

