# Analyzing the Impact of Stephen Curry on 3-Point Shooting

## Introduction

The goal of this project is to analyze the impact of Stephen Curry on the NBA's adoption of the 3-point shot. We will explore how the league's 3-point attempt rate (3PAR) has changed over the years and how Curry's individual performance compares to the league average.

Data Source:

*   Basketball-Reference.com: [https://www.basketball-reference.com/](https://www.basketball-reference.com/)

## Data Cleaning

### Cleaning League Averages

This file cleans the raw data for league averages by removing unnecessary columns, handling missing values, and calculating the 3PAR for each season.

In [12]:
import pandas as pd

# Loading the data
df = pd.read_csv('league_averages.csv', header=None)

# Setting the second row as the header
df.columns = df.iloc[1]

# Removing the first two rows (old header and now-duplicate column names)
df = df.iloc[2:]

# Selecting only the columns we need
df = df[['Season', '3P', '3PA', '3P%', 'FGA']]

# Converting 'Season' to numeric for easier filtering
def convert_season(season):
  try:
    return int(season.split('-')[0])  # Extract the first year of the season
  except:
    return None  # For seasons like '1950-51' that don't fit the pattern

df['Season'] = df['Season'].apply(convert_season)
df.dropna(subset=['Season'], inplace=True)  # Remove rows with invalid seasons

# Filtering data for seasons after 1979 (when the 3-point line was introduced)
df = df[df['Season'] >= 1979]

# Removing the '2024-25' season as it just started.
df = df[df['Season'] < 2024]  

# Converting columns to numeric
df['3P'] = pd.to_numeric(df['3P'])
df['3PA'] = pd.to_numeric(df['3PA'])
df['3P%'] = pd.to_numeric(df['3P%'])
df['FGA'] = pd.to_numeric(df['FGA'])

# Saving the cleaned DataFrame to a new CSV file
df.to_csv('cleaned_league_averages.csv', index=False)

### Cleaning Steph Curry Averages

This file cleans the raw data for Curry averages by removing unnecessary columns, handling missing values, and calculating the 3PAR for each season. I removed the 2019 season as Curry player only 5 games due to an injury.

In [13]:
import pandas as pd

# Loading the data
df = pd.read_csv('steph_curry_averages.csv', header=0)  

# Selecting the columns we need
df = df[['Season', '3P', '3PA', '3P%', 'FGA']]

# Converting 'Season' to numeric for easier filtering
def convert_season(season):
  try:
    return int(season.split('-')[0]) 
  except:
    return None  # For seasons like '1950-51' that don't fit the pattern

df['Season'] = df['Season'].apply(convert_season)
df.dropna(subset=['Season'], inplace=True)  # Remove rows with invalid seasons
df['Season'] = df['Season'].astype(int)  # Convert to integer explicitly
# Remove the '2024-25' season
df = df[df['Season'] < 2024]  # Keep seasons before 2024
df = df[df['Season'] != 2019] # Remove 2019 season as Curry played only 5 games due to injury.

# Converting columns to numeric
df['3P'] = pd.to_numeric(df['3P'])
df['3PA'] = pd.to_numeric(df['3PA'])
df['3P%'] = pd.to_numeric(df['3P%'])
df['FGA'] = pd.to_numeric(df['FGA'])

# Reverse the order of the DataFrame based on the 'Season' column
df = df.sort_values(by='Season', ascending=False)

# Saving the cleaned DataFrame to a new CSV file
df.to_csv('cleaned_steph_curry_averages.csv', index=False)

## Data analysis

In [5]:
import pandas as pd
import altair as alt
from scipy import stats

In [14]:
# Loading the cleaned datasets
league_avg_df = pd.read_csv('cleaned_league_averages.csv')
curry_df = pd.read_csv('cleaned_steph_curry_averages.csv')

# Merging the DataFrames on the 'Season' column
merged_df = pd.merge(league_avg_df, curry_df, on='Season', suffixes=('_league', '_curry'))

# Calculating 3PAR for the league and Curry
merged_df['3PAR_league'] = merged_df['3PA_league'] / merged_df['FGA_league']
merged_df['3PAR_curry'] = merged_df['3PA_curry'] / merged_df['FGA_curry']

In [15]:
# 1. Line chart for 3PAR over seasons
line_chart = alt.Chart(merged_df).mark_line(point=True).encode(  # Add point markers
    x='Season:O',
    y=alt.Y('3PAR_league:Q', title='3-Point Attempt Rate (3PAR)'),
    color=alt.value('blue'),
    tooltip=['Season', '3PAR_league']
).properties(
    title='League 3PAR Over Seasons'
)

line_chart_curry = alt.Chart(merged_df).mark_line(point=True).encode(  # Add point markers
    x='Season:O',
    y=alt.Y('3PAR_curry:Q', title='3-Point Attempt Rate (3PAR)'),
    color=alt.value('orange'),
    tooltip=['Season', '3PAR_curry']
).properties(
    title='Curry 3PAR Over Seasons'
)

final_line_chart = line_chart + line_chart_curry

# Adding a legend interactively
final_line_chart = final_line_chart.interactive()

final_line_chart.display()

In [16]:
# 2. Scatter plot for Curry's 3PAR vs. League's 3PAR
scatter_plot = alt.Chart(merged_df).mark_point().encode(
    x=alt.X('3PAR_curry:Q', title="Curry's 3PAR"),
    y=alt.Y('3PAR_league:Q', title="League's 3PAR"),
    tooltip=['Season', '3PAR_curry', '3PAR_league']
).properties(
    title="Curry's 3PAR vs. League's 3PAR"
).interactive()
scatter_plot

In [17]:
# Calculating the correlation matrix
correlation_matrix = merged_df.corr()

# Extracting the correlation coefficient for the `3PAR_league` and `3PAR_curry` columns
correlation_coefficient = correlation_matrix.loc['3PAR_league', '3PAR_curry']

# Performing linear regression analysis
slope, intercept, r_value, p_value, std_err = stats.linregress(merged_df['3PAR_curry'], merged_df['3PAR_league'])

# Print the results
print(f"Correlation coefficient: {correlation_coefficient:.3f}")
print("Linear Regression:")
print(f"  Slope: {slope:.3f}")
print(f"  Intercept: {intercept:.3f}")
print(f"  R-squared: {r_value**2:.3f}")
print(f"  P-value: {p_value:.3f}")
print(f"  Standard error: {std_err:.3f}")

Correlation coefficient: 0.912
Linear Regression:
  Slope: 0.634
  Intercept: -0.013
  R-squared: 0.832
  P-value: 0.000
  Standard error: 0.082


The correlation coefficient is 0.917, indicating a strong positive relationship between Curry's 3PAR and the league's 3PAR.

The linear regression analysis shows that Curry's 3PAR is a statistically significant predictor of the league's 3PAR, with an R-squared value of 0.841.

Although we can't be absolutely certain Steph Curry caused the entire NBA to shoot more threes, we can reasonably assume that he played a large impact in showing how valuable the 3PT shot can be, speeding up the process of teams learning to emphasize it more.


In [18]:
# Calculating the difference in 3-point percentage between Curry and the league average for each season
merged_df['3P%_diff'] = merged_df['3P%_curry'] - merged_df['3P%_league']

# Printing the first 5 rows of the dataframe
print(merged_df.head().to_markdown(index=False, numalign="left", stralign="left"))

| Season   | 3P_league   | 3PA_league   | 3P%_league   | FGA_league   | 3P_curry   | 3PA_curry   | 3P%_curry   | FGA_curry   | 3PAR_league   | 3PAR_curry   | 3P%_diff   |
|:---------|:------------|:-------------|:-------------|:-------------|:-----------|:------------|:------------|:------------|:--------------|:-------------|:-----------|
| 2023     | 12.8        | 35.1         | 0.366        | 88.9         | 4.8        | 11.8        | 0.408       | 19.5        | 0.394826      | 0.605128     | 0.042      |
| 2022     | 12.3        | 34.2         | 0.361        | 88.3         | 4.9        | 11.4        | 0.427       | 20.2        | 0.387316      | 0.564356     | 0.066      |
| 2021     | 12.4        | 35.2         | 0.354        | 88.1         | 4.5        | 11.7        | 0.38        | 19.1        | 0.399546      | 0.612565     | 0.026      |
| 2020     | 12.7        | 34.6         | 0.367        | 88.4         | 5.3        | 12.7        | 0.421       | 21.7        | 0.391403      | 0.

In [19]:
# Bar chart of the difference in 3-point percentage
chart = alt.Chart(merged_df).mark_bar().encode(
    x='Season:O',
    y=alt.Y('3P%_diff:Q', title='Difference in 3-Point Percentage'),
    color=alt.value('purple'),
    tooltip=['Season', '3P%_diff']
).properties(
    title='Difference in 3-Point Percentage (Curry - League)'
).interactive()

# Save the chart
chart.save('difference_in_3_point_percentage_bar_chart.json')

# Display the chart
chart.display()

Stephen Curry's exceptional 3-point shooting percentage is even more impressive considering the high volume and difficulty of his self-created shots.