# Effectiveness of the Mid-Range in the Playoffs: Exploratory Analysis
The NBA is an ever-changing landscape stylistically. The game today in 2023 might be unrecognizable to some as the same product it was even 10 years ago. We are now in a new era characterized by high-scoring, fast-paced, 3-pointer-heavy basketball, ushered in by the Golden State Warriors dynasty that began nearly 10 years ago. While not every single organization has adopted their exact style of play, the amount of three-pointers attempted across the league has been trending upwards for decades now.

With this new era has come endless debate about whether the league's gravitation towards the 3 point line and away from the mid-range is good for the game or not. However, the more interesting discussion is whether it's necessary: Can teams still win in the playoffs shooting a high amount of mid-range shots? Can they completely ditch the mid-range and get away with it? Or is there some in between range that is optimal? These are the main questions we aim to explore.

### What the Math Says
From a pure numbers perspective, the answer is obvious: on average, the mid-range shot is inefficient. NBA players have simply become too good at shooting threes, and do not shoot a high enough percentage from the mid-range  to make it an efficient shot compared to shots in the paint or from beyond the arc. Simply put: 3 is greater than 2. You only need to shoot 33.3% from deep to score 1 point per possession, whereas you need to shoot 50% to reach the same efficiency on mid-range shots.

For the purposes of illustrating this point, let's think of the NBA court into 5 zones where players attempt shots: the restricted area, paint, mid-range, corner 3, and the above the break 3. Of the 128 teams to appear in the postseason the last 8 years (16 teams each season), 125 of them shot at least 50% (1 point per possession) in the restricted area, ~98%. That's not surprising right? Shots closest to the rim are bound to be the easiest. In the paint (excluding the RA), 9 teams shot above 50%, just 7 percent. From the corner 3, 100 teams shot at least 33.3% (78% of all teams), and from the above the break 3, 76 teams shot at least 33.3% (59% of all teams). Now for the mid-range: Just 3 teams of 128 shot above 50% from the mid-range in the playoffs the last 8 years. It's by far been the least efficient of the 5 zones for NBA teams in recent memory.

One possible conclusion to draw here is that teams should start ignoring the mid-range. If a team has the personnel to generate a bunch of open or semi-open threes, they should take those shots, because on average they'll score more points per possession on threes (by a lot) than from the mid-range. There are actually several case studies for teams that actually did this, and we'll explore what went wrong for those teams later.

### The Counter-Argument

There are several problems with the statement 'shots from the mid-range are inefficient'. First off, a team's ability to score from different zones on the court will always depend on their personnel. Certain players are incredibly prolific from around 10-15 feet away from the basket, and just because on average those shots aren't great doesn't mean they're bad for those players.

Secondly, just because a zone is relatively inefficient to another zone doesn't mean it's necessarily a good idea to completely or even mostly eliminate that zone from your shot arsenal. When a team relies on threes a lot, it can make their offense predictable. Defenses can close out hard on the 3 point line because they know their opponent will be reluctant to take non-3-point jump shots.

Another consequence of being a 3-point heavy offense is that you open yourself up to 3-point variance, which we have seen affect playoff series time and again. Yes, 3-pointers are efficient, but they are harder to make and when you get cold from long range, it can be devastating in a short sample of games like a playoff series.

### Methodology

Data for every playoff team's shooting stats in each zone as well as their offensive rating going back to the 2015-16 season was pulled from nba.com/stats. I copy + pasted the html of the relevant tables into txt files, then opened them here in my python environment at which point I combined the data which was across 8 seasons into one data frame. This allows us to chart all 128 playoff teams we're interested in in a scatterplot, with the X axis being mid-range field goal attempts per game and the Y axis being offensive rating.


In [3]:
# There's two sets of tables imported in this cell, the zone stats for each team, and then their oRTG. These are done
# separately because the oRTGs are stored in a different section of the nba.com/stats website from the zone stats,
# so they weren't compatible with each other in the same iterable
from bs4 import BeautifulSoup

years = ['23', '22', '21', '20', '19', '18', '17', '16']

html_team_list = []
html_ortg_list = []

# Open team zone shooting data in the playoffs for all 8 years
for year in years:
    with open(f'/content/drive/MyDrive/yearly team midrange fga/{year}TeamMR.txt', 'r') as file:
        lines = file.readlines()
        html_team = ''.join(lines)
        html_team_list.append(html_team)
        soup = BeautifulSoup(html_team, 'html.parser')

# Open team offensive rating in the playoffs for all 8 years
for year in years:
    with open(f'/content/drive/MyDrive/yearly team ortg/{year}oRTG.txt', 'r') as file:
        lines = file.readlines()
        html_ortg = ''.join(lines)
        html_ortg_list.append(html_ortg)
        soup = BeautifulSoup(html_ortg, 'html.parser')


In [4]:
# Import zone stats in the playoffs data and putting it in 1 table

import pandas as pd

pd.set_option('display.max_columns', None)
pd.set_option('display.max_rows', None)
# Define a list of years and corresponding html objects
years = [2023, 2022, 2021, 2020, 2019, 2018, 2017, 2016]
htmls = html_team_list

df_list = []  # list to hold all modified dataframes

# Loop over each year and html
for year, html in zip(years, htmls):
    # Re-read the file to get the original DataFrame
    tables = pd.read_html(html)
    df_original = tables[0]

    # Simplify the column headers
    df_original.columns = ['_'.join(col).strip() for col in df_original.columns.values]


    # Select the columns we're interested in

    df_modified = df_original[['Unnamed: 0_level_0_Team', 'Mid-Range_FG%', 'Mid-Range_FGA', 'Corner 3_FG%','Corner 3_FGA','Above the Break 3._FG%', 'Above the Break 3._FGA', 'Restricted Area_FG%','Restricted Area_FGA', 'In The Paint (Non-RA)_FG%','In The Paint (Non-RA)_FGA']].copy()

    # Rename team name column
    df_modified.rename(columns={
        'Unnamed: 0_level_0_Team': 'Team_name',
    }, inplace=True)

    #Add year prefix to 'Team_name' column so we can the same team in different seasons apart
    df_modified['Team_name'] = df_modified['Team_name'].apply(lambda x: f'{year} ' + x)

    # Append the modified dataframe to the list
    df_list.append(df_modified)

 #Concatenate all dataframes in the list into one dataframe
df_final = pd.concat(df_list)

# Create a new column 'Total_FGA' that represents the sum of all 5 FGA columns
df_final['Total_FGA'] = df_final[['Mid-Range_FGA', 'Corner 3_FGA', 'Above the Break 3._FGA', 'Restricted Area_FGA', 'In The Paint (Non-RA)_FGA']].sum(axis=1)

# Create new columns to represent the percentage of total field goal attempts from each of the 5 zones
df_final['Percent of Total Midrange'] = df_final['Mid-Range_FGA'] / df_final['Total_FGA']
df_final['Percent of Total Corner 3'] = df_final['Corner 3_FGA'] / df_final['Total_FGA']
df_final['Percent of Total Above Break 3'] = df_final['Above the Break 3._FGA'] / df_final['Total_FGA']
df_final['Percent of Total Restricted Area'] = df_final['Restricted Area_FGA'] / df_final['Total_FGA']
df_final['Percent of Total In The Paint (Non-RA)'] = df_final['In The Paint (Non-RA)_FGA'] / df_final['Total_FGA']

# Display the final dataframe, sorted by mid-range FG% descending
df_sorted_mr = df_final.sort_values(by='Mid-Range_FG%', ascending=False)
df_sorted_mr.head()

Unnamed: 0,Team_name,Mid-Range_FG%,Mid-Range_FGA,Corner 3_FG%,Corner 3_FGA,Above the Break 3._FG%,Above the Break 3._FGA,Restricted Area_FG%,Restricted Area_FGA,In The Paint (Non-RA)_FG%,In The Paint (Non-RA)_FGA,Total_FGA,Percent of Total Midrange,Percent of Total Corner 3,Percent of Total Above Break 3,Percent of Total Restricted Area,Percent of Total In The Paint (Non-RA)
9,2021 Brooklyn Nets,51.5,13.6,36.1,5.1,38.8,30.3,61.4,20.5,45.5,16.5,86.0,0.15814,0.059302,0.352326,0.238372,0.19186
14,2021 Memphis Grizzlies,51.2,8.6,25.0,5.6,35.2,25.6,68.8,25.0,41.8,29.2,94.0,0.091489,0.059574,0.27234,0.265957,0.310638
8,2020 Dallas Mavericks,50.7,12.5,46.4,9.3,32.7,28.0,67.6,22.7,42.4,16.5,89.0,0.140449,0.104494,0.314607,0.255056,0.185393
11,2021 LA Clippers,49.6,11.8,36.1,10.1,38.8,26.2,64.8,21.1,45.5,12.7,81.9,0.144078,0.123321,0.319902,0.257631,0.155067
7,2020 Oklahoma City Thunder,49.5,13.6,32.8,8.3,33.3,27.9,59.1,23.4,34.9,11.9,85.1,0.159812,0.097532,0.32785,0.274971,0.139835


In [5]:
# This cell is for importing team oRTG
ortg_htmls = html_ortg_list

df_ortg_list = []  # list to hold all modified dataframes

for year, ortg_html in zip(years, ortg_htmls):
    tables = pd.read_html(ortg_html)  # use ortg_html instead of ortg_htmls
    df_ortg = tables[0]

    # Select only the desired columns
    df_ortg_modified = df_ortg[['TEAM', 'OffRtg']].copy()

    df_ortg_modified.rename(columns={
        'TEAM': 'Team_name',
    }, inplace=True)
    # Add year prefix to 'Team_name' column
    df_ortg_modified['Team_name'] = df_ortg_modified['Team_name'].apply(lambda x: f'{year} ' + x)

    # Append the modified dataframe to the list
    df_ortg_list.append(df_ortg_modified)

# Concatenate all dataframes in the list into one dataframe
df_ortg_final = pd.concat(df_ortg_list)

# Display the final dataframe
df_ortg_final.head()

Unnamed: 0,Team_name,OffRtg
0,2023 Denver Nuggets,118.2
1,2023 LA Clippers,116.3
2,2023 Phoenix Suns,116.1
3,2023 Boston Celtics,116.1
4,2023 Atlanta Hawks,114.3


In [6]:
# Here, we join the two data frames on the 'team_name' column, so all our data is in one place.
df_new = pd.merge(df_sorted_mr[['Team_name', 'Mid-Range_FGA']], df_ortg_final[['Team_name', 'OffRtg']], on='Team_name')

# Designate the teams in our data set that won the NBA championship and add an asterisk, so we can view them when we
# plot the data
championship_winners = ['2016 Cleveland Cavaliers', '2017 Golden State Warriors', '2018 Golden State Warriors',
                        '2019 Toronto Raptors', '2020 Los Angeles Lakers', '2021 Milwaukee Bucks',
                        '2022 Golden State Warriors', '2023 Denver Nuggets']

def add_asterisk(team_name):
    if team_name in championship_winners:
        return '*' + team_name
    else:
        return team_name

# Apply the function to the 'Team_name' column
df_new['Team_name'] = df_new['Team_name'].apply(add_asterisk)
df_new.head()

Unnamed: 0,Team_name,Mid-Range_FGA,OffRtg
0,2021 Brooklyn Nets,13.6,115.9
1,2021 Memphis Grizzlies,8.6,115.0
2,2020 Dallas Mavericks,12.5,112.1
3,2021 LA Clippers,11.8,119.2
4,2020 Oklahoma City Thunder,13.6,101.7


In [7]:
import plotly.graph_objects as go
import plotly.express as px

# Create a function to determine the color based on the 'Team_name' column
def get_color(team_name):
    return 'gold' if team_name.startswith('*') else 'blue'

# Apply the function to create a new 'color' column
df_new['color'] = df_new['Team_name'].apply(get_color)

# Create a scatter plot with a trendline
fig = px.scatter(df_new, x='Mid-Range_FGA', y='OffRtg', trendline="ols")

# Add the colored points separately
fig.add_trace(go.Scatter(x=df_new['Mid-Range_FGA'], y=df_new['OffRtg'], mode='markers',
                         marker=dict(color=df_new['color']),
                         hovertext=df_new['Team_name']))

fig.update_traces(marker=dict(size=12, line=dict(width=2)), selector=dict(mode='markers'))

fig.update_layout(
    autosize=False,
    width=800,
    height=600
)

fig.show()

# Takeaways From This Chart
It's important to understand that this scatterplot is not meant to claim a strong relationship between mid-range FGA and OffRtg. There are a great many variables that affect a team's offensive rating outside of how many mid-range shots they take, so mid-range_FGA should (and does) only explain a very small percentage of the change in OffRtg. The factors that influence a team's shooting zones are nuanced and highly personnel-based, and this relationship is obviously more complex than a direct one-to-one correlation.

However, we can observe by hovering over the OLS trendline that there is a slight negative correlation here: For each additional mid-range attempt per game, we can expect a team's offensive rating to decrease by 0.316. Even more interesting is that when we look at the data points of teams that have won championships, they are all in somewhat the same area. No team that has attempted less than 10 mid-range shots per game has won a championship in the last 8 years, and only one team has won shooting more than 20 mid-range shots: The 2018 Warriors at 20.4.

This suggests that there is a "sweet spot" as far as optimal number of mid-range shots to take; not too few but not too many.

An inherent shortcoming of this data: Not every team in this chart played the same amount of games. Some teams made it to the finals or conference finals and played 15+ games, others got swept in the 1st round and only played 4 games. Obviously, the bigger the sample size the better, so some of our data points hold more significance than others, which has the potential to skew our results a bit.

To somewhat address this issue, let's filter our 128 teams down to the 16 teams that played in the NBA finals. This way, we have a larger sample of teams than just 8, while also controlling for games played. Every one of these 16 teams played in exactly 4 playoff series, and although there is still some variability in the exact number of games played, filtering out the teams that lost in the first or second round should significantly reduce that variation.

In [8]:
# Designate the teams in our data set that were NBA championship runners-up, and add a symbol so we can view them when we
# plot the data
championship_second = [
    "2016 Golden State Warriors",
    "2017 Cleveland Cavaliers",
    "2018 Cleveland Cavaliers",
    "2019 Golden State Warriors",
    "2020 Miami Heat",
    "2021 Phoenix Suns",
    "2022 Boston Celtics",
    "2023 Miami Heat"
]

def add_asterisk2(team_name):
    if team_name in championship_second:
        return '^' + team_name
    else:
        return team_name

# Apply the function to the 'Team_name' column
df_new['Team_name'] = df_new['Team_name'].apply(add_asterisk2)
df_new.loc[df_new['Team_name'] == '^2023 Los Angeles Lakers', 'Team_name'] = '#2023 Los Angeles Lakers'

In [9]:
# Create a function to determine the color based on the 'Team_name' column
def get_colorconf(team_name):
    if team_name.startswith('*'):
        return 'gold'
    elif team_name.startswith('^'):
        return 'green'
    else:
        return 'blue'


# Apply the function to create a new 'color' column
df_new['color'] = df_new['Team_name'].apply(get_colorconf)


# Create a scatter plot with a trendline
fig = px.scatter(df_new, x='Mid-Range_FGA', y='OffRtg', trendline="ols")

# Add the colored points separately
fig.add_trace(go.Scatter(x=df_new['Mid-Range_FGA'], y=df_new['OffRtg'], mode='markers',
                         marker=dict(color=df_new['color']),
                         hovertext=df_new['Team_name']))

fig.update_traces(marker=dict(size=12, line=dict(width=2)), selector=dict(mode='markers'))

fig.update_layout(
    autosize=False,
    width=800,
    height=600
)

fig.show()

In this chart, teams that lost in the finals are shown in green, teams that won the finals are gold, and everyone else is blue. Interesting! Even when we add in the finals runners-up teams, our data points all still cluster in the 10-20 midrange FGA range. Let's remove all our blue data points to get a better look at our 16 teams.

In [10]:
# Filter the DataFrame to exclude blue data points
filtered_df = df_new[~df_new['color'].eq('blue')]
filtered_df.head()

# Create a scatter plot with a trendline
fig = px.scatter(filtered_df, x='Mid-Range_FGA', y='OffRtg')

# Add the colored points separately
fig.add_trace(go.Scatter(x=filtered_df['Mid-Range_FGA'], y=filtered_df['OffRtg'], mode='markers',
                         marker=dict(color=filtered_df['color']),
                         hovertext=filtered_df['Team_name']))

fig.update_traces(marker=dict(size=12, line=dict(width=2)), selector=dict(mode='markers'))

# Add a red rectangle
fig.add_shape(
    type="rect",
    x0=9,
    y0=107,
    x1=21,
    y1=121,
    fillcolor="red",
    opacity=0.3,
    layer="below"
)

fig.update_layout(
    autosize=False,
    width=800,
    height=600,
    xaxis=dict(range=[0, 30]),  # Set x-axis range
    yaxis=dict(range=[100, 125])  # Set y-axis range
)

fig.show()

All of the teams that made the finals since 2016 took at least 9.7 mid-range shots per game, and no team took more than 20.4. Does this mean that this range is the "sweet spot" that teams should be looking to live in? Not necessarily. It's possible and even likely that our 16 teams are in this range because their mid-range FGA stabilized there due to a larger sample of games. It's very easy for teams that got bounced in the 1st round (which comprise 64 of our 128 total data points) to be outliers, and shoot either a very large or very small amount of mid-range shots.

16 teams is still a pretty small sample size, so let's expand it one more time, by adding the teams that lost in the conference finals each year. This will give us 32 teams.

In [11]:
# Designate the teams in our data set that lost in the conference finals, and add a symbol so we can view them when we
# plot the data
confinals_losers = [
    "2016 Oklahoma City Thunder",
    "2016 Toronto Raptors",
    "2017 Boston Celtics",
    "2017 San Antonio Spurs",
    "2018 Boston Celtics",
    "2018 Houston Rockets",
    "2019 Portland Trail Blazers",
    "2019 Milwaukee Bucks",
    "2020 Boston Celtics",
    "2020 LA Clippers",
    "2021 Atlanta Hawks",
    "2021 LA Clippers",
    "2022 Dallas Mavericks",
    "2022 Miami Heat",
    "2023 LA Lakers",
    "2023 Boston Celtics"
]

def add_asterisk3(team_name):
    if team_name in confinals_losers:
        return '#' + team_name
    else:
        return team_name

# Apply the function to the 'Team_name' column
df_new['Team_name'] = df_new['Team_name'].apply(add_asterisk3)

def get_colorconf2(team_name):
    if team_name.startswith('*'):
        return 'gold'
    elif team_name.startswith('^'):
        return 'green'
    elif team_name.startswith('#'):
        return 'orange'
    else:
        return 'blue'

df_new['colorconf'] = df_new['Team_name'].apply(get_colorconf2)

conf_df = df_new[~df_new['colorconf'].eq('blue')]

# Create a scatter plot with a trendline
fig = px.scatter(conf_df, x='Mid-Range_FGA', y='OffRtg')

# Add the colored points separately
fig.add_trace(go.Scatter(x=conf_df['Mid-Range_FGA'], y=conf_df['OffRtg'], mode='markers',
                         marker=dict(color=conf_df['colorconf']),
                         hovertext=conf_df['Team_name']))

fig.update_traces(marker=dict(size=12, line=dict(width=2)), selector=dict(mode='markers'))

fig.update_layout(
    autosize=False,
    width=800,
    height=600,
    xaxis=dict(range=[0, 30]),  # Set x-axis range
    yaxis=dict(range=[100, 125])  # Set y-axis range
)

fig.show()

# The Outliers & What Went Wrong

Now we've finally got a few outliers, although 28 of 32 teams still fall within the rough range of 10-20 mid range attempts a game. Among our four outliers, only the 2016 Raptors (led by mid-range savant DeMar DeRozan) took significantly more than 20 attempts a game, and they had by far the worst offense in this group of teams, with a 101.7 offensive rating. On the other side of the extreme, we have 3 teams that abandoned the mid-range somewhat but still made the conference finals: The 2018 James Harden-led Rockets, last year's 2023 Celtics, and the Mavericks in 2022.

Most basketball fans remember the 2018 Rockets for their infamous 27 consecutive missed three pointers in game 7 against the Warriors, and their legacy is a cautionary tale about what can happen when you rely on the three point line excessively. In high-pressure, closely contested basketball games - especially towards the later stages of playoff series' when fatigue has set in, it becomes incredibly difficult to make threes on high volume. Everyone is banged up and tired, and players find it hard to maintain consistency in their process when raising up to shoot from distance. When a large percentage of your three point attempts come from one high-usage superstar (think James Harden, Jayson Tatum, Luka Doncic), you're even more likely to become victim to 3-pointer variance.

In 2018, the Rockets shot 31.4% from 3 against the Warriors in the conference finals series that they lost in 7 games. Harden shot 24.4% (19 for 78!). They shot 15.9% in game 7.

In 2023, the Celtics shot 30.3% from 3 against the Heat in the conference finals series that they lost in 7 games. Tatum shot 23.4% and Jaylen Brown shot 16.3%.

Would these teams have won those series if they had leaned slightly more into the mid-range, and had a more balanced shot profile? It's at the very least clear that despite what the math says about the mid-range, it's very hard to get away with completely cutting it off. 3-point-heavy superstars simply aren't able to shoot the same percentages in the playoffs (not you, Steph Curry), and teams with well-rounded shot profiles will be less susceptible to such catastrophic shooting performances.

# What's Next?

What does a 'well rounded' shot profile actually look like? This is the final question we're going to attempt to answer today. We are now not only interested in the optimal amount of mid-range attempts, but from other zones too. Luckily, we already have percentage of total field goal attempts from the other four zones in our data set. The goal here is to find a shot profile that corresponds to the highest possible offensive rating.

We'll iterate through all 32 teams to reach at least the conference finals, and for each one adjust our 'optimal' shot profile accordingly. To start off with, we'll consider the optimal shot profile just the average from each zone amongst all 32 teams.

In [12]:
df_merged = pd.merge(df_sorted_mr[['Team_name', 'Percent of Total Midrange','Percent of Total Corner 3','Percent of Total Above Break 3','Percent of Total Restricted Area','Percent of Total In The Paint (Non-RA)']], df_ortg_final[['Team_name', 'OffRtg']], on='Team_name')
df_merged['Team_name'] = df_merged['Team_name'].apply(add_asterisk2)
df_merged['Team_name'] = df_merged['Team_name'].apply(add_asterisk)
df_merged['Team_name'] = df_merged['Team_name'].apply(add_asterisk3)
filtered_df2 = df_merged[df_merged['Team_name'].str.contains('[\^*#]')]


# Calculate the mean of each 'percent of' column
mr_avg = filtered_df2['Percent of Total Midrange'].mean()
c3_avg = filtered_df2['Percent of Total Corner 3'].mean()
ab3_avg = filtered_df2['Percent of Total Above Break 3'].mean()
ra_avg = filtered_df2['Percent of Total Restricted Area'].mean()
paint_avg = filtered_df2['Percent of Total In The Paint (Non-RA)'].mean()

#These five variables are our averages from each zone

filtered_df2.head()

Unnamed: 0,Team_name,Percent of Total Midrange,Percent of Total Corner 3,Percent of Total Above Break 3,Percent of Total Restricted Area,Percent of Total In The Paint (Non-RA),OffRtg
3,#2021 LA Clippers,0.144078,0.123321,0.319902,0.257631,0.155067,119.2
11,*2018 Golden State Warriors,0.23804,0.074679,0.288215,0.273046,0.126021,112.7
12,*2022 Golden State Warriors,0.130233,0.088372,0.343023,0.188372,0.25,114.5
17,*2023 Denver Nuggets,0.131855,0.082847,0.263711,0.263711,0.257876,118.2
19,^2021 Phoenix Suns,0.209906,0.088443,0.259434,0.247642,0.194575,113.2


In [13]:
# Calculate the average of the "OffRtg" column
average_offrtg = filtered_df2["OffRtg"].mean()

filtered_df2["ScaledOffRtg"] = filtered_df2["OffRtg"] / average_offrtg

# Stretch out each value in the "OffRtg" column by a factor of 3
#stretch it out: ortg = 3*(scaled_ortg-1) + 1 OR 3*scaled_ortg - 2

filtered_df2["StretchedOffRtg"] = (filtered_df2["ScaledOffRtg"]) * 3 - 2

filtered_df2.head()



A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy



A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy



Unnamed: 0,Team_name,Percent of Total Midrange,Percent of Total Corner 3,Percent of Total Above Break 3,Percent of Total Restricted Area,Percent of Total In The Paint (Non-RA),OffRtg,ScaledOffRtg,StretchedOffRtg
3,#2021 LA Clippers,0.144078,0.123321,0.319902,0.257631,0.155067,119.2,1.060316,1.180947
11,*2018 Golden State Warriors,0.23804,0.074679,0.288215,0.273046,0.126021,112.7,1.002496,1.007489
12,*2022 Golden State Warriors,0.130233,0.088372,0.343023,0.188372,0.25,114.5,1.018508,1.055524
17,*2023 Denver Nuggets,0.131855,0.082847,0.263711,0.263711,0.257876,118.2,1.05142,1.154261
19,^2021 Phoenix Suns,0.209906,0.088443,0.259434,0.247642,0.194575,113.2,1.006944,1.020832


# Shot Profile Optimization

So far, we've scaled our offensive rating around 1.0 such that teams with an above average OffRtg will have a 'StretchedOffRtg' value greater than 1.0, and those with a below average OffRtg will have a value lower than 1.0. Logically, we want teams whose shot profiles performed well (StretchedOffRtg > 1.0) to positively impact our final percentages. For example: take the 2023 Denver Nuggets, who had a 118.2 OffRtg and a 1.15 StretchedOffRtg value. Here are their percentages from each zone, compared to our initial averages:


In [14]:
# Filter the row for the *2023 Denver Nuggets
nuggets_row = filtered_df2[filtered_df2['Team_name'] == '*2023 Denver Nuggets']

# Extract the 5 percentage columns
percentages = nuggets_row[['Percent of Total Midrange',
                          'Percent of Total Corner 3',
                          'Percent of Total Above Break 3',
                          'Percent of Total Restricted Area',
                          'Percent of Total In The Paint (Non-RA)']]

# Setting the index to the Team_name
percentages.index = ['2023 Denver Nuggets']

# Adding the average values
averages = pd.DataFrame({
    'Percent of Total Midrange': [mr_avg],
    'Percent of Total Corner 3': [c3_avg],
    'Percent of Total Above Break 3': [ab3_avg],
    'Percent of Total Restricted Area': [ra_avg],
    'Percent of Total In The Paint (Non-RA)': [paint_avg]
}, index=['Averages'])

# Concatenating the percentages and averages
final_table = pd.concat([percentages, averages])

# Displaying the final table
final_table

Unnamed: 0,Percent of Total Midrange,Percent of Total Corner 3,Percent of Total Above Break 3,Percent of Total Restricted Area,Percent of Total In The Paint (Non-RA)
2023 Denver Nuggets,0.131855,0.082847,0.263711,0.263711,0.257876
Averages,0.164355,0.099165,0.291172,0.283164,0.162144


As you can see, the Nuggets took less midrange, corner 3, above break 3, and restricted area attempts than the averages, but way more paint attempts. This makes sense, as Nikola Jokic loves that 5-10 foot range for his post game and floaters. Since their shot profile was successful (above average OffRtg), we want our optimal percentage for paint attempts to increase significantly, while the other four zones it decreases slightly. Then, we want to repeat this process for the other 31 teams in our data set. But what should we do for teams that were unsuccessful?

Let's take for example the Toronto Raptors of 2016 that were mentioned earlier:

In [15]:
# Filter the row for the #2016 Toronto Raptors
raptors_row = filtered_df2[filtered_df2['Team_name'] == '#2016 Toronto Raptors']

# Extract the 5 percentage columns for the Raptors
percentages_raptors = raptors_row[['Percent of Total Midrange',
                                   'Percent of Total Corner 3',
                                   'Percent of Total Above Break 3',
                                   'Percent of Total Restricted Area',
                                   'Percent of Total In The Paint (Non-RA)']]

# Setting the index to the Team_name
percentages_raptors.index = ['2016 Toronto Raptors']

# Adding the average values
averages = pd.DataFrame({
    'Percent of Total Midrange': [mr_avg],
    'Percent of Total Corner 3': [c3_avg],
    'Percent of Total Above Break 3': [ab3_avg],
    'Percent of Total Restricted Area': [ra_avg],
    'Percent of Total In The Paint (Non-RA)': [paint_avg]
}, index=['Averages'])

# Concatenating the Raptors percentages and averages
final_table = pd.concat([percentages_raptors, averages])

# Displaying the final table
final_table

Unnamed: 0,Percent of Total Midrange,Percent of Total Corner 3,Percent of Total Above Break 3,Percent of Total Restricted Area,Percent of Total In The Paint (Non-RA)
2016 Toronto Raptors,0.277641,0.079853,0.201474,0.283784,0.157248
Averages,0.164355,0.099165,0.291172,0.283164,0.162144


If you remember, the Raptors had a putrid 101.7 offensive rating, despite reaching the conference finals and eventually losing to Lebron. They took ~75% more mid-range attempts than the average team in this data set did, and way less above the break threes. Since this strategy was unsuccessful, our optimal percentages will move in the opposite direction. The optimal midrange percentage will decrease significantly, and optimal above the break 3 percentage will increase significantly. The other 3 zones are pretty close to the averages, so those won't change too much.

Now, how do we practically write an algorithm to update our values this way? It's not too complicated. Let's assume that midrange_avg is our initial midrange percentage, and example_midrange is the percentage for the specific team we're optimizing for (i.e. 27.8% for the 2016 Raptors)

If StretchedOffRtg >= 1.0, midrange_avg = midrange_avg + (team average - midrange_avg) * (StretchedOffRtg - weight)
If StretchedOffRtg < 1.0, midrange_avg = midrange_avg - (team average - midrange_avg) * (StretchedOffRtg - weight)

What we're doing here is updating our midrange average by the difference between a specific team's midrange average and the initial average. Then we multiply by the scaled offensive rating, so that teams with offensive ratings further from the mean have a bigger impact on our results. A specific weight is subtracted so that our final percentages aren't unnoticeabely different from the initial averages, but also don't change too drastically.

Obviously, this is just for the midrange. We need to repeat this process for the other four zones as well, and then do THAT for all 32 teams. However, this is the core of each operation in this algorithm. It's codified below:

In [16]:
#CHECK INSIDE INDIVIDUAL TEAM CODE BLOCK
#list with initial means
initial_percentages = [mr_avg, c3_avg, ab3_avg, ra_avg, paint_avg]
final_mr = 0
final_c3 = 0
final_ab3 = 0
final_ra = 0
final_paint = 0
#list with final optimal percentages, so initial ones don't get overwritten
final_percentages = [final_mr, final_c3, final_ab3, final_ra, final_paint]

# Simulate a specific team_name for demonstration purposes
specific_team_name = "*2023 Denver Nuggets"	  # Replace this with the actual team_name you're interested in

# Loop through each team in the DataFrame
for index, row in filtered_df2.iterrows():
    team_name = row["Team_name"]
    stretched_offrtg = row["StretchedOffRtg"]

    if team_name == specific_team_name:
        print(f"Processing team: {team_name}, StretchedOffRtg: {stretched_offrtg}")

        if stretched_offrtg >= 1.0:
            final_mr = mr_avg + (row["Percent of Total Midrange"] - mr_avg)*(stretched_offrtg-0.6)
            final_c3 = c3_avg + (row["Percent of Total Corner 3"] - c3_avg)*(stretched_offrtg-0.6)
            final_ab3 = ab3_avg + (row["Percent of Total Above Break 3"] - ab3_avg)*(stretched_offrtg-0.6)
            final_ra = ra_avg + (row["Percent of Total Restricted Area"] - ra_avg)*(stretched_offrtg-0.6)
            final_paint = paint_avg + (row["Percent of Total In The Paint (Non-RA)"] - paint_avg)*(stretched_offrtg-0.6)
        else:
            final_mr = mr_avg - (row["Percent of Total Midrange"] - mr_avg)*(stretched_offrtg-0.6)
            final_c3 = c3_avg - (row["Percent of Total Corner 3"] - c3_avg)*(stretched_offrtg-0.6)
            final_ab3 = ab3_avg - (row["Percent of Total Above Break 3"] - ab3_avg)*(stretched_offrtg-0.6)
            final_ra = ra_avg - (row["Percent of Total Restricted Area"] - ra_avg)*(stretched_offrtg-0.6)
            final_paint = paint_avg - (row["Percent of Total In The Paint (Non-RA)"] - paint_avg)*(stretched_offrtg-0.6)
        # Update the final_percentages list
        final_percentages = [final_mr, final_c3, final_ab3, final_ra, final_paint]

print("Initial Average Percentages:", initial_percentages)

# Filter the row for the *2023 Denver Nuggets
sample_row = filtered_df2[filtered_df2['Team_name'] == specific_team_name]

# Extract the 5 percentage columns for the Nuggets
sample_percentages = sample_row[['Percent of Total Midrange',
                                   'Percent of Total Corner 3',
                                   'Percent of Total Above Break 3',
                                   'Percent of Total Restricted Area',
                                   'Percent of Total In The Paint (Non-RA)']]

# Converting the DataFrame row to a list
sample_percentages_list = sample_percentages.values.tolist()[0]

# Displaying the list
print("Specific Team Percentages", sample_percentages_list)
print("Final Percentages:", final_percentages)

Processing team: *2023 Denver Nuggets, StretchedOffRtg: 1.154261119081779
Initial Average Percentages: [0.1643553980478369, 0.09916548911968831, 0.29117158393751486, 0.28316372778452137, 0.16214380111043863]
Specific Team Percentages [0.13185530921820304, 0.08284714119019836, 0.2637106184364061, 0.2637106184364061, 0.25787631271878647]
Final Percentages: [0.14634186244286682, 0.09012086333472337, 0.27595103846780417, 0.27238162562761475, 0.2152046101269909]


By altering the specific_team_name variable above, you can view the impact of any of the 32 teams on the final result. With our Nuggets example, the initial paint average was 16.2%. The Nuggets paint percentage was 25.8%, and the final paint percentage was 21.5%. This makes sense, since the Nuggets had a significantly better offensive rating than the average team in this data set (1.15 StretchedOffRtg value). Now, we just need to repeat this process for all 32 teams, and we'll have our final optimal shot profile:

In [17]:
# FINAL PRODUCT CODE BLOCK
# Loop through each team in the DataFrame
for index, row in filtered_df2.iterrows():
    stretched_offrtg = row["StretchedOffRtg"]

    if stretched_offrtg >= 1.0:
        final_mr = mr_avg + (row["Percent of Total Midrange"] - mr_avg)*(stretched_offrtg-0.6)
        final_c3 = c3_avg + (row["Percent of Total Corner 3"] - c3_avg)*(stretched_offrtg-0.6)
        final_ab3 = ab3_avg + (row["Percent of Total Above Break 3"] - ab3_avg)*(stretched_offrtg-0.6)
        final_ra = ra_avg + (row["Percent of Total Restricted Area"] - ra_avg)*(stretched_offrtg-0.6)
        final_paint = paint_avg + (row["Percent of Total In The Paint (Non-RA)"] - paint_avg)*(stretched_offrtg-0.6)
    else:
        final_mr = mr_avg - (row["Percent of Total Midrange"] - mr_avg)*(stretched_offrtg-0.6)
        final_c3 = c3_avg - (row["Percent of Total Corner 3"] - c3_avg)*(stretched_offrtg-0.6)
        final_ab3 = ab3_avg - (row["Percent of Total Above Break 3"] - ab3_avg)*(stretched_offrtg-0.6)
        final_ra = ra_avg - (row["Percent of Total Restricted Area"] - ra_avg)*(stretched_offrtg-0.6)
        final_paint = paint_avg - (row["Percent of Total In The Paint (Non-RA)"] - paint_avg)*(stretched_offrtg-0.6)

    # Update the final_percentages list
    final_percentages = [final_mr, final_c3, final_ab3, final_ra, final_paint]

# Calculating the differences
difference_percentages = [final - initial for initial, final in zip(initial_percentages, final_percentages)]

# Creating a DataFrame with the initial, final, and difference percentages
percentages_table = pd.DataFrame({
    'Percent of Total Midrange': [initial_percentages[0]*100, final_percentages[0]*100, difference_percentages[0]*100],
    'Percent of Total Corner 3': [initial_percentages[1]*100, final_percentages[1]*100, difference_percentages[1]*100],
    'Percent of Total Above Break 3': [initial_percentages[2]*100, final_percentages[2]*100, difference_percentages[2]*100],
    'Percent of Total Restricted Area': [initial_percentages[3]*100, final_percentages[3]*100, difference_percentages[3]*100],
    'Percent of Total In The Paint (Non-RA)': [initial_percentages[4]*100, final_percentages[4]*100, difference_percentages[4]*100]
}, index=['Initial Percentages', 'Final Percentages', 'Difference'])

# Formatting the values with a percentage sign
percentages_table = percentages_table.applymap(lambda x: f'{x:.2f}%')

# Displaying the final table
percentages_table

Unnamed: 0,Percent of Total Midrange,Percent of Total Corner 3,Percent of Total Above Break 3,Percent of Total Restricted Area,Percent of Total In The Paint (Non-RA)
Initial Percentages,16.44%,9.92%,29.12%,28.32%,16.21%
Final Percentages,16.59%,10.13%,28.29%,26.39%,18.60%
Difference,0.15%,0.21%,-0.83%,-1.93%,2.39%


# What Does It All Mean?

Before we interpret these results, it's important to really hammer home the point I made earlier: optimal shot profiles will always depend first and foremost on the personnel of each team. Nikola Jokic excels from 5-10 feet, so the Nuggets took a lot of shots from that area, and were successful. Would this work on teams that don't have Nikola Jokic on them? Probably not. Should all teams shift some of their restricted area attempts to be paint attempts instead? Not necessarily.

However, the fact that our final percentages favor paint attempts over restricted area attempts indicate to me that being efficient from that 5-10 foot range is an extremely valuable skillset in the playoffs. The teams that have players who excel in that range will take more attempts, and the fact that there's a 2.4% positive difference between our initial and final paint percentage shows that these teams generally have more successful offenses. However, there aren't many players with the skillset to thrive in this range, which is why the inital percentage of attempts teams shoot from that zone is a bit low.

Everyone wants to protect the restricted area in the playoffs. Easy layups and dunks need to be prevented at all costs, and perhaps that's why that short mid-range/paint area opens up a bit. Funnily enough, the midrange zone has the smallest difference of all 5 zones between initial and final percentages, indicating that teams are generally shooting the "correct" amount of mid-range shots. Regarding our initial question, teams really can't get away with completely cutting the mid-range out of their diet, despite what the math says. In reality, it simply has not worked yet. However, if the right team one day is built with the right personnel, that could change.