https://www.jphwang.com/posts/nba-shot-data-analytics-visualization-with-python-pandas-and-matplotlib-part-2-grouping-data-by-area/

for hexabin tutorial


https://medium.com/@amitparikh41/college-basketball-shot-mapping-with-python-23f543528b5f

shot chart mapping

Step 1: Setting Up the Environment and Helper Functions

We'll start with helper functions to fetch and save the list of players, get team abbreviations, and categorize shots.

Understanding MAE in the Context of Basketball Shooting

MAE is a metric that measures the average magnitude of errors between predicted values and actual values, without considering their direction. In our context, MAE is used to measure the difference between the shooting efficiencies of two players across various areas on the court.
Step 1: Define Common Areas on the Court

    The court is divided into different areas such as "Left Corner 3," "Top of Key," "Paint," "Mid-Range," etc.
    For each player, their shot efficiency is calculated in these predefined areas. Shot efficiency is simply the ratio of successful shots to total attempts from that area.

Step 2: Calculate Individual Player Efficiencies

    For each player, we fetch their shot data for a season and categorize their shots into these predefined areas.
    We then calculate the shooting efficiency for each player in each of these areas. For example, Player A might have a 45% efficiency from "Left Corner 3" and 55% from "Top of Key."

Step 3: Identify Common Areas Between Players

    When comparing two players, we first identify the common areas where both players have taken shots. This is crucial because MAE can only be calculated in areas where both players have data.
    For example, if Player A and Player B both have shot data from "Left Corner 3" and "Top of Key," these areas are considered common areas.

Step 4: Calculate the MAE

    For Each Common Area:
        Calculate the absolute difference between Player A's efficiency and Player B's efficiency.
        For example, if Player A has a 45% efficiency in "Left Corner 3" and Player B has 35%, the absolute difference is |45% - 35%| = 10%.
    Overall MAE:
        Sum up the absolute differences for all common areas and divide by the number of common areas to get the MAE. This gives an overall measure of how similar or different the players' shooting efficiencies are across the court.

Step 5: Interpret the MAE for Compatibility

    High MAE (More Compatibility):
        A higher MAE indicates that the players excel in different areas of the court. For example, if Player A is highly efficient from "Left Corner 3" but not from "Top of Key," while Player B is the opposite, the MAE between them will be high.
        This is interpreted as more compatibility because the players complement each other by being good at different areas on the court, meaning they won’t crowd each other or compete for the same shots during a game.
    Low MAE (Less Compatibility):
        A lower MAE means the players have similar efficiencies in the same areas. This could indicate they are not as complementary because they might prefer shooting from the same spots, potentially leading to reduced spacing and efficiency when both are on the court together.
        This is interpreted as less compatibility because both players might end up taking similar shots from the same areas, leading to potential redundancy.

Step 6: Categorize Compatibility

    Based on the calculated MAE, we can categorize the player pairings:
        If the MAE is above a certain threshold (which could be the average MAE across all player pairs), the players are deemed "Compatible."
        If the MAE is below this threshold, they are deemed "Not Compatible."

In [156]:
%%writefile ../src/shot_chart/nba_helpers.py

import requests
import pandas as pd
import os
from nba_api.stats.static import players, teams
from nba_api.stats.endpoints import shotchartdetail
from nba_api.stats.endpoints import commonallplayers
import numpy as np

def fetch_and_save_players_list():
    """Fetches the list of all players for specified seasons and saves it to a CSV file."""
    seasons = ["2023-24", "2022-23", "2021-22"]
    all_players = commonallplayers.CommonAllPlayers(is_only_current_season=0).get_data_frames()[0]

    players_data = []
    for season in seasons:
        for _, player in all_players.iterrows():
            players_data.append({
                'id': player['PERSON_ID'],
                'full_name': player['DISPLAY_FIRST_LAST'],
                'season': season
            })

    df = pd.DataFrame(players_data).drop_duplicates()
    df.to_csv('data/shot_chart_data/players_list.csv', index=False)

def load_players_list(season):
    """Loads the list of players for a specific season from a CSV file."""
    file_path = 'data/shot_chart_data/players_list.csv'
    if not os.path.exists(file_path):
        fetch_and_save_players_list()
    
    players_df = pd.read_csv(file_path)
    return players_df[players_df['season'] == season]

def get_team_abbreviation(team_name):
    """Gets the team abbreviation for a given team name."""
    team_dictionary = teams.get_teams()
    team_info = [team for team in team_dictionary if team['full_name'] == team_name]
    if not team_info:
        raise ValueError(f"No team found with name {team_name}")
    return team_info[0]['abbreviation']

def categorize_shot(row, debug=False):
    """Categorizes a shot based on its location with optional debugging.
    
    Args:
        row (pd.Series): A row of shot data containing 'LOC_X' and 'LOC_Y'.
        debug (bool): If True, logs detailed information about shots that don't fit into known categories.
    
    Returns:
        tuple: A tuple containing the area and distance category of the shot.
               Returns ('Unknown', 'Unknown') for shots that don't fit into known categories when debug=False.
    """
    x, y = row['LOC_X'], row['LOC_Y']
    distance_from_hoop = np.sqrt(x**2 + y**2)

    if distance_from_hoop > 300:  # Over 30 ft
        return 'Backcourt', 'Beyond 30 ft'
    elif distance_from_hoop > 240:  # 24-30 ft
        if x < -80:
            return 'Deep 3 Left', '24-30 ft'
        elif x > 80:
            return 'Deep 3 Right', '24-30 ft'
        else:
            return 'Deep 3 Center', '24-30 ft'
    elif y > 237.5:
        if x < -80:
            return 'Left Corner 3', '24+ ft'
        elif x > 80:
            return 'Right Corner 3', '24+ ft'
        else:
            return 'Left Wing 3' if x < 0 else 'Right Wing 3', '24+ ft'
    elif y > 142.5:
        if x < -80:
            return 'Left Wing 3', '24+ ft'
        elif x > 80:
            return 'Right Wing 3', '24+ ft'
        elif x < 0:
            return 'Left Top of Key 3', '20-24 ft'
        else:
            return 'Right Top of Key 3', '20-24 ft'
    elif y > 47.5:
        if x < -80:
            return 'Left Baseline Mid-range', '10-20 ft'
        elif x > 80:
            return 'Right Baseline Mid-range', '10-20 ft'
        elif x < -10:
            return 'Left Elbow Mid-range', '10-20 ft'
        elif x > 10:
            return 'Right Elbow Mid-range', '10-20 ft'
        else:
            return 'Center Mid-range', '10-20 ft'
    elif y >= 0:  # Near basket, including under the hoop
        if distance_from_hoop < 10:
            if x < -10:
                return 'Left of Near Basket', '0-10 ft'
            elif x > 10:
                return 'Right of Near Basket', '0-10 ft'
            else:
                return 'Center of Near Basket', '0-10 ft'
        elif distance_from_hoop < 20:  # Adjusted to correctly categorize shots at 10-20 ft range
            if x < -20:
                return 'Left of Near Basket', '10-20 ft'
            elif x > 20:
                return 'Right of Near Basket', '10-20 ft'
            else:
                return 'Center of Near Basket', '10-20 ft'
        elif distance_from_hoop < 30:  # Added condition for shots in the 20-30 ft range
            return 'Near Mid-range', '20-30 ft'
        else:
            if x < -80:
                return 'Left Wing Mid-range', '20-30 ft'
            elif x > 80:
                return 'Right Wing Mid-range', '20-30 ft'
            else:
                return 'Center Mid-range', '20-30 ft'
    
    if debug:
        print(f"Debug: Unknown shot location (x, y)=({x}, {y}), distance from hoop={distance_from_hoop}")
    
    return 'Unknown', 'Unknown'  # Ensure that a tuple is always returned



def get_all_court_areas():
    """Returns a list of all possible court areas defined in categorize_shot."""
    return [
        'Backcourt', 'Deep 3 Left', 'Deep 3 Center', 'Deep 3 Right',
        'Left Corner 3', 'Right Corner 3', 'Left Wing 3', 'Right Wing 3',
        'Left Top of Key 3', 'Right Top of Key 3', 'Center Mid-range',
        'Left Baseline Mid-range', 'Right Baseline Mid-range',
        'Left Elbow Mid-range', 'Right Elbow Mid-range',
        'Center of Near Basket', 'Left of Near Basket', 'Right of Near Basket',
        'Near Mid-range', 'Left Wing Mid-range', 'Right Wing Mid-range'
    ]



Overwriting ../src/shot_chart/nba_helpers.py


Step 2: Fetching Shots Data

Next, we'll write functions to fetch shots data for both offensive and defensive sides.

In [157]:
%%writefile ../src/shot_chart/nba_shots.py

import pandas as pd
from nba_api.stats.endpoints import shotchartdetail
from nba_api.stats.static import players, teams
from sklearn.metrics import mean_absolute_error
import matplotlib.pyplot as plt

from shot_chart.nba_helpers import categorize_shot
from shot_chart.nba_plotting import plot_shot_chart_hexbin
from shot_chart.nba_helpers import get_team_abbreviation


def fetch_shots_for_multiple_players(player_names, season, court_areas=None, opponent_name=None, debug=False):
    from shot_chart.nba_efficiency import calculate_efficiency
    """Fetch shots data for multiple players with an option to filter by court areas and an opponent team."""
    player_shots = {}
    for player_name in player_names:
        print(f"Fetching shots for {player_name}")
        shots = fetch_shots_data(player_name, is_team=False, season=season, opponent_team=opponent_name)
        
        # Apply the categorize_shot function to generate 'Area' and 'Distance' columns
        shots['Area'], shots['Distance'] = zip(*shots.apply(lambda row: categorize_shot(row, debug=debug), axis=1))
        
        # If court_areas is provided and not set to 'all', filter the shots data
        if court_areas and court_areas != 'all':
            shots = shots[shots['Area'].isin(court_areas)]
        
        efficiency = calculate_efficiency(shots)
        player_shots[player_name] = {
            'shots': shots,
            'efficiency': efficiency
        }
        
        # Plot shot chart for each player
        fig = plot_shot_chart_hexbin(shots, f'{player_name} Shot Chart', opponent=opponent_name if opponent_name else "the rest of the league")
        plt.show()
        
        # Print efficiency summary for each player
        print(f"Efficiency for {player_name}:")
        print(efficiency)
        
    return player_shots

def fetch_shots_data(name, is_team, season, opponent_team=None, opponent_player=None, game_date=None, debug=False):
    """Fetches shots data for a team or player for a given season with optional filters."""
    if is_team:
        team_dictionary = teams.get_teams()
        team_info = [team for team in team_dictionary if team['full_name'] == name]
        
        if not team_info:
            raise ValueError(f"No team found with name {name}")
        
        team_id = team_info[0]['id']
        print(f"Fetching data for Team ID: {team_id}")
        
        shotchart = shotchartdetail.ShotChartDetail(
            team_id=team_id,
            player_id=0,
            context_measure_simple='FGA',
            season_nullable=season,
            season_type_all_star=['Regular Season', 'Playoffs']
        )
    else:
        player_dictionary = players.get_players()
        player_info = [player for player in player_dictionary if player['full_name'] == name]
        
        if not player_info:
            raise ValueError(f"No player found with name {name}")
        
        player_id = player_info[0]['id']
        print(f"Fetching data for Player ID: {player_id}")
        
        shotchart = shotchartdetail.ShotChartDetail(
            team_id=0,
            player_id=player_id,
            context_measure_simple='FGA',
            season_nullable=season,
            season_type_all_star=['Regular Season', 'Playoffs']
        )
    
    data = shotchart.get_data_frames()[0]

    # Handle the case where opponent_team is "all"
    if opponent_team and opponent_team.lower() != "all":
        opponent_abbreviation = get_team_abbreviation(opponent_team)
        data = data[(data['HTM'] == opponent_abbreviation) | (data['VTM'] == opponent_abbreviation)]
    
    if opponent_player:
        opponent_dictionary = players.get_players()
        opponent_info = [player for player in opponent_dictionary if player['full_name'] == opponent_player]
        if opponent_info:
            opponent_player_id = opponent_info[0]['id']
            data = data[data['PLAYER_ID'] == opponent_player_id]

    if game_date:
        data = data[data['GAME_DATE'] == game_date.replace('-', '')]

    # Apply the categorize_shot function with the debug parameter
    data['Area'], data['Distance'] = zip(*data.apply(lambda row: categorize_shot(row, debug=debug), axis=1))

    return data


def fetch_defensive_shots_data(name, is_team, season, opponent_team=None, opponent_player=None, game_date=None, debug=False):
    """Fetches defensive shots data for a team or player for a given season with optional filters."""
    if is_team:
        team_abbr = get_team_abbreviation(name)
        shotchart = shotchartdetail.ShotChartDetail(
            team_id=0,
            player_id=0,
            context_measure_simple='FGA',
            season_nullable=season,
            season_type_all_star=['Regular Season', 'Playoffs']
        )
        
        data = shotchart.get_data_frames()[0]
        defensive_shots = data[(data['HTM'] == team_abbr) | (data['VTM'] == team_abbr)]
        defensive_shots = defensive_shots[defensive_shots['TEAM_NAME'] != name]

    else:
        player_dictionary = players.get_players()
        player_info = [player for player in player_dictionary if player['full_name'] == name]
        
        if not player_info:
            raise ValueError(f"No player found with name {name}")
        
        player_id = player_info[0]['id']
        print(f"Fetching data for Player ID: {player_id}")
        
        shotchart = shotchartdetail.ShotChartDetail(
            team_id=0,
            player_id=0,
            context_measure_simple='FGA',
            season_nullable=season,
            season_type_all_star=['Regular Season', 'Playoffs']
        )
        
        data = shotchart.get_data_frames()[0]
        defensive_shots = data[data['PLAYER_ID'] == player_id]

    if opponent_team:
        opponent_abbreviation = get_team_abbreviation(opponent_team)
        defensive_shots = defensive_shots[(defensive_shots['HTM'] == opponent_abbreviation) | (defensive_shots['VTM'] == opponent_abbreviation)]

    if opponent_player:
        opponent_dictionary = players.get_players()
        opponent_info = [player for player in opponent_dictionary if player['full_name'] == opponent_player]
        if opponent_info:
            opponent_player_id = opponent_info[0]['id']
            defensive_shots = defensive_shots[defensive_shots['PLAYER_ID'] == opponent_player_id]

    if game_date:
        defensive_shots = defensive_shots[defensive_shots['GAME_DATE'] == game_date.replace('-', '')]

    # Apply the categorize_shot function with the debug parameter
    defensive_shots['Area'], defensive_shots['Distance'] = zip(*defensive_shots.apply(lambda row: categorize_shot(row, debug=debug), axis=1))

    return defensive_shots


Overwriting ../src/shot_chart/nba_shots.py


Step 3: Plotting Functions

We'll create functions to plot the court and shot charts.

In [158]:
%%writefile ../src/shot_chart/nba_plotting.py

import matplotlib.pyplot as plt
import matplotlib.patches as patches

def plot_court(ax=None):
    """Plots the basketball court on a given axis."""
    if ax is None:
        ax = plt.gca()
    
    ax.set_xlim(-250, 250)
    ax.set_ylim(-47.5, 422.5)
    ax.set_aspect('equal')
    
    court_elements = [
        plt.Circle((0, 0), radius=7.5, linewidth=2, color='black', fill=False),  # Hoop
        plt.Rectangle((-30, -10), 60, -1, linewidth=2, color='black'),  # Backboard
        plt.Rectangle((-80, -47.5), 160, 190, linewidth=2, color='black', fill=False),  # Paint
        plt.Circle((0, 142.5), radius=60, linewidth=2, color='black', fill=False),  # Free throw top arc
        plt.Circle((0, 142.5), radius=60, linewidth=2, color='black', fill=False, linestyle='dashed'),  # Free throw bottom arc
        patches.Arc((0, 0), 475, 475, theta1=0, theta2=180, linewidth=2, color='black'),  # 3-point arc
        plt.Rectangle((-250, -47.5), 500, 470, linewidth=2, color='black', fill=False),  # Outer lines
    ]
    
    for element in court_elements:
        ax.add_patch(element)
    
    return ax

def plot_shot_chart_hexbin(shots, title, opponent, court_color='white'):
    """Plots a hexbin shot chart."""
    plt.figure(figsize=(12, 11))
    ax = plt.gca()
    ax.set_facecolor(court_color)
    plot_court(ax)
    
    hexbin = plt.hexbin(
        shots['LOC_X'], shots['LOC_Y'], C=shots['SHOT_MADE_FLAG'], 
        gridsize=40, extent=(-250, 250, -47.5, 422.5), cmap='Blues', edgecolors='grey'
    )
    
    cb = plt.colorbar(hexbin, ax=ax, orientation='vertical')
    cb.set_label('Shooting Percentage')
    
    total_attempts = len(shots)
    total_made = shots['SHOT_MADE_FLAG'].sum()
    overall_percentage = total_made / total_attempts if total_attempts > 0 else 0
    
    opponent_text = f" against {opponent}" if opponent else " against the rest of the league"
    
    plt.text(0, 450, f"Total Shots: {total_attempts}", fontsize=12, ha='center')
    plt.text(0, 430, f"Total Made: {total_made}", fontsize=12, ha='center')
    plt.text(0, 410, f"Overall Percentage: {overall_percentage:.2%}", fontsize=12, ha='center')
    
    plt.title(f"{title}{opponent_text}", pad=50)  # Adjusted title to include opponent
    plt.xlim(-250, 250)
    plt.ylim(-47.5, 422.5)
    
    plt.tight_layout()  # Ensures that the plot elements don't overlap
    return plt.gcf()



Overwriting ../src/shot_chart/nba_plotting.py


Step 4: Efficiency Calculation Functions

We'll write functions to calculate shot efficiency and team fit.

In [159]:
%%writefile ../src/shot_chart/nba_efficiency.py

import pandas as pd
from sklearn.metrics import mean_absolute_error, mean_absolute_percentage_error
import os
import pandas as pd
import matplotlib.pyplot as plt
from shot_chart.nba_helpers import categorize_shot
from shot_chart.nba_shots import fetch_shots_data, fetch_defensive_shots_data

def calculate_efficiency(shots, debug=False):
    """Calculates the efficiency of shots and ensures unique areas."""
    shots['Area'], shots['Distance'] = zip(*shots.apply(lambda row: categorize_shot(row, debug=debug), axis=1))
    
    # Group by Area and Distance to aggregate data
    summary = shots.groupby(['Area', 'Distance']).agg(
        Attempts=('SHOT_MADE_FLAG', 'size'),
        Made=('SHOT_MADE_FLAG', 'sum')
    ).reset_index()
    
    # Calculate Efficiency
    summary['Efficiency'] = summary['Made'] / summary['Attempts']
    
    return summary


def calculate_team_fit(home_efficiency, opponent_efficiency):
    """Calculates the team fit using MAE and MAPE, ensuring consistent data across areas."""
    # Aggregate data to ensure unique areas
    home_efficiency = home_efficiency.groupby('Area').agg({
        'Attempts': 'sum', 
        'Made': 'sum'
    }).reset_index()
    home_efficiency['Efficiency'] = home_efficiency['Made'] / home_efficiency['Attempts']

    opponent_efficiency = opponent_efficiency.groupby('Area').agg({
        'Attempts': 'sum', 
        'Made': 'sum'
    }).reset_index()
    opponent_efficiency['Efficiency'] = opponent_efficiency['Made'] / opponent_efficiency['Attempts']
    
    # Get all unique areas present in either player's data
    all_areas = set(home_efficiency['Area']).union(set(opponent_efficiency['Area']))
    
    # Create a complete DataFrame for home and opponent efficiency with all areas
    home_efficiency_complete = home_efficiency.set_index('Area').reindex(all_areas, fill_value=0).reset_index()
    opponent_efficiency_complete = opponent_efficiency.set_index('Area').reindex(all_areas, fill_value=0).reset_index()
    
    # Log areas with no data
    missing_home_areas = all_areas - set(home_efficiency['Area'])
    missing_opponent_areas = all_areas - set(opponent_efficiency['Area'])
    
    if missing_home_areas:
        print(f"Warning: The home player is missing data for areas: {', '.join(missing_home_areas)}. Filling with zero attempts and efficiency.")
    if missing_opponent_areas:
        print(f"Warning: The opponent player is missing data for areas: {', '.join(missing_opponent_areas)}. Filling with zero attempts and efficiency.")
    
    # Calculate MAE and MAPE
    mae = mean_absolute_error(home_efficiency_complete['Efficiency'], opponent_efficiency_complete['Efficiency'])
    mape = mean_absolute_percentage_error(home_efficiency_complete['Efficiency'], opponent_efficiency_complete['Efficiency'])
    
    return mae, mape



def create_mae_table(home_team, season, all_teams):
    from shot_chart.nba_shots import fetch_shots_data, fetch_defensive_shots_data
    """Creates a table of MAE values for the given home team against all opponents."""
    mae_list = []
    home_shots = fetch_shots_data(home_team, True, season)
    home_efficiency = calculate_efficiency(home_shots)

    for opponent in all_teams:
        if opponent == home_team:
            continue
        
        # Ensure the season is passed to fetch_defensive_shots_data
        opponent_shots = fetch_defensive_shots_data(opponent, True, season)
        opponent_efficiency = calculate_efficiency(opponent_shots)
        
        mae, mape = calculate_team_fit(home_efficiency, opponent_efficiency)
        
        mae_list.append({
            'Home Team': home_team,
            'Opponent Team': opponent,
            'MAE': mae,
            'MAPE': mape,
            'Season': season
        })
    
    mae_df = pd.DataFrame(mae_list)
    mae_df = mae_df.sort_values(by='MAE')
    return mae_df

def save_mae_table(mae_df, file_path):
    """Saves the MAE table to a CSV file."""
    if os.path.exists(file_path):
        existing_df = pd.read_csv(file_path)
        mae_df = pd.concat([existing_df, mae_df]).drop_duplicates()
    mae_df.to_csv(file_path, index=False)

def load_mae_table(file_path):
    """Loads the MAE table from a CSV file if it exists."""
    if os.path.exists(file_path):
        return pd.read_csv(file_path)
    return None

def get_seasons_range(mae_df):
    """Returns the minimum and maximum seasons in the MAE DataFrame."""
    min_season = mae_df['Season'].min()
    max_season = mae_df['Season'].max()
    return min_season, max_season

def calculate_compatibility_between_players(player_shots):
    """Calculate MAE between each pair of players and determine shooting area compatibility."""
    mae_list = []
    player_names = list(player_shots.keys())
    
    # Debugging: Ensure there are at least two players to compare
    if len(player_names) < 2:
        print("Error: Less than two players provided for comparison.")
        return pd.DataFrame()  # Return an empty DataFrame early to prevent errors

    for i, player1 in enumerate(player_names):
        for player2 in player_names[i+1:]:
            mae = calculate_team_fit(player_shots[player1]['efficiency'], player_shots[player2]['efficiency'])[0]
            print(f"MAE for {player1} vs {player2}: {mae}")
            
            # Calculate compatibility based on shooting percentages
            compatibility = []
            for area in player_shots[player1]['efficiency']['Area'].unique():
                eff1 = player_shots[player1]['efficiency'].loc[player_shots[player1]['efficiency']['Area'] == area, 'Efficiency'].values[0]
                eff2 = player_shots[player2]['efficiency'].loc[player_shots[player2]['efficiency']['Area'] == area, 'Efficiency'].values[0]
                
                if eff1 >= 0.5 and eff2 >= 0.5:
                    compatibility.append('same_area')
                elif (eff1 >= 0.5 and eff2 < 0.5) or (eff1 < 0.5 and eff2 >= 0.5):
                    compatibility.append('diff_area')
            
            # Determine overall compatibility based on majority
            if compatibility.count('same_area') > compatibility.count('diff_area'):
                shooting_area_compatibility = 'efficient_in_same_areas'
            else:
                shooting_area_compatibility = 'efficient_in_diff_areas'
                
            mae_list.append({
                'Player 1': player1,
                'Player 2': player2,
                'MAE': mae,
                'Shooting Area Compatibility': shooting_area_compatibility
            })
    
    # Convert to DataFrame and return
    mae_df = pd.DataFrame(mae_list)
    return mae_df






Overwriting ../src/shot_chart/nba_efficiency.py


Step 5: Main Function

Finally, we'll create the main function that ties everything together and performs the analysis.

In [160]:
%%writefile ../src/shot_chart/shot_chart_main.py

import os
import pandas as pd
import streamlit as st
import matplotlib.pyplot as plt
from nba_api.stats.static import players, teams
from shot_chart.nba_helpers import get_team_abbreviation, categorize_shot
from shot_chart.nba_shots import fetch_shots_data, fetch_defensive_shots_data, fetch_shots_for_multiple_players
from shot_chart.nba_plotting import plot_shot_chart_hexbin
from shot_chart.nba_efficiency import calculate_efficiency, create_mae_table, save_mae_table, load_mae_table, get_seasons_range, calculate_compatibility_between_players

def preload_mae_tables(entity_name, season):
    """Preload MAE tables for all teams to speed up future calculations."""
    mae_df_all_path = f'data/shot_chart_data/{entity_name}_mae_table_all_{season}.csv'
    mae_df_all = load_mae_table(mae_df_all_path)
    if mae_df_all is None:
        mae_df_all = create_mae_table(entity_name, season, [team['full_name'] for team in teams.get_teams()])
        save_mae_table(mae_df_all, mae_df_all_path)
    return mae_df_all

def create_and_save_mae_table_specific(entity_name, season, opponent_name):
    """Create and save the MAE table for a specific opponent."""
    mae_df_specific_path = f'data/shot_chart_data/{entity_name}_mae_table_specific_{season}.csv'
    mae_df_specific = load_mae_table(mae_df_specific_path)
    
    print(f"Debug: Opponent name before calling create_mae_table: {opponent_name}")
    
    if mae_df_specific is None or opponent_name not in mae_df_specific['Opponent Team'].values:
        mae_df_specific = create_mae_table(entity_name, season, [opponent_name])
        
        print(f"Debug: MAE table head after creation for opponent {opponent_name}:\n{mae_df_specific.head()}")
        
        save_mae_table(mae_df_specific, mae_df_specific_path)
    
    return mae_df_specific

def create_and_save_mae_table_all(entity_name, season):
    """Load the preloaded MAE table for all teams."""
    mae_df_all_path = f'data/shot_chart_data/{entity_name}_mae_table_all_{season}.csv'
    mae_df_all = load_mae_table(mae_df_all_path)
    if mae_df_all is None:
        mae_df_all = preload_mae_tables(entity_name, season)
    return mae_df_all

def run_scenario(entity_name, entity_type, season, opponent_name=None, analysis_type="offensive", compare_players=False, player_names=None, court_areas=None):
    """Run a scenario for a given entity (team or player) against a specific opponent or all teams."""
    if compare_players and player_names:
        st.write("Comparing multiple players...")
        player_shots = fetch_shots_for_multiple_players(player_names, season, court_areas, opponent_name, debug=False)
        
        st.write(f"Player shots data: {player_shots.keys()}")
        for player, shots in player_shots.items():
            st.write(f"{player}: {len(shots['shots'])} shots recorded")
        
        compatibility_df = calculate_compatibility_between_players(player_shots)
        
        st.write("MAE DataFrame after calculation:")
        st.write(compatibility_df)
    else:
        opponent_text = "All Teams" if not opponent_name or (isinstance(opponent_name, str) and opponent_name.lower() == "all") else opponent_name
        st.write(f"Running scenario for {entity_name} ({'Team' if entity_type == 'team' else 'Player'}) vs {opponent_text}")
        
        # Ensure only teams are analyzed for defensive scenarios
        if entity_type == 'player' and analysis_type != 'offensive':
            st.error("Defensive analysis is only applicable for teams.")
            return
        
        # Load the appropriate MAE tables based on whether it's a team or player
        if entity_type == 'team':
            mae_df_all = preload_mae_tables(entity_name, season)
        else:
            mae_df_all = None  # MAE might not be applicable for individual players in this context
        
        # Fetch and display offensive data
        shots = fetch_shots_data(entity_name, entity_type == 'team', season, opponent_name)
        st.write(f"Shots DataFrame head:")
        st.write(shots.head())
        
        efficiency = calculate_efficiency(shots)
        st.write(f"Offensive Efficiency for {entity_name}:")
        st.write(efficiency)
        
        # Plot shot chart and display it
        fig = plot_shot_chart_hexbin(shots, f'{entity_name} Shot Chart', opponent=opponent_text)
        st.pyplot(fig)
        
        if opponent_name and isinstance(opponent_name, str) and opponent_name.lower() != "all" and entity_type == 'team':
            # MAE calculation and saving for specific team (only relevant if we're dealing with teams)
            mae_df_specific = create_and_save_mae_table_specific(entity_name, season, opponent_name)
            
            # Filter the table to include only the specific opponent
            mae_df_specific = mae_df_specific[mae_df_specific['Opponent Team'] == opponent_name]
            
            st.write(f"Defensive MAE Table for {entity_name} against {opponent_name}:")
            st.write(mae_df_specific)
        elif entity_type == 'team':
            # MAE calculation and loading for all teams (only relevant for teams)
            st.write(f"MAE Table for {entity_name} against all teams:")
            st.write(mae_df_all)
        
            min_season, max_season = get_seasons_range(mae_df_all)
            st.write(f"MAE Table available for seasons from {min_season} to {max_season}.")
        
        # Perform defensive analysis only for teams
        if analysis_type == 'both' and entity_type == 'team':
            # Fetch and display defensive data for the specified team
            defensive_shots = fetch_defensive_shots_data(entity_name, entity_type == 'team', season, opponent_name)
            defensive_efficiency = calculate_efficiency(defensive_shots)
            st.write(f"Defensive Efficiency for {entity_name}:")
            st.write(defensive_efficiency)
            
            # Plot defensive shot chart and display it
            fig = plot_shot_chart_hexbin(defensive_shots, f'{entity_name} Defensive Shot Chart', opponent=opponent_text)
            st.pyplot(fig)
            
            if opponent_name and isinstance(opponent_name, str) and opponent_name.lower() != "all" and entity_type == 'team':
                # MAE calculation for defensive analysis against the specific opponent (only for teams)
                mae_df_specific = create_and_save_mae_table_specific(entity_name, season, opponent_name)
                
                # Filter the table to include only the specific opponent
                mae_df_specific = mae_df_specific[mae_df_specific['Opponent Team'] == opponent_name]
                
                st.write(f"Defensive MAE Table for {entity_name} against {opponent_name}:")
                st.write(mae_df_specific)


def main():
    season = "2023-24"  # You can modify this to ask the user for input if needed

    # Specify the court areas of interest, or use 'all' to compare across all areas
    court_areas = 'all'  # Or set to 'all' to include all areas

    # Compare three players in specific areas of the court against a specific opponent or all teams
    player_names = ["Luka Doncic", "Stephen Curry", "Kevin Durant"]
    run_scenario(None, "player", season, opponent_name="all", compare_players=True, player_names=player_names, court_areas=court_areas)

    # Other scenarios remain unchanged
    run_scenario("Boston Celtics", "team", season, "Dallas Mavericks", analysis_type="both")
    run_scenario("Boston Celtics", "team", season, None, analysis_type="both")
    run_scenario("Luka Doncic", "player", season, "Boston Celtics", analysis_type="offensive")

if __name__ == "__main__":
    main()




Overwriting ../src/shot_chart/shot_chart_main.py


In [161]:
%%writefile ../src/shot_chart_streamlit_app.py

import streamlit as st
from shot_chart.nba_helpers import get_team_abbreviation, categorize_shot, get_all_court_areas
from shot_chart.nba_shots import fetch_shots_for_multiple_players
from shot_chart.nba_plotting import plot_shot_chart_hexbin
from shot_chart.nba_efficiency import create_mae_table, save_mae_table, load_mae_table, get_seasons_range, calculate_compatibility_between_players
from shot_chart.shot_chart_main import run_scenario
from nba_api.stats.static import players, teams

@st.cache_data
def get_teams_list():
    """Get the list of NBA teams."""
    return [team['full_name'] for team in teams.get_teams()]

@st.cache_data
def get_players_list():
    """Get the list of NBA players."""
    return [player['full_name'] for player in players.get_players()]

def main():
    st.title("NBA Shot Analysis")
    
    # Add guidelines and purpose explanation at the top
    st.markdown("""
    ### Welcome to the NBA Shot Analysis App!
    
    This app allows you to analyze the offensive and defensive efficiency of NBA teams and players. 
    You can compare players or teams to identify the most efficient spots on the court, 
    analyze player compatibility based on shot area efficiency, and much more.
    
    **Options and Guidelines:**
    - **Analysis Type**: Choose between offensive, defensive, or both types of analysis.
    - **Team or Player**: Analyze a team or an individual player.
    - **Court Areas**: Select specific court areas or analyze all areas.
    - **Comparison**: Compare multiple players to see how their offensive efficiencies align or differ.
    
    ### How to Find the Most Efficient Spots:
    - The app allows you to explore shot efficiency across different court areas.
    - You can see how players perform against other teams and how well they play together.
    - The MAE (Mean Absolute Error) metric helps identify the compatibility between players based on their shooting efficiency in various areas.
    """)
    
    analysis_type = st.selectbox("Select analysis type", options=["offensive", "defensive", "both"])
    
    entity_type = st.selectbox("Analyze a Team or Player?", options=["team", "player"])
    
    if entity_type == "team":
        st.markdown("_**Team option is able to analyze both offense and defense by looking into the defense by shot detail from other teams' shot charts against the Opposing Team.**_")
        entity_name = st.selectbox("Select a Team", options=get_teams_list())
    else:
        st.markdown("_**Player Option is only able to look at offense.**_")
        player_names = st.multiselect("Select Players to Analyze", options=get_players_list())
    
    season = st.selectbox("Select the season", options=["2023-24", "2022-23", "2021-22", "2020-21"])
    
    opponent_type = st.selectbox("Compare against all teams or a specific team?", options=["all", "specific"])
    
    opponent_name = None
    if opponent_type == "specific":
        opponent_name = st.selectbox("Select an Opponent Team", options=get_teams_list())
    
    court_areas = st.selectbox("Select court areas to analyze", options=["all", "specific"], index=0)
    
    if court_areas == "specific":
        court_areas = st.multiselect("Select specific court areas", options=get_all_court_areas())
    else:
        court_areas = "all"
    
    debug_mode = st.checkbox("Enable Debug Mode", value=False)
    
    if st.button("Run Analysis"):
        if entity_type == "player" and (not player_names or len(player_names) < 1):
            st.error("Please select at least one player.")
        else:
            if entity_type == "player":
                if len(player_names) == 1:
                    # Single player analysis
                    run_scenario(
                        entity_name=player_names[0],
                        entity_type=entity_type,
                        season=season,
                        opponent_name=opponent_name,
                        analysis_type=analysis_type,
                        compare_players=False,
                        player_names=None,
                        court_areas=court_areas
                    )
                else:
                    # Multiple players comparison
                    player_shots = fetch_shots_for_multiple_players(player_names, season, court_areas, opponent_name, debug=debug_mode)
                    
                    for player, shots in player_shots.items():
                        st.pyplot(plot_shot_chart_hexbin(shots['shots'], f'{player} Shot Chart', opponent=opponent_name if opponent_name else "all teams"))
                        st.write(f"Efficiency for {player}:")
                        st.write(shots['efficiency'])
                    
                    compatibility_df = calculate_compatibility_between_players(player_shots)
                    st.write("Player Shooting Area Compatibility:")
                    st.write(compatibility_df)
            else:
                # Team analysis
                run_scenario(
                    entity_name=entity_name,
                    entity_type=entity_type,
                    season=season,
                    opponent_name=opponent_name,
                    analysis_type=analysis_type,
                    compare_players=False,
                    court_areas=court_areas
                )

    # Add explanation for shot chart MAE analysis
    with st.expander("Understanding MAE in Player Analysis with context from their Shooting"):
        st.markdown("""
        **MAE** is a metric that measures the average magnitude of errors between predicted values and actual values, without considering their direction.
        
        In our context, MAE is used to measure the difference between the shooting efficiencies of two players across various areas on the court.
        
        **Steps to Analyze MAE:**
        1. **Define Common Areas**: The court is divided into areas like "Left Corner 3", "Top of Key", "Paint", etc.
        2. **Calculate Individual Efficiencies**: Fetch shot data for each player and calculate their shooting efficiency in these areas.
        3. **Identify Common Areas**: When comparing players, identify the areas where both players have taken shots.
        4. **Calculate MAE**: Compute the absolute difference between efficiencies in each common area and average them.
        5. **Interpret Compatibility**:
            - **High MAE**: Indicates players excel in different areas (more compatible).
            - **Low MAE**: Indicates similar efficiencies in the same areas (less compatible).
        
        **Use this metric to assess player compatibility based on where they excel on the court!**
        """)
        
    with st.expander("Understanding MAE in Team (offensive or defensive) in comparison to other Teams"):
        st.markdown("""
        **MAE** is a metric that measures the average magnitude of errors between predicted values and actual values, without considering their direction.
        
        In the context of team analysis, MAE is used to measure the difference between the shooting efficiencies of one team's offense and the defensive efficiencies of other teams.
        
        **Steps to Analyze MAE for Team Comparison:**
        1. **Calculate Offensive Efficiency**: Fetch shot data for the team of interest and calculate their shooting efficiency across various areas on the court.
        2. **Calculate Defensive Efficiency of Opponents**: For each opponent team, calculate their defensive efficiency by analyzing how well they defend these same areas on the court.
        3. **Calculate MAE**: Compute the MAE between the offensive efficiency of the team of interest and the defensive efficiencies of each opponent team across the defined court areas.
        4. **Interpret the Results**:
            - **Low MAE**: Indicates that the opponent team is effective at defending the areas where the team of interest typically excels. This suggests that the opponent is a "bad fit" for the team of interest, as they defend well against their strengths.
            - **High MAE**: Indicates that the opponent team struggles to defend the areas where the team of interest typically excels. This suggests that the opponent is a "good fit" for the team of interest, as their defense is less effective against the team's offensive strengths.
        
        **Use this analysis to identify which teams are tough matchups (bad fits) versus easier matchups (good fits) based on how well they can defend your team's key offensive areas!**
        """)

if __name__ == "__main__":
    main()



Overwriting ../src/shot_chart_streamlit_app.py
