# LeBron James Shots Analysis

In this notebook, we will arrange the data to analyze LeBron James' offensive evolution throughout his career.

## Importing Libraries
To begin our analysis, we need to import the required libraries and load the dataset that contains LeBron James' career shots. You can access the dataset [here](https://www.kaggle.com/datasets/eduvadillo/lebron-james-career-shots).

In [17]:
import pandas as pd
import numpy as np

df = pd.read_csv('lebron_shot_data.csv', encoding='utf8')

## LeBron Shots Data

### Data Cleaning
Let's begin by cleaning the dataframe to focus only on the necessary information for our analysis.

- Step 1: Remove Unnecessary Columns

We will start by removing columns that are not essential for our analysis.

- Step 2: Rename Columns

Next, we will rename the columns to make the data more understandable and easier to work with.

- 3: Invert X-axis Coordinates

Since the original data is formatted for a mirrored basketball court, we will invert the x-axis coordinates to match the orientation of the court we will plot.


In [18]:
df = pd.read_csv('lebron_shot_data.csv', encoding='utf8')

df = df.drop(['GAME_ID', 'GAME_EVENT_ID', 'PLAYER_ID', 'PLAYER_NAME', 'TEAM_ID', 'EVENT_TYPE', 'GAME_DATE', 'HTM', 'VTM', 'SHOT_ZONE_RANGE', 'SHOT_TYPE', 'ACTION_TYPE', ], axis=1)
df = df.rename(columns={"TEAM_NAME": "Team", "PERIOD": "Period", "MINUTES_REMAINING": "Minutes_Remaining", "SHOT_MADE_FLAG": "FG", "SHOT_ATTEMPTED_FLAG": "FGA", "SHOT_ZONE_BASIC": "Zone", "SHOT_ZONE_AREA": "Area", "LOC_X": "X", "LOC_Y": "Y", "SHOT_DISTANCE": "Distance", "SEASON": "Season"})
#inverto la dx con la sx per avere la rappresentazione corretta
#dati sono specchiati perchè la rappresentazione è specchiata rispetto all'asse y
df['X'] = -df['Y']

#trasformo i left center e rigth center in left e right

df['Area'] = df['Area'].replace(['Left Side Center(LC)', 'Right Side Center(RC)'], ['Left Side(L)', 'Right Side(R)'])


#remove not significant data
#remove shots from more than 30 feet
df = df[df['Distance'] <= 30]
#remove shots from the last 3 minutes of the game
df = df[(df['Period'] < 4) | (df['Minutes_Remaining'] <= 3)]

df = df.drop(['Period', 'Minutes_Remaining', 'SECONDS_REMAINING'], axis=1)



df  = df.set_index(['Team', 'Season'])
df.to_pickle("../creazione_plot/LeBronShots.pickle")
#display(df)

## Advanced Data

This dataset has been compiled by aggregating various statistics from basketball-reference.com, which can be downloaded directly as a CSV file from the following links: [basketball-reference.com](https://www.basketball-reference.com/players/j/jamesle01.html#advanced) and [statmuse.com](https://www.statmuse.com/nba/ask/nba).

### Basketball Reference
Similarly to before, we will clean up the dataset to retain only the relevant data. Additionally, we will set an index that will facilitate merging with other statistics.

In [19]:
df_advanced_Aux = pd.read_csv('advanced.csv', encoding='utf8')

#drop unnamed column
df_advanced_Aux = df_advanced_Aux.drop(df_advanced_Aux.columns[df_advanced_Aux.columns.str.contains('unnamed',case = False)],axis = 1)

#drop age, Lg, Pos , MP, TS%, 3PAr, FTr, ORB%, DRB%, TRB%, AST%, STL%, BLK%, TOV%, DWS, WS, WS/48, OBPM, DBPM, BPM, VORP

df_advanced_Aux = df_advanced_Aux.drop(['Age', 'Lg', 'Pos', 'MP', 'TS%', '3PAr', 'FTr', 'ORB%', 'DRB%', 'TRB%', 'AST%', 'STL%', 'BLK%', 'TOV%', 'DWS', 'WS', 'WS/48', 'OBPM', 'DBPM', 'BPM', 'VORP', 'PER'], axis=1)

#rename columns, set team, season and as index

df_advanced_Aux = df_advanced_Aux.rename(columns={'Tm' : 'Team', 'G' : 'GP', 'USG%' : 'USG'})

df_advanced_Aux = df_advanced_Aux.set_index(['Team', 'Season'])

#display(df_advanced_Aux)


## StatMuse Data Integration
To incorporate statistics on Offensive Rating (ORtg) from StatMuse.com, we'll construct a dataframe to compile these values. Here's how we'll proceed:

- Step 1: Building the ORtg Dataframe

We'll create a dictionary to store the ORtg statistics and then convert this dictionary into a dataframe.

- Step 2: Combining Dataframes

Finally, we'll merge the two dataframes based on their indices to consolidate all the necessary data.

In [20]:
# Dati forniti
data = {
    'Season': [
        '2003-04', '2004-05', '2005-06', '2006-07', '2007-08', '2008-09',
        '2009-10', '2010-11', '2011-12', '2012-13', '2013-14', '2014-15',
        '2015-16', '2016-17', '2017-18', '2018-19', '2019-20', '2020-21',
        '2021-22', '2022-23', '2023-24'
    ],
    'Team': [
        'CLE', 'CLE', 'CLE', 'CLE', 'CLE', 'CLE', 'CLE', 
             'MIA', 'MIA', 'MIA', 'MIA', 
             'CLE', 'CLE', 'CLE', 'CLE', 
             'LAL', 'LAL', 'LAL', 'LAL', 'LAL', 'LAL'],

    'OFR_L': [
        101.8, 107.4, 108.6, 105.7, 107.7, 113.2, 112.8,
        112.8, 108.9, 113.6, 112.4,
        114.2, 112.1, 116.0, 113.5,
        110.1, 112.9, 112.2, 112.0, 116.1, 116.8],
    'OFR_T': [
        108.6, 112.1, 104.5, 116.1, 95.7, 118.7, 103.5,
        109.4, 86.8, 110.4, 106.8,
        99.7, 105.0, 97.3, np.nan,
        105.0, 109.8, 107.7, 109.0, 113.1, 116.3],

    'LEAGUE_OFR': [
        102.9, 106.1, 106.2, 106.5, 107.5, 108.3, 107.6,
        107.3, 104.6, 105.9, 106.7,
        105.6, 106.4, 108.8, 108.6,
        110.4, 110.6, 112.3, 112.0, 114.8, 115.3]

}

# Creazione del DataFrame
df_advanced_Aux2 = pd.DataFrame(data)

# Impostazione dell'indice
df_advanced_Aux2 = df_advanced_Aux2.set_index(['Team', 'Season'])

# Unione dei DataFrame
df_advanced = pd.concat([df_advanced_Aux, df_advanced_Aux2], axis=1)

#display(df_advanced)

df_advanced.to_pickle("../creazione_plot/LeBronAdvanced.pickle")