#Lebron James Shots Analysis
In this notebook, we will create a dataframe of Lebron James' shots from his career which will be used in the following notebooks to analyze his shots.

## Importing Libraries
The first thing to do is to import the necessary libraries and the data on which we will work.
The dataset can be found here: https://www.kaggle.com/datasets/eduvadillo/lebron-james-career-shots

In [2]:
import pandas as pd
import requests
from bs4 import BeautifulSoup

df = pd.read_csv('lebron_shot_data.csv', encoding='utf8')

Let's start by cleaning the dataframe leaving only the necessary information.

## Data Cleaning
Firstly we will remove the columns that are not necessary for the analysis.
Then we rename the columns to make it easier to understand the data.
Latelly we invert asciss because the original data is formatted for a mirrored basketball court in relation with the one i will plot.

In [3]:
df = pd.read_csv('lebron_shot_data.csv', encoding='utf8')

df = df.drop(['GAME_ID', 'GAME_EVENT_ID', 'PLAYER_ID', 'PLAYER_NAME', 'TEAM_ID', 'EVENT_TYPE', 'GAME_DATE', 'HTM', 'VTM', 'SHOT_ZONE_RANGE', 'SHOT_TYPE', 'ACTION_TYPE'], axis=1)
df = df.rename(columns={"TEAM_NAME": "Team", "PERIOD": "Period", "MINUTES_REMAINING": "Minutes_Remaining", "SHOT_MADE_FLAG": "FG", "SHOT_ATTEMPTED_FLAG": "FGA", "SHOT_ZONE_BASIC": "Zone", "SHOT_ZONE_AREA": "Area", "LOC_X": "X", "LOC_Y": "Y", "SHOT_DISTANCE": "Distance", "SEASON": "Season"})
#inverto la dx con la sx per avere la rappresentazione corretta
#dati sono specchiati perchè la rappresentazione è specchiata rispetto all'asse y
df['X'] = -df['Y']

#trasformo i left center e rigth center in left e right

df['Area'] = df['Area'].replace(['Left Side Center(LC)', 'Right Side Center(RC)'], ['Left Side(L)', 'Right Side(R)'])


#remove not significant data
#remove shots from more than 30 feet
df = df[df['Distance'] <= 30]
#remove shots from the last 3 minutes of the game
df = df[(df['Period'] < 4) | (df['Minutes_Remaining'] <= 3)]

df = df.drop(['Period', 'Minutes_Remaining', 'SECONDS_REMAINING'], axis=1)



df  = df.set_index(['Team'])
df.to_pickle("../creazione_plot/LeBronShots.pickle")
display(df)

Unnamed: 0_level_0,Zone,Area,Distance,X,Y,FGA,FG,Season
Team,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1
Cleveland Cavaliers,Mid-Range,Right Side(R),15,0,0,1,1,2003-04
Cleveland Cavaliers,Mid-Range,Left Side(L),13,-2,2,1,1,2003-04
Cleveland Cavaliers,Mid-Range,Right Side(R),16,-5,5,1,1,2003-04
Cleveland Cavaliers,Mid-Range,Left Side(L),14,-92,92,1,0,2003-04
Cleveland Cavaliers,In The Paint (Non-RA),Center(C),5,-22,22,1,1,2003-04
...,...,...,...,...,...,...,...,...
Los Angeles Lakers,In The Paint (Non-RA),Center(C),9,-91,91,1,1,2023-24
Los Angeles Lakers,Restricted Area,Center(C),1,-2,2,1,1,2023-24
Los Angeles Lakers,Restricted Area,Center(C),3,-1,1,1,1,2023-24
Los Angeles Lakers,Above the Break 3,Center(C),26,-269,269,1,1,2023-24


Get the data of Lebron James' advanced stats during his career (regular season).

In [4]:
url = 'https://www.espn.com/nba/player/advancedstats/_/id/1966/lebron-james'
headers = {"User-Agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/96.0.4664.110 Safari/537.36"}
page = requests.get(url, headers=headers)
pageSoup = BeautifulSoup(page.content, 'html.parser')

In [5]:
# Trova la tabella delle advanced stats (se presente)
tables = pageSoup.find_all('table', class_='Table Table--align-right')
if len(tables) >= 1:  # Verifica se ci sono almeno tre tabelle nella pagina
    table = tables[0]  # Prendi la prima tabella
else:
    print("Non sono presenti abbastanza tabelle nella pagina per estrarre i postseason totals.")
    exit()

# Estrai i dati dalla tabella e crea una lista di dizionari
data = []
if table:
    rows = table.find_all('tr')
    for row in rows[1:]:  # Ignora l'intestazione
        cols = row.find_all(['th', 'td'])
        cols = [col.text.strip() for col in cols]
        data.append({
            'TS': cols[3],
            'TO': cols[5],
            'USG': cols[6],
            'RPM': cols[8],
        })

#converto tutti i tipi in float
for i in range(len(data)):
    data[i]['TS'] = float(data[i]['TS'])
    data[i]['TO'] = float(data[i]['TO'])
    data[i]['USG'] = float(data[i]['USG'])
    data[i]['RPM'] = float(data[i]['RPM'])

# Crea un DataFrame Pandas
dfAUX1 = pd.DataFrame(data)

# Stampa il DataFrame
display(dfAUX1)

Unnamed: 0,TS,TO,USG,RPM
0,48.8,11.2,26.9,-0.1
1,55.4,9.3,28.8,4.95
2,56.8,8.8,31.4,6.02
3,55.2,9.4,29.6,6.95
4,56.8,9.2,32.7,5.8
5,59.1,8.7,32.2,10.05
6,60.4,9.4,32.2,9.92
7,59.4,10.9,29.7,5.74
8,60.5,10.7,29.8,6.88
9,64.0,9.6,28.2,7.58


In [6]:
teams = ['CLE', 'MIA', 'LAL']
season = ['2003-04', '2004-05', '2005-06', '2006-07', '2007-08', '2008-09', '2009-10', '2010-11', '2011-12', '2012-13', '2013-14', '2014-15', '2015-16', '2016-17', '2017-18', '2018-19', '2019-20', '2020-21', '2021-22', '2022-23', '2023-24']

data = []

# Aggiunta delle tuple per CLE
for year in season[:7]:
    data.append(('CLE', year))
    
# Aggiunta delle tuple per MIA
for year in season[7:11]:
    data.append(('MIA', year))

# Aggiunta delle tuple per CLE
for year in season[11:15]:
    data.append(('CLE', year))

# Aggiunta delle tuple per LAL
for year in season[15:]:
    data.append(('LAL', year))

# Creazione del DataFrame
dfAUX2 = pd.DataFrame(data, columns=['Team', 'Season'])

# Visualizzazione del DataFrame
display(dfAUX2)

Unnamed: 0,Team,Season
0,CLE,2003-04
1,CLE,2004-05
2,CLE,2005-06
3,CLE,2006-07
4,CLE,2007-08
5,CLE,2008-09
6,CLE,2009-10
7,MIA,2010-11
8,MIA,2011-12
9,MIA,2012-13


In [7]:
#append the two dataframes

dfUsage = dfAUX2.join(dfAUX1)
dfUsage.set_index(['Team'], inplace=True)

#load the data

dfUsage.to_pickle("../creazione_plot/LeBron_EFG_USG.pickle")
display(dfUsage)

Unnamed: 0_level_0,Season,TS,TO,USG,RPM
Team,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1
CLE,2003-04,48.8,11.2,26.9,-0.1
CLE,2004-05,55.4,9.3,28.8,4.95
CLE,2005-06,56.8,8.8,31.4,6.02
CLE,2006-07,55.2,9.4,29.6,6.95
CLE,2007-08,56.8,9.2,32.7,5.8
CLE,2008-09,59.1,8.7,32.2,10.05
CLE,2009-10,60.4,9.4,32.2,9.92
MIA,2010-11,59.4,10.9,29.7,5.74
MIA,2011-12,60.5,10.7,29.8,6.88
MIA,2012-13,64.0,9.6,28.2,7.58


In [8]:
#create a dataframe whith % win of the team in the season

data = [
    ('CLE', '2000-01', 0.366)
    ('CLE', '2001-02', 0.354),
    ('CLE', '2002-03', 0.207),
    ('CLE', '2003-04', 0.427),
    ('CLE', '2004-05', 0.512),
    ('CLE', '2005-06', 0.610),
    ('CLE', '2006-07', 0.610),
    ('CLE', '2007-08', 0.549),
    ('CLE', '2008-09', 0.805),
    ('CLE', '2009-10', 0.744),
    ('CLE', '2010-11', 0.232),
    ('CLE', '2011-12', 0.318),
    ('CLE', '2012-13', 0.293),
    
    ('MIA', '2007-08', 0.183),
    ('MIA', '2008-09', 0.524),
    ('MIA', '2009-10', 0.573),
    ('MIA', '2010-11', 0.707),
    ('MIA', '2011-12', 0.697),
    ('MIA', '2012-13', 0.805),
    ('MIA', '2013-14', 0.659),
    ('MIA', '2014-15', 0.451),
    ('MIA', '2015-16', 0.585),
    ('MIA', '2016-17', 0.500),
    
    ('CLE', '2011-12', 0.318),
    ('CLE', '2012-13', 0.293),
    ('CLE', '2013-14', 0.402),
    ('CLE', '2014-15', 0.646),
    ('CLE', '2015-16', 0.695),
    ('CLE', '2016-17', 0.622),
    ('CLE', '2017-18', 0.610),
    ('CLE', '2018-19', 0.232),
    ('CLE', '2019-20', 0.292),
    ('CLE', '2020-21', 0.306),


    ('LAL', '2015-16', 0.207),
    ('LAL', '2016-17', 0.317),
    ('LAL', '2017-18', 0.427),
    ('LAL', '2018-19', 0.451),
    ('LAL', '2019-20', 0.732),
    ('LAL', '2020-21', 0.680),
    ('LAL', '2021-22', 0.402),
    ('LAL', '2022-23', 0.524),
    ('LAL', '2023-24', 0.573)
]

dfAUX3 = pd.DataFrame(data, columns=['Team', 'Season', 'W_PCT'])

dfAUX3.set_index(['Team'], inplace=True)

dfAUX3.to_pickle("../creazione_plot/LeBron_Win%.pickle")
display(dfAUX3)

Unnamed: 0_level_0,Season,W_PCT
Team,Unnamed: 1_level_1,Unnamed: 2_level_1
CLE,2003-04,0.427
CLE,2004-05,0.512
CLE,2005-06,0.61
CLE,2006-07,0.61
CLE,2007-08,0.549
CLE,2008-09,0.805
CLE,2009-10,0.744
MIA,2010-11,0.707
MIA,2011-12,0.697
MIA,2012-13,0.805
