#Lebron James Shots Analysis
In this notebook, we will create a dataframe of Lebron James' shots from his career which will be used in the following notebooks to analyze his shots.

## Importing Libraries
The first thing to do is to import the necessary libraries and the data on which we will work.
The dataset can be found here: https://www.kaggle.com/datasets/eduvadillo/lebron-james-career-shots

In [6]:
import pandas as pd
import numpy as np
import requests
from bs4 import BeautifulSoup

df = pd.read_csv('lebron_shot_data.csv', encoding='utf8')

Let's start by cleaning the dataframe leaving only the necessary information.

## Data Cleaning
Firstly we will remove the columns that are not necessary for the analysis.
Then we rename the columns to make it easier to understand the data.
Latelly we invert asciss because the original data is formatted for a mirrored basketball court in relation with the one i will plot.

In [7]:
df = pd.read_csv('lebron_shot_data.csv', encoding='utf8')

df = df.drop(['GAME_ID', 'GAME_EVENT_ID', 'PLAYER_ID', 'PLAYER_NAME', 'TEAM_ID', 'EVENT_TYPE', 'GAME_DATE', 'HTM', 'VTM', 'SHOT_ZONE_RANGE', 'SHOT_TYPE', 'ACTION_TYPE'], axis=1)
df = df.rename(columns={"TEAM_NAME": "Team", "PERIOD": "Period", "MINUTES_REMAINING": "Minutes_Remaining", "SHOT_MADE_FLAG": "FG", "SHOT_ATTEMPTED_FLAG": "FGA", "SHOT_ZONE_BASIC": "Zone", "SHOT_ZONE_AREA": "Area", "LOC_X": "X", "LOC_Y": "Y", "SHOT_DISTANCE": "Distance", "SEASON": "Season"})
#inverto la dx con la sx per avere la rappresentazione corretta
#dati sono specchiati perchè la rappresentazione è specchiata rispetto all'asse y
df['X'] = -df['Y']

#trasformo i left center e rigth center in left e right

df['Area'] = df['Area'].replace(['Left Side Center(LC)', 'Right Side Center(RC)'], ['Left Side(L)', 'Right Side(R)'])


#remove not significant data
#remove shots from more than 30 feet
df = df[df['Distance'] <= 30]
#remove shots from the last 3 minutes of the game
df = df[(df['Period'] < 4) | (df['Minutes_Remaining'] <= 3)]

df = df.drop(['Period', 'Minutes_Remaining', 'SECONDS_REMAINING'], axis=1)



df  = df.set_index(['Team'])
df.to_pickle("../creazione_plot/LeBronShots.pickle")
display(df)

Unnamed: 0_level_0,Zone,Area,Distance,X,Y,FGA,FG,Season
Team,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1
Cleveland Cavaliers,Mid-Range,Right Side(R),15,0,0,1,1,2003-04
Cleveland Cavaliers,Mid-Range,Left Side(L),13,-2,2,1,1,2003-04
Cleveland Cavaliers,Mid-Range,Right Side(R),16,-5,5,1,1,2003-04
Cleveland Cavaliers,Mid-Range,Left Side(L),14,-92,92,1,0,2003-04
Cleveland Cavaliers,In The Paint (Non-RA),Center(C),5,-22,22,1,1,2003-04
...,...,...,...,...,...,...,...,...
Los Angeles Lakers,In The Paint (Non-RA),Center(C),9,-91,91,1,1,2023-24
Los Angeles Lakers,Restricted Area,Center(C),1,-2,2,1,1,2023-24
Los Angeles Lakers,Restricted Area,Center(C),3,-1,1,1,1,2023-24
Los Angeles Lakers,Above the Break 3,Center(C),26,-269,269,1,1,2023-24


Get the data of Lebron James' advanced stats during his career (regular season).

In [25]:
df_advanced = pd.read_csv('advanced.csv', encoding='utf8')

#drop unnamed column
df_advanced = df_advanced.drop(df_advanced.columns[df_advanced.columns.str.contains('unnamed',case = False)],axis = 1)

#drop age, Lg, Pos , MP, TS%, 3PAr, FTr, ORB%, DRB%, TRB%, AST%, STL%, BLK%, TOV%, DWS, WS, WS/48, OBPM, DBPM, BPM, VORP

df_advanced = df_advanced.drop(['Age', 'Lg', 'Pos', 'MP', 'TS%', '3PAr', 'FTr', 'ORB%', 'DRB%', 'TRB%', 'AST%', 'STL%', 'BLK%', 'TOV%', 'DWS', 'WS', 'WS/48', 'OBPM', 'DBPM', 'BPM', 'VORP'], axis=1)

#rename columns, set team, season and as index

df_advanced = df_advanced.rename(columns={'Tm' : 'Team', 'G' : 'GP', 'USG%' : 'USG'})

df_advanced = df_advanced.set_index(['Team', 'Season'])

display(df_advanced)


Unnamed: 0_level_0,Unnamed: 1_level_0,GP,PER,USG,OWS
Team,Season,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1
CLE,2003-04,79,18.3,28.2,2.4
CLE,2004-05,80,25.7,29.7,9.7
CLE,2005-06,79,28.1,33.6,12.0
CLE,2006-07,78,24.5,31.0,8.0
CLE,2007-08,75,29.1,33.5,10.7
CLE,2008-09,81,31.7,33.8,13.7
CLE,2009-10,76,31.1,33.5,13.3
MIA,2010-11,79,27.3,31.5,10.3
MIA,2011-12,62,30.7,32.0,10.0
MIA,2012-13,76,31.6,30.2,14.6


In [22]:
# Dati forniti
data = {
    'Season': [
        '2003-04', '2004-05', '2005-06', '2006-07', '2007-08', '2008-09',
        '2009-10', '2010-11', '2011-12', '2012-13', '2013-14', '2014-15',
        '2015-16', '2016-17', '2017-18', '2018-19', '2019-20', '2020-21',
        '2021-22', '2022-23', '2023-24'
    ],
    'Team': [
        'CLE', 'CLE', 'CLE', 'CLE', 'CLE', 'CLE', 'CLE', 
             'MIA', 'MIA', 'MIA', 'MIA', 
             'CLE', 'CLE', 'CLE', 'CLE', 
             'LAL', 'LAL', 'LAL', 'LAL', 'LAL', 'LAL'],
             
    'OFR_L': [
        101.8, 107.4, 108.6, 105.7, 107.7, 113.2, 112.8,
        112.8, 108.9, 113.6, 112.4,
        114.2, 112.1, 116.0, 113.5,
        110.1, 112.9, 112.2, 112.0, 116.1, 116.8],
    'OFR_T': [
        108.6, 112.1, 104.5, 116.1, 95.7, 118.7, 103.5,
        109.4, 86.8, 110.4, 106.8,
        99.7, 105.0, 97.3, np.nan,
        105.0, 109.8, 107.7, 109.0, 113.1, 116.3]
}

# Creazione del DataFrame
df_Aux2 = pd.DataFrame(data)

# Reorganizzazione delle colonne
df_Aux2 = df_Aux2[['Season', 'GP', 'Team', 'OFR_L', 'OFR_T']]



df_Teams_Analysis = df_Aux2.join(df_advanced.set_index('Season'), on='Season'

df_Teams_Analysis = df_Teams_Analysis.set_index(['Team'])

# Visualizzazione del DataFrame
display(df_Teams_Analysis)

df_Teams_Analysis.to_pickle("../creazione_plot/LeBronStats.pickle")

ValueError: You are trying to merge on object and int64 columns for key 'Season'. If you wish to proceed you should use pd.concat