<a href="https://colab.research.google.com/github/soniagormar/EDA_NBA_Shots_23-24/blob/main/EDA_NBA_shots.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

In [None]:
import pandas as pd
import plotly.express as px

**Analiza danych 'NBA Shots'**

Analiza została przewprowadzona na zbiorze danych, który został pobrany ze strony [kaggle.com](https://www.kaggle.com/datasets/mexwell/nba-shots?resource=download).

Zbiór zawiera dane dotyczące rzutów, które oddali zawodnicy we wszystkich oficjalnych spotkaniach NBA od sezonu 2003/04 do sezonu 2023/24.
Analiza dotyczy informacji z sezonu 2023/2024.

\\
**Data Dictionary**


*   SEASON_1, SEASON_2 - wskaźniki sezonu
*   TEAM_ID, TEAM_NAME - numer id zespołu i jego nazwa
*   PLAYER_ID, PLAYER_NAME - numer id gracza i jego imię i nazwisko
*   POSITION_GROUP, POSITION - informacja o pozycji gracza, na której gra
*   GAME_DATE, GAME_ID - data i numer id meczu
*   HOME_TEAM - kod zespołu Gospodarzy
*   AWAY_TEAM - kod zespołu Gości
*   EVENT_TYPE - informacja o typie rzutu (Missed Shot/Made Shot)
*   SHOT_MADE - informacja o skuteczności rzutu (TRUE/FALSE)
*   ACTION_TYPE - rodzaj rzutu, który został wykonany (layup, dunk, jump shot, etc.)
*   SHOT_TYPE - informacja o potencjalnej liczbie punktów jaką gracz mógł uzyskać za wykonany rzut (2PT or 3PT)
*   BASIC_ZONE - nazwa strefy, z której wykonano rzut (Restricted Area, In the Paint (non-RA), Midrange, Left Corner 3, Right Corner 3, Above the Break, Backcourt)
*   ZONE_NAME - informacja, z której strony boiska oddano rzut (left, left side center, center, right side center, right)
*   ZONE_ABB - skrót nazwy z kolumny ZONE_NAME ((L), (LC), (C), (RC), (R))
*   ZONE_RANGE - zakres odległości z jakiej oddano rzut (Less than 8 ft., 8-16 ft. 16-24 ft. 24+ ft.)
*   LOC_X, LOC_Y - dokładne współrzędne punktu na boisku, z którego oddany został rzut
*   SHOT_DISTANCE - odległość z jakiej oddano rzut (liczona od kosza, wyrażona w ft.)
*   QUARTER - kwarta, w której oddany został rzut
*   MINS_LEFT, SECS_LEFT - czas ozostały do końca kwarty, w której oddano rzut
*   SHOT_VALUE - dodana kolumna z wartością rzutu





In [None]:
# Importowanie danych, weryfikacja typów danych, ewentualnych braków
df = pd.read_csv('NBA_2024_Shots.csv', parse_dates = ['GAME_DATE'])
df['SHOT_VALUE'] = df['SHOT_TYPE'].apply(lambda x: 3 if '3PT Field Goal' in x else 2)
df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 218701 entries, 0 to 218700
Data columns (total 27 columns):
 #   Column          Non-Null Count   Dtype         
---  ------          --------------   -----         
 0   SEASON_1        218701 non-null  int64         
 1   SEASON_2        218701 non-null  object        
 2   TEAM_ID         218701 non-null  int64         
 3   TEAM_NAME       218701 non-null  object        
 4   PLAYER_ID       218701 non-null  int64         
 5   PLAYER_NAME     218701 non-null  object        
 6   POSITION_GROUP  217437 non-null  object        
 7   POSITION        217437 non-null  object        
 8   GAME_DATE       218701 non-null  datetime64[ns]
 9   GAME_ID         218701 non-null  int64         
 10  HOME_TEAM       218701 non-null  object        
 11  AWAY_TEAM       218701 non-null  object        
 12  EVENT_TYPE      218701 non-null  object        
 13  SHOT_MADE       218701 non-null  bool          
 14  ACTION_TYPE     218701 non-null  obj

In [None]:
df.columns

In [None]:
SEASON_1 = "SEASON_1"

In [None]:
df.head()

Unnamed: 0,SEASON_1,SEASON_2,TEAM_ID,TEAM_NAME,PLAYER_ID,PLAYER_NAME,POSITION_GROUP,POSITION,GAME_DATE,GAME_ID,...,BASIC_ZONE,ZONE_NAME,ZONE_ABB,ZONE_RANGE,LOC_X,LOC_Y,SHOT_DISTANCE,QUARTER,MINS_LEFT,SECS_LEFT
0,2024,2023-24,1610612764,Washington Wizards,1629673,Jordan Poole,G,SG,2023-11-03,22300003,...,In The Paint (Non-RA),Center,C,8-16 ft.,-0.4,17.45,12,1,11,1
1,2024,2023-24,1610612764,Washington Wizards,1630166,Deni Avdija,F,SF,2023-11-03,22300003,...,Above the Break 3,Center,C,24+ ft.,1.5,30.55,25,1,10,26
2,2024,2023-24,1610612764,Washington Wizards,1626145,Tyus Jones,G,PG,2023-11-03,22300003,...,Restricted Area,Center,C,Less Than 8 ft.,-3.3,6.55,3,1,9,46
3,2024,2023-24,1610612764,Washington Wizards,1629673,Jordan Poole,G,SG,2023-11-03,22300003,...,Restricted Area,Center,C,Less Than 8 ft.,-1.0,5.85,1,1,8,30
4,2024,2023-24,1610612764,Washington Wizards,1626145,Tyus Jones,G,PG,2023-11-03,22300003,...,Restricted Area,Center,C,Less Than 8 ft.,-0.0,6.25,1,1,8,8


In [None]:
players_shots = df.groupby(['PLAYER_NAME', 'ACTION_TYPE']).size().unstack(fill_value=0)
players_shots['FAVOURED_SHOT_TECHNIQUE'] = players_shots.idxmax(axis=1)
players_shots.reset_index()

ACTION_TYPE,PLAYER_NAME,Alley Oop Dunk Shot,Alley Oop Layup shot,Cutting Dunk Shot,Cutting Finger Roll Layup Shot,Cutting Layup Shot,Driving Bank Hook Shot,Driving Dunk Shot,Driving Finger Roll Layup Shot,Driving Floating Bank Jump Shot,...,Step Back Jump shot,Tip Dunk Shot,Tip Layup Shot,Turnaround Bank Hook Shot,Turnaround Bank shot,Turnaround Fadeaway Bank Jump Shot,Turnaround Fadeaway shot,Turnaround Hook Shot,Turnaround Jump Shot,FAVOURED_SHOT_TECHNIQUE
0,A.J. Lawson,0,2,1,0,6,0,2,5,1,...,0,2,4,0,0,0,0,0,0,Jump Shot
1,AJ Green,0,0,0,1,4,0,0,2,0,...,9,0,0,0,0,0,0,0,2,Jump Shot
2,AJ Griffin,0,0,1,0,0,0,0,2,1,...,8,0,0,0,0,0,0,0,0,Jump Shot
3,Aaron Gordon,43,10,35,1,13,0,23,13,4,...,20,5,67,0,1,4,15,4,3,Jump Shot
4,Aaron Holiday,0,0,0,0,0,1,1,17,3,...,15,0,1,0,0,0,8,2,2,Jump Shot
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
563,Zach LaVine,1,1,1,0,4,3,5,11,6,...,58,0,0,0,0,0,5,0,1,Jump Shot
564,Zavier Simpson,0,0,0,0,0,4,0,12,0,...,2,0,0,0,0,0,1,0,0,Driving Finger Roll Layup Shot
565,Zeke Nnaji,1,1,5,1,22,0,2,2,0,...,2,4,18,0,0,0,1,0,1,Jump Shot
566,Ziaire Williams,10,1,10,1,4,1,1,13,6,...,20,2,5,0,0,0,2,0,6,Jump Shot


In [None]:
# UWAGA! Dane nie uwzględniają transferu graczy

players_accuracy = df.groupby(['PLAYER_NAME', 'EVENT_TYPE']).size().unstack(fill_value=0)
players_accuracy['SHOTS_TAKEN'] = players_accuracy['Made Shot'] + players_accuracy['Missed Shot']
players_accuracy['ACCURACY %'] = ((players_accuracy['Made Shot'] / (players_accuracy['SHOTS_TAKEN']))*100).round(2)

#players_accuracy = players_accuracy.sort_values(by='SHOTS_TAKEN', ascending=False)
players_accuracy = players_accuracy.sort_values(by='PLAYER_NAME')
#players_accuracy = players_accuracy.sort_values(by='ACCURACY %', ascending=False)

players_accuracy = players_accuracy.merge(players_shots[['FAVOURED_SHOT_TECHNIQUE']], on='PLAYER_NAME', how='left')

players_accuracy.head(10)


Unnamed: 0_level_0,Made Shot,Missed Shot,SHOTS_TAKEN,ACCURACY %,FAVOURED_SHOT_TECHNIQUE
PLAYER_NAME,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1
A.J. Lawson,54,67,121,44.63,Jump Shot
AJ Green,83,113,196,42.35,Jump Shot
AJ Griffin,18,44,62,29.03,Jump Shot
Aaron Gordon,398,318,716,55.59,Jump Shot
Aaron Holiday,186,231,417,44.6,Jump Shot
Aaron Nesmith,315,320,635,49.61,Jump Shot
Aaron Wiggins,212,165,377,56.23,Jump Shot
Adam Flagler,1,6,7,14.29,Pullup Jump shot
Adama Sanogo,14,13,27,51.85,Layup Shot
Admiral Schofield,10,16,26,38.46,Jump Shot


In [None]:
fig = px.scatter(players_accuracy,
                 x = 'SHOTS_TAKEN',
                 y = 'ACCURACY %',
                 title = 'Rozkład skuteczności zawodników w zależności od liczby oddanych rzutów',
                 labels = {'ACCURACY %' : 'Skuteczność', 'SHOTS_TAKEN' : 'Liczba oddanych rzutów'},
                 )

mean_accuracy = players_accuracy['ACCURACY %'].mean()

fig.add_hline(
    y= mean_accuracy,
    line_dash="dash",
    line_color="red",
    annotation_text= f"Średnia skuteczność {mean_accuracy:.2f}",
    annotation_position='bottom right'
    )

fig.show()

In [None]:
made_shots = df[df['SHOT_MADE'] == 1]
players_points = made_shots.groupby('PLAYER_NAME')['SHOT_VALUE'].sum().reset_index()
players_points = players_points.rename(columns={'SHOT_VALUE': 'TOTAL_POINTS'})
players_accuracy = players_accuracy.merge(players_points[['PLAYER_NAME', 'TOTAL_POINTS']], on='PLAYER_NAME', how='left')


In [None]:
fig = px.scatter(players_accuracy,
                 y='TOTAL_POINTS',
                 x='ACCURACY %',
                 title='Rozkład liczby uzyskanych punktów w zależności od skuteczności zawodnika',
                 labels={'ACCURACY %' : 'Skuteczność', 'TOTAL_POINTS' : 'Suma uzyskanych punktów w sezonie'},
                 )

fig.add_vline(
    x= mean_accuracy,
    line_dash="dash",
    line_color="red",
    annotation_text=f"Średnia skuteczność {mean_accuracy:.2f}",
    annotation_position='bottom right'
    )

fig.show()


In [None]:
fig = px.scatter(players_accuracy,
                 y = 'TOTAL_POINTS',
                 x = 'SHOTS_TAKEN',
                 title = 'Rozkład liczby uzyskanych punktów w zależności od liczby oddanych przez zawodnika rzutów',
                 labels = {'SHOTS_TAKEN' : 'Liczba oddanych rzutów', 'TOTAL_POINTS' : 'Suma uzyskanych punktów w sezonie', 'ACCURACY %': "Skuteczność"},
                 color = 'ACCURACY %'
                 )

fig.show()


In [None]:
shots_data = df.groupby(['ACTION_TYPE', 'EVENT_TYPE']).size().unstack(fill_value=0)

shots_data['COUNT'] = shots_data['Missed Shot'] + shots_data['Made Shot']

shots_data['ACCURACY'] = ((shots_data['Made Shot']/shots_data['COUNT'])*100).round(2)

shots_data = shots_data.sort_values(by= 'COUNT', ascending = False)

shots_data = shots_data.reset_index()

In [None]:
fig = px.bar(
    shots_data[shots_data['COUNT']>=1000],
    x='ACTION_TYPE',
    y='COUNT',
    title='Zestawienie liczby rzutów oddanych w odpowiedniej technice z informacją dotyczącą skuteczności rzutów',
    labels={'ACTION_TYPE': 'Rodzaj rzutu', 'COUNT': 'Liczba rzutów', 'ACCURACY' : 'Skuteczność [%]'},
    color = 'ACCURACY'
)

fig.show()

In [None]:
shots_data = shots_data.sort_values(by= 'COUNT', ascending = False)

fig = px.bar(shots_data,
                 y = 'ACCURACY',
                 x = 'ACTION_TYPE',
                 title = 'Wykres skuteczności w zależności od techniki rzutu',
                 labels = {'ACCURACY' : 'Skuteczność [%]', 'ACTION_TYPE' : 'Rodzaj rzutu'},
                 )

fig.add_hline(
    y= mean_accuracy,
    line_dash="dash",
    line_color="red",
    annotation_text= f"Średnia skuteczność {mean_accuracy:.2f}",
    annotation_position='top left'
    )

fig.show()

In [155]:
fig = px.scatter(df,
                 x = 'LOC_X',
                 y = 'LOC_Y',
                 color = 'EVENT_TYPE',
                 labels ={'LOC_X' : "Współrzędna x", 'LOC_Y' : "Współrzędna y"},
                 opacity = 0.8,
                 size_max = 10
                 )

fig.update_layout(
    title = 'Graficzne przedstawienie miejsc, z których oddawane były rzuty <br> z uwzględnieniem informacji o skuteczności rzutu',
    title_xanchor='left',
    title_yanchor='top'
)


import base64
from IPython.display import Image

image_path = "NBA_court.jpg"
with open(image_path, "rb") as image_file:
    encoded_image = base64.b64encode(image_file.read()).decode()



fig.update_layout(
    width=911,
    height= 1450,
    legend_title_text='Typ zdarzenia'
)


# Dodanie obrazu boiska jako tła
fig.update_layout(
    images=[dict(
        source=f"data:image/jpg;base64,{encoded_image}",
        xref="x",
        yref="y",
        x=-25,
        y= 94,
        sizex = 50,
        sizey = 94,
        sizing="stretch",
        opacity=0.5,
        layer="below")
    ]
)


fig.update_traces(marker=dict(size=3))


fig.show()

In [None]:
points = df.groupby(['SHOT_VALUE','SHOT_MADE']).size().unstack(fill_value=0).reset_index()
points['ACCURACY'] = ((points[1] / (points[0] + points[1]))*100).round(2)
points['POINTS_MADE'] = points['SHOT_VALUE'] * points[1]
points

SHOT_MADE,SHOT_VALUE,False,True,ACCURACY,POINTS_MADE
0,2,60186,72160,54.52,144320
1,3,54776,31579,36.57,94737


According to above table: Despite accuracy being much smaller, taking an attempt at 3P shot is "worth it"

In [157]:
#Badanie wpływu lokalizacji(sektor,odległość, sektor+odległość (% jako kolor))
loc_shot_data = df[df['SHOT_MADE'] == 1].groupby('BASIC_ZONE').size().reset_index(name = 'SHOT')
loc_shot_data_total = df.groupby('BASIC_ZONE').size().reset_index(name = 'TOTAL')
loc_shot_data = loc_shot_data.merge(loc_shot_data_total, on = 'BASIC_ZONE')
loc_shot_data['ACCURACY'] = ((loc_shot_data['SHOT']/loc_shot_data['TOTAL'])*100).round(2)
loc_shot_data

Unnamed: 0,BASIC_ZONE,SHOT,TOTAL,ACCURACY
0,Above the Break 3,22939,63770,35.97
1,Backcourt,8,433,1.85
2,In The Paint (Non-RA),18982,43089,44.05
3,Left Corner 3,4448,11523,38.6
4,Mid-Range,10293,24589,41.86
5,Restricted Area,42884,64669,66.31
6,Right Corner 3,4185,10628,39.38


In [158]:
loc_shot_data = loc_shot_data.sort_values(by= 'ACCURACY', ascending = False)

fig = px.bar(loc_shot_data,
             x = 'BASIC_ZONE',
             y = 'TOTAL',
             color = 'ACCURACY',
             labels = {'BASIC_ZONE' : "Strefa rzutu", 'TOTAL': 'Liczba rzutów', 'ACCURACY' : 'Skuteczność'},
             title = 'Porównanie liczby rzutów z informacją o skuteczności tych rzutów'
             )

fig.show()

In [None]:
shot_distance = df.groupby(['SHOT_DISTANCE', 'SHOT_MADE']).size().unstack(fill_value=0)
shot_distance = shot_distance.reset_index()
shot_distance.columns = ['SHOT_DISTANCE','Missed', 'Made']
shot_distance['TOTAL'] = shot_distance['Made'] + shot_distance['Missed']
shot_distance['ACCURACY'] = ((shot_distance['Made'] / shot_distance['TOTAL']) * 100).round(2)

fig = px.scatter(shot_distance,
                 x= 'SHOT_DISTANCE',
                 y= 'TOTAL',
                 labels ={'TOTAL' : 'Liczba rzutów', 'SHOT_DISTANCE' : 'Odległość'},
                 title = 'Rozkład liczby rzutów oddawanych z poszczególnych odległości, z oznaczeniem ',
                 color = 'ACCURACY')


fig.add_vrect(x0= 21.98, x1= 23.75,
              annotation_text="Linia 3 punktów", annotation_position="top left",
              fillcolor="green", opacity=0.20, line_width=0)

fig.show()

In [166]:
teams_accuracy = df.groupby(['TEAM_NAME', 'EVENT_TYPE']).size().unstack(fill_value=0)
teams_accuracy['ACCURACY'] = (teams_accuracy['Made Shot'] / (teams_accuracy['Made Shot'] + teams_accuracy['Missed Shot']))
teams_accuracy = (teams_accuracy['ACCURACY'] * 100).round(2)
teams_accuracy = teams_accuracy.reset_index()
teams_accuracy = teams_accuracy.sort_values(by= 'ACCURACY')

fig = px.bar(teams_accuracy,
             x='TEAM_NAME',
             y='ACCURACY',
             title='Team accuracy in 2023/2024 season',
             labels={'ACCURACY': 'Accuracy [%]', 'TEAM_NAME': 'Team name'}
             )

teams_accuracy = teams_accuracy.sort_values(by= 'ACCURACY')

fig.show()



As shown above, there are no big differences between accuracy if we're looking at team performance.

Therefore there will be no furthere analysis in this direction.