# Estatisticas das partidas
Neste notebook serão coletadas todas as estatisticas de todas as partidas da temporada regular da NBA das temporadas de 2001-02 até 2023-24.

Para isso será utilizado a `nba_api`: https://github.com/swar/nba_api

## Instalando o pacote via pip

In [1]:
!pip install nba_api



## Importando as bibliotecas

In [2]:
import pandas as pd
pd.set_option('display.max_columns', None)

from nba_api.stats.endpoints import leaguegamelog

## Coleta dos Dados

O código busca dados de jogos da NBA para várias temporadas e armazena os resultados em DataFrames.

In [3]:
# Selecionar as temporadas de 2001-02 até 2023-24
seasons = []
for year in range(2001, 2024):
    season = f"{year}-{str(year+1)[-2:]}"
    seasons.append(season)
print(seasons)

['2001-02', '2002-03', '2003-04', '2004-05', '2005-06', '2006-07', '2007-08', '2008-09', '2009-10', '2010-11', '2011-12', '2012-13', '2013-14', '2014-15', '2015-16', '2016-17', '2017-18', '2018-19', '2019-20', '2020-21', '2021-22', '2022-23', '2023-24']


In [4]:
# Lista para armazenar os DataFrames de cada temporada
dfs = []

for season in seasons:
    gamelog = leaguegamelog.LeagueGameLog(season=season, season_type_all_star='Regular Season').get_data_frames()[0]
    dfs.append(gamelog)

# Combina todos os DataFrames em um único DataFrame
nba_leaguegamelog = pd.concat(dfs, ignore_index=True)
nba_leaguegamelog

Unnamed: 0,SEASON_ID,TEAM_ID,TEAM_ABBREVIATION,TEAM_NAME,GAME_ID,GAME_DATE,MATCHUP,WL,MIN,FGM,FGA,FG_PCT,FG3M,FG3A,FG3_PCT,FTM,FTA,FT_PCT,OREB,DREB,REB,AST,STL,BLK,TOV,PF,PTS,PLUS_MINUS,VIDEO_AVAILABLE
0,22001,1610612739,CLE,Cleveland Cavaliers,0020100001,2001-10-30,CLE vs. BOS,L,240,39,90,0.433,3,17,0.176,8,10,0.800,7,29,36,30,3,8,14,21,89,-19,0
1,22001,1610612762,UTA,Utah Jazz,0020100009,2001-10-30,UTA vs. MIL,L,265,43,81,0.531,5,16,0.313,21,25,0.840,11,30,41,36,8,4,21,24,112,-7,0
2,22001,1610612757,POR,Portland Trail Blazers,0020100012,2001-10-30,POR @ LAL,L,240,31,73,0.425,4,12,0.333,21,27,0.778,5,37,42,17,10,7,15,28,87,-11,0
3,22001,1610612745,HOU,Houston Rockets,0020100007,2001-10-30,HOU vs. ATL,W,265,33,84,0.393,4,18,0.222,19,28,0.679,14,38,52,17,10,5,18,21,89,5,0
4,22001,1610612737,ATL,Atlanta Hawks,0020100007,2001-10-30,ATL @ HOU,L,265,30,83,0.361,3,14,0.214,21,27,0.778,10,37,47,15,10,8,17,26,84,-5,0
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
55207,22023,1610612749,MIL,Milwaukee Bucks,0022301191,2024-04-14,MIL @ ORL,L,240,30,75,0.400,7,27,0.259,21,23,0.913,7,27,34,16,10,4,17,18,88,-25,1
55208,22023,1610612748,MIA,Miami Heat,0022301189,2024-04-14,MIA vs. TOR,W,240,46,87,0.529,9,36,0.250,17,19,0.895,13,33,46,29,13,7,19,16,118,15,1
55209,22023,1610612761,TOR,Toronto Raptors,0022301189,2024-04-14,TOR @ MIA,L,240,38,90,0.422,9,34,0.265,18,21,0.857,13,28,41,23,6,3,19,21,103,-15,1
55210,22023,1610612737,ATL,Atlanta Hawks,0022301188,2024-04-14,ATL @ IND,L,240,39,89,0.438,12,36,0.333,25,27,0.926,9,23,32,25,6,5,17,12,115,-42,1


### Renomeação e Limpeza de Colunas:
O código renomeia colunas, remove colunas redundantes e faz algumas substituições de nomes de times.

In [5]:
nba_leaguegamelog  = nba_leaguegamelog.drop(columns=['VIDEO_AVAILABLE'])

In [6]:
nba_leaguegamelog.replace('Charlotte Bobcats', 'Charlotte Hornets', inplace=True)
nba_leaguegamelog.replace('LA Clippers', 'Los Angeles Clippers', inplace=True)
nba_leaguegamelog.replace('New Jersey Nets', 'Brooklyn Nets', inplace=True)
nba_leaguegamelog.replace('NJN', 'BKN', inplace=True)
nba_leaguegamelog['MATCHUP'] = nba_leaguegamelog['MATCHUP'].str.replace('NJN', 'BKN')
nba_leaguegamelog.replace('New Orleans Hornets', 'New Orleans Pelicans', inplace=True)
nba_leaguegamelog.replace('NOH', 'NOP', inplace=True)
nba_leaguegamelog['MATCHUP'] = nba_leaguegamelog['MATCHUP'].str.replace('NOH', 'NOP')
nba_leaguegamelog.replace('Seattle SuperSonics', 'Oklahoma City Thunder', inplace=True)
nba_leaguegamelog.replace('SEA', 'OKC', inplace=True)
nba_leaguegamelog['MATCHUP'] = nba_leaguegamelog['MATCHUP'].str.replace('SEA', 'OKC')
nba_leaguegamelog.replace('New Orleans/Oklahoma City Hornets', 'New Orleans Pelicans', inplace=True)
nba_leaguegamelog.replace('NOK', 'NOP', inplace=True)
nba_leaguegamelog['MATCHUP'] = nba_leaguegamelog['MATCHUP'].str.replace('NOK', 'NOP')
nba_leaguegamelog.replace('Charlotte Hornets', 'Charlotte Hornets', inplace=True)
nba_leaguegamelog.replace('CHH', 'CHA', inplace=True)
nba_leaguegamelog['MATCHUP'] = nba_leaguegamelog['MATCHUP'].str.replace('CHH', 'CHA')

# Alterando o nome da coluna SEASON_ID para SEASON
nba_leaguegamelog.rename(columns={'SEASON_ID': 'SEASON'}, inplace=True)

# Removendo o primeiro caractere '2' da coluna SEASON
nba_leaguegamelog['SEASON'] = nba_leaguegamelog['SEASON'].astype(str).str[1:]

In [7]:
# Convertendo o array NumPy para DataFrame do Pandas
nba_leaguegamelog_df = pd.DataFrame(nba_leaguegamelog)

### Valores ausentes:

Verificar se há valores faltantes e como lidar com eles.

Explicação dos dados ausentes

*   [NBA cancela partida que aconteceria em Boston nesta terça-feira](https://ge.globo.com/basquete/noticia/2013/04/nba-cancela-partida-que-aconteceria-em-boston-nesta-terca-feira.html)
*   [Celtics é primeiro time da NBA a não cobrar lances livres em uma partida](https://ge.globo.com/basquete/nba/noticia/2024/04/11/celtics-e-primeiro-time-da-nba-a-nao-cobrar-lances-livres-em-uma-partida.ghtml)



In [8]:
nba_leaguegamelog[nba_leaguegamelog.isnull().any(axis=1)]

Unnamed: 0,SEASON,TEAM_ID,TEAM_ABBREVIATION,TEAM_NAME,GAME_ID,GAME_DATE,MATCHUP,WL,MIN,FGM,FGA,FG_PCT,FG3M,FG3A,FG3_PCT,FTM,FTA,FT_PCT,OREB,DREB,REB,AST,STL,BLK,TOV,PF,PTS,PLUS_MINUS
28760,2012,1610612754,IND,Indiana Pacers,21201214,2013-04-16,IND @ BOS,,0,0,0,,0,0,,0,0,,0,0,0,0,0,0,0,0,0,0
28761,2012,1610612738,BOS,Boston Celtics,21201214,2013-04-16,BOS vs. IND,,0,0,0,,0,0,,0,0,,0,0,0,0,0,0,0,0,0,0
55118,2023,1610612738,BOS,Boston Celtics,22301148,2024-04-09,BOS @ MIL,L,240,37,93,0.398,17,52,0.327,0,0,,12,26,38,27,11,4,12,8,91,-13


In [9]:
# Subtituir 'NaN' por '0' na linha 55125
nba_leaguegamelog.at[55125, 'FT_PCT'] = 0

# Remover a linha com valores ausentes
nba_leaguegamelog = nba_leaguegamelog.dropna()

### Tipos de Dados
Explicação de cada uma das colunas:

* `SEASON`: Temporada em que o jogo ocorreu.
* `HOME_TEAM_ID`: O ID da equipe da casa.
* `HOME_TEAM_ABBREVIATION`: Abreviação do nome da equipe da casa.
* `HOME_TEAM_NAME`: Nome completo da equipe da casa.
* `GAME_ID`: O ID único do jogo.
* `GAME_DATE`: Data em que o jogo ocorreu.
* `HOME_MATCHUP`: Confronto da equipe da casa contra a equipe adversária.
* `HOME_WL`: Resultado do jogo para a equipe da casa (1 para vitória e 0 para derrota).
* `MIN`: Minutos jogados (geralmente 240 minutos em um jogo regulamentar da NBA).
* `FGM`: Arremessos de campo convertidos.
* `FGA`: Arremessos de campo tentados.
* `FG_PCT`: Porcentagem de arremessos de campo convertidos.
* `FG3M`: Três pontos convertidos.
* `FG3A`: Três pontos tentados.
* `FG3_PCT`: Porcentagem de três pontos convertidos.
* `FTM`: Lances livres convertidos.
* `FTA`: Lances livres tentados.
* `FT_PCT`: Porcentagem de lances livres convertidos.
* `OREB`: Rebotes ofensivos.
* `DREB`: Rebotes defensivos.
* `REB`: Total de rebotes.
* `AST`: Assistências.
* `STL`: Roubos de bola.
* `BLK`: Toques de bloqueio.
* `TOV`: Perdas de bola.
* `PF`: Faltas pessoais.
* `PTS`: Pontos.
* `PLUS_MINUS`: Diferença de pontos (positivo indica que a equipe da casa marcou mais pontos do que o adversário, negativo o contrário).

In [10]:
# Verificar os tipos de dados das colunas
nba_leaguegamelog.info()

<class 'pandas.core.frame.DataFrame'>
Index: 55209 entries, 0 to 55211
Data columns (total 28 columns):
 #   Column             Non-Null Count  Dtype  
---  ------             --------------  -----  
 0   SEASON             55209 non-null  object 
 1   TEAM_ID            55209 non-null  int64  
 2   TEAM_ABBREVIATION  55209 non-null  object 
 3   TEAM_NAME          55209 non-null  object 
 4   GAME_ID            55209 non-null  object 
 5   GAME_DATE          55209 non-null  object 
 6   MATCHUP            55209 non-null  object 
 7   WL                 55209 non-null  object 
 8   MIN                55209 non-null  int64  
 9   FGM                55209 non-null  int64  
 10  FGA                55209 non-null  int64  
 11  FG_PCT             55209 non-null  float64
 12  FG3M               55209 non-null  int64  
 13  FG3A               55209 non-null  int64  
 14  FG3_PCT            55209 non-null  float64
 15  FTM                55209 non-null  int64  
 16  FTA                55209 no

In [11]:
# Criar uma cópia profunda do DataFrame para evitar o warning e modificar a cópia
nba_games = nba_leaguegamelog.copy()

# Converter a coluna 'GAME_DATE' para o formato de data na cópia
nba_games['GAME_DATE'] = pd.to_datetime(nba_games['GAME_DATE'])
nba_games['SEASON'] = nba_games['SEASON'].astype(int)
nba_games['WL'] = nba_games['WL'].map({'W': 1, 'L': 0}).astype('int64')

In [12]:
nba_games.info()

<class 'pandas.core.frame.DataFrame'>
Index: 55209 entries, 0 to 55211
Data columns (total 28 columns):
 #   Column             Non-Null Count  Dtype         
---  ------             --------------  -----         
 0   SEASON             55209 non-null  int64         
 1   TEAM_ID            55209 non-null  int64         
 2   TEAM_ABBREVIATION  55209 non-null  object        
 3   TEAM_NAME          55209 non-null  object        
 4   GAME_ID            55209 non-null  object        
 5   GAME_DATE          55209 non-null  datetime64[ns]
 6   MATCHUP            55209 non-null  object        
 7   WL                 55209 non-null  int64         
 8   MIN                55209 non-null  int64         
 9   FGM                55209 non-null  int64         
 10  FGA                55209 non-null  int64         
 11  FG_PCT             55209 non-null  float64       
 12  FG3M               55209 non-null  int64         
 13  FG3A               55209 non-null  int64         
 14  FG3_PCT    

## Análise Exploratória

In [13]:
nba_games.head()

Unnamed: 0,SEASON,TEAM_ID,TEAM_ABBREVIATION,TEAM_NAME,GAME_ID,GAME_DATE,MATCHUP,WL,MIN,FGM,FGA,FG_PCT,FG3M,FG3A,FG3_PCT,FTM,FTA,FT_PCT,OREB,DREB,REB,AST,STL,BLK,TOV,PF,PTS,PLUS_MINUS
0,2001,1610612739,CLE,Cleveland Cavaliers,20100001,2001-10-30,CLE vs. BOS,0,240,39,90,0.433,3,17,0.176,8,10,0.8,7,29,36,30,3,8,14,21,89,-19
1,2001,1610612762,UTA,Utah Jazz,20100009,2001-10-30,UTA vs. MIL,0,265,43,81,0.531,5,16,0.313,21,25,0.84,11,30,41,36,8,4,21,24,112,-7
2,2001,1610612757,POR,Portland Trail Blazers,20100012,2001-10-30,POR @ LAL,0,240,31,73,0.425,4,12,0.333,21,27,0.778,5,37,42,17,10,7,15,28,87,-11
3,2001,1610612745,HOU,Houston Rockets,20100007,2001-10-30,HOU vs. ATL,1,265,33,84,0.393,4,18,0.222,19,28,0.679,14,38,52,17,10,5,18,21,89,5
4,2001,1610612737,ATL,Atlanta Hawks,20100007,2001-10-30,ATL @ HOU,0,265,30,83,0.361,3,14,0.214,21,27,0.778,10,37,47,15,10,8,17,26,84,-5


In [14]:
nba_games.to_csv("nba_games.csv", index=False)