# Basketball Playoffs

Basketball tournaments are usually split in two parts. First, all teams play each other aiming to achieve the greatest number of wins possible. Then, at the end of the first part of the season, a pre determined number of teams which were able to win the most games are qualified to the playoff season, where they play series of knock-out matches for the trophy.

For the 10 years, data from players, teams, coaches, games and several other metrics were gathered and arranged on this dataset. The goal is to use this data to predict which teams will qualify for the playoffs in the next season.

In [48]:
# Imports
import pandas as pd
import numpy as np
import math
import matplotlib.pyplot as plt
import seaborn as sb
from sklearn.preprocessing import LabelEncoder
from sklearn.model_selection import train_test_split
from sklearn.metrics import ConfusionMatrixDisplay, accuracy_score, classification_report, confusion_matrix

## Data Preprocess

## Exploratory Data Analysis

In [49]:
df_teams = pd.read_csv('dataset/teams.csv')
pd.set_option('display.max_rows', None)
df_teams.head()

df_teams.isna().sum()

year            0
lgID            0
tmID            0
franchID        0
confID          0
divID         142
rank            0
playoff         0
seeded          0
firstRound     62
semis         104
finals        122
name            0
o_fgm           0
o_fga           0
o_ftm           0
o_fta           0
o_3pm           0
o_3pa           0
o_oreb          0
o_dreb          0
o_reb           0
o_asts          0
o_pf            0
o_stl           0
o_to            0
o_blk           0
o_pts           0
d_fgm           0
d_fga           0
d_ftm           0
d_fta           0
d_3pm           0
d_3pa           0
d_oreb          0
d_dreb          0
d_reb           0
d_asts          0
d_pf            0
d_stl           0
d_to            0
d_blk           0
d_pts           0
tmORB           0
tmDRB           0
tmTRB           0
opptmORB        0
opptmDRB        0
opptmTRB        0
won             0
lost            0
GP              0
homeW           0
homeL           0
awayW           0
awayL     

In [50]:
df_teams.replace("", np.nan, inplace=True)
df_teams.dropna(axis=1, how='all', inplace=True)

df_teams.head()

Unnamed: 0,year,lgID,tmID,franchID,confID,rank,playoff,seeded,firstRound,semis,...,GP,homeW,homeL,awayW,awayL,confW,confL,min,attend,arena
0,9,WNBA,ATL,ATL,EA,7,N,0,,,...,34,1,16,3,14,2,18,6825,141379,Philips Arena
1,10,WNBA,ATL,ATL,EA,2,Y,0,L,,...,34,12,5,6,11,10,12,6950,120737,Philips Arena
2,1,WNBA,CHA,CHA,EA,8,N,0,,,...,32,5,11,3,13,5,16,6475,90963,Charlotte Coliseum
3,2,WNBA,CHA,CHA,EA,4,Y,0,W,W,...,32,11,5,7,9,15,6,6500,105525,Charlotte Coliseum
4,3,WNBA,CHA,CHA,EA,2,Y,0,L,,...,32,11,5,7,9,12,9,6450,106670,Charlotte Coliseum


In [51]:
df_players = pd.read_csv('dataset/players.csv')
df_players.head()

Unnamed: 0,bioID,pos,firstseason,lastseason,height,weight,college,collegeOther,birthDate,deathDate
0,abrahta01w,C,0,0,74.0,190,George Washington,,1975-09-27,0000-00-00
1,abrossv01w,F,0,0,74.0,169,Connecticut,,1980-07-09,0000-00-00
2,adairje01w,C,0,0,76.0,197,George Washington,,1986-12-19,0000-00-00
3,adamsda01w,F-C,0,0,73.0,239,Texas A&M,Jefferson College (JC),1989-02-19,0000-00-00
4,adamsjo01w,C,0,0,75.0,180,New Mexico,,1981-05-24,0000-00-00


In [52]:
df_coaches = pd.read_csv('dataset/coaches.csv')
df_coaches.head()

Unnamed: 0,coachID,year,tmID,lgID,stint,won,lost,post_wins,post_losses
0,adamsmi01w,5,WAS,WNBA,0,17,17,1,2
1,adubari99w,1,NYL,WNBA,0,20,12,4,3
2,adubari99w,2,NYL,WNBA,0,21,11,3,3
3,adubari99w,3,NYL,WNBA,0,18,14,4,4
4,adubari99w,4,NYL,WNBA,0,16,18,0,0


In [53]:
df_players_teams = pd.read_csv('dataset/players_teams.csv')
df_awards_players = pd.read_csv('dataset/awards_players.csv')
df_series_post = pd.read_csv('dataset/series_post.csv')
df_teams_post = pd.read_csv('dataset/teams_post.csv')