# NBA Production Predictor - AI Model
This program is designed to take in 5, 7, or 9 players on each team and compare them based on historical stats and our analysis on player production. Essentially the model will determine which team is stronger based past production. Our hopes is that the model can take into account how each player performed against certaint teams and other players in their perspective position, which may be also be a future goal of our model. Thank you for viewing and I hope you enjoy using our NBA Game Predictor Model. 

## Import project dependencies

In [75]:
# Import pandas to read and manipulate data
import pandas as pd
from sklearn.preprocessing import LabelEncoder
from utils import *
import os
from dotenv import load_dotenv

In [76]:
# Using pandas read in each csv project file
regular_season_data = pd.read_csv('data/Regular_Season.csv')
playoff_data = pd.read_csv('data/Playoffs.csv')

In [77]:
player_position.head()

Unnamed: 0,Rk;Player;Pos;Age;Tm;G;GS;MP;FG;FGA;FG%;3P;3PA;3P%;2P;2PA;2P%;eFG%;FT;FTA;FT%;ORB;DRB;TRB;AST;STL;BLK;TOV;PF;PTS;Year
0,1;Alaa Abdelnaby;PF;22;POR;43;0;6.7;1.3;2.7;0....
1,2;Mahmoud Abdul-Rauf;PG;21;DEN;67;19;22.5;6.2;...
2,3;Mark Acres;C;28;ORL;68;0;19.3;1.6;3.1;0.509;...
3,4;Michael Adams;PG;28;DEN;66;66;35.5;8.5;21.5;...
4,5;Mark Aguirre;SF;31;DET;78;13;25.7;5.4;11.7;0...


In [85]:
player_position = pd.read_csv('data/players.csv', encoding='ISO-8859-1', sep=';', usecols=[1, 2])

# Rename the columns to 'Player' and 'Pos'
player_position.columns = ['PLAYER', 'Pos']

# Display the resulting DataFrame
df = pd.DataFrame(player_position)
df



Unnamed: 0,PLAYER,Pos
0,Alaa Abdelnaby,PF
1,Mahmoud Abdul-Rauf,PG
2,Mark Acres,C
3,Michael Adams,PG
4,Mark Aguirre,SF
...,...,...
18039,Delon Wright,PG
18040,Thaddeus Young,PF
18041,Trae Young,PG
18042,Cody Zeller,C


In [None]:
# View the first five columns of each file to ensure they were read in correctly
display(regular_season_data.head())
display(playoff_data.tail())

Unnamed: 0.2,Unnamed: 0.1,Unnamed: 0,year,Season_type,PLAYER_ID,RANK,PLAYER,TEAM_ID,TEAM,GP,...,DREB,REB,AST,STL,BLK,TOV,PF,PTS,AST_TOV,STL_TOV
0,0,0,2012-13,Regular_Season,201142,1,Kevin Durant,1610612760,OKC,81,...,594,640,374,116,105,280,143,2280,1.34,0.41
1,1,1,2012-13,Regular_Season,977,2,Kobe Bryant,1610612747,LAL,78,...,367,433,469,106,25,287,173,2133,1.63,0.37
2,2,2,2012-13,Regular_Season,2544,3,LeBron James,1610612748,MIA,76,...,513,610,551,129,67,226,110,2036,2.44,0.57
3,3,3,2012-13,Regular_Season,201935,4,James Harden,1610612745,HOU,78,...,317,379,455,142,38,295,178,2023,1.54,0.48
4,4,4,2012-13,Regular_Season,2546,5,Carmelo Anthony,1610612752,NYK,67,...,326,460,171,52,32,175,205,1920,0.98,0.3


Unnamed: 0.2,Unnamed: 0.1,Unnamed: 0,year,Season_type,PLAYER_ID,RANK,PLAYER,TEAM_ID,TEAM,GP,...,DREB,REB,AST,STL,BLK,TOV,PF,PTS,AST_TOV,STL_TOV
2571,2571,8830,2023-24,Playoffs,1641765,198,Olivier-Maxence Prosper,1610612742,DAL,3,...,3,3,1,0,0,0,0,0,0.0,0.0
2572,2572,8831,2023-24,Playoffs,1631115,198,Orlando Robinson,1610612748,MIA,1,...,1,1,1,0,0,0,0,0,0.0,0.0
2573,2573,8832,2023-24,Playoffs,203933,198,T.J. Warren,1610612750,MIN,3,...,1,3,1,0,0,0,0,0,0.0,0.0
2574,2574,8833,2023-24,Playoffs,201152,198,Thaddeus Young,1610612756,PHX,1,...,0,0,0,0,0,0,0,0,0.0,0.0
2575,2575,8834,2023-24,Playoffs,203648,198,Thanasis Antetokounmpo,1610612749,MIL,2,...,0,0,0,1,1,0,1,0,0.0,0.0


# Beginning of Preprocessing PipeLine Utils.py
1. combine_dataframes
    - Uses reformat_columns and converts 'Season_type' and 'year' columns
3. encode_categorical_columns
4. convert_to_single_game
5. calculate_shooting_production
6. calculate_offensive_anx
7. calculate_defensive_anx
8. calculate_total_player_production

### 1. Combine the two dataframes 'regular_season_data' and 'playoff_data'

In [None]:
"""
Function Name: combine_dataframes

Parameters: Takes in a [list of dataframes]

Output: A combined dataframe

Description: This function takes a list of DataFrames, reformats each, and 
combines them into one DataFrame along axis 0.

"""
# Put dataframes into a list
list_of_dataframes = [regular_season_data, playoff_data]

# Call 'combine_dataframes' function to combine the list of dataframes
combined_stats = combine_dataframes(list_of_dataframes)
combined_stats.head()

Unnamed: 0,year,Season_type,PLAYER_ID,RANK,PLAYER,TEAM_ID,TEAM,GP,MIN,FGM,...,DREB,REB,AST,STL,BLK,TOV,PF,PTS,AST_TOV,STL_TOV
0,2012,Regular Season,201142,1,Kevin Durant,1610612760,OKC,81,3119,731,...,594,640,374,116,105,280,143,2280,1.34,0.41
1,2012,Regular Season,977,2,Kobe Bryant,1610612747,LAL,78,3013,738,...,367,433,469,106,25,287,173,2133,1.63,0.37
2,2012,Regular Season,2544,3,LeBron James,1610612748,MIA,76,2877,765,...,513,610,551,129,67,226,110,2036,2.44,0.57
3,2012,Regular Season,201935,4,James Harden,1610612745,HOU,78,2985,585,...,317,379,455,142,38,295,178,2023,1.54,0.48
4,2012,Regular Season,2546,5,Carmelo Anthony,1610612752,NYK,67,2482,669,...,326,460,171,52,32,175,205,1920,0.98,0.3


In [88]:
merged_df = pd.merge(df, combined_stats, on='PLAYER')
merged_df

Unnamed: 0,PLAYER,Pos,year,Season_type,PLAYER_ID,RANK,TEAM_ID,TEAM,GP,MIN,...,STL,BLK,TOV,PF,PTS,AST_TOV,STL_TOV,season_type_encoded,player_encoded,team_encoded
0,Gerald Henderson,PG,2012,Regular Season,201945,63,1610612766,CHA,68,2133,...,68,34,108,149,1055,1.64,0.63,0,509,3
1,Gerald Henderson,PG,2013,Regular Season,201945,64,1610612766,CHA,77,2461,...,51,32,113,159,1081,1.76,0.45,0,509,3
2,Gerald Henderson,PG,2014,Regular Season,201945,71,1610612766,CHA,80,2315,...,51,22,110,135,969,1.87,0.46,0,509,3
3,Gerald Henderson,PG,2015,Regular Season,201945,164,1610612757,POR,72,1431,...,39,19,68,87,624,1.10,0.57,0,509,25
4,Gerald Henderson,PG,2016,Regular Season,201945,151,1610612755,PHI,72,1667,...,40,15,62,129,662,1.81,0.65,0,509,23
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
67797,Ivica Zubac,C,2018,Playoffs,1627826,146,1610612746,LAC,4,39,...,2,2,4,5,20,0.25,0.50,1,585,12
67798,Ivica Zubac,C,2019,Playoffs,1627826,51,1610612746,LAC,13,320,...,2,10,12,34,118,0.67,0.17,1,585,12
67799,Ivica Zubac,C,2020,Playoffs,1627826,56,1610612746,LAC,17,301,...,2,12,16,33,107,0.44,0.13,1,585,12
67800,Ivica Zubac,C,2022,Playoffs,1627826,107,1610612746,LAC,5,130,...,3,1,11,8,46,0.27,0.27,1,585,12


In [None]:
combined_stats['PLAYER'].unique()

array(['Kevin Durant', 'Kobe Bryant', 'LeBron James', ..., 'Dwayne Jones',
       'Tracy McGrady', 'Luca Vildoza'], dtype=object)

### 2. Encode categorical columns

In [None]:
"""
Function Name: encode_categorical_columns

Parameters: Takes in a dataframe

Output: Converted dataframe, with dropped categorical columns, along with player and team mappings. 
* Converted Dataframe
The player and team columns have been encoded and mapped so they can be decoded later.
* Player and Team Mappings
The function returns the player mappings and the team mappings so that they can be decoded later.

Description: Encodes categorical columns, like Season_type, PLAYER, and TEAM, using 
LabelEncoder. Also, it returns mappings for decoding later.

"""

# Call the 'convert_to_single_game' function to convert dataframe
encoded_df, player_mapping, team_mapping = encode_categorical_columns(combined_stats)
encoded_df.head()

Unnamed: 0,GP,MIN,FGM,FGA,FG_PCT,FG3M,FG3A,FG3_PCT,FTM,FTA,...,STL,BLK,TOV,PF,PTS,AST_TOV,STL_TOV,season_type_encoded,player_encoded,team_encoded
0,81,3119,731,1433,0.51,139,334,0.416,679,750,...,116,105,280,143,2280,1.34,0.41,0,895,21
1,78,3013,738,1595,0.463,132,407,0.324,525,626,...,106,25,287,173,2133,1.63,0.37,0,922,13
2,76,2877,765,1354,0.565,103,254,0.406,403,535,...,129,67,226,110,2036,2.44,0.57,0,963,15
3,78,2985,585,1337,0.438,179,486,0.368,674,792,...,142,38,295,178,2023,1.54,0.48,0,653,10
4,67,2482,669,1489,0.449,157,414,0.379,425,512,...,52,32,175,205,1920,0.98,0.3,0,198,20


### 3. Convert dataframe to single game stats

In [None]:
"""
Function Name: convert_to_single_game

Parameters: A dataframe

Output: Converted dataframe

Description: This function converts per-season stats to per-game stats by dividing each column by the games played (GP) column, while excluding certain 
columns (like percentage columns) from the division.

"""

single_game_stats = convert_to_single_game(encoded_df)
single_game_stats.head()

Unnamed: 0,MIN,FGM,FGA,FG_PCT,FG3M,FG3A,FG3_PCT,FTM,FTA,FT_PCT,...,STL,BLK,TOV,PF,PTS,AST_TOV,STL_TOV,season_type_encoded,player_encoded,team_encoded
0,38.5,9.0,17.7,0.51,1.7,4.1,0.416,8.4,9.3,0.905,...,1.4,1.3,3.5,1.8,28.1,0.0,0.0,0.0,11.0,0.3
1,38.6,9.5,20.4,0.463,1.7,5.2,0.324,6.7,8.0,0.839,...,1.4,0.3,3.7,2.2,27.3,0.0,0.0,0.0,11.8,0.2
2,37.9,10.1,17.8,0.565,1.4,3.3,0.406,5.3,7.0,0.753,...,1.7,0.9,3.0,1.4,26.8,0.0,0.0,0.0,12.7,0.2
3,38.3,7.5,17.1,0.438,2.3,6.2,0.368,8.6,10.2,0.851,...,1.8,0.5,3.8,2.3,25.9,0.0,0.0,0.0,8.4,0.1
4,37.0,10.0,22.2,0.449,2.3,6.2,0.379,6.3,7.6,0.83,...,0.8,0.5,2.6,3.1,28.7,0.0,0.0,0.0,3.0,0.3


### 4. Calculate Shooting Production

In [None]:
"""
Function Name: calculate_shooting_production

Parameters: A dataframe - stats

Output: Converted dataframe

Description: This function calculates shooting production by multiplying made shots 
by weights defined in the .env file for two-pointers, three-pointers, and free throws.

"""

# Call the function on the single stats dataframe
shooting_production_calculated = calculate_shooting_production(single_game_stats)
shooting_production_calculated.head()

Unnamed: 0,MIN,FGM,FGA,FG_PCT,FG3M,FG3A,FG3_PCT,FTM,FTA,FT_PCT,...,season_type_encoded,player_encoded,team_encoded,FGMI,FG3MI,FTMI,two_production,three_production,ft_production,total_shooting_production
0,38.5,9.0,17.7,0.51,1.7,4.1,0.416,8.4,9.3,0.905,...,0.0,11.0,0.3,8.7,2.4,0.9,0.3,-0.7,7.5,7.1
1,38.6,9.5,20.4,0.463,1.7,5.2,0.324,6.7,8.0,0.839,...,0.0,11.8,0.2,10.9,3.5,1.3,-1.4,-1.8,5.4,2.2
2,37.9,10.1,17.8,0.565,1.4,3.3,0.406,5.3,7.0,0.753,...,0.0,12.7,0.2,7.7,1.9,1.7,2.4,-0.5,3.6,5.5
3,38.3,7.5,17.1,0.438,2.3,6.2,0.368,8.6,10.2,0.851,...,0.0,8.4,0.1,9.6,3.9,1.6,-2.1,-1.6,7.0,3.3
4,37.0,10.0,22.2,0.449,2.3,6.2,0.379,6.3,7.6,0.83,...,0.0,3.0,0.3,12.2,3.9,1.3,-2.2,-1.6,5.0,1.2


### 5. Calculate Offensive Anx

In [None]:
"""
Function Name: calculate_offensive_anx

Parameters: A dataframe - stats

Output: Converted dataframe

Description: This function calculates offensive anxiety by applying weights
from environment variables to specific offensive stats.

"""

# Call the function to calculate offensive anx
offensive_anx_calculated = calculate_offensive_anx(shooting_production_calculated)
offensive_anx_calculated.head()

Unnamed: 0,MIN,FGM,FGA,FG_PCT,FG3M,FG3A,FG3_PCT,FTM,FTA,FT_PCT,...,FG3MI,FTMI,two_production,three_production,ft_production,total_shooting_production,weighted_or,weighted_ast,weighted_to,total_offensive_anx
0,38.5,9.0,17.7,0.51,1.7,4.1,0.416,8.4,9.3,0.905,...,2.4,0.9,0.3,-0.7,7.5,7.1,0.0,0.0,0.0,0.0
1,38.6,9.5,20.4,0.463,1.7,5.2,0.324,6.7,8.0,0.839,...,3.5,1.3,-1.4,-1.8,5.4,2.2,0.0,0.0,0.0,0.0
2,37.9,10.1,17.8,0.565,1.4,3.3,0.406,5.3,7.0,0.753,...,1.9,1.7,2.4,-0.5,3.6,5.5,0.0,0.0,0.0,0.0
3,38.3,7.5,17.1,0.438,2.3,6.2,0.368,8.6,10.2,0.851,...,3.9,1.6,-2.1,-1.6,7.0,3.3,0.0,0.0,0.0,0.0
4,37.0,10.0,22.2,0.449,2.3,6.2,0.379,6.3,7.6,0.83,...,3.9,1.3,-2.2,-1.6,5.0,1.2,0.0,0.0,0.0,0.0


### 6. Calculate Defensive Anx  

In [None]:
"""
Function Name: calculate_defensive_anx 

Parameters: Takes in a dataframe

Output: Converted dataframe

Description: This function calculates defensive anxiety by applying weights from 
environment variables to specific defensive stats.

"""

# Call the 'reformat_columns' function to convert dataframe
defensive_anx_calculated = calculate_defensive_anx(offensive_anx_calculated)
defensive_anx_calculated.head()

Unnamed: 0,MIN,FGM,FGA,FG_PCT,FG3M,FG3A,FG3_PCT,FTM,FTA,FT_PCT,...,total_shooting_production,weighted_or,weighted_ast,weighted_to,total_offensive_anx,weighted_dr,weighted_stl,weighted_blk,weighted_pf,total_defensive_anx
0,38.5,9.0,17.7,0.51,1.7,4.1,0.416,8.4,9.3,0.905,...,7.1,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
1,38.6,9.5,20.4,0.463,1.7,5.2,0.324,6.7,8.0,0.839,...,2.2,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
2,37.9,10.1,17.8,0.565,1.4,3.3,0.406,5.3,7.0,0.753,...,5.5,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
3,38.3,7.5,17.1,0.438,2.3,6.2,0.368,8.6,10.2,0.851,...,3.3,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
4,37.0,10.0,22.2,0.449,2.3,6.2,0.379,6.3,7.6,0.83,...,1.2,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0


### 7. Calculate Total Player Production

In [None]:
"""
Function Name: calculate_total_player_production

Parameters: Takes in a [list of dataframes]

Output: A combined dataframe

Description: This function calculates total player production by summing the 
shooting production, offensive anxiety, and defensive anxiety.

"""

# Call a function used to calculate the total production column
total_production_calculated = calculate_total_player_production(defensive_anx_calculated)
total_production_calculated.head()

Unnamed: 0,MIN,FGM,FGA,FG_PCT,FG3M,FG3A,FG3_PCT,FTM,FTA,FT_PCT,...,weighted_or,weighted_ast,weighted_to,total_offensive_anx,weighted_dr,weighted_stl,weighted_blk,weighted_pf,total_defensive_anx,total_production
0,38.5,9.0,17.7,0.51,1.7,4.1,0.416,8.4,9.3,0.905,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,7.1
1,38.6,9.5,20.4,0.463,1.7,5.2,0.324,6.7,8.0,0.839,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,2.2
2,37.9,10.1,17.8,0.565,1.4,3.3,0.406,5.3,7.0,0.753,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,5.5
3,38.3,7.5,17.1,0.438,2.3,6.2,0.368,8.6,10.2,0.851,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,3.3
4,37.0,10.0,22.2,0.449,2.3,6.2,0.379,6.3,7.6,0.83,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.2


####