In [2]:
import pandas as pd
import os

## Data Import

In [14]:
# import the nba data
nba_df = pd.read_csv(os.path.join("data", "nba_data_processed.csv"))
nba_df.shape

(649, 29)

In [9]:
nba_df.head(5)

Unnamed: 0,Player,Pos,Age,Tm,G,GS,MP,FG,FGA,FG%,...,FT%,ORB,DRB,TRB,AST,STL,BLK,TOV,PF,PTS
0,Precious Achiuwa,C,23.0,TOR,38.0,11.0,23.3,3.9,8.1,0.482,...,0.689,2.0,4.6,6.6,1.0,0.6,0.7,1.2,2.1,10.2
1,Steven Adams,C,29.0,MEM,42.0,42.0,27.0,3.7,6.3,0.597,...,0.364,5.1,6.5,11.5,2.3,0.9,1.1,1.9,2.3,8.6
2,Bam Adebayo,C,25.0,MIA,57.0,57.0,35.0,8.4,15.7,0.536,...,0.8,2.6,7.2,9.8,3.2,1.2,0.8,2.5,2.8,21.2
3,Ochai Agbaji,SG,22.0,UTA,39.0,2.0,15.6,1.8,3.8,0.483,...,0.682,0.7,1.1,1.8,0.6,0.2,0.1,0.3,1.4,5.0
4,Santi Aldama,PF,22.0,MEM,56.0,18.0,22.0,3.3,7.0,0.474,...,0.729,1.0,3.6,4.6,1.2,0.7,0.7,0.7,1.9,9.4


## Column Descriptions

- Player: string - name of the player
- Pos (Position): string - position played by the player
- Age: integer - age of the player as of February 1, 2023
- Tm (Team): string - team the player belongs to
- G (Games Played): integer - number of games played by the player
- GS (Games Started): integer - number of games started by the player
- MP (Minutes Played): integer - total minutes played by the player
- FG (Field Goals): integer - number of field goals made by the player
- FGA (Field Goal Attempts): integer - number of field goal attempts by the player
- FG% (Field Goal Percentage): float - percentage of field goals made by the player
- 3P (3-Point Field Goals): integer - number of 3-point field goals made by the player
- 3PA (3-Point Field Goal Attempts): integer - number of 3-point field goal attempts by the player
- 3P% (3-Point Field Goal Percentage): float - percentage of 3-point field goals made by the player
- 2P (2-Point Field Goals): integer - number of 2-point field goals made by the player
- 2PA (2-point Field Goal Attempts): integer - number of 2-point field goal attempts by the player
- 2P% (2-Point Field Goal Percentage): float - percentage of 2-point field goals made by the player
- eFG% (Effective Field Goal Percentage): float - effective field goal percentage of the player
- FT (Free Throws): integer - number of free throws made by the player
- FTA (Free Throw Attempts): integer - number of free throw attempts by the player
- FT% (Free Throw Percentage): float - percentage of free throws made by the player
- ORB (Offensive Rebounds): integer - number of offensive rebounds by the player
- DRB (Defensive Rebounds): integer - number of defensive rebounds by the player
- TRB (Total Rebounds): integer - total rebounds by the player
- AST (Assists): integer - number of assists made by the player
- STL (Steals): integer - number of steals made by the player
- BLK (Blocks): integer - number of blocks made by the player
- TOV (Turnovers): integer - number of turnovers made by the player
- PF (Personal Fouls): integer - number of personal fouls made by the player
- PTS (Points): integer - total points scored by the player

## Proposal

Our group will be analyzing an NBA Player Performance Stats dataset from [Kaggle](https://www.kaggle.com/datasets/iabdulw/nba-player-performance-stats) . The dataset contains 649 rows, indicating the total number of players, and 29 columns that are key performance indicators (KPIs) for each player. These KPIs include FG (field goals), 3P (3-point field goal attempts), and 2P (2-point field goal attempts). The goal of this project is to use the data to predict the performance of, or identify award-winning players in the near future.

To start, our group will utilize linear regression to quantify which variables among the selected columns are more likely to help us detect better performers, since the majority of the columns are either integers or floats. Then, we will apply logistic regression to determine the probability of a player receiving an award or performing better in the 2023-24 NBA season. Before diving into the analysis, we will convert the Tm column into integers using OneHotEncoding to use the variable for further study.

While analyzing the dataset, our group is unsure which columns to use for prediction, but we have identified a few variables that are meaningless for our purposes:

- Player: a unique variable indicating the name of a player.