# Week 4: Data Manipulation with Pandas (Part 2)
## NBA Players Data Analysis with Pandas
Today, we will be working with a **real-world dataset** containing infromation about NBA players, including their height, weight, birthplaces, teams, draft details, and performance states.

<br>
**Dataset Source:** [NBA Players Data on Kaggle](https://www.kaggle.com/datasets/justinas/nba-players-data)

## What you'll learn:
- Load data from a **CSV file** into a Pandas DataFrame.
- Explore and clean the dataset.
- Apply **filtering, grouping, and sorting** to analyze player stats.
- Write **functions** to make our analysis more reusable.
- Visualize data.


### Loading CSV files into Pandas DataFrame
---
To load a csv file you can use the funciton `pd.read_csv()`
<br>
[Documentation](https://pandas.pydata.org/docs/reference/api/pandas.read_csv.html)
<br><br>
**Note**: you need to provide the complete path for your file if it is not in the same location as your notebook

In [20]:
# Import the Pandas library 
import pandas as pd

# Load CSV file
file_name = 'nba_players_info.csv'
df_nba = pd.read_csv(file_name)

# display the first 5 rows of the dataframe
df_nba.head()

Unnamed: 0,player_name,team_abbreviation,age,player_height,player_weight,college,country,draft_year,draft_round,draft_number,...,pts,reb,ast,net_rating,oreb_pct,dreb_pct,usg_pct,ts_pct,ast_pct,season
0,Randy Livingston,HOU,22,193.04,94.800728,Louisiana State,USA,1996,2,42,...,3.9,1.5,2.4,0.3,0.042,0.071,0.169,0.487,0.248,1996-97
1,Gaylon Nickerson,WAS,28,190.5,86.18248,Northwestern Oklahoma,USA,1994,2,34,...,3.8,1.3,0.3,8.9,0.03,0.111,0.174,0.497,0.043,1996-97
2,George Lynch,VAN,26,203.2,103.418976,North Carolina,USA,1993,1,12,...,8.3,6.4,1.9,-8.2,0.106,0.185,0.175,0.512,0.125,1996-97
3,George McCloud,LAL,30,203.2,102.0582,Florida State,USA,1989,1,7,...,10.2,2.8,1.7,-2.7,0.027,0.111,0.206,0.527,0.125,1996-97
4,George Zidek,DEN,23,213.36,119.748288,UCLA,USA,1995,1,22,...,2.8,1.7,0.3,-14.1,0.102,0.169,0.195,0.5,0.064,1996-97


In [21]:
# display the information of the dataframe
df_nba.info()

# check for missing values
df_nba.isnull().sum()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 12844 entries, 0 to 12843
Data columns (total 21 columns):
 #   Column             Non-Null Count  Dtype  
---  ------             --------------  -----  
 0   player_name        12844 non-null  object 
 1   team_abbreviation  12844 non-null  object 
 2   age                12844 non-null  int64  
 3   player_height      12844 non-null  float64
 4   player_weight      12844 non-null  float64
 5   college            10990 non-null  object 
 6   country            12844 non-null  object 
 7   draft_year         12844 non-null  object 
 8   draft_round        12844 non-null  object 
 9   draft_number       12844 non-null  object 
 10  gp                 12844 non-null  int64  
 11  pts                12844 non-null  float64
 12  reb                12844 non-null  float64
 13  ast                12844 non-null  float64
 14  net_rating         12844 non-null  float64
 15  oreb_pct           12844 non-null  float64
 16  dreb_pct           128

player_name             0
team_abbreviation       0
age                     0
player_height           0
player_weight           0
college              1854
country                 0
draft_year              0
draft_round             0
draft_number            0
gp                      0
pts                     0
reb                     0
ast                     0
net_rating              0
oreb_pct                0
dreb_pct                0
usg_pct                 0
ts_pct                  0
ast_pct                 0
season                  0
dtype: int64

### Cleaning DataFrame
---
We can remove all the rows for entries that have at least one missing value using the `df.dropna()` function

In [24]:
# Drop missing values from dataframe
# inplace=True will modify the dataframe, if False it will return a new dataframe
df_nba.dropna(inplace=True)
df_nba.isnull().sum()

player_name          0
team_abbreviation    0
age                  0
player_height        0
player_weight        0
college              0
country              0
draft_year           0
draft_round          0
draft_number         0
gp                   0
pts                  0
reb                  0
ast                  0
net_rating           0
oreb_pct             0
dreb_pct             0
usg_pct              0
ts_pct               0
ast_pct              0
season               0
dtype: int64

In [None]:
# Lets print a Summary statistics using .describe() method
df_nba.describe()

# Create it as a new dataframe
summary_stats = df_nba.describe()

# Get max age
summary_stats['age']['max']

44.0

1. Find the tallest and shortest player
<br><br>
**Hint**: Use `df['column'].idxmax()` to get the row index with the maximum value

In [33]:
# Let's find the index for the row with the maximum and minimum height
index_max_height = df_nba['player_height'].idxmax()
index_min_height = df_nba['player_height'].idxmin()

# Get the player name with the maximum and minimum height
player_max_height = df_nba['player_name'][index_max_height]
player_min_height = df_nba['player_name'][index_min_height]

# Print results
print('The player with the maximum height is:', player_max_height)
print('The player with the minimum height is:', player_min_height)

The player with the maximum height is: Shawn Bradley
The player with the minimum height is: Muggsy Bogues


2. Who had the highest average points in each season?

In [None]:
# Get the seasons in our Dataframe using the unique() method
df_nba['season'].unique() # get an array with the unique values
list(df_nba['season'].unique()) # convert it to a list 
seasons = list(df_nba['season'].unique()) # create a variable with the list

# Use the values in the list to create a new dataframe for each season
season1 = df_nba[df_nba['season'] == seasons[0]]
# Find the player with the maximum pts in season1
index_max_pts_season1 = season1['pts'].idxmax()
player_max_pts_season1 = season1['player_name'][index_max_pts_season1]
print('The player with the maximum points in', seasons[0], 'is', player_max_pts_season1)
# Do this for all seasons


# Or use a for loop


The player with the maximum points in 1996-97 is Michael Jordan


In [38]:
df_nba.groupby('team_abbreviation').sum()

Unnamed: 0_level_0,player_name,age,player_height,player_weight,college,country,draft_year,draft_round,draft_number,gp,pts,reb,ast,net_rating,oreb_pct,dreb_pct,usg_pct,ts_pct,ast_pct,season
team_abbreviation,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1
ATL,Eldridge RecasnerHenry JamesJon BarryIvano New...,9877,74122.28,36896.987648,WashingtonSt. Mary's (TX)Georgia TechGeorgia T...,USAUSAUSAUSAUSACongoUSAUSAUSAUSAUSAUSAUSAUSAUS...,1992Undrafted199219941992199119951994199419901...,UndraftedUndrafted1Undrafted112221211111212211...,UndraftedUndrafted21Undrafted34423833935519281...,17964,2825.6,1200.5,613.3,-1276.6,19.707,50.07,66.472,187.35,47.121,1996-971996-971996-971996-971996-971996-971996...
BKN,Tyshawn TaylorReggie EvansBrook LopezC.J. Wats...,4752,34884.36,17334.01828,KansasIowaStanfordTennesseeIllinoisTexasNorth ...,USAUSAUSAUSAUSAUSAUSAUSAUSAUSAUSAUSAUSAUSAUSAM...,2012Undrafted2008Undrafted20052010199520042001...,2Undrafted1Undrafted11111112211Undrafted111122...,41Undrafted10Undrafted3243610252543511410Undra...,8214,1559.6,633.0,357.9,-563.2,7.791,24.783,32.212,91.976,24.625,2012-132012-132012-132012-132012-132012-132012...
BOS,Greg MinorEric WilliamsFrank BrickowskiDana Ba...,9918,74269.6,37301.13812,LouisvilleProvidencePenn StateBoston CollegeBa...,USAUSAUSAUSAUSAUSAUSACanadaUSAUSAUSAUSAUSAUSAU...,1994199519811989Undrafted1990Undrafted19911996...,1131Undrafted1Undrafted1211UndraftedUndraftedU...,25145716Undrafted19Undrafted243881UndraftedUnd...,19597,3121.3,1278.7,723.2,-191.9,19.586,50.727,67.07,195.872,50.93,1996-971996-971996-971996-971996-971996-971996...
CHA,Melvin ElyMatt CarrollKeith BogansKareem RushM...,6778,51941.9,25647.68216,Fresno StateNotre DameKentuckyMissouriVillanov...,USAUSAUSAUSAUSAUSAUSAUSAUSAUSAUSAUSAUSAUSAUSAU...,2002Undrafted20032002UndraftedUndrafted2002199...,1Undrafted21UndraftedUndrafted211222221112211U...,12Undrafted4320UndraftedUndrafted5329164546434...,13013,2103.5,861.8,467.4,-1314.0,12.229,35.37,48.385,132.951,35.911,2004-052004-052004-052004-052004-052004-052004...
CHH,Glen RiceDonald RoyalDell CurryRicky PierceTom...,2527,17449.8,8845.497592,MichiganNotre DameVirginia TechRiceUtahKentuck...,USAUSAUSAUSAUSAUSAUSAUSAUSAUSAUSAUSAUSAUSAUSAU...,1989198719861982198119961990199619861992198719...,13111122221322UndraftedUndrafted23UndraftedUnd...,45215188165144394212534255UndraftedUndrafted47...,4449,678.6,295.5,159.6,-367.1,5.425,12.183,16.458,41.551,12.225,1996-971996-971996-971996-971996-971996-971996...
CHI,Jason CaffeyDickey SimpkinsJud BuechlerDennis ...,9873,72687.18,35854.17964,AlabamaProvidenceArizonaSoutheastern Oklahoma ...,USAUSAUSAUSAUSAUSAUSAUSAUSAUSAUSAUSAUSAUSAUSAU...,1995199419901986198719761986198819911991199219...,11221112122111121111Undrafted121221112112Undra...,20213827588507315231610331728520Undrafted85016...,19105,3035.2,1292.0,725.5,-626.6,18.742,48.681,67.519,182.47,52.037,1996-971996-971996-971996-971996-971996-971996...
CLE,Chris MillsCarl ThomasDonny MarshallDanny Ferr...,9791,72349.36,36165.797344,ArizonaEastern MichiganConnecticutDukeSt. John...,USAUSAUSAUSAUSAUkraineUSAUSAUSAUSAUSAUSAUSAUSA...,1993Undrafted199519891994199619901991199619831...,1Undrafted21211122212Undrafted111Undrafted2112...,22Undrafted392431211115630291745Undrafted19231...,17668,2684.7,1180.2,615.1,-1033.0,18.555,49.064,65.379,181.757,47.013,1996-971996-971996-971996-971996-971996-971996...
DAL,Greg DreilingFred RobertsErick StricklandJamie...,10099,71059.04,35063.115192,KansasBrigham YoungNebraskaSouth CarolinaTexas...,USAUSAUSAUSAUSAUSAUSAUSAUSAUSAUSAUSAUSAUSAUSAU...,19861982Undrafted1994199619951983199319891996U...,22Undrafted2211111UndraftedUndrafted1111111Und...,2627Undrafted4741911269UndraftedUndrafted12212...,17894,2675.5,1137.0,629.9,-592.1,18.418,47.234,64.878,181.368,45.956,1996-971996-971996-971996-971996-971996-971996...
DEN,George ZidekElmer BennettEric MurdockErvin Joh...,9836,72534.78,35796.573456,UCLANotre DameProvidenceNew OrleansMichiganPen...,USAUSAUSAUSAUSAUSAUSAUSAUSAUSAUSAUSAUSAUSAUSAU...,1995199219911993199519951996198319941992198719...,121122211121111211Undrafted1211211Undrafted212...,223821233549379271339965252285Undrafted5521029...,18850,2996.1,1280.9,683.9,-1018.0,20.242,49.146,66.979,189.199,51.068,1996-971996-971996-971996-971996-971996-971996...
DET,Grant HillGrant LongJerome WilliamsJoe DumarsD...,10057,73423.78,36594.441784,DukeEastern MichiganGeorgetownMcNeese StateGeo...,USAUSAUSAUSAUSAUSAUSAUSAUSAUSAUSAUSAUSAUSAUSAU...,1994198819961985199519801990199519931992198419...,121122111211Undrafted1211122111122112111122211...,3332618583516181039919Undrafted173514101944311...,19399,3035.8,1296.4,671.3,-607.2,18.707,50.265,68.767,186.885,49.71,1996-971996-971996-971996-971996-971996-971996...
