| Column name | Description |
| -- | -- |
| ID | Unique identifier for each player in the dataset |
| points | Total points scored by a player in a season |
| possessions | Total possessions by a player in a season |
| team_pace | Average number of possessions a team uses per game |
| Unnamed | 4	Unknown data |
| Unnamed | 5	Unknown data |
| GP | Games played by a player in a season |
| MPG | Average minutes played by a player per game |
| TS% | True shooting percentage, the player's shooting percentage, taking into account free throws and three-pointers |
| AST | Assist ratio, the percentage of a player's possessions that end in an assist |
| TO | Turnover ratio, the percentage of a player's possessions that end in a turnover |
| USG | Usage rate, the number of possessions a player uses per 40 minutes |
| ORR | Offensive rebound rate |
| DRR | Defensive rebound rate |
| REBR | Rebound rate, the percentage of missed shots that a player rebounds |
| PER | Player efficiency rating, the measure of a player's per-minute productivity on the court |

In [1]:
import pandas as pd

In [3]:
# Import data from the CSV file to a pandas DataFrame.
player_df = pd.read_csv('player_data.csv')

In [5]:
# print data head
player_df.head()

Unnamed: 0,ID,points,possessions,team_pace,Unnamed: 4,Unnamed: 5,GP,MPG,TS%,AST,TO,USG,ORR,DRR,REBR,PER
0,1,1893.0,1251.8,97.8,,,63.0,33.9,0.569,17.2,11.5,26.1,4.7,23.3,7.8,10.9
1,2,1386.0,1282.5,110.5,,,58.0,32.5,0.511,24.8,9.7,26.9,6.1,0.9,10.7,27.3
2,3,1405.0,1252.3,105.8,,,55.0,36.3,0.605,25.7,13.9,28.1,4.5,4.9,1.8,
3,4,1282.0,1235.9,100.7,,,54.0,37.6,0.636,29.5,11.0,22.3,4.8,4.6,5.6,22.35
4,5,1721.0,1254.0,105.7,,,59.0,30.5,0.589,22.8,9.9,24.6,1.2,8.4,12.1,28.38


In [13]:
# Total NaN
player_df.isna().sum()

ID              0
points          3
possessions     3
team_pace       3
Unnamed: 4     46
Unnamed: 5     46
GP              7
MPG             6
TS%             1
AST             1
TO              1
USG             1
ORR             1
DRR             1
REBR            1
PER            10
dtype: int64

In [14]:
# Information about dataframe
player_df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 46 entries, 0 to 45
Data columns (total 16 columns):
 #   Column       Non-Null Count  Dtype  
---  ------       --------------  -----  
 0   ID           46 non-null     int64  
 1   points       43 non-null     float64
 2   possessions  43 non-null     float64
 3   team_pace    43 non-null     float64
 4   Unnamed: 4   0 non-null      float64
 5   Unnamed: 5   0 non-null      float64
 6   GP           39 non-null     float64
 7   MPG          40 non-null     float64
 8   TS%          45 non-null     float64
 9   AST          45 non-null     float64
 10  TO           45 non-null     float64
 11  USG          45 non-null     float64
 12  ORR          45 non-null     float64
 13  DRR          45 non-null     float64
 14  REBR         45 non-null     float64
 15  PER          36 non-null     float64
dtypes: float64(15), int64(1)
memory usage: 5.9 KB


In [16]:
# Drop columns that have no values
player_df.dropna(axis='columns',inplace=True, how='all')
player_df.isna().sum()

ID              0
points          3
possessions     3
team_pace       3
GP              7
MPG             6
TS%             1
AST             1
TO              1
USG             1
ORR             1
DRR             1
REBR            1
PER            10
dtype: int64

In [17]:
player_df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 46 entries, 0 to 45
Data columns (total 14 columns):
 #   Column       Non-Null Count  Dtype  
---  ------       --------------  -----  
 0   ID           46 non-null     int64  
 1   points       43 non-null     float64
 2   possessions  43 non-null     float64
 3   team_pace    43 non-null     float64
 4   GP           39 non-null     float64
 5   MPG          40 non-null     float64
 6   TS%          45 non-null     float64
 7   AST          45 non-null     float64
 8   TO           45 non-null     float64
 9   USG          45 non-null     float64
 10  ORR          45 non-null     float64
 11  DRR          45 non-null     float64
 12  REBR         45 non-null     float64
 13  PER          36 non-null     float64
dtypes: float64(13), int64(1)
memory usage: 5.2 KB


In [18]:
# Drop rows that have no values
player_df.dropna(inplace=True,how='all')
player_df.isna().sum()

ID              0
points          3
possessions     3
team_pace       3
GP              7
MPG             6
TS%             1
AST             1
TO              1
USG             1
ORR             1
DRR             1
REBR            1
PER            10
dtype: int64

In [None]:
# Know that no row has all NaN values