# Pandas
Excel ♥ SQL

[Excel ♥ SQL]: # (Invisible comment)

### Dataframes & Series

Series #1

| player_id | pts_per_games |
|-----------|---------------|
| 201939    |       27.3    |
| 201940    |       26.0    |
| 201941    |       16.3    |

Series #2

| player_id | reb_per_games |
|-----------|---------------|
| 201939    |       6.7     |
| 201940    |       5.4     |
| 201941    |       4.8     |

Dataframes (DF)

| player_id | pts_per_games | reb_per_game |
|-----------|---------------|--------------|
| 201939    |       27.3    |    6.7       |  
| 201940    |       26.0    |    5.4       |
| 201941    |       16.3    |    4.8       |


### Dataframes store in memory a collection of series

Dataframes can be created using various inputs like:

* Lists
* Dictionaries
* Series
* Numpy arrays
* Another dataframe

Dataframes can be created reading in data like:

---
+ CSV
+ Excel
+ SQL
---


# Creating a Dataframe

In [3]:
import pandas as pd

In [4]:
# This line create an empty dataframe
df = pd.DataFrame()

In [5]:
print(df)

Empty DataFrame
Columns: []
Index: []


In [6]:
celtics_dict = {
    'player_name': ['Jaylen Brown', 'Jayson Tatum', 'Derrick White', 'Jrue Holiday', 'Neemias Queta'],
    'ppg': [ 26.8, 30.3, 12.4, 14.1, 8.3 ],
    'rpg': [ 5.3, 8.2, 4.5, 4.7, 7.5 ],
    'apg': [ 4.4, 5.1, 6.3, 5.9, 0.6 ]
}

In [7]:
# df(celtics_dict) --> This won't work because the df was already created and it can't be overwritten

In [8]:
df_celtics = pd.DataFrame(celtics_dict)

In [9]:
print(df_celtics)

     player_name   ppg  rpg  apg
0   Jaylen Brown  26.8  5.3  4.4
1   Jayson Tatum  30.3  8.2  5.1
2  Derrick White  12.4  4.5  6.3
3   Jrue Holiday  14.1  4.7  5.9
4  Neemias Queta   8.3  7.5  0.6


In [10]:
df_filtered = pd.DataFrame(df_celtics, index=[2,4])

In [11]:
print(df_filtered)

     player_name   ppg  rpg  apg
2  Derrick White  12.4  4.5  6.3
4  Neemias Queta   8.3  7.5  0.6


In [12]:
label = ['sf', 'pf', 'pg', 'sg', 'c']

In [13]:
df_label = pd.DataFrame(celtics_dict, index= label)

In [14]:
print(df_label)

      player_name   ppg  rpg  apg
sf   Jaylen Brown  26.8  5.3  4.4
pf   Jayson Tatum  30.3  8.2  5.1
pg  Derrick White  12.4  4.5  6.3
sg   Jrue Holiday  14.1  4.7  5.9
c   Neemias Queta   8.3  7.5  0.6


In [15]:
# Let's create another DataFrame
stats = [['Jaylen Brown',4,6], ['Jayson Tatum',2,5],['Jrue Holiday',4,4]]

In [16]:
stats_df = pd.DataFrame(stats, columns= ['player', 'oreb', 'dreb'])

In [17]:
print(stats_df)

         player  oreb  dreb
0  Jaylen Brown     4     6
1  Jayson Tatum     2     5
2  Jrue Holiday     4     4


In [18]:
rebs = [6,9,11,7,3]

In [19]:
reb_series = pd.DataFrame(rebs, columns=['jaylen brown_reb'])

In [20]:
print(reb_series)

   jaylen brown_reb
0                 6
1                 9
2                11
3                 7
4                 3


# Reading CSV Files in Dataframes

In [21]:
import pandas as pd

In [22]:
df = pd.read_csv('../nba-stats-csv/player_info.csv')

In [23]:
print(df)

       player_id     player_name season_id
0            920      A.C. Green   1996-97
1            243     Aaron McKie   1996-97
2           1425  Aaron Williams   1996-97
3            768       Acie Earl   1996-97
4            228      Adam Keefe   1996-97
...          ...             ...       ...
29223    1628380    Zach Collins   2017-18
29224     203897     Zach LaVine   2017-18
29225       2216   Zach Randolph   2017-18
29226       2585   Zaza Pachulia   2017-18
29227    1627753         Zhou Qi   2017-18

[29228 rows x 3 columns]


In [24]:
df.head(10)

Unnamed: 0,player_id,player_name,season_id
0,920,A.C. Green,1996-97
1,243,Aaron McKie,1996-97
2,1425,Aaron Williams,1996-97
3,768,Acie Earl,1996-97
4,228,Adam Keefe,1996-97
5,154,Adrian Caldwell,1996-97
6,673,Alan Henderson,1996-97
7,1059,Aleksandar Djordjevic,1996-97
8,275,Allan Houston,1996-97
9,947,Allen Iverson,1996-97


In [25]:


df_noheader = pd.read_csv('../nba-stats-csv/player_info_no_header.csv', header=None)

In [26]:
print(df_noheader)

             0               1        2
0          920      A.C. Green  1996-97
1          243     Aaron McKie  1996-97
2         1425  Aaron Williams  1996-97
3          768       Acie Earl  1996-97
4          228      Adam Keefe  1996-97
...        ...             ...      ...
29223  1628380    Zach Collins  2017-18
29224   203897     Zach LaVine  2017-18
29225     2216   Zach Randolph  2017-18
29226     2585   Zaza Pachulia  2017-18
29227  1627753         Zhou Qi  2017-18

[29228 rows x 3 columns]


In [27]:
df_noheader.head(5)

Unnamed: 0,0,1,2
0,920,A.C. Green,1996-97
1,243,Aaron McKie,1996-97
2,1425,Aaron Williams,1996-97
3,768,Acie Earl,1996-97
4,228,Adam Keefe,1996-97


In [28]:
df_index = pd.read_csv('../nba-stats-csv/player_info.csv', index_col='player_id')

In [29]:
print(df_index)

              player_name season_id
player_id                          
920            A.C. Green   1996-97
243           Aaron McKie   1996-97
1425       Aaron Williams   1996-97
768             Acie Earl   1996-97
228            Adam Keefe   1996-97
...                   ...       ...
1628380      Zach Collins   2017-18
203897        Zach LaVine   2017-18
2216        Zach Randolph   2017-18
2585        Zaza Pachulia   2017-18
1627753           Zhou Qi   2017-18

[29228 rows x 2 columns]


In [30]:
df_usecols = pd.read_csv('../nba-stats-csv/player_general_traditional_per_game_data.csv', usecols=['player_id', 'season_id'])

In [31]:
df_usecols.head(5)

Unnamed: 0,player_id,season_id
0,471,1996-97
1,920,1996-97
2,243,1996-97
3,1425,1996-97
4,768,1996-97


# Attibutes & Methods

In [32]:
# .method()
# .attribute

In [40]:
df_players_data = pd.read_csv('../nba-stats-csv/player_general_traditional_per_game_data.csv')

In [41]:
df_players_data.shape # (rows, columns)

(10633, 21)

In [42]:
df_players_data.dtypes

player_id      int64
season_id     object
gp           float64
age          float64
min          float64
fgm          float64
fga          float64
fg_pct       float64
fg3m         float64
fg3a         float64
fg3_pct      float64
ftm          float64
fta          float64
ft_pct       float64
oreb         float64
dreb         float64
ast          float64
tov          float64
stl          float64
blk          float64
pts          float64
dtype: object

### Head, Tail & Sample Methods

In [45]:
df_5_row = df_players_data.head()

In [47]:
print(df_5_row) # We've created a new dataframe using data from the original

   player_id season_id    gp   age   min  fgm  fga  fg_pct  fg3m  fg3a  ...  \
0        471   1996-97  41.0   NaN  13.3  1.1  3.3   0.331   0.2   0.7  ...   
1        920   1996-97  83.0  33.0  30.8  2.8  5.8   0.483   0.0   0.2  ...   
2        243   1996-97  83.0  24.0  20.4  1.8  4.4   0.411   0.5   1.2  ...   
3       1425   1996-97  33.0  25.0  17.8  2.6  4.5   0.574   0.0   0.0  ...   
4        768   1996-97  47.0  27.0  11.1  1.4  3.8   0.374   0.0   0.1  ...   

   ftm  fta  ft_pct  oreb  dreb  ast  tov  stl  blk  pts  
0  1.0  1.2   0.875   0.7   1.8  1.4  0.8  0.2  0.3  3.4  
1  1.5  2.4   0.650   2.7   5.2  0.8  0.9  0.8  0.2  7.2  
2  1.1  1.3   0.836   0.5   2.2  1.9  1.1  0.9  0.3  5.2  
3  1.0  1.5   0.673   1.9   2.5  0.5  1.0  0.5  0.9  6.2  
4  1.1  1.8   0.643   0.7   1.3  0.4  0.7  0.3  0.6  4.0  

[5 rows x 21 columns]


In [57]:
df_players_data.tail()

Unnamed: 0,player_id,season_id,gp,age,min,fgm,fga,fg_pct,fg3m,fg3a,...,ftm,fta,ft_pct,oreb,dreb,ast,tov,stl,blk,pts
10628,203897,2018-19,63.0,24.0,34.5,8.4,18.0,0.467,1.9,5.1,...,5.0,6.0,0.832,0.6,4.0,4.5,3.4,1.0,0.4,23.7
10629,1629155,2018-19,1.0,26.0,3.8,0.0,1.0,0.0,0.0,1.0,...,0.0,0.0,0.0,0.0,0.0,0.0,1.0,1.0,0.0,0.0
10630,2585,2018-19,68.0,35.0,12.9,1.3,2.8,0.44,0.0,0.1,...,1.4,1.8,0.782,1.5,2.4,1.3,0.8,0.5,0.3,3.9
10631,1629015,2018-19,6.0,19.0,18.4,2.3,5.7,0.412,1.0,2.7,...,1.0,1.3,0.75,0.5,1.7,1.7,1.0,0.3,0.3,6.7
10632,1627753,2018-19,1.0,23.0,1.0,1.0,1.0,1.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,2.0


In [58]:
df_players_data.head(10)

Unnamed: 0,player_id,season_id,gp,age,min,fgm,fga,fg_pct,fg3m,fg3a,...,ftm,fta,ft_pct,oreb,dreb,ast,tov,stl,blk,pts
0,471,1996-97,41.0,,13.3,1.1,3.3,0.331,0.2,0.7,...,1.0,1.2,0.875,0.7,1.8,1.4,0.8,0.2,0.3,3.4
1,920,1996-97,83.0,33.0,30.8,2.8,5.8,0.483,0.0,0.2,...,1.5,2.4,0.65,2.7,5.2,0.8,0.9,0.8,0.2,7.2
2,243,1996-97,83.0,24.0,20.4,1.8,4.4,0.411,0.5,1.2,...,1.1,1.3,0.836,0.5,2.2,1.9,1.1,0.9,0.3,5.2
3,1425,1996-97,33.0,25.0,17.8,2.6,4.5,0.574,0.0,0.0,...,1.0,1.5,0.673,1.9,2.5,0.5,1.0,0.5,0.9,6.2
4,768,1996-97,47.0,27.0,11.1,1.4,3.8,0.374,0.0,0.1,...,1.1,1.8,0.643,0.7,1.3,0.4,0.7,0.3,0.6,4.0
5,228,1996-97,62.0,27.0,15.4,1.3,2.6,0.513,0.0,0.0,...,1.1,1.7,0.689,1.2,2.3,0.5,0.7,0.5,0.2,3.8
6,154,1996-97,45.0,30.0,13.1,0.9,2.0,0.435,0.0,0.0,...,0.5,1.1,0.42,1.3,2.4,0.3,0.6,0.4,0.2,2.2
7,673,1996-97,30.0,24.0,17.2,2.6,5.4,0.475,0.0,0.0,...,1.5,2.5,0.6,1.6,2.3,0.8,1.0,0.7,0.2,6.6
8,1059,1996-97,8.0,29.0,7.9,1.0,2.0,0.5,0.6,0.9,...,0.5,0.6,0.8,0.1,0.5,0.6,0.6,0.0,0.0,3.1
9,275,1996-97,81.0,26.0,34.3,5.4,12.7,0.423,1.8,4.7,...,2.2,2.7,0.803,0.5,2.4,2.2,2.1,0.5,0.2,14.8


In [59]:
df_players_data.tail(7)

Unnamed: 0,player_id,season_id,gp,age,min,fgm,fga,fg_pct,fg3m,fg3a,...,ftm,fta,ft_pct,oreb,dreb,ast,tov,stl,blk,pts
10626,1629139,2018-19,15.0,24.0,11.6,1.0,3.4,0.294,0.1,1.1,...,0.5,0.7,0.7,0.3,1.8,0.5,0.4,0.3,0.1,2.6
10627,1628380,2018-19,77.0,21.0,17.6,2.5,5.2,0.473,0.5,1.6,...,1.2,1.6,0.746,1.4,2.8,0.9,1.0,0.3,0.9,6.6
10628,203897,2018-19,63.0,24.0,34.5,8.4,18.0,0.467,1.9,5.1,...,5.0,6.0,0.832,0.6,4.0,4.5,3.4,1.0,0.4,23.7
10629,1629155,2018-19,1.0,26.0,3.8,0.0,1.0,0.0,0.0,1.0,...,0.0,0.0,0.0,0.0,0.0,0.0,1.0,1.0,0.0,0.0
10630,2585,2018-19,68.0,35.0,12.9,1.3,2.8,0.44,0.0,0.1,...,1.4,1.8,0.782,1.5,2.4,1.3,0.8,0.5,0.3,3.9
10631,1629015,2018-19,6.0,19.0,18.4,2.3,5.7,0.412,1.0,2.7,...,1.0,1.3,0.75,0.5,1.7,1.7,1.0,0.3,0.3,6.7
10632,1627753,2018-19,1.0,23.0,1.0,1.0,1.0,1.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,2.0


In [60]:
df_players_data.sample(12) # pick random rows for sample purposes.

Unnamed: 0,player_id,season_id,gp,age,min,fgm,fga,fg_pct,fg3m,fg3a,...,ftm,fta,ft_pct,oreb,dreb,ast,tov,stl,blk,pts
3455,2581,2003-04,75.0,24.0,18.5,2.1,5.4,0.386,1.0,2.7,...,0.7,0.9,0.821,0.2,1.3,2.8,1.7,0.8,0.1,5.9
10030,203500,2017-18,76.0,24.0,32.7,5.9,9.4,0.629,0.0,0.0,...,2.1,3.8,0.559,5.1,4.0,1.2,1.7,1.2,1.0,13.9
2101,296,2000-01,82.0,31.0,28.0,3.5,7.9,0.444,1.4,3.7,...,1.2,1.5,0.779,1.0,3.0,3.2,1.7,0.9,0.4,9.6
8371,203200,2014-15,59.0,26.0,11.1,1.5,4.0,0.387,0.6,1.8,...,0.6,0.8,0.822,0.2,1.0,0.8,0.5,0.7,0.2,4.3
5512,201592,2008-09,6.0,24.0,1.3,0.3,0.5,0.667,0.0,0.0,...,0.0,0.0,0.0,0.3,0.2,0.0,0.2,0.2,0.0,0.7
5063,2137,2007-08,73.0,30.0,18.0,2.2,5.1,0.424,0.8,2.3,...,0.6,0.8,0.759,0.5,2.6,0.9,0.8,0.8,0.2,5.8
10178,203469,2018-19,49.0,26.0,25.4,3.9,7.0,0.551,0.1,0.4,...,2.3,2.9,0.787,2.2,4.6,2.1,1.3,0.8,0.8,10.1
139,190,1996-97,46.0,29.0,13.5,1.0,2.0,0.511,0.0,0.1,...,0.4,0.8,0.541,1.2,1.5,0.4,0.7,0.3,0.8,2.6
3718,976,2004-05,76.0,30.0,34.9,5.0,12.0,0.412,1.2,3.5,...,1.7,2.1,0.813,0.4,1.6,5.1,1.5,0.7,0.0,12.8
3018,959,2002-03,82.0,29.0,33.1,6.3,13.6,0.465,1.4,3.3,...,3.8,4.2,0.909,0.8,2.1,7.3,2.3,1.0,0.1,17.8


In [61]:
df_players_data.info() # We receive metadata from our dataframe

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 10633 entries, 0 to 10632
Data columns (total 21 columns):
 #   Column     Non-Null Count  Dtype  
---  ------     --------------  -----  
 0   player_id  10633 non-null  int64  
 1   season_id  10633 non-null  object 
 2   gp         10633 non-null  float64
 3   age        10631 non-null  float64
 4   min        10633 non-null  float64
 5   fgm        10633 non-null  float64
 6   fga        10633 non-null  float64
 7   fg_pct     10633 non-null  float64
 8   fg3m       10633 non-null  float64
 9   fg3a       10633 non-null  float64
 10  fg3_pct    10633 non-null  float64
 11  ftm        10633 non-null  float64
 12  fta        10633 non-null  float64
 13  ft_pct     10633 non-null  float64
 14  oreb       10633 non-null  float64
 15  dreb       10633 non-null  float64
 16  ast        10633 non-null  float64
 17  tov        10633 non-null  float64
 18  stl        10633 non-null  float64
 19  blk        10633 non-null  float64
 20  pts   

In [62]:
df_players_data.describe()

Unnamed: 0,player_id,gp,age,min,fgm,fga,fg_pct,fg3m,fg3a,fg3_pct,ftm,fta,ft_pct,oreb,dreb,ast,tov,stl,blk,pts
count,10633.0,10633.0,10631.0,10633.0,10633.0,10633.0,10633.0,10633.0,10633.0,10633.0,10633.0,10633.0,10633.0,10633.0,10633.0,10633.0,10633.0,10633.0,10633.0,10633.0
mean,168422.2,52.633782,27.235255,20.398853,3.031468,6.798354,0.433469,0.545359,1.562682,0.235817,1.495429,2.00348,0.698231,0.970159,2.589119,1.796398,1.186043,0.646929,0.412602,8.101129
std,388278.0,25.115265,4.344703,10.099563,2.15945,4.594357,0.097001,0.661287,1.743706,0.179157,1.397361,1.769024,0.195102,0.831041,1.789594,1.789208,0.79109,0.44905,0.481362,5.903179
min,2.0,1.0,18.0,0.1,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
25%,1495.0,33.0,24.0,12.1,1.3,3.2,0.398,0.0,0.1,0.0,0.5,0.8,0.643,0.4,1.3,0.6,0.6,0.3,0.1,3.5
50%,2501.0,59.0,27.0,19.8,2.5,5.7,0.437,0.3,1.0,0.298,1.1,1.5,0.745,0.7,2.2,1.2,1.0,0.6,0.3,6.6
75%,201880.0,75.0,30.0,28.7,4.3,9.5,0.479,0.9,2.6,0.364,2.0,2.7,0.814,1.4,3.4,2.4,1.6,0.9,0.5,11.5
max,1629541.0,85.0,44.0,44.4,12.2,27.8,1.0,5.1,13.2,1.0,9.7,13.1,1.0,6.8,11.5,11.7,5.7,2.9,3.9,36.1


In [64]:
df_players_data.sum() # sums all the columns

player_id                                           1790833752
season_id    1996-971996-971996-971996-971996-971996-971996...
gp                                                    559655.0
age                                                   289538.0
min                                                   216901.0
fgm                                                    32233.6
fga                                                    72286.9
fg_pct                                                4609.073
fg3m                                                    5798.8
fg3a                                                   16616.0
fg3_pct                                               2507.447
ftm                                                    15900.9
fta                                                    21303.0
ft_pct                                                7424.287
oreb                                                   10315.7
dreb                                                   

In [66]:
print(df_players_data.dtypes)

player_id      int64
season_id     object
gp           float64
age          float64
min          float64
fgm          float64
fga          float64
fg_pct       float64
fg3m         float64
fg3a         float64
fg3_pct      float64
ftm          float64
fta          float64
ft_pct       float64
oreb         float64
dreb         float64
ast          float64
tov          float64
stl          float64
blk          float64
pts          float64
dtype: object


In [67]:
df_players_data.mean(numeric_only=True) # return averages for all columns, excluding non-numeric columns

player_id    168422.246967
gp               52.633782
age              27.235255
min              20.398853
fgm               3.031468
fga               6.798354
fg_pct            0.433469
fg3m              0.545359
fg3a              1.562682
fg3_pct           0.235817
ftm               1.495429
fta               2.003480
ft_pct            0.698231
oreb              0.970159
dreb              2.589119
ast               1.796398
tov               1.186043
stl               0.646929
blk               0.412602
pts               8.101129
dtype: float64

In [75]:
df_players_data.corr(numeric_only=True) # It returns a DF in which each cell (i,j) contains the correlation coeficient between column i and column j

Unnamed: 0,player_id,gp,age,min,fgm,fga,fg_pct,fg3m,fg3a,fg3_pct,ftm,fta,ft_pct,oreb,dreb,ast,tov,stl,blk,pts
player_id,1.0,-0.117966,-0.327878,-0.138399,-0.088626,-0.091591,-0.013139,0.031839,0.0472,0.053466,-0.117921,-0.121414,-0.070711,-0.104069,-0.076786,-0.073226,-0.128919,-0.094299,-0.058878,-0.089121
gp,-0.117966,1.0,0.031932,0.661571,0.558798,0.537544,0.323238,0.33897,0.32638,0.252203,0.43621,0.432054,0.401575,0.350474,0.499179,0.391819,0.47328,0.482834,0.293747,0.550499
age,-0.327878,0.031932,1.0,0.06816,-0.010243,-0.005897,-0.011013,0.060671,0.048133,0.017836,-0.026634,-0.045631,0.080744,-0.034389,0.048336,0.079816,-0.013969,0.023266,-0.036354,-0.006939
min,-0.138399,0.661571,0.06816,1.0,0.898696,0.902499,0.265402,0.512537,0.521027,0.286004,0.763601,0.759884,0.39117,0.468688,0.741098,0.672098,0.82154,0.766862,0.381113,0.896016
fgm,-0.088626,0.558798,-0.010243,0.898696,1.0,0.981887,0.306843,0.494028,0.500119,0.269023,0.856325,0.848208,0.355666,0.440062,0.712868,0.627075,0.83531,0.681252,0.370746,0.98983
fga,-0.091591,0.537544,-0.005897,0.902499,0.981887,1.0,0.188154,0.569256,0.588161,0.317116,0.846021,0.825272,0.378002,0.353804,0.653903,0.665869,0.844743,0.707331,0.291224,0.982662
fg_pct,-0.013139,0.323238,-0.011013,0.265402,0.306843,0.188154,1.0,-0.050848,-0.097441,-0.027911,0.222654,0.258393,0.158601,0.401689,0.362039,0.052194,0.188453,0.131418,0.34654,0.271633
fg3m,0.031839,0.33897,0.060671,0.512537,0.494028,0.569256,-0.050848,1.0,0.984159,0.58952,0.34147,0.268837,0.352018,-0.234495,0.132096,0.480875,0.386329,0.45696,-0.156865,0.554222
fg3a,0.0472,0.32638,0.048133,0.521027,0.500119,0.588161,-0.097441,0.984159,1.0,0.567381,0.356412,0.284632,0.352993,-0.244986,0.131054,0.507521,0.410626,0.48219,-0.167452,0.560556
fg3_pct,0.053466,0.252203,0.017836,0.286004,0.269023,0.317116,-0.027911,0.58952,0.567381,1.0,0.160462,0.099531,0.329966,-0.274225,-0.007294,0.334241,0.210239,0.295806,-0.21288,0.300943


In [79]:
# df.corr() allow us another correlation methods like KENDALL (ordinary) , SPEARMAN (range)
df_players_data.corr(method='kendall', numeric_only=True)

Unnamed: 0,player_id,gp,age,min,fgm,fga,fg_pct,fg3m,fg3a,fg3_pct,ftm,fta,ft_pct,oreb,dreb,ast,tov,stl,blk,pts
player_id,1.0,-0.049629,-0.367445,-0.091298,-0.037343,-0.043592,0.018327,0.084187,0.08852,0.048623,-0.078411,-0.078734,-0.020916,-0.085018,-0.040499,-0.058631,-0.094775,-0.072241,-0.021445,-0.037121
gp,-0.049629,1.0,0.025086,0.483078,0.457123,0.43232,0.222985,0.23341,0.213247,0.192389,0.398778,0.385246,0.186352,0.27539,0.413673,0.35434,0.383604,0.395249,0.269561,0.454796
age,-0.367445,0.025086,1.0,0.065926,0.018737,0.021939,-0.004999,0.038587,0.029461,0.037397,-0.002739,-0.018958,0.08703,-0.017574,0.049607,0.081839,0.016366,0.040269,-0.022152,0.02142
min,-0.091298,0.483078,0.065926,1.0,0.781914,0.779366,0.201871,0.351276,0.34642,0.220949,0.6467,0.628992,0.242477,0.34878,0.61843,0.581987,0.677397,0.631727,0.320444,0.784151
fgm,-0.037343,0.457123,0.018737,0.781914,1.0,0.894457,0.273434,0.356273,0.348955,0.233893,0.689373,0.663987,0.263192,0.343502,0.590564,0.540624,0.67412,0.565072,0.316351,0.936079
fga,-0.043592,0.43232,0.021939,0.779366,0.894457,1.0,0.157274,0.413936,0.416572,0.26436,0.667004,0.631318,0.290184,0.284559,0.542927,0.575909,0.676747,0.580583,0.261648,0.896753
fg_pct,0.018327,0.222985,-0.004999,0.201871,0.273434,0.157274,1.0,-0.150148,-0.185242,-0.052174,0.233467,0.26208,-0.051373,0.390849,0.333369,0.021933,0.162015,0.099594,0.356955,0.240671
fg3m,0.084187,0.23341,0.038587,0.351276,0.356273,0.413936,-0.150148,1.0,0.909042,0.650727,0.224482,0.167717,0.354643,-0.181761,0.119112,0.426025,0.267616,0.355069,-0.109751,0.393796
fg3a,0.08852,0.213247,0.029461,0.34642,0.348955,0.416572,-0.185242,0.909042,1.0,0.581729,0.225048,0.16906,0.343972,-0.181135,0.114461,0.434557,0.270954,0.362794,-0.116293,0.384332
fg3_pct,0.048623,0.192389,0.037397,0.220949,0.233893,0.26436,-0.052174,0.650727,0.581729,1.0,0.133207,0.082143,0.307455,-0.181975,0.039936,0.307474,0.160469,0.236315,-0.127662,0.260348


In [80]:
df_players_data.corr(method='spearman', numeric_only=True)

Unnamed: 0,player_id,gp,age,min,fgm,fga,fg_pct,fg3m,fg3a,fg3_pct,ftm,fta,ft_pct,oreb,dreb,ast,tov,stl,blk,pts
player_id,1.0,-0.073136,-0.518455,-0.135667,-0.054827,-0.064527,0.026698,0.11972,0.12963,0.070805,-0.115019,-0.115775,-0.031432,-0.124129,-0.059914,-0.086782,-0.137975,-0.104539,-0.029974,-0.05485
gp,-0.073136,1.0,0.037448,0.665861,0.632432,0.60614,0.312643,0.317638,0.300426,0.268493,0.555282,0.540295,0.263519,0.387633,0.57814,0.500269,0.533843,0.53881,0.370234,0.632718
age,-0.518455,0.037448,1.0,0.098127,0.028613,0.033211,-0.006317,0.052553,0.041192,0.051766,-0.00384,-0.026925,0.125349,-0.024619,0.072287,0.117791,0.024266,0.05769,-0.029978,0.032419
min,-0.135667,0.665861,0.098127,1.0,0.935055,0.934589,0.287379,0.470466,0.476145,0.315466,0.830362,0.817215,0.343924,0.491677,0.812719,0.772182,0.853757,0.805728,0.444801,0.938279
fgm,-0.054827,0.632432,0.028613,0.935055,1.0,0.981957,0.380555,0.469641,0.473428,0.333427,0.860957,0.842267,0.370083,0.483941,0.780337,0.726791,0.846736,0.739359,0.435937,0.99306
fga,-0.064527,0.60614,0.033211,0.934589,0.981957,1.0,0.220359,0.545842,0.561527,0.376517,0.845644,0.816882,0.408392,0.407024,0.735593,0.765854,0.851594,0.757664,0.365546,0.983846
fg_pct,0.026698,0.312643,-0.006317,0.287379,0.380555,0.220359,1.0,-0.229513,-0.281096,-0.103372,0.331263,0.372952,-0.066371,0.536507,0.460126,0.0332,0.23134,0.141388,0.480814,0.33917
fg3m,0.11972,0.317638,0.052553,0.470466,0.469641,0.545842,-0.229513,1.0,0.970327,0.794675,0.305945,0.22986,0.486999,-0.26634,0.146695,0.577157,0.364427,0.474927,-0.159113,0.518514
fg3a,0.12963,0.300426,0.041192,0.476145,0.473428,0.561527,-0.281096,0.970327,1.0,0.739364,0.315228,0.237878,0.486603,-0.277315,0.141777,0.600614,0.378848,0.497124,-0.175582,0.520817
fg3_pct,0.070805,0.268493,0.051766,0.315466,0.333427,0.376517,-0.103372,0.794675,0.739364,1.0,0.190752,0.118963,0.42364,-0.257111,0.056853,0.436321,0.22987,0.331129,-0.175179,0.371578


In [81]:
df_players_data.count() # returns the value in each DF column

player_id    10633
season_id    10633
gp           10633
age          10631
min          10633
fgm          10633
fga          10633
fg_pct       10633
fg3m         10633
fg3a         10633
fg3_pct      10633
ftm          10633
fta          10633
ft_pct       10633
oreb         10633
dreb         10633
ast          10633
tov          10633
stl          10633
blk          10633
pts          10633
dtype: int64

In [84]:
df_players_data.max(numeric_only=True)

player_id    1629541.0
gp                85.0
age               44.0
min               44.4
fgm               12.2
fga               27.8
fg_pct             1.0
fg3m               5.1
fg3a              13.2
fg3_pct            1.0
ftm                9.7
fta               13.1
ft_pct             1.0
oreb               6.8
dreb              11.5
ast               11.7
tov                5.7
stl                2.9
blk                3.9
pts               36.1
dtype: float64

In [85]:
df_players_data.min(numeric_only=True)

player_id     2.0
gp            1.0
age          18.0
min           0.1
fgm           0.0
fga           0.0
fg_pct        0.0
fg3m          0.0
fg3a          0.0
fg3_pct       0.0
ftm           0.0
fta           0.0
ft_pct        0.0
oreb          0.0
dreb          0.0
ast           0.0
tov           0.0
stl           0.0
blk           0.0
pts           0.0
dtype: float64

In [88]:
df_players_data.median(numeric_only=True) # This method is useful when you want to find the central position of the data without having extreme values significantly affect the result.

player_id    2501.000
gp             59.000
age            27.000
min            19.800
fgm             2.500
fga             5.700
fg_pct          0.437
fg3m            0.300
fg3a            1.000
fg3_pct         0.298
ftm             1.100
fta             1.500
ft_pct          0.745
oreb            0.700
dreb            2.200
ast             1.200
tov             1.000
stl             0.600
blk             0.300
pts             6.600
dtype: float64

In [92]:
df_players_data.std(numeric_only=True) # The standard deviation is a measure of dispersion that indicates how much the values of each column deviate from their mean

player_id    388278.044987
gp               25.115265
age               4.344703
min              10.099563
fgm               2.159450
fga               4.594357
fg_pct            0.097001
fg3m              0.661287
fg3a              1.743706
fg3_pct           0.179157
ftm               1.397361
fta               1.769024
ft_pct            0.195102
oreb              0.831041
dreb              1.789594
ast               1.789208
tov               0.791090
stl               0.449050
blk               0.481362
pts               5.903179
dtype: float64

# Dataframe Columns

In [2]:
import pandas as pd

In [3]:
df = pd.read_csv('../nba-stats-csv/player_stats_total.csv')

In [4]:
df_slim = df.sample(20)

In [6]:
list(df_slim) # retrieves all the column names

['player_name',
 'player_id',
 'season_id',
 'gp',
 'age',
 'min',
 'fgm',
 'fga',
 'fg3m',
 'ftm',
 'fta',
 'oreb',
 'dreb',
 'ast',
 'tov',
 'stl',
 'blk',
 'pts']

In [7]:
df_slim.player_name

1332          George Hill
25            Paul Pierce
153           Joe Johnson
2581     Jonathon Simmons
1940        Iman Shumpert
721             David Lee
1066          Jason Smith
335            David West
2421        Dwight Powell
490            Beno Udrih
608      Shaun Livingston
244         Manu Ginobili
1405    Marreese Speights
1046          Ian Mahinmi
1445    Russell Westbrook
2610        Mario Hezonja
1154          Mike Conley
1672      Wayne Ellington
33            Paul Pierce
2522          Tarik Black
Name: player_name, dtype: object

In [9]:
df_slim['pts']

1332     894.0
25      1769.0
153     1245.0
2581     331.0
1940     567.0
721      876.0
1066     420.0
335      554.0
2421      90.0
490      340.0
608      318.0
244     1237.0
1405     711.0
1046     354.0
1445    1558.0
2610     317.0
1154     788.0
1672     648.0
33      1430.0
2522     383.0
Name: pts, dtype: float64

In [11]:
type(df_slim['pts'])

pandas.core.series.Series

In [12]:
df_slim[['player_name', 'pts']]

Unnamed: 0,player_name,pts
1332,George Hill,894.0
25,Paul Pierce,1769.0
153,Joe Johnson,1245.0
2581,Jonathon Simmons,331.0
1940,Iman Shumpert,567.0
721,David Lee,876.0
1066,Jason Smith,420.0
335,David West,554.0
2421,Dwight Powell,90.0
490,Beno Udrih,340.0


In [15]:
list_of_columns = ['player_name', 'ftm', 'fta'] # This is a better way to introduce a list inside the args

In [14]:
df_slim[list_of_columns]

Unnamed: 0,player_name,ftm,fta
1332,George Hill,114.0,150.0
25,Paul Pierce,549.0,668.0
153,Joe Johnson,159.0,195.0
2581,Jonathon Simmons,69.0,92.0
1940,Iman Shumpert,71.0,90.0
721,David Lee,194.0,237.0
1066,Jason Smith,86.0,102.0
335,David West,63.0,80.0
2421,Dwight Powell,25.0,33.0
490,Beno Udrih,53.0,60.0


In [17]:
list(df_slim)

['player_name',
 'player_id',
 'season_id',
 'gp',
 'age',
 'min',
 'fgm',
 'fga',
 'fg3m',
 'ftm',
 'fta',
 'oreb',
 'dreb',
 'ast',
 'tov',
 'stl',
 'blk',
 'pts']

In [21]:
df_slim.iloc[:, 0] # search for the column on this index value --> 'player_name'

1332          George Hill
25            Paul Pierce
153           Joe Johnson
2581     Jonathon Simmons
1940        Iman Shumpert
721             David Lee
1066          Jason Smith
335            David West
2421        Dwight Powell
490            Beno Udrih
608      Shaun Livingston
244         Manu Ginobili
1405    Marreese Speights
1046          Ian Mahinmi
1445    Russell Westbrook
2610        Mario Hezonja
1154          Mike Conley
1672      Wayne Ellington
33            Paul Pierce
2522          Tarik Black
Name: player_name, dtype: object

In [22]:
df_slim.iloc[:, 1]

1332     201588
25         1718
153        2207
2581     203613
1940     202697
721      101135
1066     201160
335        2561
2421     203939
490        2757
608        2733
244        1938
1405     201578
1046     101133
1445     201566
2610    1626209
1154     201144
1672     201961
33         1718
2522     204028
Name: player_id, dtype: int64

In [24]:
df_slim.iloc[:, -1] # it returns the last column name

1332     894.0
25      1769.0
153     1245.0
2581     331.0
1940     567.0
721      876.0
1066     420.0
335      554.0
2421      90.0
490      340.0
608      318.0
244     1237.0
1405     711.0
1046     354.0
1445    1558.0
2610     317.0
1154     788.0
1672     648.0
33      1430.0
2522     383.0
Name: pts, dtype: float64

In [26]:
df_slim.iloc[:, 0:3] # pick a group of columns

Unnamed: 0,player_name,player_id,season_id
1332,George Hill,201588,2015-16
25,Paul Pierce,1718,2004-05
153,Joe Johnson,2207,2013-14
2581,Jonathon Simmons,203613,2015-16
1940,Iman Shumpert,202697,2016-17
721,David Lee,101135,2007-08
1066,Jason Smith,201160,2012-13
335,David West,2561,2015-16
2421,Dwight Powell,203939,2014-15
490,Beno Udrih,2757,2006-07


In [28]:
df_slim.iloc[:, [1,4,6]]

Unnamed: 0,player_id,age,fgm
1332,201588,30.0,326.0
25,1718,27.0,556.0
153,2207,33.0,462.0
2581,203613,26.0,122.0
1940,202697,26.0,201.0
721,101135,25.0,341.0
1066,201160,27.0,167.0
335,2561,35.0,244.0
2421,203939,23.0,31.0
490,2757,24.0,127.0


In [31]:
random_columns = [0,11,7,16]
df_slim.iloc[:, random_columns]

Unnamed: 0,player_name,oreb,fga,blk
1332,George Hill,58.0,739.0,17.0
25,Paul Pierce,78.0,1223.0,39.0
153,Joe Johnson,48.0,1018.0,10.0
2581,Jonathon Simmons,16.0,242.0,5.0
1940,Iman Shumpert,39.0,489.0,27.0
721,David Lee,242.0,618.0,29.0
1066,Jason Smith,64.0,341.0,45.0
335,David West,72.0,448.0,55.0
2421,Dwight Powell,18.0,67.0,6.0
490,Beno Udrih,11.0,344.0,1.0


### Adding & Deleting Columns

In [32]:
import pandas as pd

In [33]:
df = pd.read_csv('../nba-stats-csv/player_stats_total.csv')

In [34]:
df_slim = df.sample(20)

In [35]:
df_slim

Unnamed: 0,player_name,player_id,season_id,gp,age,min,fgm,fga,fg3m,ftm,fta,oreb,dreb,ast,tov,stl,blk,pts
2573,Jarell Eddie,204067,2015-16,26.0,24.0,146.767,20.0,65.0,15.0,8.0,8.0,2.0,21.0,5.0,1.0,5.0,1.0,63.0
2375,Tim Hardaway Jr.,203501,2016-17,79.0,25.0,2153.69,415.0,912.0,149.0,164.0,214.0,35.0,189.0,182.0,106.0,55.0,15.0,1143.0
2690,Jake Layman,1627774,2016-17,35.0,23.0,248.777,26.0,89.0,13.0,13.0,17.0,6.0,18.0,11.0,11.0,9.0,3.0,78.0
2371,Steven Adams,203500,2016-17,80.0,23.0,2389.4,374.0,655.0,0.0,157.0,257.0,281.0,332.0,86.0,146.0,89.0,78.0,905.0
1938,Iman Shumpert,202697,2014-15,62.0,25.0,1544.6,193.0,471.0,67.0,43.0,64.0,55.0,169.0,135.0,91.0,81.0,16.0,496.0
110,Mike Miller,2034,2001-02,63.0,22.0,2127.89,351.0,802.0,116.0,138.0,181.0,49.0,224.0,198.0,108.0,47.0,23.0,956.0
2306,Jeff Withey,203481,2014-15,37.0,25.0,258.585,32.0,64.0,0.0,34.0,50.0,23.0,41.0,11.0,12.0,4.0,18.0,98.0
1559,Jodie Meeks,201975,2013-14,77.0,26.0,2556.04,413.0,892.0,162.0,221.0,258.0,30.0,164.0,138.0,111.0,111.0,4.0,1209.0
399,Leandro Barbosa,2571,2009-10,44.0,27.0,785.603,155.0,365.0,44.0,64.0,73.0,11.0,58.0,64.0,46.0,23.0,12.0,418.0
995,Al Horford,201143,2008-09,67.0,23.0,2242.5,312.0,594.0,0.0,149.0,205.0,145.0,479.0,163.0,103.0,53.0,95.0,773.0


In [38]:
df_slim['ppg'] =  0 # Add a new column with default value = 0 for each row

In [37]:
df_slim

Unnamed: 0,player_name,player_id,season_id,gp,age,min,fgm,fga,fg3m,ftm,fta,oreb,dreb,ast,tov,stl,blk,pts,ppg
2573,Jarell Eddie,204067,2015-16,26.0,24.0,146.767,20.0,65.0,15.0,8.0,8.0,2.0,21.0,5.0,1.0,5.0,1.0,63.0,0
2375,Tim Hardaway Jr.,203501,2016-17,79.0,25.0,2153.69,415.0,912.0,149.0,164.0,214.0,35.0,189.0,182.0,106.0,55.0,15.0,1143.0,0
2690,Jake Layman,1627774,2016-17,35.0,23.0,248.777,26.0,89.0,13.0,13.0,17.0,6.0,18.0,11.0,11.0,9.0,3.0,78.0,0
2371,Steven Adams,203500,2016-17,80.0,23.0,2389.4,374.0,655.0,0.0,157.0,257.0,281.0,332.0,86.0,146.0,89.0,78.0,905.0,0
1938,Iman Shumpert,202697,2014-15,62.0,25.0,1544.6,193.0,471.0,67.0,43.0,64.0,55.0,169.0,135.0,91.0,81.0,16.0,496.0,0
110,Mike Miller,2034,2001-02,63.0,22.0,2127.89,351.0,802.0,116.0,138.0,181.0,49.0,224.0,198.0,108.0,47.0,23.0,956.0,0
2306,Jeff Withey,203481,2014-15,37.0,25.0,258.585,32.0,64.0,0.0,34.0,50.0,23.0,41.0,11.0,12.0,4.0,18.0,98.0,0
1559,Jodie Meeks,201975,2013-14,77.0,26.0,2556.04,413.0,892.0,162.0,221.0,258.0,30.0,164.0,138.0,111.0,111.0,4.0,1209.0,0
399,Leandro Barbosa,2571,2009-10,44.0,27.0,785.603,155.0,365.0,44.0,64.0,73.0,11.0,58.0,64.0,46.0,23.0,12.0,418.0,0
995,Al Horford,201143,2008-09,67.0,23.0,2242.5,312.0,594.0,0.0,149.0,205.0,145.0,479.0,163.0,103.0,53.0,95.0,773.0,0


In [43]:
df_slim['ppg'] = round((df_slim['pts'] / df_slim['gp']), 1)

In [44]:
df_slim

Unnamed: 0,player_name,player_id,season_id,gp,age,min,fgm,fga,fg3m,ftm,fta,oreb,dreb,ast,tov,stl,blk,pts,ppg
2573,Jarell Eddie,204067,2015-16,26.0,24.0,146.767,20.0,65.0,15.0,8.0,8.0,2.0,21.0,5.0,1.0,5.0,1.0,63.0,2.4
2375,Tim Hardaway Jr.,203501,2016-17,79.0,25.0,2153.69,415.0,912.0,149.0,164.0,214.0,35.0,189.0,182.0,106.0,55.0,15.0,1143.0,14.5
2690,Jake Layman,1627774,2016-17,35.0,23.0,248.777,26.0,89.0,13.0,13.0,17.0,6.0,18.0,11.0,11.0,9.0,3.0,78.0,2.2
2371,Steven Adams,203500,2016-17,80.0,23.0,2389.4,374.0,655.0,0.0,157.0,257.0,281.0,332.0,86.0,146.0,89.0,78.0,905.0,11.3
1938,Iman Shumpert,202697,2014-15,62.0,25.0,1544.6,193.0,471.0,67.0,43.0,64.0,55.0,169.0,135.0,91.0,81.0,16.0,496.0,8.0
110,Mike Miller,2034,2001-02,63.0,22.0,2127.89,351.0,802.0,116.0,138.0,181.0,49.0,224.0,198.0,108.0,47.0,23.0,956.0,15.2
2306,Jeff Withey,203481,2014-15,37.0,25.0,258.585,32.0,64.0,0.0,34.0,50.0,23.0,41.0,11.0,12.0,4.0,18.0,98.0,2.6
1559,Jodie Meeks,201975,2013-14,77.0,26.0,2556.04,413.0,892.0,162.0,221.0,258.0,30.0,164.0,138.0,111.0,111.0,4.0,1209.0,15.7
399,Leandro Barbosa,2571,2009-10,44.0,27.0,785.603,155.0,365.0,44.0,64.0,73.0,11.0,58.0,64.0,46.0,23.0,12.0,418.0,9.5
995,Al Horford,201143,2008-09,67.0,23.0,2242.5,312.0,594.0,0.0,149.0,205.0,145.0,479.0,163.0,103.0,53.0,95.0,773.0,11.5


In [46]:
# This method offers more posibilities
df_slim.insert(3, column = 'league', value = 'NBA')

In [47]:
df_slim

Unnamed: 0,player_name,player_id,season_id,league,gp,age,min,fgm,fga,fg3m,ftm,fta,oreb,dreb,ast,tov,stl,blk,pts,ppg
2573,Jarell Eddie,204067,2015-16,NBA,26.0,24.0,146.767,20.0,65.0,15.0,8.0,8.0,2.0,21.0,5.0,1.0,5.0,1.0,63.0,2.4
2375,Tim Hardaway Jr.,203501,2016-17,NBA,79.0,25.0,2153.69,415.0,912.0,149.0,164.0,214.0,35.0,189.0,182.0,106.0,55.0,15.0,1143.0,14.5
2690,Jake Layman,1627774,2016-17,NBA,35.0,23.0,248.777,26.0,89.0,13.0,13.0,17.0,6.0,18.0,11.0,11.0,9.0,3.0,78.0,2.2
2371,Steven Adams,203500,2016-17,NBA,80.0,23.0,2389.4,374.0,655.0,0.0,157.0,257.0,281.0,332.0,86.0,146.0,89.0,78.0,905.0,11.3
1938,Iman Shumpert,202697,2014-15,NBA,62.0,25.0,1544.6,193.0,471.0,67.0,43.0,64.0,55.0,169.0,135.0,91.0,81.0,16.0,496.0,8.0
110,Mike Miller,2034,2001-02,NBA,63.0,22.0,2127.89,351.0,802.0,116.0,138.0,181.0,49.0,224.0,198.0,108.0,47.0,23.0,956.0,15.2
2306,Jeff Withey,203481,2014-15,NBA,37.0,25.0,258.585,32.0,64.0,0.0,34.0,50.0,23.0,41.0,11.0,12.0,4.0,18.0,98.0,2.6
1559,Jodie Meeks,201975,2013-14,NBA,77.0,26.0,2556.04,413.0,892.0,162.0,221.0,258.0,30.0,164.0,138.0,111.0,111.0,4.0,1209.0,15.7
399,Leandro Barbosa,2571,2009-10,NBA,44.0,27.0,785.603,155.0,365.0,44.0,64.0,73.0,11.0,58.0,64.0,46.0,23.0,12.0,418.0,9.5
995,Al Horford,201143,2008-09,NBA,67.0,23.0,2242.5,312.0,594.0,0.0,149.0,205.0,145.0,479.0,163.0,103.0,53.0,95.0,773.0,11.5


In [48]:
del df_slim['ppg']

In [49]:
df_slim

Unnamed: 0,player_name,player_id,season_id,league,gp,age,min,fgm,fga,fg3m,ftm,fta,oreb,dreb,ast,tov,stl,blk,pts
2573,Jarell Eddie,204067,2015-16,NBA,26.0,24.0,146.767,20.0,65.0,15.0,8.0,8.0,2.0,21.0,5.0,1.0,5.0,1.0,63.0
2375,Tim Hardaway Jr.,203501,2016-17,NBA,79.0,25.0,2153.69,415.0,912.0,149.0,164.0,214.0,35.0,189.0,182.0,106.0,55.0,15.0,1143.0
2690,Jake Layman,1627774,2016-17,NBA,35.0,23.0,248.777,26.0,89.0,13.0,13.0,17.0,6.0,18.0,11.0,11.0,9.0,3.0,78.0
2371,Steven Adams,203500,2016-17,NBA,80.0,23.0,2389.4,374.0,655.0,0.0,157.0,257.0,281.0,332.0,86.0,146.0,89.0,78.0,905.0
1938,Iman Shumpert,202697,2014-15,NBA,62.0,25.0,1544.6,193.0,471.0,67.0,43.0,64.0,55.0,169.0,135.0,91.0,81.0,16.0,496.0
110,Mike Miller,2034,2001-02,NBA,63.0,22.0,2127.89,351.0,802.0,116.0,138.0,181.0,49.0,224.0,198.0,108.0,47.0,23.0,956.0
2306,Jeff Withey,203481,2014-15,NBA,37.0,25.0,258.585,32.0,64.0,0.0,34.0,50.0,23.0,41.0,11.0,12.0,4.0,18.0,98.0
1559,Jodie Meeks,201975,2013-14,NBA,77.0,26.0,2556.04,413.0,892.0,162.0,221.0,258.0,30.0,164.0,138.0,111.0,111.0,4.0,1209.0
399,Leandro Barbosa,2571,2009-10,NBA,44.0,27.0,785.603,155.0,365.0,44.0,64.0,73.0,11.0,58.0,64.0,46.0,23.0,12.0,418.0
995,Al Horford,201143,2008-09,NBA,67.0,23.0,2242.5,312.0,594.0,0.0,149.0,205.0,145.0,479.0,163.0,103.0,53.0,95.0,773.0


In [50]:
df_dropped = df_slim.drop('league', axis=1)

In [51]:
df_dropped

Unnamed: 0,player_name,player_id,season_id,gp,age,min,fgm,fga,fg3m,ftm,fta,oreb,dreb,ast,tov,stl,blk,pts
2573,Jarell Eddie,204067,2015-16,26.0,24.0,146.767,20.0,65.0,15.0,8.0,8.0,2.0,21.0,5.0,1.0,5.0,1.0,63.0
2375,Tim Hardaway Jr.,203501,2016-17,79.0,25.0,2153.69,415.0,912.0,149.0,164.0,214.0,35.0,189.0,182.0,106.0,55.0,15.0,1143.0
2690,Jake Layman,1627774,2016-17,35.0,23.0,248.777,26.0,89.0,13.0,13.0,17.0,6.0,18.0,11.0,11.0,9.0,3.0,78.0
2371,Steven Adams,203500,2016-17,80.0,23.0,2389.4,374.0,655.0,0.0,157.0,257.0,281.0,332.0,86.0,146.0,89.0,78.0,905.0
1938,Iman Shumpert,202697,2014-15,62.0,25.0,1544.6,193.0,471.0,67.0,43.0,64.0,55.0,169.0,135.0,91.0,81.0,16.0,496.0
110,Mike Miller,2034,2001-02,63.0,22.0,2127.89,351.0,802.0,116.0,138.0,181.0,49.0,224.0,198.0,108.0,47.0,23.0,956.0
2306,Jeff Withey,203481,2014-15,37.0,25.0,258.585,32.0,64.0,0.0,34.0,50.0,23.0,41.0,11.0,12.0,4.0,18.0,98.0
1559,Jodie Meeks,201975,2013-14,77.0,26.0,2556.04,413.0,892.0,162.0,221.0,258.0,30.0,164.0,138.0,111.0,111.0,4.0,1209.0
399,Leandro Barbosa,2571,2009-10,44.0,27.0,785.603,155.0,365.0,44.0,64.0,73.0,11.0,58.0,64.0,46.0,23.0,12.0,418.0
995,Al Horford,201143,2008-09,67.0,23.0,2242.5,312.0,594.0,0.0,149.0,205.0,145.0,479.0,163.0,103.0,53.0,95.0,773.0


In [52]:
# new_df = old_df[['col1', 'col2', 'col3']]
reb_df = df_slim[['player_name', 'oreb', 'dreb']]

In [53]:
reb_df

Unnamed: 0,player_name,oreb,dreb
2573,Jarell Eddie,2.0,21.0
2375,Tim Hardaway Jr.,35.0,189.0
2690,Jake Layman,6.0,18.0
2371,Steven Adams,281.0,332.0
1938,Iman Shumpert,55.0,169.0
110,Mike Miller,49.0,224.0
2306,Jeff Withey,23.0,41.0
1559,Jodie Meeks,30.0,164.0
399,Leandro Barbosa,11.0,58.0
995,Al Horford,145.0,479.0


### Renaming Columns

In [3]:
import pandas as pd

In [4]:
df = pd.read_csv('../nba-stats-csv/player_stats_total.csv')

In [69]:
df = df[['player_name', 'gp', 'min']]

In [70]:
df_slim = df.sample(10)

In [73]:
df_slim.columns = ['player_name', 'games_played', 'minutes'] # Changing name of columns

In [74]:
df_slim

Unnamed: 0,player_name,games_played,minutes
667,Andrew Bogut,32.0,785.702
1676,Wesley Matthews,69.0,2402.62
423,Udonis Haslem,81.0,2490.45
577,Luol Deng,74.0,2393.94
1283,Danilo Gallinari,28.0,411.745
1075,Jeff Green,81.0,2251.41
1422,Nicolas Batum,70.0,2447.71
1925,E'Twaun Moore,79.0,1505.73
2486,Langston Galloway,74.0,1494.47
1175,Ramon Sessions,61.0,1651.56


In [75]:
new_column_names = ['player', 'games_p', 'minutos']

In [76]:
df_slim.columns = new_column_names

In [77]:
df_slim

Unnamed: 0,player,games_p,minutos
667,Andrew Bogut,32.0,785.702
1676,Wesley Matthews,69.0,2402.62
423,Udonis Haslem,81.0,2490.45
577,Luol Deng,74.0,2393.94
1283,Danilo Gallinari,28.0,411.745
1075,Jeff Green,81.0,2251.41
1422,Nicolas Batum,70.0,2447.71
1925,E'Twaun Moore,79.0,1505.73
2486,Langston Galloway,74.0,1494.47
1175,Ramon Sessions,61.0,1651.56


In [80]:
df_new = df_slim.rename(columns = {
    'games_p': 'games_played',
    'minutos': 'minutes'
})

In [81]:
df_new

Unnamed: 0,player,games_played,minutes
667,Andrew Bogut,32.0,785.702
1676,Wesley Matthews,69.0,2402.62
423,Udonis Haslem,81.0,2490.45
577,Luol Deng,74.0,2393.94
1283,Danilo Gallinari,28.0,411.745
1075,Jeff Green,81.0,2251.41
1422,Nicolas Batum,70.0,2447.71
1925,E'Twaun Moore,79.0,1505.73
2486,Langston Galloway,74.0,1494.47
1175,Ramon Sessions,61.0,1651.56


### Selecting Rows

In [82]:
import pandas as pd

In [83]:
df = pd.read_csv('../nba-stats-csv/player_stats_total.csv')

In [85]:
df_slim = df.sample(20)

In [86]:
df_slim

Unnamed: 0,player_name,player_id,season_id,gp,age,min,fgm,fga,fg3m,ftm,fta,oreb,dreb,ast,tov,stl,blk,pts
1510,DeMarre Carroll,201960,2011-12,24.0,25.0,348.018,43.0,105.0,7.0,14.0,16.0,28.0,25.0,18.0,11.0,12.0,1.0,107.0
531,JR Smith,2747,2008-09,81.0,23.0,2244.62,424.0,950.0,180.0,205.0,272.0,43.0,254.0,227.0,150.0,78.0,14.0,1233.0
2523,Tim Frazier,204025,2014-15,11.0,24.0,239.498,21.0,61.0,5.0,10.0,21.0,6.0,22.0,60.0,25.0,8.0,0.0,57.0
186,Richard Jefferson,2210,2014-15,74.0,35.0,1244.15,144.0,324.0,66.0,78.0,114.0,25.0,158.0,61.0,52.0,32.0,11.0,432.0
1992,Lance Thomas,202498,2011-12,42.0,24.0,629.495,57.0,126.0,0.0,52.0,62.0,49.0,77.0,12.0,24.0,10.0,7.0,166.0
124,Mike Miller,2034,2015-16,47.0,36.0,372.96,22.0,62.0,19.0,0.0,0.0,6.0,47.0,40.0,17.0,13.0,4.0,63.0
2234,Tyler Zeller,203092,2014-15,82.0,25.0,1730.72,340.0,619.0,0.0,153.0,186.0,146.0,319.0,113.0,76.0,18.0,52.0,833.0
1357,Jerryd Bayless,201573,2013-14,72.0,25.0,1685.72,248.0,617.0,76.0,94.0,118.0,22.0,123.0,194.0,82.0,60.0,9.0,666.0
424,Udonis Haslem,2617,2006-07,79.0,27.0,2482.82,353.0,717.0,0.0,138.0,203.0,188.0,466.0,97.0,110.0,49.0,26.0,844.0
2343,Reggie Bullock,203493,2016-17,31.0,26.0,466.793,54.0,128.0,28.0,5.0,7.0,13.0,51.0,29.0,10.0,18.0,3.0,141.0


In [89]:
df_slim.iloc[8] # returns eighth row. Remember first row = 0

player_name    Udonis Haslem
player_id               2617
season_id            2006-07
gp                      79.0
age                     27.0
min                  2482.82
fgm                    353.0
fga                    717.0
fg3m                     0.0
ftm                    138.0
fta                    203.0
oreb                   188.0
dreb                   466.0
ast                     97.0
tov                    110.0
stl                     49.0
blk                     26.0
pts                    844.0
Name: 424, dtype: object

In [90]:
df_print = df_slim.iloc[1:11] # returns 10 rows from our DF

In [91]:
df_print

Unnamed: 0,player_name,player_id,season_id,gp,age,min,fgm,fga,fg3m,ftm,fta,oreb,dreb,ast,tov,stl,blk,pts
531,JR Smith,2747,2008-09,81.0,23.0,2244.62,424.0,950.0,180.0,205.0,272.0,43.0,254.0,227.0,150.0,78.0,14.0,1233.0
2523,Tim Frazier,204025,2014-15,11.0,24.0,239.498,21.0,61.0,5.0,10.0,21.0,6.0,22.0,60.0,25.0,8.0,0.0,57.0
186,Richard Jefferson,2210,2014-15,74.0,35.0,1244.15,144.0,324.0,66.0,78.0,114.0,25.0,158.0,61.0,52.0,32.0,11.0,432.0
1992,Lance Thomas,202498,2011-12,42.0,24.0,629.495,57.0,126.0,0.0,52.0,62.0,49.0,77.0,12.0,24.0,10.0,7.0,166.0
124,Mike Miller,2034,2015-16,47.0,36.0,372.96,22.0,62.0,19.0,0.0,0.0,6.0,47.0,40.0,17.0,13.0,4.0,63.0
2234,Tyler Zeller,203092,2014-15,82.0,25.0,1730.72,340.0,619.0,0.0,153.0,186.0,146.0,319.0,113.0,76.0,18.0,52.0,833.0
1357,Jerryd Bayless,201573,2013-14,72.0,25.0,1685.72,248.0,617.0,76.0,94.0,118.0,22.0,123.0,194.0,82.0,60.0,9.0,666.0
424,Udonis Haslem,2617,2006-07,79.0,27.0,2482.82,353.0,717.0,0.0,138.0,203.0,188.0,466.0,97.0,110.0,49.0,26.0,844.0
2343,Reggie Bullock,203493,2016-17,31.0,26.0,466.793,54.0,128.0,28.0,5.0,7.0,13.0,51.0,29.0,10.0,18.0,3.0,141.0
1821,Larry Sanders,202336,2014-15,27.0,26.0,585.528,86.0,172.0,0.0,24.0,48.0,67.0,99.0,23.0,28.0,26.0,39.0,196.0


In [25]:
celtics_dict = {
    'player_name': ['Jaylen Brown', 'Jayson Tatum', 'Derrick White', 'Jrue Holiday', 'Neemias Queta'],
    'ppg': [ 26.8, 30.3, 12.4, 14.1, 8.3 ],
    'rpg': [ 5.3, 8.2, 4.5, 4.7, 7.5 ],
    'apg': [ 4.4, 5.1, 6.3, 5.9, 0.6 ]
}

In [26]:
pos = ['sf', 'pf', 'pg', 'sg', 'c']
df = pd.DataFrame(celtics_dict, index=pos)

In [27]:
df

Unnamed: 0,player_name,ppg,rpg,apg
sf,Jaylen Brown,26.8,5.3,4.4
pf,Jayson Tatum,30.3,8.2,5.1
pg,Derrick White,12.4,4.5,6.3
sg,Jrue Holiday,14.1,4.7,5.9
c,Neemias Queta,8.3,7.5,0.6


In [28]:
df.loc['pg'] #  .loc -> uses strings | .iloc -> uses integer

player_name    Derrick White
ppg                     12.4
rpg                      4.5
apg                      6.3
Name: pg, dtype: object

In [29]:
combo_pos = ['sf','pf']
df.loc[combo_pos]

Unnamed: 0,player_name,ppg,rpg,apg
sf,Jaylen Brown,26.8,5.3,4.4
pf,Jayson Tatum,30.3,8.2,5.1


### Adding & Dropping Rows

In [30]:
import pandas as pd
import numpy as np

In [32]:
raw_data = {
    'player_name': ['Jaylen Brown', 'Jayson Tatum', 'Derrick White', np.nan, 'Neemias Queta'],
    'team': [ 'BOS', 'BOS', 'BOS', np.nan, 'BOS' ],
    'gp': [ 55, 82, 45, np.nan, np.nan ],
    'blocks': [ 44, 51, np.nan, np.nan, 60 ]
}

In [34]:
celtics_df = pd.DataFrame(raw_data, columns= ['player_name', 'team', 'gp', 'blocks'])

In [35]:
celtics_df

Unnamed: 0,player_name,team,gp,blocks
0,Jaylen Brown,BOS,55.0,44.0
1,Jayson Tatum,BOS,82.0,51.0
2,Derrick White,BOS,45.0,
3,,,,
4,Neemias Queta,BOS,,60.0


In [39]:
clean_df1 = celtics_df.dropna(axis=0, how='any') # 0: rows, 1: cols

In [41]:
clean_df1 # dropna: method that clean rows that contains cell with NaN values

Unnamed: 0,player_name,team,gp,blocks
0,Jaylen Brown,BOS,55.0,44.0
1,Jayson Tatum,BOS,82.0,51.0


In [42]:
clean_df2 = celtics_df.dropna(axis=0, how='all') # 0: rows, 1: cols

In [44]:
clean_df2 # how=all only removes rows in which each cell contains NaN as value

Unnamed: 0,player_name,team,gp,blocks
0,Jaylen Brown,BOS,55.0,44.0
1,Jayson Tatum,BOS,82.0,51.0
2,Derrick White,BOS,45.0,
4,Neemias Queta,BOS,,60.0


In [45]:
clean_df3 = celtics_df.dropna(subset= ['gp']) # 0: rows, 1: cols

In [46]:
clean_df3

Unnamed: 0,player_name,team,gp,blocks
0,Jaylen Brown,BOS,55.0,44.0
1,Jayson Tatum,BOS,82.0,51.0
2,Derrick White,BOS,45.0,


In [50]:
updated_df1 = celtics_df.fillna(0) # Replace NaN values with 0

In [51]:
updated_df1

Unnamed: 0,player_name,team,gp,blocks
0,Jaylen Brown,BOS,55.0,44.0
1,Jayson Tatum,BOS,82.0,51.0
2,Derrick White,BOS,45.0,0.0
3,0,0,0.0,0.0
4,Neemias Queta,BOS,0.0,60.0


In [52]:
jaylen_brown_stats = ['Jaylen Brown', '2018-19', 13.4], ['Jaylen Brown', '2019-2020', 19.4]

In [54]:
df_jaylen_brown = pd.DataFrame(jaylen_brown_stats, columns = ['player', 'season', 'ppg'])

In [55]:
df_jaylen_brown

Unnamed: 0,player,season,ppg
0,Jaylen Brown,2018-19,13.4
1,Jaylen Brown,2019-2020,19.4


In [57]:
new_season = pd.DataFrame([['Jaylen Brown', '2020-21', 20.8]], columns = ['player', 'season', 'ppg'])

In [58]:
new_season

Unnamed: 0,player,season,ppg
0,Jaylen Brown,2020-21,20.8


In [62]:
df_all_jaylen_brown_seasons = pd.concat([
    df_jaylen_brown, 
    new_season
], ignore_index=True)

In [63]:
df_all_jaylen_brown_seasons

Unnamed: 0,player,season,ppg
0,Jaylen Brown,2018-19,13.4
1,Jaylen Brown,2019-2020,19.4
2,Jaylen Brown,2020-21,20.8


### Implace Parameter

In [100]:
import pandas as pd

In [101]:
df = pd.read_csv('../nba-stats-csv/player_stats_total.csv')

In [102]:
df_slim = df.sample(20)

In [103]:
df_slim

Unnamed: 0,player_name,player_id,season_id,gp,age,min,fgm,fga,fg3m,ftm,fta,oreb,dreb,ast,tov,stl,blk,pts
1442,Russell Westbrook,201566,2008-09,82.0,20.0,2668.03,436.0,1095.0,35.0,349.0,428.0,178.0,221.0,435.0,274.0,110.0,16.0,1256.0
74,Jason Terry,1891,2016-17,74.0,39.0,1364.9,105.0,243.0,73.0,24.0,29.0,15.0,91.0,98.0,36.0,46.0,20.0,307.0
1679,Wesley Matthews,202083,2015-16,78.0,29.0,2644.3,331.0,854.0,189.0,126.0,146.0,27.0,211.0,151.0,78.0,78.0,17.0,977.0
1140,Marco Belinelli,201158,2007-08,33.0,22.0,240.565,36.0,93.0,16.0,7.0,9.0,3.0,11.0,15.0,12.0,5.0,0.0,95.0
2671,Davis Bertans,202722,2016-17,67.0,24.0,807.608,103.0,234.0,69.0,28.0,34.0,22.0,76.0,46.0,32.0,20.0,28.0,303.0
610,Shaun Livingston,2733,2013-14,76.0,28.0,1974.18,235.0,487.0,1.0,158.0,191.0,67.0,179.0,245.0,105.0,93.0,31.0,629.0
2620,Norman Powell,1626181,2016-17,76.0,23.0,1367.65,227.0,506.0,56.0,126.0,159.0,26.0,143.0,82.0,70.0,52.0,14.0,636.0
1935,Iman Shumpert,202697,2011-12,59.0,22.0,1705.47,214.0,534.0,48.0,87.0,109.0,42.0,144.0,164.0,111.0,101.0,8.0,563.0
366,Kyle Korver,2594,2004-05,82.0,24.0,2666.5,317.0,759.0,226.0,82.0,96.0,40.0,339.0,182.0,106.0,103.0,33.0,942.0
1542,James Johnson,201949,2012-13,54.0,26.0,878.55,117.0,283.0,2.0,40.0,67.0,49.0,96.0,58.0,68.0,41.0,50.0,276.0


In [104]:
df_slim.sort_values('gp') # This form is stored just on the fly. It was saved into a variable

Unnamed: 0,player_name,player_id,season_id,gp,age,min,fgm,fga,fg3m,ftm,fta,oreb,dreb,ast,tov,stl,blk,pts
1412,Michael Beasley,201563,2014-15,24.0,26.0,504.805,92.0,212.0,8.0,20.0,26.0,12.0,77.0,32.0,37.0,15.0,13.0,212.0
2551,Chris McCullough,1626191,2015-16,24.0,21.0,362.12,44.0,109.0,13.0,11.0,23.0,25.0,43.0,9.0,15.0,28.0,12.0,112.0
1140,Marco Belinelli,201158,2007-08,33.0,22.0,240.565,36.0,93.0,16.0,7.0,9.0,3.0,11.0,15.0,12.0,5.0,0.0,95.0
1542,James Johnson,201949,2012-13,54.0,26.0,878.55,117.0,283.0,2.0,40.0,67.0,49.0,96.0,58.0,68.0,41.0,50.0,276.0
1935,Iman Shumpert,202697,2011-12,59.0,22.0,1705.47,214.0,534.0,48.0,87.0,109.0,42.0,144.0,164.0,111.0,101.0,8.0,563.0
2213,Quincy Acy,203112,2013-14,63.0,23.0,847.712,66.0,141.0,4.0,35.0,53.0,72.0,144.0,28.0,30.0,23.0,26.0,171.0
2671,Davis Bertans,202722,2016-17,67.0,24.0,807.608,103.0,234.0,69.0,28.0,34.0,22.0,76.0,46.0,32.0,20.0,28.0,303.0
2233,Tyler Zeller,203092,2013-14,70.0,24.0,1049.41,156.0,290.0,0.0,87.0,121.0,103.0,179.0,36.0,60.0,18.0,38.0,399.0
2222,Terrence Ross,203082,2012-13,73.0,22.0,1239.26,186.0,457.0,65.0,30.0,42.0,35.0,109.0,53.0,48.0,43.0,14.0,467.0
74,Jason Terry,1891,2016-17,74.0,39.0,1364.9,105.0,243.0,73.0,24.0,29.0,15.0,91.0,98.0,36.0,46.0,20.0,307.0


In [105]:
df_slim

Unnamed: 0,player_name,player_id,season_id,gp,age,min,fgm,fga,fg3m,ftm,fta,oreb,dreb,ast,tov,stl,blk,pts
1442,Russell Westbrook,201566,2008-09,82.0,20.0,2668.03,436.0,1095.0,35.0,349.0,428.0,178.0,221.0,435.0,274.0,110.0,16.0,1256.0
74,Jason Terry,1891,2016-17,74.0,39.0,1364.9,105.0,243.0,73.0,24.0,29.0,15.0,91.0,98.0,36.0,46.0,20.0,307.0
1679,Wesley Matthews,202083,2015-16,78.0,29.0,2644.3,331.0,854.0,189.0,126.0,146.0,27.0,211.0,151.0,78.0,78.0,17.0,977.0
1140,Marco Belinelli,201158,2007-08,33.0,22.0,240.565,36.0,93.0,16.0,7.0,9.0,3.0,11.0,15.0,12.0,5.0,0.0,95.0
2671,Davis Bertans,202722,2016-17,67.0,24.0,807.608,103.0,234.0,69.0,28.0,34.0,22.0,76.0,46.0,32.0,20.0,28.0,303.0
610,Shaun Livingston,2733,2013-14,76.0,28.0,1974.18,235.0,487.0,1.0,158.0,191.0,67.0,179.0,245.0,105.0,93.0,31.0,629.0
2620,Norman Powell,1626181,2016-17,76.0,23.0,1367.65,227.0,506.0,56.0,126.0,159.0,26.0,143.0,82.0,70.0,52.0,14.0,636.0
1935,Iman Shumpert,202697,2011-12,59.0,22.0,1705.47,214.0,534.0,48.0,87.0,109.0,42.0,144.0,164.0,111.0,101.0,8.0,563.0
366,Kyle Korver,2594,2004-05,82.0,24.0,2666.5,317.0,759.0,226.0,82.0,96.0,40.0,339.0,182.0,106.0,103.0,33.0,942.0
1542,James Johnson,201949,2012-13,54.0,26.0,878.55,117.0,283.0,2.0,40.0,67.0,49.0,96.0,58.0,68.0,41.0,50.0,276.0


In [108]:
df_slim.sort_values('gp', inplace=True) # inplace modified our DF without needing to store them in a variable

In [107]:
df_slim

Unnamed: 0,player_name,player_id,season_id,gp,age,min,fgm,fga,fg3m,ftm,fta,oreb,dreb,ast,tov,stl,blk,pts
1412,Michael Beasley,201563,2014-15,24.0,26.0,504.805,92.0,212.0,8.0,20.0,26.0,12.0,77.0,32.0,37.0,15.0,13.0,212.0
2551,Chris McCullough,1626191,2015-16,24.0,21.0,362.12,44.0,109.0,13.0,11.0,23.0,25.0,43.0,9.0,15.0,28.0,12.0,112.0
1140,Marco Belinelli,201158,2007-08,33.0,22.0,240.565,36.0,93.0,16.0,7.0,9.0,3.0,11.0,15.0,12.0,5.0,0.0,95.0
1542,James Johnson,201949,2012-13,54.0,26.0,878.55,117.0,283.0,2.0,40.0,67.0,49.0,96.0,58.0,68.0,41.0,50.0,276.0
1935,Iman Shumpert,202697,2011-12,59.0,22.0,1705.47,214.0,534.0,48.0,87.0,109.0,42.0,144.0,164.0,111.0,101.0,8.0,563.0
2213,Quincy Acy,203112,2013-14,63.0,23.0,847.712,66.0,141.0,4.0,35.0,53.0,72.0,144.0,28.0,30.0,23.0,26.0,171.0
2671,Davis Bertans,202722,2016-17,67.0,24.0,807.608,103.0,234.0,69.0,28.0,34.0,22.0,76.0,46.0,32.0,20.0,28.0,303.0
2233,Tyler Zeller,203092,2013-14,70.0,24.0,1049.41,156.0,290.0,0.0,87.0,121.0,103.0,179.0,36.0,60.0,18.0,38.0,399.0
2222,Terrence Ross,203082,2012-13,73.0,22.0,1239.26,186.0,457.0,65.0,30.0,42.0,35.0,109.0,53.0,48.0,43.0,14.0,467.0
74,Jason Terry,1891,2016-17,74.0,39.0,1364.9,105.0,243.0,73.0,24.0,29.0,15.0,91.0,98.0,36.0,46.0,20.0,307.0


In [111]:
df_slim.sort_values('pts', inplace=False) # Sort by pts just on the fly, next time we'll see data ordered by default parameters

Unnamed: 0,player_name,player_id,season_id,gp,age,min,fgm,fga,fg3m,ftm,fta,oreb,dreb,ast,tov,stl,blk,pts
1140,Marco Belinelli,201158,2007-08,33.0,22.0,240.565,36.0,93.0,16.0,7.0,9.0,3.0,11.0,15.0,12.0,5.0,0.0,95.0
2551,Chris McCullough,1626191,2015-16,24.0,21.0,362.12,44.0,109.0,13.0,11.0,23.0,25.0,43.0,9.0,15.0,28.0,12.0,112.0
2213,Quincy Acy,203112,2013-14,63.0,23.0,847.712,66.0,141.0,4.0,35.0,53.0,72.0,144.0,28.0,30.0,23.0,26.0,171.0
1412,Michael Beasley,201563,2014-15,24.0,26.0,504.805,92.0,212.0,8.0,20.0,26.0,12.0,77.0,32.0,37.0,15.0,13.0,212.0
1542,James Johnson,201949,2012-13,54.0,26.0,878.55,117.0,283.0,2.0,40.0,67.0,49.0,96.0,58.0,68.0,41.0,50.0,276.0
2671,Davis Bertans,202722,2016-17,67.0,24.0,807.608,103.0,234.0,69.0,28.0,34.0,22.0,76.0,46.0,32.0,20.0,28.0,303.0
74,Jason Terry,1891,2016-17,74.0,39.0,1364.9,105.0,243.0,73.0,24.0,29.0,15.0,91.0,98.0,36.0,46.0,20.0,307.0
2233,Tyler Zeller,203092,2013-14,70.0,24.0,1049.41,156.0,290.0,0.0,87.0,121.0,103.0,179.0,36.0,60.0,18.0,38.0,399.0
2222,Terrence Ross,203082,2012-13,73.0,22.0,1239.26,186.0,457.0,65.0,30.0,42.0,35.0,109.0,53.0,48.0,43.0,14.0,467.0
1894,Bismack Biyombo,202687,2016-17,81.0,24.0,1793.3,179.0,339.0,0.0,125.0,234.0,157.0,410.0,74.0,95.0,25.0,91.0,483.0


In [110]:
df_slim

Unnamed: 0,player_name,player_id,season_id,gp,age,min,fgm,fga,fg3m,ftm,fta,oreb,dreb,ast,tov,stl,blk,pts
1412,Michael Beasley,201563,2014-15,24.0,26.0,504.805,92.0,212.0,8.0,20.0,26.0,12.0,77.0,32.0,37.0,15.0,13.0,212.0
2551,Chris McCullough,1626191,2015-16,24.0,21.0,362.12,44.0,109.0,13.0,11.0,23.0,25.0,43.0,9.0,15.0,28.0,12.0,112.0
1140,Marco Belinelli,201158,2007-08,33.0,22.0,240.565,36.0,93.0,16.0,7.0,9.0,3.0,11.0,15.0,12.0,5.0,0.0,95.0
1542,James Johnson,201949,2012-13,54.0,26.0,878.55,117.0,283.0,2.0,40.0,67.0,49.0,96.0,58.0,68.0,41.0,50.0,276.0
1935,Iman Shumpert,202697,2011-12,59.0,22.0,1705.47,214.0,534.0,48.0,87.0,109.0,42.0,144.0,164.0,111.0,101.0,8.0,563.0
2213,Quincy Acy,203112,2013-14,63.0,23.0,847.712,66.0,141.0,4.0,35.0,53.0,72.0,144.0,28.0,30.0,23.0,26.0,171.0
2671,Davis Bertans,202722,2016-17,67.0,24.0,807.608,103.0,234.0,69.0,28.0,34.0,22.0,76.0,46.0,32.0,20.0,28.0,303.0
2233,Tyler Zeller,203092,2013-14,70.0,24.0,1049.41,156.0,290.0,0.0,87.0,121.0,103.0,179.0,36.0,60.0,18.0,38.0,399.0
2222,Terrence Ross,203082,2012-13,73.0,22.0,1239.26,186.0,457.0,65.0,30.0,42.0,35.0,109.0,53.0,48.0,43.0,14.0,467.0
74,Jason Terry,1891,2016-17,74.0,39.0,1364.9,105.0,243.0,73.0,24.0,29.0,15.0,91.0,98.0,36.0,46.0,20.0,307.0


In [113]:
df_sorted_pts = df_slim.sort_values('pts', inplace=False) # now the sorted DF will be stored in a variable

### Sorting Dataframes

In [117]:
import pandas as pd
import numpy as np

In [118]:
celtics_stats = {
    'player_name': ['Jaylen Brown', 'Jayson Tatum', 'Derrick White', 'Jrue Holiday', 'Neemias Queta'],
    'ppg': [ 26.8, 30.3, 12.4, 14.1, 8.3 ],
    'rpg': [ 5.3, 8.2, 4.5, 4.7, 7.5 ],
    'apg': [ 4.4, 5.1, 6.3, 5.9, 0.6 ]
}

In [120]:
unsorted_df = pd.DataFrame(celtics_stats, index=[1,4,2,3,0])

In [121]:
unsorted_df

Unnamed: 0,player_name,ppg,rpg,apg
1,Jaylen Brown,26.8,5.3,4.4
4,Jayson Tatum,30.3,8.2,5.1
2,Derrick White,12.4,4.5,6.3
3,Jrue Holiday,14.1,4.7,5.9
0,Neemias Queta,8.3,7.5,0.6


In [122]:
sorted_df = unsorted_df.sort_index()

In [123]:
sorted_df

Unnamed: 0,player_name,ppg,rpg,apg
0,Neemias Queta,8.3,7.5,0.6
1,Jaylen Brown,26.8,5.3,4.4
2,Derrick White,12.4,4.5,6.3
3,Jrue Holiday,14.1,4.7,5.9
4,Jayson Tatum,30.3,8.2,5.1


In [127]:
sorted_points = sorted_df.sort_values('ppg', ascending=False)

In [128]:
sorted_points

Unnamed: 0,player_name,ppg,rpg,apg
4,Jayson Tatum,30.3,8.2,5.1
1,Jaylen Brown,26.8,5.3,4.4
3,Jrue Holiday,14.1,4.7,5.9
2,Derrick White,12.4,4.5,6.3
0,Neemias Queta,8.3,7.5,0.6


In [129]:
sorted_points = sorted_df.sort_values('ppg')

In [130]:
sorted_points

Unnamed: 0,player_name,ppg,rpg,apg
0,Neemias Queta,8.3,7.5,0.6
2,Derrick White,12.4,4.5,6.3
3,Jrue Holiday,14.1,4.7,5.9
1,Jaylen Brown,26.8,5.3,4.4
4,Jayson Tatum,30.3,8.2,5.1


In [131]:
celtics_stats = {
    'player_name': ['Jaylen Brown', 'Jayson Tatum', 'Derrick White', 'Jrue Holiday', 'Neemias Queta'],
    'ppg': [ 26.8, 30.3, np.nan, 14.1, np.nan ],
    'rpg': [ 5.3, 8.2, 4.5, 4.7, 7.5 ],
    'apg': [ 4.4, 5.1, 6.3, 5.9, 0.6 ]
}

In [132]:
df_missing = pd.DataFrame(celtics_stats)

In [136]:
# Actually, we can pass 'ppg' or and array with more columns ['ppg', 'apg', ...]
sorted_by_points = df_missing.sort_values('ppg', ascending= False, na_position='first')

In [135]:
sorted_by_points

Unnamed: 0,player_name,ppg,rpg,apg
2,Derrick White,,4.5,6.3
4,Neemias Queta,,7.5,0.6
1,Jayson Tatum,30.3,8.2,5.1
0,Jaylen Brown,26.8,5.3,4.4
3,Jrue Holiday,14.1,4.7,5.9


### Filtering Dataframes

In [138]:
import pandas as pd
import numpy as np

In [139]:
df = pd.read_csv('../nba-stats-csv/nba_season_stats.csv')

In [140]:
df.sample(5)

Unnamed: 0,Player,Age,Tm,G,GS,MP,FG,FGA,FG%,3P,...,FT%,ORB,DRB,TRB,AST,STL,BLK,TOV,PF,PTS
516,Nik Stauskas,24,PHI,6,0,45,1,4,0.25,0,...,1.0,0,1,1,1,4,0,4,3,4
342,Kyle Lowry,31,TOR,78,78,2510,403,944,0.427,238,...,0.854,66,368,434,537,85,19,183,192,1267
27,Dwayne Bacon,22,CHO,53,6,713,72,192,0.375,11,...,0.8,4,120,124,38,16,2,23,46,175
541,Evan Turner,29,POR,79,40,2034,258,577,0.447,42,...,0.85,30,214,244,173,47,29,99,163,649
73,Dillon Brooks,22,MEM,82,74,2350,340,772,0.44,94,...,0.747,49,208,257,135,73,17,124,233,898


In [141]:
df['Tm'] == 'OKC'

0       True
1      False
2       True
3      False
4      False
       ...  
600    False
601    False
602    False
603    False
604    False
Name: Tm, Length: 605, dtype: bool

In [144]:
okc_df = df[df['Tm'] == 'OKC']

In [145]:
okc_df

Unnamed: 0,Player,Age,Tm,G,GS,MP,FG,FGA,FG%,3P,...,FT%,ORB,DRB,TRB,AST,STL,BLK,TOV,PF,PTS
0,Alex Abrines,24,OKC,75,8,1134,115,291,0.395,84,...,0.848,26,88,114,28,38,8,25,124,353
2,Steven Adams,24,OKC,76,76,2487,448,712,0.629,0,...,0.559,384,301,685,88,92,78,128,215,1056
16,Carmelo Anthony,33,OKC,78,78,2501,472,1168,0.404,169,...,0.767,67,386,453,103,47,49,99,197,1261
70,Corey Brewer,31,OKC,18,16,514,64,144,0.444,23,...,0.795,20,41,61,24,38,6,11,56,182
116,Nick Collison,37,OKC,15,0,75,13,19,0.684,0,...,0.385,7,13,20,4,0,0,7,7,31
150,PJ Dozier,21,OKC,2,0,3,1,2,0.5,0,...,,0,1,1,0,0,0,1,1,2
175,Raymond Felton,33,OKC,82,2,1365,224,552,0.406,81,...,0.818,25,131,156,203,49,16,76,91,565
176,Terrance Ferguson,19,OKC,61,12,763,70,169,0.414,40,...,0.9,19,28,47,19,24,10,11,83,189
191,Paul George,27,OKC,79,79,2891,576,1340,0.43,244,...,0.822,72,375,447,263,161,39,212,233,1734
201,Jerami Grant,23,OKC,81,1,1647,244,456,0.535,32,...,0.675,86,233,319,57,31,77,54,155,682


In [149]:
okc = (df['Tm'] == 'OKC') # Prettier way to do the same that we made above

In [150]:
df[okc]

Unnamed: 0,Player,Age,Tm,G,GS,MP,FG,FGA,FG%,3P,...,FT%,ORB,DRB,TRB,AST,STL,BLK,TOV,PF,PTS
0,Alex Abrines,24,OKC,75,8,1134,115,291,0.395,84,...,0.848,26,88,114,28,38,8,25,124,353
2,Steven Adams,24,OKC,76,76,2487,448,712,0.629,0,...,0.559,384,301,685,88,92,78,128,215,1056
16,Carmelo Anthony,33,OKC,78,78,2501,472,1168,0.404,169,...,0.767,67,386,453,103,47,49,99,197,1261
70,Corey Brewer,31,OKC,18,16,514,64,144,0.444,23,...,0.795,20,41,61,24,38,6,11,56,182
116,Nick Collison,37,OKC,15,0,75,13,19,0.684,0,...,0.385,7,13,20,4,0,0,7,7,31
150,PJ Dozier,21,OKC,2,0,3,1,2,0.5,0,...,,0,1,1,0,0,0,1,1,2
175,Raymond Felton,33,OKC,82,2,1365,224,552,0.406,81,...,0.818,25,131,156,203,49,16,76,91,565
176,Terrance Ferguson,19,OKC,61,12,763,70,169,0.414,40,...,0.9,19,28,47,19,24,10,11,83,189
191,Paul George,27,OKC,79,79,2891,576,1340,0.43,244,...,0.822,72,375,447,263,161,39,212,233,1734
201,Jerami Grant,23,OKC,81,1,1647,244,456,0.535,32,...,0.675,86,233,319,57,31,77,54,155,682


In [151]:
dual_df = df[(df['Tm'] == 'OKC') | (df['Tm'] == 'LAL')]

In [152]:
dual_df

Unnamed: 0,Player,Age,Tm,G,GS,MP,FG,FGA,FG%,3P,...,FT%,ORB,DRB,TRB,AST,STL,BLK,TOV,PF,PTS
0,Alex Abrines,24,OKC,75,8,1134,115,291,0.395,84,...,0.848,26,88,114,28,38,8,25,124,353
2,Steven Adams,24,OKC,76,76,2487,448,712,0.629,0,...,0.559,384,301,685,88,92,78,128,215,1056
16,Carmelo Anthony,33,OKC,78,78,2501,472,1168,0.404,169,...,0.767,67,386,453,103,47,49,99,197,1261
30,Lonzo Ball,20,LAL,52,50,1780,203,564,0.36,90,...,0.451,69,291,360,376,88,43,136,117,528
56,Vander Blue,25,LAL,5,0,45,1,5,0.2,0,...,0.5,0,1,1,3,1,0,3,2,3
59,Andrew Bogut,33,LAL,23,5,216,17,25,0.68,0,...,1.0,26,52,78,15,4,13,19,40,36
69,Corey Brewer,31,LAL,54,2,694,78,172,0.453,8,...,0.667,23,70,93,41,41,8,38,83,198
70,Corey Brewer,31,OKC,18,16,514,64,144,0.444,23,...,0.795,20,41,61,24,38,6,11,56,182
82,Thomas Bryant,20,LAL,15,0,72,8,21,0.381,1,...,0.556,3,14,17,6,1,2,2,6,22
91,Kentavious Caldwell-Pope,24,LAL,74,74,2458,340,798,0.426,159,...,0.789,60,327,387,162,106,16,97,145,992


In [153]:
# Easiest way to do the same above
okc = (df['Tm'] == 'OKC') 
lal = (df['Tm'] == 'LAL')

In [154]:
df[okc | lal]

Unnamed: 0,Player,Age,Tm,G,GS,MP,FG,FGA,FG%,3P,...,FT%,ORB,DRB,TRB,AST,STL,BLK,TOV,PF,PTS
0,Alex Abrines,24,OKC,75,8,1134,115,291,0.395,84,...,0.848,26,88,114,28,38,8,25,124,353
2,Steven Adams,24,OKC,76,76,2487,448,712,0.629,0,...,0.559,384,301,685,88,92,78,128,215,1056
16,Carmelo Anthony,33,OKC,78,78,2501,472,1168,0.404,169,...,0.767,67,386,453,103,47,49,99,197,1261
30,Lonzo Ball,20,LAL,52,50,1780,203,564,0.36,90,...,0.451,69,291,360,376,88,43,136,117,528
56,Vander Blue,25,LAL,5,0,45,1,5,0.2,0,...,0.5,0,1,1,3,1,0,3,2,3
59,Andrew Bogut,33,LAL,23,5,216,17,25,0.68,0,...,1.0,26,52,78,15,4,13,19,40,36
69,Corey Brewer,31,LAL,54,2,694,78,172,0.453,8,...,0.667,23,70,93,41,41,8,38,83,198
70,Corey Brewer,31,OKC,18,16,514,64,144,0.444,23,...,0.795,20,41,61,24,38,6,11,56,182
82,Thomas Bryant,20,LAL,15,0,72,8,21,0.381,1,...,0.556,3,14,17,6,1,2,2,6,22
91,Kentavious Caldwell-Pope,24,LAL,74,74,2458,340,798,0.426,159,...,0.789,60,327,387,162,106,16,97,145,992


In [157]:
df[(df['Tm'] == 'OKC') & (df['GS'] > 10)] # Game Started > 10

Unnamed: 0,Player,Age,Tm,G,GS,MP,FG,FGA,FG%,3P,...,FT%,ORB,DRB,TRB,AST,STL,BLK,TOV,PF,PTS
2,Steven Adams,24,OKC,76,76,2487,448,712,0.629,0,...,0.559,384,301,685,88,92,78,128,215,1056
16,Carmelo Anthony,33,OKC,78,78,2501,472,1168,0.404,169,...,0.767,67,386,453,103,47,49,99,197,1261
70,Corey Brewer,31,OKC,18,16,514,64,144,0.444,23,...,0.795,20,41,61,24,38,6,11,56,182
176,Terrance Ferguson,19,OKC,61,12,763,70,169,0.414,40,...,0.9,19,28,47,19,24,10,11,83,189
191,Paul George,27,OKC,79,79,2891,576,1340,0.43,244,...,0.822,72,375,447,263,161,39,212,233,1734
481,Andre Roberson,26,OKC,39,39,1037,87,162,0.537,8,...,0.316,75,110,185,46,45,35,30,89,194
569,Russell Westbrook,29,OKC,80,80,2914,757,1687,0.449,97,...,0.737,152,652,804,820,147,20,381,200,2028


In [158]:
okc = (df['Tm'] == 'OKC') 
game_started = (df['GS'] > 10)

In [160]:
and_df = df[okc & game_started] 

In [161]:
and_df

Unnamed: 0,Player,Age,Tm,G,GS,MP,FG,FGA,FG%,3P,...,FT%,ORB,DRB,TRB,AST,STL,BLK,TOV,PF,PTS
2,Steven Adams,24,OKC,76,76,2487,448,712,0.629,0,...,0.559,384,301,685,88,92,78,128,215,1056
16,Carmelo Anthony,33,OKC,78,78,2501,472,1168,0.404,169,...,0.767,67,386,453,103,47,49,99,197,1261
70,Corey Brewer,31,OKC,18,16,514,64,144,0.444,23,...,0.795,20,41,61,24,38,6,11,56,182
176,Terrance Ferguson,19,OKC,61,12,763,70,169,0.414,40,...,0.9,19,28,47,19,24,10,11,83,189
191,Paul George,27,OKC,79,79,2891,576,1340,0.43,244,...,0.822,72,375,447,263,161,39,212,233,1734
481,Andre Roberson,26,OKC,39,39,1037,87,162,0.537,8,...,0.316,75,110,185,46,45,35,30,89,194
569,Russell Westbrook,29,OKC,80,80,2914,757,1687,0.449,97,...,0.737,152,652,804,820,147,20,381,200,2028


In [162]:
fg_50 = (df['FG%'] > .50)
three_point_40 = (df['3P%'] > .40)
ft_90 = (df['FT%'] > .90)

In [163]:
club_50_40_90 = df[fg_50 & three_point_40 & ft_90]

In [164]:
club_50_40_90

Unnamed: 0,Player,Age,Tm,G,GS,MP,FG,FGA,FG%,3P,...,FT%,ORB,DRB,TRB,AST,STL,BLK,TOV,PF,PTS
111,Antonius Cleveland,23,ATL,4,0,42,4,7,0.571,3,...,1.0,1,3,4,0,1,1,5,12,13
519,David Stockton,26,UTA,3,0,9,2,3,0.667,2,...,1.0,0,0,0,0,0,0,1,3,10


In [166]:
dual_combined = df[(okc | lal) & game_started]

In [167]:
dual_combined

Unnamed: 0,Player,Age,Tm,G,GS,MP,FG,FGA,FG%,3P,...,FT%,ORB,DRB,TRB,AST,STL,BLK,TOV,PF,PTS
2,Steven Adams,24,OKC,76,76,2487,448,712,0.629,0,...,0.559,384,301,685,88,92,78,128,215,1056
16,Carmelo Anthony,33,OKC,78,78,2501,472,1168,0.404,169,...,0.767,67,386,453,103,47,49,99,197,1261
30,Lonzo Ball,20,LAL,52,50,1780,203,564,0.36,90,...,0.451,69,291,360,376,88,43,136,117,528
70,Corey Brewer,31,OKC,18,16,514,64,144,0.444,23,...,0.795,20,41,61,24,38,6,11,56,182
91,Kentavious Caldwell-Pope,24,LAL,74,74,2458,340,798,0.426,159,...,0.789,60,327,387,162,106,16,97,145,992
165,Tyler Ennis,23,LAL,54,11,683,94,224,0.42,14,...,0.759,19,76,95,105,30,10,38,75,224
176,Terrance Ferguson,19,OKC,61,12,763,70,169,0.414,40,...,0.9,19,28,47,19,24,10,11,83,189
191,Paul George,27,OKC,79,79,2891,576,1340,0.43,244,...,0.822,72,375,447,263,161,39,212,233,1734
225,Josh Hart,22,LAL,63,23,1461,176,375,0.469,78,...,0.702,42,221,263,80,47,16,47,107,496
265,Brandon Ingram,20,LAL,59,59,1975,358,761,0.47,41,...,0.681,57,257,314,230,45,43,149,163,949


In [169]:
# Look for players who have played for any these three teams.
have_played = df[df['Tm'].isin(['OKC', 'LAL', 'CHI'])]

In [170]:
have_played

Unnamed: 0,Player,Age,Tm,G,GS,MP,FG,FGA,FG%,3P,...,FT%,ORB,DRB,TRB,AST,STL,BLK,TOV,PF,PTS
0,Alex Abrines,24,OKC,75,8,1134,115,291,0.395,84,...,0.848,26,88,114,28,38,8,25,124,353
2,Steven Adams,24,OKC,76,76,2487,448,712,0.629,0,...,0.559,384,301,685,88,92,78,128,215,1056
16,Carmelo Anthony,33,OKC,78,78,2501,472,1168,0.404,169,...,0.767,67,386,453,103,47,49,99,197,1261
18,Ryan Arcidiacono,23,CHI,24,0,304,17,41,0.415,9,...,0.833,1,24,25,35,13,0,13,18,48
23,Omer Asik,31,CHI,4,0,61,2,6,0.333,0,...,0.000,2,8,10,1,1,2,4,6,4
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
564,Travis Wear,27,LAL,17,0,228,25,72,0.347,17,...,1.000,0,38,38,7,4,5,6,30,75
569,Russell Westbrook,29,OKC,80,80,2914,757,1687,0.449,97,...,0.737,152,652,804,820,147,20,381,200,2028
580,Derrick Williams,26,LAL,2,0,9,1,4,0.250,0,...,,0,1,1,0,0,0,0,0,2
602,Paul Zipser,23,CHI,54,12,824,81,234,0.346,37,...,0.760,13,118,131,46,20,15,43,86,218


# Groupby

In [2]:
import pandas as pd
import numpy as np

In [3]:
df = pd.read_csv('../nba-stats-csv/player_general_traditional_per_game_data.csv')

In [4]:
df.groupby('season_id')

<pandas.core.groupby.generic.DataFrameGroupBy object at 0x000001FEE43FB320>

In [5]:
type(df.groupby('season_id'))

pandas.core.groupby.generic.DataFrameGroupBy

In [6]:
type(df)

pandas.core.frame.DataFrame

In [7]:
df2 = pd.read_csv('../nba-stats-csv/nba_season_stats.csv')

In [7]:
df2 = pd.read_csv('../nba-stats-csv/nba_season_stats.csv')

In [8]:
grouped = df2.groupby('Tm')

In [9]:
grouped['Age'].mean().sort_values(ascending = False)

Tm
CLE    29.000000
NOP    28.086957
SAS    28.058824
GSW    28.000000
HOU    28.000000
WAS    27.133333
MIN    27.000000
OKC    26.705882
DAL    26.652174
LAC    26.571429
MIA    26.421053
DET    26.318182
UTA    26.238095
MIL    26.000000
NYK    25.857143
IND    25.842105
DEN    25.833333
SAC    25.777778
LAL    25.500000
ATL    25.500000
CHO    25.470588
MEM    25.458333
BOS    25.400000
PHI    25.304348
ORL    25.263158
BRK    25.227273
TOR    24.833333
CHI    24.619048
PHO    24.363636
POR    24.125000
Name: Age, dtype: float64

In [10]:
boston = grouped.get_group('BOS')

In [11]:
boston

Unnamed: 0,Player,Age,Tm,G,GS,MP,FG,FGA,FG%,3P,...,FT%,ORB,DRB,TRB,AST,STL,BLK,TOV,PF,PTS
8,Kadeem Allen,25,BOS,18,1,107,6,22,0.273,0,...,0.778,4,7,11,12,3,2,9,15,19
36,Aron Baynes,31,BOS,81,67,1485,210,446,0.471,3,...,0.756,130,304,434,93,22,51,80,200,482
49,Jabari Bird,23,BOS,13,1,115,15,26,0.577,3,...,0.462,6,13,19,8,3,1,8,7,39
77,Jaylen Brown,21,BOS,70,70,2152,373,803,0.465,121,...,0.644,66,280,346,114,70,26,124,181,1017
158,Jarell Eddie,26,BOS,2,0,6,0,1,0.0,0,...,,0,1,1,0,1,0,0,0,0
193,Jonathan Gibson,30,BOS,4,0,40,14,23,0.609,6,...,,0,3,3,4,0,0,3,5,34
230,Gordon Hayward,27,BOS,1,1,5,1,2,0.5,0,...,,0,1,1,0,0,0,0,1,2
253,Al Horford,31,BOS,72,72,2277,368,753,0.489,97,...,0.783,103,427,530,339,43,78,132,138,927
266,Kyrie Irving,25,BOS,60,60,1931,534,1087,0.491,166,...,0.889,33,194,227,306,65,17,140,122,1466
320,Shane Larkin,25,BOS,54,2,775,84,219,0.384,31,...,0.865,15,77,92,98,29,2,33,58,231


In [12]:
seasons = df.groupby('season_id')

In [13]:
seasons['pts'].sum()

season_id
1996-97    3539.5
1997-98    3489.0
1998-99    3231.7
1999-00    3500.2
2000-01    3444.8
2001-02    3512.4
2002-03    3359.5
2003-04    3423.5
2004-05    3752.9
2005-06    3656.0
2006-07    3759.6
2007-08    3728.5
2008-09    3784.2
2009-10    3788.7
2010-11    3709.2
2011-12    3790.0
2012-13    3732.3
2013-14    3899.5
2014-15    3996.3
2015-16    3974.3
2016-17    4095.4
2017-18    4406.6
2018-19    4565.2
Name: pts, dtype: float64

In [16]:
multiple_filters = ['pts', 'ast']
seasons[multiple_filters].sum()

Unnamed: 0_level_0,pts,ast
season_id,Unnamed: 1_level_1,Unnamed: 2_level_1
1996-97,3539.5,803.1
1997-98,3489.0,798.9
1998-99,3231.7,726.3
1999-00,3500.2,811.8
2000-01,3444.8,804.2
2001-02,3512.4,810.1
2002-03,3359.5,763.1
2003-04,3423.5,777.3
2004-05,3752.9,823.2
2005-06,3656.0,773.6


In [17]:
# split
seasons = df.groupby('season_id')

In [19]:
# apply
seasons['fg3m'].describe()

Unnamed: 0_level_0,count,mean,std,min,25%,50%,75%,max
season_id,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1
1996-97,441.0,0.492517,0.641563,0.0,0.0,0.2,0.8,2.8
1997-98,439.0,0.358314,0.511527,0.0,0.0,0.1,0.6,2.3
1998-99,440.0,0.353182,0.525859,0.0,0.0,0.1,0.525,2.8
1999-00,439.0,0.389977,0.522891,0.0,0.0,0.1,0.7,2.2
2000-01,441.0,0.391383,0.53908,0.0,0.0,0.1,0.7,2.7
2001-02,440.0,0.4225,0.592334,0.0,0.0,0.1,0.8,3.3
2002-03,428.0,0.413785,0.574413,0.0,0.0,0.1,0.7,2.6
2003-04,442.0,0.420588,0.584809,0.0,0.0,0.1,0.7,3.0
2004-05,464.0,0.458405,0.62259,0.0,0.0,0.2,0.7,2.9
2005-06,458.0,0.46048,0.621417,0.0,0.0,0.1,0.8,3.4


In [21]:
# combine
df_3p_by_season = seasons['fg3m'].sum()
df_3p_by_season.sort_values()

season_id
1998-99    155.4
1997-98    157.3
1999-00    171.2
2000-01    172.6
2002-03    177.1
2001-02    185.9
2003-04    185.9
2005-06    210.9
2004-05    212.7
1996-97    217.2
2006-07    229.4
2010-11    239.5
2007-08    242.7
2009-10    243.2
2008-09    247.7
2011-12    255.5
2012-13    272.1
2013-14    300.7
2014-15    318.2
2015-16    327.5
2016-17    373.3
2017-18    438.2
2018-19    464.6
Name: fg3m, dtype: float64

In [22]:
seasons.agg({
    'pts': 'sum',
    'ast': 'mean'
})

Unnamed: 0_level_0,pts,ast
season_id,Unnamed: 1_level_1,Unnamed: 2_level_1
1996-97,3539.5,1.821088
1997-98,3489.0,1.819818
1998-99,3231.7,1.650682
1999-00,3500.2,1.849203
2000-01,3444.8,1.823583
2001-02,3512.4,1.841136
2002-03,3359.5,1.782944
2003-04,3423.5,1.758597
2004-05,3752.9,1.774138
2005-06,3656.0,1.689083


# Concatenate & Append

In [35]:
# merging
import pandas as pd
import numpy as np

In [36]:
df_A = df.sample(10)
df_B = df.sample(10)

In [37]:
df_A

Unnamed: 0,player_id,season_id,gp,age,min,fgm,fga,fg_pct,fg3m,fg3a,...,ftm,fta,ft_pct,oreb,dreb,ast,tov,stl,blk,pts
5145,2198,2007-08,38.0,26.0,18.8,1.9,3.7,0.507,0.0,0.0,...,1.1,2.6,0.408,1.3,3.6,1.2,1.3,0.6,0.6,4.8
3957,124,2004-05,15.0,37.0,8.7,0.9,2.1,0.419,0.0,0.0,...,0.5,0.8,0.667,1.2,0.9,1.3,0.9,0.3,0.1,2.3
10261,202339,2018-19,78.0,29.0,29.1,6.0,12.4,0.484,1.6,4.8,...,2.3,3.0,0.75,1.1,3.6,5.5,2.1,1.5,0.4,15.9
5649,2399,2008-09,18.0,28.0,27.5,5.2,12.9,0.401,1.8,5.0,...,2.9,3.6,0.815,0.8,3.0,2.4,2.1,0.7,0.5,15.1
5875,2788,2009-10,49.0,29.0,7.2,1.0,2.1,0.466,0.0,0.0,...,0.2,0.4,0.474,0.7,1.1,0.2,0.3,0.1,0.6,2.1
4109,2764,2005-06,23.0,22.0,5.5,0.7,1.2,0.556,0.1,0.2,...,0.3,0.5,0.5,0.1,0.5,0.4,0.2,0.3,0.0,1.7
782,328,1997-98,59.0,39.0,12.0,1.0,2.2,0.457,0.0,0.0,...,0.4,0.6,0.676,1.1,2.2,0.3,0.6,0.2,0.1,2.4
1066,906,1998-99,29.0,33.0,28.1,2.5,5.9,0.429,0.6,1.6,...,1.3,1.6,0.787,0.5,3.0,5.9,2.2,1.1,0.0,6.9
5500,255,2008-09,82.0,36.0,29.8,4.8,9.1,0.523,0.3,0.9,...,2.2,2.7,0.808,0.8,4.2,2.3,1.5,1.1,0.7,12.0
7288,202700,2012-13,44.0,22.0,12.2,2.2,4.8,0.455,0.5,1.9,...,0.7,1.2,0.627,0.8,1.3,0.7,0.8,0.2,0.2,5.7


In [38]:
df_B

Unnamed: 0,player_id,season_id,gp,age,min,fgm,fga,fg_pct,fg3m,fg3a,...,ftm,fta,ft_pct,oreb,dreb,ast,tov,stl,blk,pts
947,1745,1998-99,9.0,23.0,1.5,0.1,0.9,0.125,0.0,0.0,...,0.0,0.0,0.0,0.3,0.0,0.0,0.2,0.0,0.0,0.2
9481,201583,2016-17,72.0,29.0,29.4,4.5,10.7,0.418,2.8,7.0,...,1.8,2.1,0.86,1.6,3.0,0.9,0.8,0.4,0.2,13.6
185,1068,1996-97,40.0,33.0,10.1,0.9,1.9,0.459,0.0,0.0,...,0.3,0.7,0.407,0.5,1.4,0.3,0.2,0.2,0.2,2.0
5814,201627,2009-10,69.0,24.0,29.2,4.8,10.2,0.468,2.0,4.4,...,1.5,1.7,0.886,0.9,2.9,1.5,1.2,0.9,0.2,13.0
8474,203136,2014-15,16.0,25.0,4.5,0.5,0.8,0.667,0.0,0.0,...,0.3,0.3,1.0,0.4,0.6,0.4,0.3,0.1,0.1,1.3
2654,1889,2002-03,80.0,27.0,36.4,4.7,11.6,0.406,0.3,1.4,...,3.9,4.9,0.795,1.1,2.9,6.7,2.6,1.2,0.1,13.6
72,201,1996-97,81.0,32.0,23.5,1.6,3.6,0.43,0.0,0.0,...,0.8,1.7,0.474,2.5,4.8,0.5,1.0,0.5,1.2,3.9
7157,2585,2011-12,58.0,28.0,28.3,2.9,5.8,0.499,0.0,0.0,...,2.0,2.7,0.741,2.7,5.2,1.4,1.4,0.9,0.5,7.8
3634,2734,2004-05,76.0,22.0,15.5,2.1,4.8,0.429,0.6,1.7,...,1.0,1.4,0.757,0.4,1.0,2.2,1.1,1.0,0.3,5.7
8716,201565,2015-16,66.0,27.0,31.8,6.8,15.9,0.427,0.7,2.3,...,2.2,2.7,0.793,0.7,2.7,4.7,2.7,0.7,0.2,16.4


In [39]:
merge = [df_A, df_B]
pd.concat(merge)

Unnamed: 0,player_id,season_id,gp,age,min,fgm,fga,fg_pct,fg3m,fg3a,...,ftm,fta,ft_pct,oreb,dreb,ast,tov,stl,blk,pts
5145,2198,2007-08,38.0,26.0,18.8,1.9,3.7,0.507,0.0,0.0,...,1.1,2.6,0.408,1.3,3.6,1.2,1.3,0.6,0.6,4.8
3957,124,2004-05,15.0,37.0,8.7,0.9,2.1,0.419,0.0,0.0,...,0.5,0.8,0.667,1.2,0.9,1.3,0.9,0.3,0.1,2.3
10261,202339,2018-19,78.0,29.0,29.1,6.0,12.4,0.484,1.6,4.8,...,2.3,3.0,0.75,1.1,3.6,5.5,2.1,1.5,0.4,15.9
5649,2399,2008-09,18.0,28.0,27.5,5.2,12.9,0.401,1.8,5.0,...,2.9,3.6,0.815,0.8,3.0,2.4,2.1,0.7,0.5,15.1
5875,2788,2009-10,49.0,29.0,7.2,1.0,2.1,0.466,0.0,0.0,...,0.2,0.4,0.474,0.7,1.1,0.2,0.3,0.1,0.6,2.1
4109,2764,2005-06,23.0,22.0,5.5,0.7,1.2,0.556,0.1,0.2,...,0.3,0.5,0.5,0.1,0.5,0.4,0.2,0.3,0.0,1.7
782,328,1997-98,59.0,39.0,12.0,1.0,2.2,0.457,0.0,0.0,...,0.4,0.6,0.676,1.1,2.2,0.3,0.6,0.2,0.1,2.4
1066,906,1998-99,29.0,33.0,28.1,2.5,5.9,0.429,0.6,1.6,...,1.3,1.6,0.787,0.5,3.0,5.9,2.2,1.1,0.0,6.9
5500,255,2008-09,82.0,36.0,29.8,4.8,9.1,0.523,0.3,0.9,...,2.2,2.7,0.808,0.8,4.2,2.3,1.5,1.1,0.7,12.0
7288,202700,2012-13,44.0,22.0,12.2,2.2,4.8,0.455,0.5,1.9,...,0.7,1.2,0.627,0.8,1.3,0.7,0.8,0.2,0.2,5.7


In [53]:
raw_data = {
    'player_id': ['1','2','3','4','5'],
    'ppg': [24.3, 28.1 ,21.2 ,17.2, 11.4],
    'fg%': [.57, .43, .38, .39, .54]}

In [54]:
cols = ['player_id', 'ppg', 'fg%']
df_1 = pd.DataFrame(raw_data, columns = ['player_id', 'ppg', 'fg%'])

In [55]:
raw_data = {
    'player_id': ['8','6','7','9','10'],
    'ppg': [24.3, 28.1 ,21.2 ,17.2, 11.4],
    'fg%': [.57, .43, .38, .39, .54]}

In [56]:
df_2 = pd.DataFrame(raw_data, columns = ['player_id', 'ppg', 'fg%'])

In [59]:
df_concatenated = pd.concat([df_1, df_2])

In [60]:
df_concatenated

Unnamed: 0,player_id,ppg,fg%
0,1,24.3,0.57
1,2,28.1,0.43
2,3,21.2,0.38
3,4,17.2,0.39
4,5,11.4,0.54
0,8,24.3,0.57
1,6,28.1,0.43
2,7,21.2,0.38
3,9,17.2,0.39
4,10,11.4,0.54


In [64]:
df_concatenated.reset_index(drop=True) # Create new indexes for the new array merged !
# it's getting the same result that:
# pd.concat([df1, df2], ignore_index=True)

Unnamed: 0,player_id,ppg,fg%
0,1,24.3,0.57
1,2,28.1,0.43
2,3,21.2,0.38
3,4,17.2,0.39
4,5,11.4,0.54
5,8,24.3,0.57
6,6,28.1,0.43
7,7,21.2,0.38
8,9,17.2,0.39
9,10,11.4,0.54


In [65]:
# We are getting same results if we append df2 into df1 or viceversa.
# df1.append(df2)
# df2.append(df1)

# Merging & Joining

In [1]:
import pandas as pd
import numpy as np

In [2]:
df1 = pd.read_csv('../nba-stats-csv/player_general_traditional_per_game_data.csv')
df2 = pd.read_csv('../nba-stats-csv/player_id_player_name.csv')

In [5]:
df1.head(5)

Unnamed: 0,player_id,season_id,gp,age,min,fgm,fga,fg_pct,fg3m,fg3a,...,ftm,fta,ft_pct,oreb,dreb,ast,tov,stl,blk,pts
0,471,1996-97,41.0,,13.3,1.1,3.3,0.331,0.2,0.7,...,1.0,1.2,0.875,0.7,1.8,1.4,0.8,0.2,0.3,3.4
1,920,1996-97,83.0,33.0,30.8,2.8,5.8,0.483,0.0,0.2,...,1.5,2.4,0.65,2.7,5.2,0.8,0.9,0.8,0.2,7.2
2,243,1996-97,83.0,24.0,20.4,1.8,4.4,0.411,0.5,1.2,...,1.1,1.3,0.836,0.5,2.2,1.9,1.1,0.9,0.3,5.2
3,1425,1996-97,33.0,25.0,17.8,2.6,4.5,0.574,0.0,0.0,...,1.0,1.5,0.673,1.9,2.5,0.5,1.0,0.5,0.9,6.2
4,768,1996-97,47.0,27.0,11.1,1.4,3.8,0.374,0.0,0.1,...,1.1,1.8,0.643,0.7,1.3,0.4,0.7,0.3,0.6,4.0


In [6]:
df2.tail(5)

Unnamed: 0,player_id,player_name
481,201163,Wilson Chandler
482,1627812,Yogi Ferrell
483,203897,Zach LaVine
484,2216,Zach Randolph
485,2585,Zaza Pachulia


In [7]:
merge_df = pd.merge(df1, df2, on = 'player_id') # on: identifies the common columns between both DF

In [8]:
merge_df.sample(10)

Unnamed: 0,player_id,season_id,gp,age,min,fgm,fga,fg_pct,fg3m,fg3a,...,fta,ft_pct,oreb,dreb,ast,tov,stl,blk,pts,player_name
63,2399,2003-04,75.0,23.0,31.2,4.3,9.6,0.449,1.3,3.4,...,2.5,0.741,1.2,4.7,2.9,1.9,0.9,0.2,11.7,Mike Dunleavy
1344,101123,2013-14,82.0,28.0,28.4,5.5,12.3,0.445,2.5,6.2,...,2.8,0.848,0.6,2.8,1.5,1.8,0.9,0.5,15.8,Gerald Green
1135,201177,2012-13,67.0,26.0,22.2,2.3,5.1,0.455,0.4,1.3,...,1.3,0.765,1.4,3.5,2.1,1.0,0.5,0.4,6.0,Josh McRoberts
645,201572,2010-11,82.0,23.0,35.2,7.9,16.0,0.492,0.0,0.0,...,6.0,0.787,2.4,3.5,1.6,2.1,0.6,1.5,20.4,Brook Lopez
3331,2544,2018-19,55.0,34.0,35.2,10.1,19.9,0.51,2.0,5.9,...,7.6,0.665,1.0,7.4,8.3,3.6,1.3,0.6,27.4,LeBron James
2150,203999,2015-16,80.0,21.0,21.7,3.8,7.5,0.512,0.4,1.1,...,2.4,0.811,2.3,4.7,2.4,1.3,1.0,0.6,10.0,Nikola Jokic
2592,203101,2016-17,45.0,28.0,10.8,1.0,2.0,0.478,0.0,0.0,...,0.9,0.641,0.8,1.3,0.5,0.7,0.4,0.3,2.5,Miles Plumlee
1039,202688,2012-13,75.0,21.0,31.5,4.8,11.7,0.407,1.6,4.4,...,3.0,0.733,0.7,2.6,4.0,2.7,0.8,0.1,13.3,Brandon Knight
171,2756,2005-06,82.0,22.0,17.7,1.3,3.8,0.346,0.7,2.1,...,0.6,0.885,0.4,1.5,1.7,0.6,0.6,0.0,3.9,Sasha Vujacic
341,201152,2007-08,74.0,20.0,21.0,3.6,6.6,0.539,0.1,0.3,...,1.4,0.738,1.6,2.6,0.8,0.9,1.0,0.1,8.2,Thaddeus Young


In [9]:
# filter columns to merge
columns_to_merge = ['player_id', 'pts', 'ast', 'season_id']
merge_filtered = pd.merge(df2, df1[columns_to_merge], on = 'player_id')

In [10]:
merge_filtered.sample(10)

Unnamed: 0,player_id,player_name,pts,ast,season_id
2915,2756,Sasha Vujacic,3.9,1.7,2005-06
1373,201935,James Harden,36.1,7.5,2018-19
89,1627816,Alex Poythress,5.1,0.8,2018-19
736,201599,DeAndre Jordan,4.3,0.2,2008-09
3442,2216,Zach Randolph,16.1,2.2,2014-15
115,101161,Amir Johnson,6.2,0.6,2009-10
1372,201935,James Harden,30.4,8.8,2017-18
707,101135,David Lee,20.2,3.6,2009-10
3272,204020,Tyler Johnson,10.9,2.9,2018-19
1841,203145,Kent Bazemore,11.6,2.3,2015-16


In [11]:
merged_df3 = pd.merge(df2, df1[columns_to_merge], on = 'player_id', how='inner')
merged_df3.sample(10)

Unnamed: 0,player_id,player_name,pts,ast,season_id
580,202709,Cory Joseph,8.5,3.1,2015-16
596,1626245,Cristiano Felicio,4.8,0.6,2016-17
3027,200779,Steve Novak,8.8,0.2,2011-12
601,1626156,D'Angelo Russell,15.5,5.2,2017-18
730,2561,David West,7.1,1.8,2015-16
835,202682,Derrick Williams,8.8,0.6,2011-12
317,1626171,Bobby Portis,14.2,1.4,2018-19
1409,201162,Jared Dudley,10.6,1.3,2010-11
1137,202330,Gordon Hayward,5.4,1.1,2010-11
3120,201168,Tiago Splitter,5.6,0.8,2015-16


In [12]:
raw_data = {
    'player_id': ['1','2','3'],
    'first_name': ["Rajon", "Kobe", "Lamar"],
    'last_name': ["Rondo", "Bryant", "Odom"]}

In [13]:
df_A = pd.DataFrame(raw_data, columns = ['player_id', 'first_name', 'last_name'])

In [14]:
raw_data2 = {
    'player_id': ['1','2','3'],
    'career_ppg': [11.7, 28.4, 13.2]
    }

In [15]:
df_B = pd.DataFrame(raw_data2, columns = ['player_id', 'career_ppg'])

In [16]:
left_merge = pd.merge(df_A, df_B, on = 'player_id', how = 'left')

In [17]:
left_merge

Unnamed: 0,player_id,first_name,last_name,career_ppg
0,1,Rajon,Rondo,11.7
1,2,Kobe,Bryant,28.4
2,3,Lamar,Odom,13.2


In [18]:
right_merge = pd.merge(df_A, df_B, on = 'player_id', how = 'right')

In [19]:
right_merge

Unnamed: 0,player_id,first_name,last_name,career_ppg
0,1,Rajon,Rondo,11.7
1,2,Kobe,Bryant,28.4
2,3,Lamar,Odom,13.2


In [20]:
outer_merge = pd.merge(df_A, df_B, on = 'player_id', how = 'outer')

In [21]:
outer_merge

Unnamed: 0,player_id,first_name,last_name,career_ppg
0,1,Rajon,Rondo,11.7
1,2,Kobe,Bryant,28.4
2,3,Lamar,Odom,13.2


In [22]:
# We are concatenating methods merge + drop (delete the column we pass in the 1st arg)
merge_dropped = pd.merge(df_A, df_B, on = 'player_id', how = 'outer').drop('player_id', axis=1)

In [23]:
merge_dropped

Unnamed: 0,first_name,last_name,career_ppg
0,Rajon,Rondo,11.7
1,Kobe,Bryant,28.4
2,Lamar,Odom,13.2


# Iterating Over Dataframes

In [24]:
import pandas as pd
import numpy as np

In [25]:
nba_stats = {
    'first_name': ['Jaylen','Jayson','Jrue','Derrick','Al'],
    'last_name': ['Brown','Tatum','Holiday','White','Horford'],
    'ppg': [28.2, 30.3, 14.1, 12.5, 7.8],
    'apg': [5.2, 6.1, 4.4, 4.5, 1.5],
    'rpg': [6.0, 8.4, 4.9, 3.7, 6.1]
}
celtics_df = pd.DataFrame(nba_stats, columns = ['first_name','last_name','ppg','apg','rpg'])

In [27]:
for row in celtics_df.iterrows():
    print(row)

(0, first_name    Jaylen
last_name      Brown
ppg             28.2
apg              5.2
rpg              6.0
Name: 0, dtype: object)
(1, first_name    Jayson
last_name      Tatum
ppg             30.3
apg              6.1
rpg              8.4
Name: 1, dtype: object)
(2, first_name       Jrue
last_name     Holiday
ppg              14.1
apg               4.4
rpg               4.9
Name: 2, dtype: object)
(3, first_name    Derrick
last_name       White
ppg              12.5
apg               4.5
rpg               3.7
Name: 3, dtype: object)
(4, first_name         Al
last_name     Horford
ppg               7.8
apg               1.5
rpg               6.1
Name: 4, dtype: object)


In [29]:
for index, row in celtics_df.iterrows():
    print(row['first_name'], row['last_name'], row['ppg'])

Jaylen Brown 28.2
Jayson Tatum 30.3
Jrue Holiday 14.1
Derrick White 12.5
Al Horford 7.8


In [31]:
for row in celtics_df.itertuples():
    print(row.first_name, row.last_name, row.ppg)

Jaylen Brown 28.2
Jayson Tatum 30.3
Jrue Holiday 14.1
Derrick White 12.5
Al Horford 7.8


In [32]:
ppg_data = []

In [33]:
for row in celtics_df.itertuples():
    ppg_data.append(row.ppg)

In [34]:
ppg_data

[28.2, 30.3, 14.1, 12.5, 7.8]

# Applying Functions to Dataframes

In [35]:
df = pd.read_csv('../nba-stats-csv/player_stats_total.csv')

In [36]:
df.sample(5)

Unnamed: 0,player_name,player_id,season_id,gp,age,min,fgm,fga,fg3m,ftm,fta,oreb,dreb,ast,tov,stl,blk,pts
1621,Serge Ibaka,201586,2013-14,81.0,24.0,2666.02,524.0,978.0,23.0,156.0,199.0,224.0,485.0,85.0,123.0,39.0,219.0,1227.0
496,Beno Udrih,2757,2012-13,66.0,30.0,1456.71,210.0,476.0,34.0,84.0,103.0,34.0,107.0,302.0,108.0,41.0,3.0,538.0
224,Zach Randolph,2216,2004-05,46.0,23.0,1607.25,332.0,741.0,0.0,207.0,254.0,142.0,300.0,86.0,112.0,34.0,17.0,871.0
2036,Ricky Rubio,201937,2013-14,82.0,23.0,2637.98,255.0,670.0,44.0,227.0,283.0,61.0,281.0,704.0,221.0,191.0,11.0,781.0
2304,Isaiah Canaan,203477,2016-17,39.0,25.0,592.107,63.0,173.0,25.0,30.0,33.0,6.0,44.0,37.0,21.0,22.0,1.0,181.0


In [37]:
def add_year(current_age):
    new_age = current_age + 1
    return new_age

In [38]:
df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 2741 entries, 0 to 2740
Data columns (total 18 columns):
 #   Column       Non-Null Count  Dtype  
---  ------       --------------  -----  
 0   player_name  2741 non-null   object 
 1   player_id    2741 non-null   int64  
 2   season_id    2741 non-null   object 
 3   gp           2741 non-null   float64
 4   age          2741 non-null   float64
 5   min          2741 non-null   float64
 6   fgm          2741 non-null   float64
 7   fga          2741 non-null   float64
 8   fg3m         2741 non-null   float64
 9   ftm          2741 non-null   float64
 10  fta          2741 non-null   float64
 11  oreb         2741 non-null   float64
 12  dreb         2741 non-null   float64
 13  ast          2741 non-null   float64
 14  tov          2741 non-null   float64
 15  stl          2741 non-null   float64
 16  blk          2741 non-null   float64
 17  pts          2741 non-null   float64
dtypes: float64(15), int64(1), object(2)
memory usage

In [39]:
df['age'].apply(add_year)

0       22.0
1       23.0
2       24.0
3       25.0
4       26.0
        ... 
2736    22.0
2737    22.0
2738    23.0
2739    23.0
2740    24.0
Name: age, Length: 2741, dtype: float64

In [40]:
player_stats = {
    'player_name': ['Derrick Rose', 'Dirk Nowitzki', 'Dwayne Wade', 'Paul Pierce', 'Chris Paul'],
    'season_id': ['2010-11', '2010-11', '2014-15', '2011-12', '2022-23'],
    'total_points': [1840, 2020, 1105, 1638, 744],
    'games_played': [73, 80, 61, 71, 55]
}
df = pd.DataFrame(player_stats)

In [41]:
df.head()

Unnamed: 0,player_name,season_id,total_points,games_played
0,Derrick Rose,2010-11,1840,73
1,Dirk Nowitzki,2010-11,2020,80
2,Dwayne Wade,2014-15,1105,61
3,Paul Pierce,2011-12,1638,71
4,Chris Paul,2022-23,744,55


In [45]:
def ppg_calculator(row):
    gp = row.iloc[3]  # iloc, is a Pandas attritube that offers indexation 
    pts = row.iloc[2]
    ppg = pts / gp
    return ppg

In [46]:
df.apply(ppg_calculator, axis = 1 )

0    25.205479
1    25.250000
2    18.114754
3    23.070423
4    13.527273
dtype: float64

In [47]:
df['ppg'] = df.apply(ppg_calculator, axis = 1)

In [48]:
df.head()

Unnamed: 0,player_name,season_id,total_points,games_played,ppg
0,Derrick Rose,2010-11,1840,73,25.205479
1,Dirk Nowitzki,2010-11,2020,80,25.25
2,Dwayne Wade,2014-15,1105,61,18.114754
3,Paul Pierce,2011-12,1638,71,23.070423
4,Chris Paul,2022-23,744,55,13.527273


In [54]:
df = pd.read_csv('../nba-stats-csv/player_stats_total.csv')

In [73]:
def ft_calculator(row):
    if row['fta'] == 0:
        ft_perc = 0  # Si fta es 0, asigna 0 a ft_perc
    else:
        ft_perc = row['ftm'] / row['fta'] + .00001  # Realiza la división solo si fta no es 0
    return round(ft_perc, 2)


In [74]:
df['ft_perc'] = df.apply(ft_calculator, axis=1)

In [75]:
df.sample(5)

Unnamed: 0,player_name,player_id,season_id,gp,age,min,fgm,fga,fg3m,ftm,fta,oreb,dreb,ast,tov,stl,blk,pts,ft_perc
2159,Kent Bazemore,203145,2014-15,75.0,25.0,1323.28,141.0,331.0,48.0,60.0,100.0,21.0,201.0,78.0,73.0,52.0,33.0,390.0,0.6
886,Kyle Lowry,200768,2006-07,10.0,21.0,175.422,14.0,38.0,3.0,25.0,28.0,12.0,19.0,32.0,12.0,14.0,1.0,56.0,0.89
802,Marvin Williams,101107,2010-11,65.0,25.0,1864.53,246.0,537.0,37.0,147.0,174.0,68.0,245.0,88.0,62.0,34.0,23.0,676.0,0.84
2538,Alan Williams,1626210,2016-17,47.0,24.0,707.697,138.0,267.0,0.0,70.0,112.0,94.0,198.0,23.0,37.0,27.0,32.0,346.0,0.63
1941,Isaiah Thomas,202738,2011-12,65.0,23.0,1655.43,256.0,571.0,83.0,154.0,185.0,48.0,120.0,266.0,105.0,53.0,8.0,749.0,0.83


In [76]:
final = df.groupby('player_name').sum().apply(ft_calculator, axis=1)

In [77]:
final.sort_values(ascending=False)

player_name
Nicolas Laprovittola    1.0
Michael Gbinije         1.0
Diamond Stone           1.0
Georges Niang           1.0
Chinanu Onuaku          1.0
                       ... 
Ben Bentil              0.0
Patricio Garino         0.0
Daniel Ochefu           0.0
Danuel House            0.0
Tim Quarterman          0.0
Length: 486, dtype: float64

# Arrays

In [79]:
import numpy as np

In [81]:
kawhi_steals = [4,5,2,1,3]

In [82]:
kawhi_steals

[4, 5, 2, 1, 3]

In [84]:
total_steals = np.array(kawhi_steals)

In [85]:
total_steals

array([4, 5, 2, 1, 3])

In [87]:
total_steals.mean() # calculates average of the elements in the array.

np.float64(3.0)

In [91]:
new_total_steals = total_steals + 5 # plus 5 to the total_steals average

In [92]:
new_total_steals.mean()

np.float64(8.0)

In [97]:
player_stats = np.array([
    [1,2,3,4],
    [5,6,7,8],
    [9,10,11,12]
])

In [98]:
player_stats.shape # shape returns a tuple that show us dimesions of the array

(3, 4)

In [101]:
some_other_player_stats = np.array([
    [11,12,13,14],
    [15,16,17,18],
    [19,20,21,22]
])

In [100]:
some_other_player_stats.shape

(3, 4)

In [102]:
total = player_stats + some_other_player_stats

In [103]:
total

array([[12, 14, 16, 18],
       [20, 22, 24, 26],
       [28, 30, 32, 34]])