# Pandas
Excel ♥ SQL

[Excel ♥ SQL]: # (Invisible comment)

### Dataframes & Series

Series #1

| player_id | pts_per_games |
|-----------|---------------|
| 201939    |       27.3    |
| 201940    |       26.0    |
| 201941    |       16.3    |

Series #2

| player_id | reb_per_games |
|-----------|---------------|
| 201939    |       6.7     |
| 201940    |       5.4     |
| 201941    |       4.8     |

Dataframes (DF)

| player_id | pts_per_games | reb_per_game |
|-----------|---------------|--------------|
| 201939    |       27.3    |    6.7       |  
| 201940    |       26.0    |    5.4       |
| 201941    |       16.3    |    4.8       |


### Dataframes store in memory a collection of series

Dataframes can be created using various inputs like:

* Lists
* Dictionaries
* Series
* Numpy arrays
* Another dataframe

Dataframes can be created reading in data like:

---
+ CSV
+ Excel
+ SQL
---


# Creating a Dataframe

In [91]:
import pandas as pd

In [92]:
# This line create an empty dataframe
df = pd.DataFrame()

In [93]:
print(df)

Empty DataFrame
Columns: []
Index: []


In [94]:
celtics_dict = {
    'player_name': ['Jaylen Brown', 'Jayson Tatum', 'Derrick White', 'Jrue Holiday', 'Neemias Queta'],
    'ppg': [ 26.8, 30.3, 12.4, 14.1, 8.3 ],
    'rpg': [ 5.3, 8.2, 4.5, 4.7, 7.5 ],
    'apg': [ 4.4, 5.1, 6.3, 5.9, 0.6 ]
}

In [95]:
# df(celtics_dict) --> This won't work because the df was already created and it can't be overwritten

In [96]:
df_celtics = pd.DataFrame(celtics_dict)

In [97]:
print(df_celtics)

     player_name   ppg  rpg  apg
0   Jaylen Brown  26.8  5.3  4.4
1   Jayson Tatum  30.3  8.2  5.1
2  Derrick White  12.4  4.5  6.3
3   Jrue Holiday  14.1  4.7  5.9
4  Neemias Queta   8.3  7.5  0.6


In [98]:
df_filtered = pd.DataFrame(df_celtics, index=[2,4])

In [99]:
print(df_filtered)

     player_name   ppg  rpg  apg
2  Derrick White  12.4  4.5  6.3
4  Neemias Queta   8.3  7.5  0.6


In [100]:
label = ['sf', 'pf', 'pg', 'sg', 'c']

In [101]:
df_label = pd.DataFrame(celtics_dict, index= label)

In [102]:
print(df_label)

      player_name   ppg  rpg  apg
sf   Jaylen Brown  26.8  5.3  4.4
pf   Jayson Tatum  30.3  8.2  5.1
pg  Derrick White  12.4  4.5  6.3
sg   Jrue Holiday  14.1  4.7  5.9
c   Neemias Queta   8.3  7.5  0.6


In [103]:
# Let's create another DataFrame
stats = [['Jaylen Brown',4,6], ['Jayson Tatum',2,5],['Jrue Holiday',4,4]]

In [104]:
stats_df = pd.DataFrame(stats, columns= ['player', 'oreb', 'dreb'])

In [105]:
print(stats_df)

         player  oreb  dreb
0  Jaylen Brown     4     6
1  Jayson Tatum     2     5
2  Jrue Holiday     4     4


In [107]:
rebs = [6,9,11,7,3]

In [111]:
reb_series = pd.DataFrame(rebs, columns=['jaylen brown_reb'])

In [112]:
print(reb_series)

   jaylen brown_reb
0                 6
1                 9
2                11
3                 7
4                 3


# Reading CSV Files in Dataframes

In [113]:
import pandas as pd

In [115]:
df = pd.read_csv('../nba-stats-csv/player_info.csv')

In [116]:
print(df)

       player_id     player_name season_id
0            920      A.C. Green   1996-97
1            243     Aaron McKie   1996-97
2           1425  Aaron Williams   1996-97
3            768       Acie Earl   1996-97
4            228      Adam Keefe   1996-97
...          ...             ...       ...
29223    1628380    Zach Collins   2017-18
29224     203897     Zach LaVine   2017-18
29225       2216   Zach Randolph   2017-18
29226       2585   Zaza Pachulia   2017-18
29227    1627753         Zhou Qi   2017-18

[29228 rows x 3 columns]


In [117]:
df.head(10)

Unnamed: 0,player_id,player_name,season_id
0,920,A.C. Green,1996-97
1,243,Aaron McKie,1996-97
2,1425,Aaron Williams,1996-97
3,768,Acie Earl,1996-97
4,228,Adam Keefe,1996-97
5,154,Adrian Caldwell,1996-97
6,673,Alan Henderson,1996-97
7,1059,Aleksandar Djordjevic,1996-97
8,275,Allan Houston,1996-97
9,947,Allen Iverson,1996-97


In [121]:
df_noheader = pd.read_csv('../nba-stats-csv/player_info_no_header.csv', header=None)

In [122]:
print(df_noheader)

             0               1        2
0          920      A.C. Green  1996-97
1          243     Aaron McKie  1996-97
2         1425  Aaron Williams  1996-97
3          768       Acie Earl  1996-97
4          228      Adam Keefe  1996-97
...        ...             ...      ...
29223  1628380    Zach Collins  2017-18
29224   203897     Zach LaVine  2017-18
29225     2216   Zach Randolph  2017-18
29226     2585   Zaza Pachulia  2017-18
29227  1627753         Zhou Qi  2017-18

[29228 rows x 3 columns]


In [124]:
df_noheader.head(5)

Unnamed: 0,0,1,2
0,920,A.C. Green,1996-97
1,243,Aaron McKie,1996-97
2,1425,Aaron Williams,1996-97
3,768,Acie Earl,1996-97
4,228,Adam Keefe,1996-97


In [125]:
df_index = pd.read_csv('../nba-stats-csv/player_info.csv', index_col='player_id')

In [126]:
print(df_index)

              player_name season_id
player_id                          
920            A.C. Green   1996-97
243           Aaron McKie   1996-97
1425       Aaron Williams   1996-97
768             Acie Earl   1996-97
228            Adam Keefe   1996-97
...                   ...       ...
1628380      Zach Collins   2017-18
203897        Zach LaVine   2017-18
2216        Zach Randolph   2017-18
2585        Zaza Pachulia   2017-18
1627753           Zhou Qi   2017-18

[29228 rows x 2 columns]


In [127]:
df_usecols = pd.read_csv('../nba-stats-csv/player_general_traditional_per_game_data.csv', usecols=['player_id', 'season_id'])

In [128]:
df_usecols.head(5)

Unnamed: 0,player_id,season_id
0,471,1996-97
1,920,1996-97
2,243,1996-97
3,1425,1996-97
4,768,1996-97
