## DataFrames

DataFrames are actually multiple Series sharing the same index. DataFrames are very important when it comes to large datasets.

Let's jump on to the examples to see why!

In [1]:
import numpy as np
import pandas as pd

from pandas import Series,DataFrame

In [2]:
list1 = [[0,1,2,3],[4,5,6,7]]  ##  or list1 = [[0,1,2,3],  
                               ##              [4,5,6,7]]

In [3]:
df = DataFrame(data=list1,index=['row1','row2'],
              columns=['col1','col2','col3','col4'])
df

Unnamed: 0,col1,col2,col3,col4
row1,0,1,2,3
row2,4,5,6,7


In [4]:
array = np.arange(8)
array

array([0, 1, 2, 3, 4, 5, 6, 7])

In [5]:
array.reshape(2,4)

array([[0, 1, 2, 3],
       [4, 5, 6, 7]])

In [6]:
df = DataFrame(np.arange(8).reshape(2,4),index=['row1','row2'],
              columns=['col1','col2','col3','col4'])
df

Unnamed: 0,col1,col2,col3,col4
row1,0,1,2,3
row2,4,5,6,7


### read_csv

Let's import a csv file as DataFrame using read_csv and continue using this data:

In [7]:
nba = pd.read_csv('data_nba.csv')
nba

Unnamed: 0,Rank,Team,Won,Lost,Pct.,First NBA season,Total games,Division
0,1,San Antonio Spurs,2162,1316,0.622,1976–77,3478,Southwest
1,2,Los Angeles Lakers,3333,2282,0.594,1948–49,5615,Pacific
2,3,Boston Celtics,3378,2346,0.59,1946–47,5724,Atlantic
3,4,Oklahoma City Thunder,2283,1933,0.542,1967–68,4216,Northwest
4,5,Utah Jazz,1964,1678,0.539,1974–75,3642,Northwest
5,6,Portland Trail Blazers,2134,1836,0.538,1970–71,3970,Northwest
6,7,Phoenix Suns,2186,1948,0.529,1968–69,4134,Pacific
7,8,Houston Rockets,2225,1991,0.528,1967–68,4216,Southwest
8,9,Miami Heat,1294,1200,0.519,1988–89,2494,Southeast
9,10,Milwaukee Bucks,2129,2005,0.515,1968–69,4134,Central


We can check with type() function to see if the csv file was imported as a DataFrame object:

In [8]:
type(nba)

pandas.core.frame.DataFrame

some useful built-in functions for DataFrame:

In [9]:
nba.head()        # shows the first 5 rows by default

Unnamed: 0,Rank,Team,Won,Lost,Pct.,First NBA season,Total games,Division
0,1,San Antonio Spurs,2162,1316,0.622,1976–77,3478,Southwest
1,2,Los Angeles Lakers,3333,2282,0.594,1948–49,5615,Pacific
2,3,Boston Celtics,3378,2346,0.59,1946–47,5724,Atlantic
3,4,Oklahoma City Thunder,2283,1933,0.542,1967–68,4216,Northwest
4,5,Utah Jazz,1964,1678,0.539,1974–75,3642,Northwest


In [10]:
nba.head(2)      #number of rows to show can be passed into the function

Unnamed: 0,Rank,Team,Won,Lost,Pct.,First NBA season,Total games,Division
0,1,San Antonio Spurs,2162,1316,0.622,1976–77,3478,Southwest
1,2,Los Angeles Lakers,3333,2282,0.594,1948–49,5615,Pacific


In [11]:
nba.tail()         #show the last 5 rows by default

Unnamed: 0,Rank,Team,Won,Lost,Pct.,First NBA season,Total games,Division
5,6,Portland Trail Blazers,2134,1836,0.538,1970–71,3970,Northwest
6,7,Phoenix Suns,2186,1948,0.529,1968–69,4134,Pacific
7,8,Houston Rockets,2225,1991,0.528,1967–68,4216,Southwest
8,9,Miami Heat,1294,1200,0.519,1988–89,2494,Southeast
9,10,Milwaukee Bucks,2129,2005,0.515,1968–69,4134,Central


In [12]:
nba.tail(2)     #number of rows to show can be passed into the function

Unnamed: 0,Rank,Team,Won,Lost,Pct.,First NBA season,Total games,Division
8,9,Miami Heat,1294,1200,0.519,1988–89,2494,Southeast
9,10,Milwaukee Bucks,2129,2005,0.515,1968–69,4134,Central


In [13]:
nba.columns        # column labels

Index(['Rank', 'Team', 'Won', 'Lost', 'Pct.', 'First NBA season',
       'Total games', 'Division'],
      dtype='object')

In [14]:
nba.index            # index info

RangeIndex(start=0, stop=10, step=1)

In [15]:
nba['Team']          # selecting a column

0         San Antonio Spurs
1        Los Angeles Lakers
2            Boston Celtics
3     Oklahoma City Thunder
4                 Utah Jazz
5    Portland Trail Blazers
6              Phoenix Suns
7           Houston Rockets
8                Miami Heat
9           Milwaukee Bucks
Name: Team, dtype: object

In [16]:
nba[['Team','Lost','Won']]    # selecting multiple columns

Unnamed: 0,Team,Lost,Won
0,San Antonio Spurs,1316,2162
1,Los Angeles Lakers,2282,3333
2,Boston Celtics,2346,3378
3,Oklahoma City Thunder,1933,2283
4,Utah Jazz,1678,1964
5,Portland Trail Blazers,1836,2134
6,Phoenix Suns,1948,2186
7,Houston Rockets,1991,2225
8,Miami Heat,1200,1294
9,Milwaukee Bucks,2005,2129


In [17]:
type(nba['Team'])    # Every column of the DataFrame is actually of Series type
                     # which means Series methods can be called for columns of a DataFrame!

pandas.core.series.Series

In [18]:
nba

Unnamed: 0,Rank,Team,Won,Lost,Pct.,First NBA season,Total games,Division
0,1,San Antonio Spurs,2162,1316,0.622,1976–77,3478,Southwest
1,2,Los Angeles Lakers,3333,2282,0.594,1948–49,5615,Pacific
2,3,Boston Celtics,3378,2346,0.59,1946–47,5724,Atlantic
3,4,Oklahoma City Thunder,2283,1933,0.542,1967–68,4216,Northwest
4,5,Utah Jazz,1964,1678,0.539,1974–75,3642,Northwest
5,6,Portland Trail Blazers,2134,1836,0.538,1970–71,3970,Northwest
6,7,Phoenix Suns,2186,1948,0.529,1968–69,4134,Pacific
7,8,Houston Rockets,2225,1991,0.528,1967–68,4216,Southwest
8,9,Miami Heat,1294,1200,0.519,1988–89,2494,Southeast
9,10,Milwaukee Bucks,2129,2005,0.515,1968–69,4134,Central


In [19]:
nba.loc[5]           # selecting a row by its index
                     # we will look more into this

Rank                                     6
Team                Portland Trail Blazers
Won                                  2,134
Lost                                 1,836
Pct.                                 0.538
First NBA season                   1970–71
Total games                          3,970
Division                         Northwest
Name: 5, dtype: object

### Renaming the columns

In [20]:
new_column = ['Rank_','Team_','Won_','Lost_','Pct_',
             'First Season','Number of games','Region']

In [21]:
new_column_dict = dict(zip(nba.columns,new_column))   # zip object has to be converted to dictionary type
new_column_dict

{'Rank': 'Rank_',
 'Team': 'Team_',
 'Won': 'Won_',
 'Lost': 'Lost_',
 'Pct.': 'Pct_',
 'First NBA season': 'First Season',
 'Total games': 'Number of games',
 'Division': 'Region'}

In [22]:
nba.rename(columns=new_column_dict,inplace=True)
nba

Unnamed: 0,Rank_,Team_,Won_,Lost_,Pct_,First Season,Number of games,Region
0,1,San Antonio Spurs,2162,1316,0.622,1976–77,3478,Southwest
1,2,Los Angeles Lakers,3333,2282,0.594,1948–49,5615,Pacific
2,3,Boston Celtics,3378,2346,0.59,1946–47,5724,Atlantic
3,4,Oklahoma City Thunder,2283,1933,0.542,1967–68,4216,Northwest
4,5,Utah Jazz,1964,1678,0.539,1974–75,3642,Northwest
5,6,Portland Trail Blazers,2134,1836,0.538,1970–71,3970,Northwest
6,7,Phoenix Suns,2186,1948,0.529,1968–69,4134,Pacific
7,8,Houston Rockets,2225,1991,0.528,1967–68,4216,Southwest
8,9,Miami Heat,1294,1200,0.519,1988–89,2494,Southeast
9,10,Milwaukee Bucks,2129,2005,0.515,1968–69,4134,Central


### Adding a new column

In [23]:
nba['Coach'] = np.arange(10)
nba

Unnamed: 0,Rank_,Team_,Won_,Lost_,Pct_,First Season,Number of games,Region,Coach
0,1,San Antonio Spurs,2162,1316,0.622,1976–77,3478,Southwest,0
1,2,Los Angeles Lakers,3333,2282,0.594,1948–49,5615,Pacific,1
2,3,Boston Celtics,3378,2346,0.59,1946–47,5724,Atlantic,2
3,4,Oklahoma City Thunder,2283,1933,0.542,1967–68,4216,Northwest,3
4,5,Utah Jazz,1964,1678,0.539,1974–75,3642,Northwest,4
5,6,Portland Trail Blazers,2134,1836,0.538,1970–71,3970,Northwest,5
6,7,Phoenix Suns,2186,1948,0.529,1968–69,4134,Pacific,6
7,8,Houston Rockets,2225,1991,0.528,1967–68,4216,Southwest,7
8,9,Miami Heat,1294,1200,0.519,1988–89,2494,Southeast,8
9,10,Milwaukee Bucks,2129,2005,0.515,1968–69,4134,Central,9


In [24]:
two_coaches = Series(['Tarik','Berk'],index=[4,8])
two_coaches

4    Tarik
8     Berk
dtype: object

In [25]:
nba['Coach']=two_coaches
nba

Unnamed: 0,Rank_,Team_,Won_,Lost_,Pct_,First Season,Number of games,Region,Coach
0,1,San Antonio Spurs,2162,1316,0.622,1976–77,3478,Southwest,
1,2,Los Angeles Lakers,3333,2282,0.594,1948–49,5615,Pacific,
2,3,Boston Celtics,3378,2346,0.59,1946–47,5724,Atlantic,
3,4,Oklahoma City Thunder,2283,1933,0.542,1967–68,4216,Northwest,
4,5,Utah Jazz,1964,1678,0.539,1974–75,3642,Northwest,Tarik
5,6,Portland Trail Blazers,2134,1836,0.538,1970–71,3970,Northwest,
6,7,Phoenix Suns,2186,1948,0.529,1968–69,4134,Pacific,
7,8,Houston Rockets,2225,1991,0.528,1967–68,4216,Southwest,
8,9,Miami Heat,1294,1200,0.519,1988–89,2494,Southeast,Berk
9,10,Milwaukee Bucks,2129,2005,0.515,1968–69,4134,Central,


### Delete a column

In [26]:
del nba['Coach']
nba

Unnamed: 0,Rank_,Team_,Won_,Lost_,Pct_,First Season,Number of games,Region
0,1,San Antonio Spurs,2162,1316,0.622,1976–77,3478,Southwest
1,2,Los Angeles Lakers,3333,2282,0.594,1948–49,5615,Pacific
2,3,Boston Celtics,3378,2346,0.59,1946–47,5724,Atlantic
3,4,Oklahoma City Thunder,2283,1933,0.542,1967–68,4216,Northwest
4,5,Utah Jazz,1964,1678,0.539,1974–75,3642,Northwest
5,6,Portland Trail Blazers,2134,1836,0.538,1970–71,3970,Northwest
6,7,Phoenix Suns,2186,1948,0.529,1968–69,4134,Pacific
7,8,Houston Rockets,2225,1991,0.528,1967–68,4216,Southwest
8,9,Miami Heat,1294,1200,0.519,1988–89,2494,Southeast
9,10,Milwaukee Bucks,2129,2005,0.515,1968–69,4134,Central


In [27]:
nba.drop('Team_',axis=1)

Unnamed: 0,Rank_,Won_,Lost_,Pct_,First Season,Number of games,Region
0,1,2162,1316,0.622,1976–77,3478,Southwest
1,2,3333,2282,0.594,1948–49,5615,Pacific
2,3,3378,2346,0.59,1946–47,5724,Atlantic
3,4,2283,1933,0.542,1967–68,4216,Northwest
4,5,1964,1678,0.539,1974–75,3642,Northwest
5,6,2134,1836,0.538,1970–71,3970,Northwest
6,7,2186,1948,0.529,1968–69,4134,Pacific
7,8,2225,1991,0.528,1967–68,4216,Southwest
8,9,1294,1200,0.519,1988–89,2494,Southeast
9,10,2129,2005,0.515,1968–69,4134,Central
