## DataFrames
DataFrames are the workhorse of pandas and are directly inspired by the R programming language.We can think of a DataFrame as a bunch of Series objects put together to share the same index. Let's use pandas to explore this topic!

In [2]:
import pandas as pd
import numpy as np

In [3]:
from numpy.random import randn
np.random.seed(101)

In [4]:
df = pd.DataFrame(randn(5,4),index='A B C D E'.split(),columns='W X Y Z'.split())

In [5]:
df

Unnamed: 0,W,X,Y,Z
A,2.70685,0.628133,0.907969,0.503826
B,0.651118,-0.319318,-0.848077,0.605965
C,-2.018168,0.740122,0.528813,-0.589001
D,0.188695,-0.758872,-0.933237,0.955057
E,0.190794,1.978757,2.605967,0.683509


### Selection and Indexing
Let's learn the various methods to grab data from a DataFrame

In [6]:
df['W']

A    2.706850
B    0.651118
C   -2.018168
D    0.188695
E    0.190794
Name: W, dtype: float64

In [11]:
# Pass a list of column names
df[['W','Z']]

Unnamed: 0,W,Z
A,-0.925874,0.610478
B,0.38603,0.230336
C,0.681209,1.939932
D,-1.005187,-0.732845
E,-1.38292,-2.141212


In [12]:
# SQL Syntax (NOT RECOMMENDED!)
df.W

A   -0.925874
B    0.386030
C    0.681209
D   -1.005187
E   -1.382920
Name: W, dtype: float64

DataFrame Columns are just Series

In [13]:
type(df['W'])

pandas.core.series.Series

#### Creating a new column:

In [15]:
df['new'] = df['W'] + df['Y']

In [16]:
df

Unnamed: 0,W,X,Y,Z,new
A,-0.925874,1.862864,-1.133817,0.610478,-2.059691
B,0.38603,2.084019,-0.376519,0.230336,0.009512
C,0.681209,1.035125,-0.03116,1.939932,0.650049
D,-1.005187,-0.74179,0.187125,-0.732845,-0.818062
E,-1.38292,1.482495,0.961458,-2.141212,-0.421462


#### Removing Columns

In [17]:
df.drop('new',axis=1)

Unnamed: 0,W,X,Y,Z
A,-0.925874,1.862864,-1.133817,0.610478
B,0.38603,2.084019,-0.376519,0.230336
C,0.681209,1.035125,-0.03116,1.939932
D,-1.005187,-0.74179,0.187125,-0.732845
E,-1.38292,1.482495,0.961458,-2.141212


In [18]:
# Not inplace unless specified!
df

Unnamed: 0,W,X,Y,Z,new
A,-0.925874,1.862864,-1.133817,0.610478,-2.059691
B,0.38603,2.084019,-0.376519,0.230336,0.009512
C,0.681209,1.035125,-0.03116,1.939932,0.650049
D,-1.005187,-0.74179,0.187125,-0.732845,-0.818062
E,-1.38292,1.482495,0.961458,-2.141212,-0.421462


In [19]:
df.drop('new',axis=1,inplace=True)

In [20]:
df

Unnamed: 0,W,X,Y,Z
A,-0.925874,1.862864,-1.133817,0.610478
B,0.38603,2.084019,-0.376519,0.230336
C,0.681209,1.035125,-0.03116,1.939932
D,-1.005187,-0.74179,0.187125,-0.732845
E,-1.38292,1.482495,0.961458,-2.141212


In [21]:
# to drop rows
df.drop('E',axis=0)

Unnamed: 0,W,X,Y,Z
A,-0.925874,1.862864,-1.133817,0.610478
B,0.38603,2.084019,-0.376519,0.230336
C,0.681209,1.035125,-0.03116,1.939932
D,-1.005187,-0.74179,0.187125,-0.732845


In [22]:
df

Unnamed: 0,W,X,Y,Z
A,-0.925874,1.862864,-1.133817,0.610478
B,0.38603,2.084019,-0.376519,0.230336
C,0.681209,1.035125,-0.03116,1.939932
D,-1.005187,-0.74179,0.187125,-0.732845
E,-1.38292,1.482495,0.961458,-2.141212


#### Selecting Rows

In [23]:
df.loc['A']

W   -0.925874
X    1.862864
Y   -1.133817
Z    0.610478
Name: A, dtype: float64

In [24]:
# Or select based off of position instead of label

df.iloc[2]

W    0.681209
X    1.035125
Y   -0.031160
Z    1.939932
Name: C, dtype: float64

#### Selecting subset of rows and columns 

In [25]:
df.loc['B','Y']

-0.37651867524923904

In [26]:
df.loc[['A','B'],['W','Y']]

Unnamed: 0,W,Y
A,-0.925874,-1.133817
B,0.38603,-0.376519


### Conditional Selection

An important feature of pandas is conditional selection using bracket notation, very similar to numpy:


In [7]:
df

Unnamed: 0,W,X,Y,Z
A,2.70685,0.628133,0.907969,0.503826
B,0.651118,-0.319318,-0.848077,0.605965
C,-2.018168,0.740122,0.528813,-0.589001
D,0.188695,-0.758872,-0.933237,0.955057
E,0.190794,1.978757,2.605967,0.683509


In [8]:
df>1

Unnamed: 0,W,X,Y,Z
A,True,False,False,False
B,False,False,False,False
C,False,False,False,False
D,False,False,False,False
E,False,True,True,False


In [9]:
df[df>0]

Unnamed: 0,W,X,Y,Z
A,2.70685,0.628133,0.907969,0.503826
B,0.651118,,,0.605965
C,,0.740122,0.528813,
D,0.188695,,,0.955057
E,0.190794,1.978757,2.605967,0.683509


In [10]:
df[df['W']>0]

Unnamed: 0,W,X,Y,Z
A,2.70685,0.628133,0.907969,0.503826
B,0.651118,-0.319318,-0.848077,0.605965
D,0.188695,-0.758872,-0.933237,0.955057
E,0.190794,1.978757,2.605967,0.683509


In [11]:
df[df['W']>0]['Y']

A    0.907969
B   -0.848077
D   -0.933237
E    2.605967
Name: Y, dtype: float64

In [12]:
df[df['W']>0][['Y','X']]

Unnamed: 0,Y,X
A,0.907969,0.628133
B,-0.848077,-0.319318
D,-0.933237,-0.758872
E,2.605967,1.978757


In [13]:
## For two conditions you can use | and & with parenthesis:
df[(df['W']>0) & (df['Y'] > 1)]

Unnamed: 0,W,X,Y,Z
E,0.190794,1.978757,2.605967,0.683509
