# DataFrames

DataFrames are the workhorse of pandas and are directly inspired by the R programming language. We can think of a DataFrame as a bunch of Series objects put together to share the same index. Let's use pandas to explore this topic!

Unnamed: 0,W,X,Y,Z
A,1.624345,-0.611756,-0.528172,-1.072969
B,0.865408,-2.301539,1.744812,-0.761207
C,0.319039,-0.24937,1.462108,-2.060141
D,-0.322417,-0.384054,1.133769,-1.099891
E,-0.172428,-0.877858,0.042214,0.582815


## Selection and Indexing

Let's learn the various methods to grab data from a DataFrame

A    1.624345
B    0.865408
C    0.319039
D   -0.322417
E   -0.172428
Name: W, dtype: float64

In [17]:
# Pass a list of column names


Unnamed: 0,W,Z
A,1.624345,-1.072969
B,0.865408,-0.761207
C,0.319039,-2.060141
D,-0.322417,-1.099891
E,-0.172428,0.582815


DataFrame Columns are just Series

**Creating a new column:**

Unnamed: 0,W,X,Y,Z,new,Q
A,1.624345,-0.611756,-0.528172,-1.072969,1,1
B,0.865408,-2.301539,1.744812,-0.761207,2,2
C,0.319039,-0.24937,1.462108,-2.060141,3,3
D,-0.322417,-0.384054,1.133769,-1.099891,4,4
E,-0.172428,-0.877858,0.042214,0.582815,5,5


** Removing Columns**

Unnamed: 0,W,X,Y,Z
A,1.624345,-0.611756,-0.528172,-1.072969
B,0.865408,-2.301539,1.744812,-0.761207
C,0.319039,-0.24937,1.462108,-2.060141
D,-0.322417,-0.384054,1.133769,-1.099891
E,-0.172428,-0.877858,0.042214,0.582815


In [29]:
# Not inplace unless specified!


Unnamed: 0,W,X,Y,Z,new,Q
A,1.624345,-0.611756,-0.528172,-1.072969,1,1
B,0.865408,-2.301539,1.744812,-0.761207,2,2
C,0.319039,-0.24937,1.462108,-2.060141,3,3
D,-0.322417,-0.384054,1.133769,-1.099891,4,4
E,-0.172428,-0.877858,0.042214,0.582815,5,5


Unnamed: 0,W,X,Y,Z
A,1.624345,-0.611756,-0.528172,-1.072969
B,0.865408,-2.301539,1.744812,-0.761207
C,0.319039,-0.24937,1.462108,-2.060141
D,-0.322417,-0.384054,1.133769,-1.099891
E,-0.172428,-0.877858,0.042214,0.582815


Can also drop rows this way:

Unnamed: 0,W,X,Y,Z
A,1.624345,-0.611756,-0.528172,-1.072969
B,0.865408,-2.301539,1.744812,-0.761207
C,0.319039,-0.24937,1.462108,-2.060141
D,-0.322417,-0.384054,1.133769,-1.099891


Unnamed: 0,W,X,Y,Z
A,1.624345,-0.611756,-0.528172,-1.072969
B,0.865408,-2.301539,1.744812,-0.761207
C,0.319039,-0.24937,1.462108,-2.060141
D,-0.322417,-0.384054,1.133769,-1.099891
E,-0.172428,-0.877858,0.042214,0.582815


** Selecting Rows**

W    1.624345
X   -0.611756
Y   -0.528172
Z   -1.072969
Name: A, dtype: float64

Or select based off of position instead of label 

W    1.624345
X   -0.611756
Y   -0.528172
Z   -1.072969
Name: A, dtype: float64

Unnamed: 0,W,X,Y,Z
A,1.624345,-0.611756,-0.528172,-1.072969
B,0.865408,-2.301539,1.744812,-0.761207
C,0.319039,-0.24937,1.462108,-2.060141
D,-0.322417,-0.384054,1.133769,-1.099891
E,-0.172428,-0.877858,0.042214,0.582815


** Selecting subset of rows and columns **

1.74481176421648

Unnamed: 0,W,Y
A,2.70685,0.907969
B,0.651118,-0.848077


### Conditional Selection

An important feature of pandas is conditional selection using bracket notation, very similar to numpy:

In [44]:
df

Unnamed: 0,W,X,Y,Z
A,1.624345,-0.611756,-0.528172,-1.072969
B,0.865408,-2.301539,1.744812,-0.761207
C,0.319039,-0.24937,1.462108,-2.060141
D,-0.322417,-0.384054,1.133769,-1.099891
E,-0.172428,-0.877858,0.042214,0.582815


Unnamed: 0,W,X,Y,Z
B,0.865408,-2.301539,1.744812,-0.761207
C,0.319039,-0.24937,1.462108,-2.060141
D,-0.322417,-0.384054,1.133769,-1.099891
E,-0.172428,-0.877858,0.042214,0.582815


Unnamed: 0,W,X,Y,Z
A,True,True,True,True
B,True,False,False,True
C,False,True,True,False
D,True,False,False,True
E,True,True,True,True


Unnamed: 0,W,X,Y,Z
A,2.70685,0.628133,0.907969,0.503826
B,0.651118,,,0.605965
C,,0.740122,0.528813,
D,0.188695,,,0.955057
E,0.190794,1.978757,2.605967,0.683509


Unnamed: 0,W,X,Y,Z
A,1.624345,-0.611756,-0.528172,-1.072969
B,0.865408,-2.301539,1.744812,-0.761207
C,0.319039,-0.24937,1.462108,-2.060141


A   -0.528172
B    1.744812
C    1.462108
Name: Y, dtype: float64

Unnamed: 0,Y,X
A,-0.528172,-0.611756
B,1.744812,-2.301539
C,1.462108,-0.24937


In [55]:
df

Unnamed: 0,W,X,Y,Z
A,1.624345,-0.611756,-0.528172,-1.072969
B,0.865408,-2.301539,1.744812,-0.761207
C,0.319039,-0.24937,1.462108,-2.060141
D,-0.322417,-0.384054,1.133769,-1.099891
E,-0.172428,-0.877858,0.042214,0.582815


For two conditions you can use | and & with parenthesis:

Unnamed: 0,W,X,Y,Z
B,0.865408,-2.301539,1.744812,-0.761207
C,0.319039,-0.24937,1.462108,-2.060141


## More Index Details

Let's discuss some more features of indexing, including resetting the index or setting it something else. We'll also talk about index hierarchy!

In [209]:
df

Unnamed: 0,W,X,Y,Z
A,2.70685,0.628133,0.907969,0.503826
B,0.651118,-0.319318,-0.848077,0.605965
C,-2.018168,0.740122,0.528813,-0.589001
D,0.188695,-0.758872,-0.933237,0.955057
E,0.190794,1.978757,2.605967,0.683509


Unnamed: 0,W,X,Y,Z,States
A,1.624345,-0.611756,-0.528172,-1.072969,CA
B,0.865408,-2.301539,1.744812,-0.761207,NY
C,0.319039,-0.24937,1.462108,-2.060141,WY
D,-0.322417,-0.384054,1.133769,-1.099891,OR
E,-0.172428,-0.877858,0.042214,0.582815,CO


Unnamed: 0_level_0,W,X,Y,Z
States,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1
CA,2.70685,0.628133,0.907969,0.503826
NY,0.651118,-0.319318,-0.848077,0.605965
WY,-2.018168,0.740122,0.528813,-0.589001
OR,0.188695,-0.758872,-0.933237,0.955057
CO,0.190794,1.978757,2.605967,0.683509


In [215]:
df

Unnamed: 0,W,X,Y,Z,States
A,2.70685,0.628133,0.907969,0.503826,CA
B,0.651118,-0.319318,-0.848077,0.605965,NY
C,-2.018168,0.740122,0.528813,-0.589001,WY
D,0.188695,-0.758872,-0.933237,0.955057,OR
E,0.190794,1.978757,2.605967,0.683509,CO


In [216]:
#but the index as states are not applied for applying what you do ...


In [218]:
df

Unnamed: 0_level_0,W,X,Y,Z
States,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1
CA,2.70685,0.628133,0.907969,0.503826
NY,0.651118,-0.319318,-0.848077,0.605965
WY,-2.018168,0.740122,0.528813,-0.589001
OR,0.188695,-0.758872,-0.933237,0.955057
CO,0.190794,1.978757,2.605967,0.683509


# Great Job!