## Python Pandas - Dataframes 2

In this tutorial we will take a closer look at pandas dataframe which is the true workhorse of pandas. We will look at more advanced df features such as:

- Conditional selection
- Index handling

In [1]:
import numpy as np
import pandas as pd

In [2]:
from numpy.random import randn

In [3]:
# Start by seeding the random module
np.random.seed(101)

### Conditional selection

We can use boolean operations to select values in a dataframe

In [4]:
df = pd.DataFrame(randn(5,4), ['A', 'B', 'C', 'D', 'E'], ['W', 'X', 'Y', 'Z']) 

In [5]:
df

Unnamed: 0,W,X,Y,Z
A,2.70685,0.628133,0.907969,0.503826
B,0.651118,-0.319318,-0.848077,0.605965
C,-2.018168,0.740122,0.528813,-0.589001
D,0.188695,-0.758872,-0.933237,0.955057
E,0.190794,1.978757,2.605967,0.683509


In [6]:
df > 0

Unnamed: 0,W,X,Y,Z
A,True,True,True,True
B,True,False,False,True
C,False,True,True,False
D,True,False,False,True
E,True,True,True,True


In [7]:
df[df > 0]

Unnamed: 0,W,X,Y,Z
A,2.70685,0.628133,0.907969,0.503826
B,0.651118,,,0.605965
C,,0.740122,0.528813,
D,0.188695,,,0.955057
E,0.190794,1.978757,2.605967,0.683509


In [9]:
# We may also filter rows
df[df['W'] > 0]
## Note below: Row C is gone

Unnamed: 0,W,X,Y,Z
A,2.70685,0.628133,0.907969,0.503826
B,0.651118,-0.319318,-0.848077,0.605965
D,0.188695,-0.758872,-0.933237,0.955057
E,0.190794,1.978757,2.605967,0.683509


In [10]:
# Select only the rows where Z is less than zero
df[df['Z'] < 0]

Unnamed: 0,W,X,Y,Z
C,-2.018168,0.740122,0.528813,-0.589001


In [12]:
# We can do multiple conditions
df[(df['Z']) < 0 | (df['X'] < 0)]

Unnamed: 0,W,X,Y,Z
B,0.651118,-0.319318,-0.848077,0.605965
C,-2.018168,0.740122,0.528813,-0.589001
D,0.188695,-0.758872,-0.933237,0.955057


### Index handling

With pandas, we can reset the index to numbers and create the index column for the old index.  We can also set new indices and do handling of the index and stuff

In [14]:
## Resetting index
df.reset_index()

Unnamed: 0,index,W,X,Y,Z
0,A,2.70685,0.628133,0.907969,0.503826
1,B,0.651118,-0.319318,-0.848077,0.605965
2,C,-2.018168,0.740122,0.528813,-0.589001
3,D,0.188695,-0.758872,-0.933237,0.955057
4,E,0.190794,1.978757,2.605967,0.683509


In [20]:
## Lets set a new index!
new_index = 'KO AR JO PA SI'.split()
print(new_index)

['KO', 'AR', 'JO', 'PA', 'SI']


In [21]:
df['COWORKERS'] = new_index

In [23]:
df.set_index('COWORKERS')
# Note that the old index gets deleted

Unnamed: 0_level_0,W,X,Y,Z
COWORKERS,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1
KO,2.70685,0.628133,0.907969,0.503826
AR,0.651118,-0.319318,-0.848077,0.605965
JO,-2.018168,0.740122,0.528813,-0.589001
PA,0.188695,-0.758872,-0.933237,0.955057
SI,0.190794,1.978757,2.605967,0.683509
