#Pandas Advanced Cookbook

This is a list of advanced topics you can tackle using Pandas including some decription. Have fun!

In [22]:
import pandas as pd
import numpy as np

## Create your own DataFrame for whatever you need it

In [23]:
df2 = pd.DataFrame({ 'A' : 1.,
                     'B' : pd.Timestamp('20130102'),
                     'C' : pd.Series(1,index=list(range(4)),dtype='float32'),
                     'D' : np.array([3] * 4,dtype='int32'),
                     'E' : pd.Categorical(["test","train","test","train"]),
                     'F' : 'foo' })
df2

Unnamed: 0,A,B,C,D,E,F
0,1,2013-01-02,1,3,test,foo
1,1,2013-01-02,1,3,train,foo
2,1,2013-01-02,1,3,test,foo
3,1,2013-01-02,1,3,train,foo


## Create your own DataFrame and append values
This could be used eg. in loops where you want to add values based on findings.
Note that while creating the index is set to column 0

the series name will automatically be used as index if you provide it.
if we do not name the series, you need to add ignore_index on the append statement or else it will fail.

In [15]:
pdf = pd.DataFrame({0:0,
                    1:1,
                    2:2,
                    3:3,
                    4:4}, index=['initial'])
ps = pd.Series([1,4,3,2,1], name='add1') 
pdf = pdf.append(ps)
pdf

Unnamed: 0,0,1,2,3,4
initial,0,1,2,3,4
add1,1,4,3,2,1


If you don't want any index column, try it like this to create the DataFrame wihout it

In [16]:
pdf = pd.DataFrame({0:[0],
                    1:[1],
                    2:[2],
                    3:[3],
                    4:[4]})
ps = pd.Series([1,4,3,2,1])
pdf = pdf.append(ps, ignore_index=True)
pdf

Unnamed: 0,0,1,2,3,4
0,0,1,2,3,4
1,1,4,3,2,1


## Finding wildcard text in a DataFrame

In [17]:
data = {'gg' : {'first' : 'Gernot', 
                'last'  : 'Greimler'},
        'eh' : {'first' : 'Erich',  
                'last'  : 'Heil'}
       }
df = pd.DataFrame(data)
df

Unnamed: 0,eh,gg
first,Erich,Gernot
last,Heil,Greimler


Ok we don't like that the colums are our initials and first/last name are row descriptions, so lets transpond

In [18]:
df = df.T
df

Unnamed: 0,first,last
eh,Erich,Heil
gg,Gernot,Greimler


now if i only want to find Mr. Heil based on the letters "He"

In [20]:
who = df[df['last'].str.contains('He*', regex=True, na=False)] #ignore NaN values and use regex so * works
who

Unnamed: 0,first,last
eh,Erich,Heil
