# Pandas Tutorial

Brandon Rhodes PyCon15

#! git clone https://github.com/brandon-rhodes/pycon-pandas-tutorial.git


[Data](pages.stern.nyu.edu/~dbackus/csv/)


In [1]:
%matplotlib inline

import numpy as np
import pandas as pd

To change the display of the pandas output 

In [4]:
from IPython.core.display import HTML

css = open('pycon-pandas-tutorial/style-table.css').read() #+ open('pycon-pandas-tutorial/style-notebook.css').read()
HTML('<style>{}</style>'.format(css))

In [8]:
titles = pd.DataFrame.from_csv(
    'pycon-pandas-tutorial/titles.csv', 
    index_col = None, 
    encoding = 'utf-8')
titles.head()

Unnamed: 0,title,year
0,Orlando Vargas,2005
1,Niu-Peng,1989
2,Anandabhadram,2005
3,Mahendra Varma,1993
4,Beomdiga shidae,1970


In [11]:
cast = pd.DataFrame.from_csv(
    'pycon-pandas-tutorial/cast.csv',
    index_col = None,
    encoding = 'utf-8')
cast.head(10)

Unnamed: 0,title,year,name,type,character,n
0,Suuri illusioni,1985,Homo $,actor,Guests,22.0
1,Gangsta Rap: The Glockumentary,2007,Too $hort,actor,Himself,
2,Menace II Society,1993,Too $hort,actor,Lew-Loc,27.0
3,Porndogs: The Adventures of Sadie,2009,Too $hort,actor,Bosco,3.0
4,Stop Pepper Palmer,2014,Too $hort,actor,Himself,
5,Townbiz,2010,Too $hort,actor,Himself,
6,For Thy Love 2,2009,Bee Moe $lim,actor,Thug 1,
7,Desire (III),2014,Syaiful 'Ariffin,actor,Actor Playing Eteocles from 'Antigone',
8,When the Man Went South,2014,Taipaleti 'Atu'ake,actor,Two Palms - Ua'i Paame,8.0
9,Little Angel (Angelita),2015,Michael 'babeepower' Viera,actor,Chico,9.0


In [14]:
len(titles), list(titles)

(214591, ['title', 'year'])

You may see some Jump/Hitch in notebook whenever a cell with existing ouput is ran. When a cell is executed (Shift/Ctrl+Enter), Ipython notebook discards the existing output. DOM of the browser kicks in clearing the bottom screen and scrolls up, since there is no need for that extra blank space. Ipython then produces a new output which falls below the view scope.

### head and tail slices and creates a new DataFrame


In [26]:
h = titles.head(10)
h

Unnamed: 0,title,year
0,Orlando Vargas,2005
1,Niu-Peng,1989
2,Anandabhadram,2005
3,Mahendra Varma,1993
4,Beomdiga shidae,1970
5,100Volta,2009
6,Atakku no. 1,1970
7,Uroki v kontse vesny,1991
8,Un sac de billes,1975
9,The Trouble with Men and Women,2005


## Series

In [27]:
h['title']
h.title 

0                    Orlando Vargas
1                          Niu-Peng
2                     Anandabhadram
3                    Mahendra Varma
4                   Beomdiga shidae
5                          100Volta
6                      Atakku no. 1
7              Uroki v kontse vesny
8                  Un sac de billes
9    The Trouble with Men and Women
Name: title, dtype: object

Mathermatical operations can be performed on a **Series**

In [28]:
h['year'] // 10 + 1000

0    1200
1    1198
2    1200
3    1199
4    1197
5    1200
6    1197
7    1199
8    1197
9    1200
Name: year, dtype: int64

In [29]:
# > is comparision opperator. Comp operator returns boolean (True|False)

h['year'] > 1985

0     True
1     True
2     True
3     True
4    False
5     True
6    False
7     True
8    False
9     True
Name: year, dtype: bool

In [38]:
h[h['year'] < 1985]

#The output of the Series (boolean) is fed into the DataFrame. 
#Only the True values of DF is returned

Unnamed: 0,title,year
4,Beomdiga shidae,1970
6,Atakku no. 1,1970
8,Un sac de billes,1975


### Gotcha 1

<code> h[h['year'] < 1985 and h['year'] >= 1990] </code> 
will throw an Exception error: The truth value of a Series is ambiguous.

**and** is a python operation that compares the left argument `(h['year'] < 1985)` to be True or False  and moves on to compare the right argument `(h['year'] >= 1990)`. **and** can compare only single value not a Series/list of values.


**`&`** is a bitwise opertor `and`. It can compare a list of bits against other.

The only problem is operator precedence (PEMDAS :)). For an input 

<pre><code>`h[h['year'] < 1985 & h['year'] >= 1990]` </code></pre>

**`&`** first compares '1990' and 'h.year' first and then compares the outer. So use `()` to order the operator precedence.

In [42]:
h[(h['year'] < 1990) & (h['year'] <= 1971)]
h[(h['year'] < 1990) | (h['year'] <= 1971)]

Unnamed: 0,title,year
1,Niu-Peng,1989
4,Beomdiga shidae,1970
6,Atakku no. 1,1970
8,Un sac de billes,1975


### Gotcha 2

**evaluating != assigning**
<pre><code>
x = 10; h = titles.head(10)
x + 10; h[h.year > 1980]
x     ; h  
</pre> </code>

x is still 10 ! 20 ; h is still `titles.head(10)`