# <--Panda Introduction-->
---

---


## Series
Series objects can be used as 1Dimension NumPy arrays

## Learning objects
1. Making Series objects from python lists and dicts
2. Extracting indexes and values
3. Indexing Series objects implicitly and explicitly

### Creating series from a list

In [49]:
import pandas as pd
import numpy as np
import os
import matplotlib.pyplot as plt
import seaborn as sns

You can easily create a Series from a list, and give it a name.

In [50]:
s = pd.Series([0,1,4,9,16,25],name='squares')
s

0     0
1     1
2     4
3     9
4    16
5    25
Name: squares, dtype: int64

For listing values.


In [51]:
s.values

array([ 0,  1,  4,  9, 16, 25])

You can also show the indices by using index

In [52]:
s.index

RangeIndex(start=0, stop=6, step=1)

In [53]:
s[0]

0

In [54]:
s[2]

4

You can also use standard slicing.

In [55]:
s[2:4]

2    4
3    9
Name: squares, dtype: int64

# DataFrames

Dataframes are a bit like arrays - they can store dimensional data.

1. Making DataFrames from Series objects, Python dicts, NumPy arrays
2. Setting indexes
3. Selecting, combining, and creating columns
4. Performing relational joins on DataFrames

In [56]:
color2015 = pd.Series(
    ['red','blue','green','brown','pink','yellow'],
    index=[0,1,2,3,4,5]
)

In [57]:
color2016 = pd.Series(
    ['orange','purple','black','turquise','magenta','gold'],
    index=[0,1,2,3,4,5]
)

In [58]:
twoyrs = pd.DataFrame({'2015':color2015,'2016':color2016})

In [59]:
twoyrs

Unnamed: 0,2015,2016
0,red,orange
1,blue,purple
2,green,black
3,brown,turquise
4,pink,magenta
5,yellow,gold


In [60]:
presidents = pd.DataFrame([{'name':'Barack Obama','inauguration':2009,'birthyear':1961},
              {'name':'George W. Bush','inauguration':2001,'birthyear':1946},
              {'name':'Bill Clinton','birthyear':1946,'inauguration':1993},
              {'name':'George H. W. Bush','inauguration':1989,'birthyear':1924}])

In [61]:
presidents

Unnamed: 0,birthyear,inauguration,name
0,1961,2009,Barack Obama
1,1946,2001,George W. Bush
2,1946,1993,Bill Clinton
3,1924,1989,George H. W. Bush


The set_index function defines an index for the DataFrame. I think of this a bit like a primary key.

In [62]:
presidents_indexes = presidents.set_index('name')

In [63]:
presidents_indexes

Unnamed: 0_level_0,birthyear,inauguration
name,Unnamed: 1_level_1,Unnamed: 2_level_1
Barack Obama,1961,2009
George W. Bush,1946,2001
Bill Clinton,1946,1993
George H. W. Bush,1924,1989


In [64]:
presidents_indexes.loc['Bill Clinton']

birthyear       1946
inauguration    1993
Name: Bill Clinton, dtype: int64

In [65]:
presidents_indexes.loc['Bill Clinton']['inauguration']

1993

In [66]:
presidents_fathers = pd.DataFrame([
    {'son':'Barack Obama','father':'Barack Obama, Sr.'},
    {'son':'George W. Bush','father':'George H. W. Bush'},
    {'son':'George H. W. Bush','father':'Prescott Bush'}
])

In [67]:
presidents_fathers

Unnamed: 0,father,son
0,"Barack Obama, Sr.",Barack Obama
1,George H. W. Bush,George W. Bush
2,Prescott Bush,George H. W. Bush


In [68]:
pd.merge(presidents,presidents_fathers,left_on='name',right_on='son')

Unnamed: 0,birthyear,inauguration,name,father,son
0,1961,2009,Barack Obama,"Barack Obama, Sr.",Barack Obama
1,1946,2001,George W. Bush,George H. W. Bush,George W. Bush
2,1924,1989,George H. W. Bush,Prescott Bush,George H. W. Bush


Create a 1:1 join using the merge method.

In [69]:
pd.merge(presidents,presidents_fathers,left_on='name',right_on='son').drop('son',axis=1)

Unnamed: 0,birthyear,inauguration,name,father
0,1961,2009,Barack Obama,"Barack Obama, Sr."
1,1946,2001,George W. Bush,George H. W. Bush
2,1924,1989,George H. W. Bush,Prescott Bush


To create left joins, add in the how='left' argument

In [70]:
pd.merge(presidents,presidents_fathers,left_on='name',right_on='son',how='left').drop('son',axis=1)

Unnamed: 0,birthyear,inauguration,name,father
0,1961,2009,Barack Obama,"Barack Obama, Sr."
1,1946,2001,George W. Bush,George H. W. Bush
2,1946,1993,Bill Clinton,
3,1924,1989,George H. W. Bush,Prescott Bush


# Aggregation

In [71]:
prez = pd.merge(presidents,presidents_fathers,left_on='name',right_on='son',how='left').drop('son',axis=1)

Pandas is very good at reading different types of files. Such as:
-JSON
-csv
-text files


In [72]:
prez

Unnamed: 0,birthyear,inauguration,name,father
0,1961,2009,Barack Obama,"Barack Obama, Sr."
1,1946,2001,George W. Bush,George H. W. Bush
2,1946,1993,Bill Clinton,
3,1924,1989,George H. W. Bush,Prescott Bush


In [73]:
prez.mean()

birthyear       1944.25
inauguration    1998.00
dtype: float64

In [74]:
prez.describe()

Unnamed: 0,birthyear,inauguration
count,4.0,4.0
mean,1944.25,1998.0
std,15.239751,8.869423
min,1924.0,1989.0
25%,1940.5,1992.0
50%,1946.0,1997.0
75%,1949.75,2003.0
max,1961.0,2009.0


In [75]:
prez.head()

Unnamed: 0,birthyear,inauguration,name,father
0,1961,2009,Barack Obama,"Barack Obama, Sr."
1,1946,2001,George W. Bush,George H. W. Bush
2,1946,1993,Bill Clinton,
3,1924,1989,George H. W. Bush,Prescott Bush
