# pandas
good resources:
- Books
    - Python for Data Analysis, Wes McKinney
    - Learning the Pandas library, Matt Harrison
- Online resources
    - Stack overflow
    - planetpython.org
- Podcasts
    - python bytes podcast
    - Data skeptic podcast



In [2]:
import pandas as pd

## series data structure

In [6]:
# series are like a 2 column table, with an index column and a value column
# you can make a series out of a list
cars = ['focus', 'pilot', 'sierra']
car_series = pd.Series(cars)
# a series is an object and in the case of text strings, the data stored is of the type object
print (car_series)

# you can store other data types in series, and pandas will attempt to store them as a homogenous data type
evens = [2, 4, 6]
evens_series = pd.Series(evens)
print (evens_series)

# as with numpy arrays, panda series will 'up-cast' types until it can accomodate all members as one data type
mixed = ['mouse', 2, 3.14 ]
mixed_series = pd.Series(mixed)
print (mixed_series) 

0     focus
1     pilot
2    sierra
dtype: object
0    2
1    4
2    6
dtype: int64
0    mouse
1        2
2     3.14
dtype: object


## pd handling of None values from python

In [7]:
# in a string series, pandas will convert a None value from python into the string 'None'
# in a numeric series, pandas will convert None to NaN (Not a Number) which is represented interally as a float

text_none = pd.Series(['first', 'second', None])
print (text_none)
numeric_none = pd.Series ([1, 2, None])
print (numeric_none)

0     first
1    second
2      None
dtype: object
0    1.0
1    2.0
2    NaN
dtype: float64


In [10]:
import numpy as np
# in pandas, which is built on top of numpy,
# nan and None are not comparable using traditional boolean operators
print (np.nan == None)
# in fact, two instances of nan are not even compararable
print (np.nan == np.nan)

# to perform a boolean comparison you need the numpy function isnan
print (np.isnan(np.nan))

False
False
True


## creating series from real data

In [23]:
# series with an index of named data elements can be created directly from a python dictionary
sample_dict = {'snf': 15, 'home': 24, 'rehab': 13}
dict_series = pd.Series(sample_dict)
print(dict_series)
print ("----")

# a series and its index can also be created using the series 'index' parameter
index_series = pd.Series([3, 9, 64], index=['Knee', 'Manual', 'Mako'] )
print (index_series)
print ("----")

# you can store more complex datatypes in a series for example a series of tuples
tuple_series = [('Manual', 15), ('Mako', 30)]
print (tuple_series)
print(pd.Series(tuple_series))
print ("----")

# pandas will ignore missing indexes, and return NaN for indexes that are not defined
data_set = {'tom': 'developer', 'asif': 'manager', 'ray': 'tech lead'}
roles = pd.Series (data_set, index = ['tom', 'asif', 'dan'])
print (roles)

snf      15
home     24
rehab    13
dtype: int64
----
Knee       3
Manual     9
Mako      64
dtype: int64
----
[('Manual', 15), ('Mako', 30)]
0    (Manual, 15)
1      (Mako, 30)
dtype: object
----
tom     developer
asif      manager
dan           NaN
dtype: object
