## Introduction to Pandas

In [4]:
import numpy as np
import pandas as pd

In [6]:
pd.__version__

'0.23.4'

### Series, Dataframes, and Index

#### Series - a one dimensional array of indexed data

In [9]:
dat_series = pd.Series([1,2,3])
dat_series

0    1
1    2
2    3
dtype: int64

In [10]:
dat_series.values

array([1, 2, 3])

In [11]:
dat_series.index

RangeIndex(start=0, stop=3, step=1)

You can set the indices however you wish

In [13]:
dat_series = pd.Series([1.25, 2.5, 3.75, 5], index = ['a','b','c','d'])
dat_series

a    1.25
b    2.50
c    3.75
d    5.00
dtype: float64

In [16]:
dat_series['a']

1.25

You can create a series through a dict

In [21]:
dat_series = pd.Series({'a':1, 'b':2, 'c':3})
dat_series

a    1
b    2
c    3
dtype: int64

#### Data Frame - Two Dimensional Array w/flexible row and column names

In [23]:
dat_frame = pd.DataFrame(dat_series)
dat_frame

Unnamed: 0,0
a,1
b,2
c,3


If missing values, Pandas fills with NaN

In [24]:
pd.DataFrame([{'a': 1, 'b': 2}, {'b': 3, 'c': 4}])

Unnamed: 0,a,b,c
0,1.0,2,
1,,3,4.0


can create dataframes by hand

In [None]:
pd.DataFrame(np.random.rand(3, 2),
                         columns=['foo', 'bar'],
                         index=['a', 'b', 'c'])

### Index Object - like arrays, but immutable

In [28]:
dat_ind = pd.Index([1,2,3,6,7,8])
dat_ind

Int64Index([1, 2, 3, 6, 7, 8], dtype='int64')

In [29]:
ind[0] = 7 # wont work

NameError: name 'ind' is not defined

Indices are useful for unions, intersections, and the like

In [31]:
indA = pd.Index([1, 3, 5, 7, 9])
indB = pd.Index([2, 3, 5, 7, 11])

In [34]:
indA & indB

Int64Index([3, 5, 7], dtype='int64')

In [35]:
indA | indB

Int64Index([1, 2, 3, 5, 7, 9, 11], dtype='int64')

In [36]:
indA ^ indB # symmetrical difference

Int64Index([1, 2, 9, 11], dtype='int64')

#### Column wise operation

In [62]:
df_a.sub(df_a[0], axis=0)

Unnamed: 0,0,1,2,3
0,0,-10,-24,4
1,0,13,31,11
2,0,-12,-12,1


#### Single element operation

In [63]:
df_a - df_a[0]

Unnamed: 0,0,1,2,3
0,0.0,21.0,-8.0,
1,-31.0,13.0,16.0,
2,-16.0,3.0,-12.0,
