#Applied Data Science: Pandas Basics (by Atul Bhardwaj)

## Series
- Series & DataFrame handle most applications
- Series is a 1d array-like object containing an array of data (of any NumPy data type); simple series is array of data

In [1]:
from pandas import Series, DataFrame
import pandas as pd

obj = Series([4,7,-5,3])
obj

0    4
1    7
2   -5
3    3
dtype: int64

**Array Representation**

In [2]:
obj.values

array([ 4,  7, -5,  3], dtype=int64)

**Index Object**: The basic object storing axis labels for all pandas objects

In [3]:
obj.index #array method to call index object

Int64Index([0, 1, 2, 3], dtype='int64')

**Series with Index identifying each data point**

In [4]:
obj2 = Series([4,7,-5,3], index=['d','b','a','c'])
obj2

d    4
b    7
a   -5
c    3
dtype: int64

**Series View**

In [5]:
obj2[['c','a','d']] #only show select columns; subset

c    3
a   -5
d    4
dtype: int64

NumPy array options such as: scalar multiplication will preserve index-value positioning

In [6]:
import numpy as np

np.exp(obj2)

d      54.598150
b    1096.633158
a       0.006738
c      20.085537
dtype: float64

Series can also be thought of as fixed-length ordered dict since it maps index values to data values, same as key:value relation of dicts. Hence, functions that accept dict as arguments will also accept series as well

In [7]:
'b' in obj2

True

**Dict to Series**
- index in Series will have dict keys in sorted order

In [8]:
dict  = {'Ohio': 3500, 'Texas': 71000, 'Oregon': 16000, 'Utah': 5000}
obj3 = Series(dict)
print (type(obj3) is Series)
obj3

True


Ohio       3500
Oregon    16000
Texas     71000
Utah       5000
dtype: int64

##DataFrame##
- DF has both row & column index; data is stored as one or multiple 2d arrays, and not a collection of 1d arrays
- Common method of creating DFs is through dicts (equal-length) or arrays

**Dict to DataFrame**

In [9]:
data = {'state':['Ohio','Ohio','Ohio','Neveda','Neveda'],
        'year':[2000,2001,2002,2001,2002],
        'pop':[1.5,1.7,3.6,2.4,2.9]}
frame = DataFrame(data)
frame

Unnamed: 0,pop,state,year
0,1.5,Ohio,2000
1,1.7,Ohio,2001
2,3.6,Ohio,2002
3,2.4,Neveda,2001
4,2.9,Neveda,2002


**Selecting Columns**

In [10]:
frame['pop']

0    1.5
1    1.7
2    3.6
3    2.4
4    2.9
Name: pop, dtype: float64

In [11]:
#Multiple columns
frame[['pop', 'state']] #simply pass a list in the argument

Unnamed: 0,pop,state
0,1.5,Ohio
1,1.7,Ohio
2,3.6,Ohio
3,2.4,Neveda
4,2.9,Neveda


**Accessing specifc values**

In [12]:
frame.ix[2, 'year']

2002

In [13]:
frame.ix[2,['pop', 'state']] #multiple column values

pop       3.6
state    Ohio
Name: 2, dtype: object

##Learn More

http://pandas.pydata.org/pandas-docs/stable/