# 5.1 Introduction to pandas Data Structures

Pandas has two main data structures. Series and DataFrame. 

## Series 
A one dimensional array like object containing a sequency of types. similar to NumPy types.

In [41]:
import pandas as pd
import numpy as np
obj = pd.Series([4, 7, -5, 3])
obj

0    4
1    7
2   -5
3    3
dtype: int64

The left side shows the index as one was not specified. You can specify your own index.

In [42]:
obj2 = pd.Series([4, 7, -5, 3], index=['d', 'b', 'a', 'c'])
obj2

d    4
b    7
a   -5
c    3
dtype: int64

In [43]:
obj2.index #returns information on the Series

Index(['d', 'b', 'a', 'c'], dtype='object')

Compared to Numpy arrays you can use labels in the index when selecting single or a set of values 

In [44]:
obj2[['c', 'a', 'd']]

c    3
a   -5
d    4
dtype: int64

In [45]:
obj2[obj2 > 0]

d    4
b    7
c    3
dtype: int64

In [46]:
np.exp(obj2)

d      54.598150
b    1096.633158
a       0.006738
c      20.085537
dtype: float64

a way to think aobut a Series is a fixed-length, ordered dict, as it is a mapping of the index values to data. It can be used in 
many contexts where you may use a dict

In [47]:
'b' in obj2

True

should you have a dict of data you can create a series with the folowing code 

In [48]:
sdata = {'Ohio':35000, 'Texas':7100, 'Oregon':16000, 'Utah':5000}
obj4 = pd.Series(sdata)
obj4

Ohio      35000
Texas      7100
Oregon    16000
Utah       5000
dtype: int64

Should you want them in a particular order you can pass a list of index's 

In [49]:
states = ['California', 'Ohio', 'Oregon', 'Texas']
obj4 = pd.Series(sdata, index=states)
obj4

California        NaN
Ohio          35000.0
Oregon        16000.0
Texas          7100.0
dtype: float64

the .isnull and .notnull function should be used to detect missing data

In [50]:
pd.isnull(obj4)

California     True
Ohio          False
Oregon        False
Texas         False
dtype: bool

In [51]:
pd.notnull(obj4)

California    False
Ohio           True
Oregon         True
Texas          True
dtype: bool

A useful Series feature is that it automatically aligns by index label in arithmetic operations 

In [52]:
obj4 + obj4

California        NaN
Ohio          70000.0
Oregon        32000.0
Texas         14200.0
dtype: float64

In [53]:
obj4.name = 'population' # gives the object a name 
obj4.index.name = 'state' #gives the index a name 

.   
.   
.   
.   
.   
.   
.   
.   
# DataFrame

In [54]:
data = {'state': ['Ohio', 'Ohio', 'Ohio', 'Nevada', 'Nevada', 'Nevada'], 'year':[2000,2001,2002,2001,2002,2003], 'pop':[1.5, 1.7, 3.6, 2.4, 2.9, 3.2]}
frame = pd.DataFrame(data)
frame.head() #shows the first 5 rows 

Unnamed: 0,state,year,pop
0,Ohio,2000,1.5
1,Ohio,2001,1.7
2,Ohio,2002,3.6
3,Nevada,2001,2.4
4,Nevada,2002,2.9


if you specify a sequence of columns, the dataframes columns will be arranged in that order. 

In [55]:
pd.DataFrame(data, columns=['year', 'state', 'pop'])

Unnamed: 0,year,state,pop
0,2000,Ohio,1.5
1,2001,Ohio,1.7
2,2002,Ohio,3.6
3,2001,Nevada,2.4
4,2002,Nevada,2.9
5,2003,Nevada,3.2


### selecting data from the DataFrame

In [56]:
frame2 = pd.DataFrame(data, columns= ['year', 'state', 'pop', 'debt'], index=['one', 'two', 'three', 'four', 'five', 'six'])
frame2.head()

Unnamed: 0,year,state,pop,debt
one,2000,Ohio,1.5,
two,2001,Ohio,1.7,
three,2002,Ohio,3.6,
four,2001,Nevada,2.4,
five,2002,Nevada,2.9,


In [57]:
frame['state']

0      Ohio
1      Ohio
2      Ohio
3    Nevada
4    Nevada
5    Nevada
Name: state, dtype: object

In [58]:
frame.year

0    2000
1    2001
2    2002
3    2001
4    2002
5    2003
Name: year, dtype: int64

In [59]:
frame.loc[1] #gets row 1 

state    Ohio
year     2001
pop       1.7
Name: 1, dtype: object

to assign data to an empy column 

In [63]:
frame2['debt'] = np.arange(6)
frame2.head()

Unnamed: 0,year,state,pop,debt
one,2000,Ohio,1.5,0
two,2001,Ohio,1.7,1
three,2002,Ohio,3.6,2
four,2001,Nevada,2.4,3
five,2002,Nevada,2.9,4


In [71]:
del frame2['debt'] 
frame2.head() #deletes column 

Unnamed: 0,year,state,pop
one,2000,Ohio,1.5
two,2001,Ohio,1.7
three,2002,Ohio,3.6
four,2001,Nevada,2.4
five,2002,Nevada,2.9
