## Agenda: Indexing and dtypes

1. Indexes
   - Setting
   - Resetting
2. inplace=True
3. dtypes

In [8]:
import numpy as np
import pandas as pd #if Pandas isn't yet imported, do so -- and give it the alias "pd"
from pandas import Series, DataFrame # if Pandas isn't yet imported, do so -- and define Series + DataFrame

In [9]:
s = Series([10,20,30,40,50])
s

0    10
1    20
2    30
3    40
4    50
dtype: int64

In [10]:
s.index = list('abcde')
s

a    10
b    20
c    30
d    40
e    50
dtype: int64

In [11]:
# what happens if I don't want this index any more?
# what happens if I want the default index?

s.reset_index()

Unnamed: 0,index,0
0,a,10
1,b,20
2,c,30
3,d,40
4,e,50


In [15]:
s

a    10
b    20
c    30
d    40
e    50
dtype: int64

### set_index

Many methods in Pandas don't modify the series/data frame. Rather, they return a new series/data frame, one that reflects the chagne that we have made.

If we want to "capture" this change in a variable, or even in the original variable we used, then we have to assign.

In [16]:
df = s.reset_index()
df

Unnamed: 0,index,0
0,a,10
1,b,20
2,c,30
3,d,40
4,e,50


In [17]:
#if I look at df, what do I see?

df

Unnamed: 0,index,0
0,a,10
1,b,20
2,c,30
3,d,40
4,e,50


In [18]:
# set_index returns a new data frame, one based on df, but it doesn't modify df.
# in order to do that, you need to assign df

df = df.set_index('index')
df

Unnamed: 0_level_0,0
index,Unnamed: 1_level_1
a,10
b,20
c,30
d,40
e,50


In [19]:
#if I want to get a series back, with just our current index and column 0, I can retrieve that column with [0].

df[0]

index
a    10
b    20
c    30
d    40
e    50
Name: 0, dtype: int64

In [20]:
df=df.reset_index()

In [21]:
df

Unnamed: 0,index,0
0,a,10
1,b,20
2,c,30
3,d,40
4,e,50


In [22]:
# get all values from column 0
# where index is 'c'
df.loc[df['index'] == 'c', 0]

2    30
Name: 0, dtype: int64

## Exercise: Weather and indexes

1. Create a series in which the index contains the dates in MMDD format ('05320', '0521'), all as strings. The values should be the expected high temperatures for the next 10 days.
2. Use 'reset_index'. What do you see?
3. Use a mask index to retreive the projected high temps for May 22 and May 25.
4. Use set_index and .loc to achieve the same goal

In [23]:
s = Series([77,71,74,78,81,76,78,83,85,93], index='0529 0530 0531 0601 0602 0603 0604 0605 0607 0608'.split()) 

In [24]:
s

0529    77
0530    71
0531    74
0601    78
0602    81
0603    76
0604    78
0605    83
0607    85
0608    93
dtype: int64

In [25]:
s.reset_index()

Unnamed: 0,index,0
0,529,77
1,530,71
2,531,74
3,601,78
4,602,81
5,603,76
6,604,78
7,605,83
8,607,85
9,608,93


In [26]:
s

0529    77
0530    71
0531    74
0601    78
0602    81
0603    76
0604    78
0605    83
0607    85
0608    93
dtype: int64

In [27]:
df=s.reset_index()

In [28]:
df

Unnamed: 0,index,0
0,529,77
1,530,71
2,531,74
3,601,78
4,602,81
5,603,76
6,604,78
7,605,83
8,607,85
9,608,93


In [29]:
(df['index'] == '0529') | (df['index'] == '0602')

0     True
1    False
2    False
3    False
4     True
5    False
6    False
7    False
8    False
9    False
Name: index, dtype: bool

In [None]:
#I can apply a boolean series to any series who