# Agenda: Indexing and dtypes

1. Indexes
    - Setting
    - Resetting
2. `inplace=True`
3. dtypes

In [3]:
import numpy as np
import pandas as pd       # if Pandas isn't yet imported, do so -- and give it the alias "pd"
from pandas import Series, DataFrame  # if Pandas isn't yet imported, do so -- and define Series + DataFrame

In [4]:
s = Series([10, 20, 30, 40, 50])
s

0    10
1    20
2    30
3    40
4    50
dtype: int64

In [5]:
s.index = list('abcde')
s

a    10
b    20
c    30
d    40
e    50
dtype: int64

In [6]:
# what happens if I don't want this index any more?
# what happens if I want the default index?

s.reset_index()

Unnamed: 0,index,0
0,a,10
1,b,20
2,c,30
3,d,40
4,e,50


In [7]:
s

a    10
b    20
c    30
d    40
e    50
dtype: int64

# `reset_index`

Many, *many* methods in Pandas don't modify the series/data frame. Rather, they return a new series/data frame, one that reflects the change that we have made.

If we want to "capture" this change in a variable, or even in the original variable we used, then we have to assign.

In [8]:
df = s.reset_index()
df

Unnamed: 0,index,0
0,a,10
1,b,20
2,c,30
3,d,40
4,e,50


# `set_index`

If I have a data frame, and I want to use one of its columns as the index, then I can call `set_index`, indicating which column should be used. I then get back a new data frame with the specified column as an index.

In [10]:
df.set_index('index')

Unnamed: 0_level_0,0
index,Unnamed: 1_level_1
a,10
b,20
c,30
d,40
e,50


In [11]:
# if I look at df, what do I see?

df

Unnamed: 0,index,0
0,a,10
1,b,20
2,c,30
3,d,40
4,e,50


In [12]:
# set_index returns a new data frame, one based on df, but it doesn't modify df.
# in order to do that, you need to assign to df

df = df.set_index('index')
df

Unnamed: 0_level_0,0
index,Unnamed: 1_level_1
a,10
b,20
c,30
d,40
e,50


In [13]:
# if I want to get a series back, with just our current index and column 0, I can
# retrieve that column with [0].

df[0]

index
a    10
b    20
c    30
d    40
e    50
Name: 0, dtype: int64

In [15]:
df = df.reset_index()

In [16]:
df

Unnamed: 0,index,0
0,a,10
1,b,20
2,c,30
3,d,40
4,e,50


In [18]:
# get all values from column 0
# where index is 'c'
df.loc[df['index'] == 'c', 0]

2    30
Name: 0, dtype: int64

In [19]:
# it's far easier and more intiuitive to set "index" to be the index,
# and then just use .loc to pull out the value(s) we want at that index

(
    df
    .set_index('index')
    .loc['c']  # here, I'm running .loc not on df, but on the result of running df.set_index
)

0    30
Name: c, dtype: int64

# Exercise: Weather and indexes

1. Create a series in which the index contains the dates in MMDD format ('0520', '0521'), all as strings. The values should be the expected high temperatures for the next 10 days.
2. Use `reset_index`. What do you see?
3. Use a mask index to retrieve the projected high temps for May 22 and May 25.
4. Use `set_index` and `.loc` to achieve the same goal.