### Converting Types in Pandas

Here we will use a machine learning dataset called Health Insurance and Hours Worked By Wives, which was collected in 1993. The dataset should be converted so many of the fields are numeric. Let's see how to do so!


[Field Explanations](https://vincentarelbundock.github.io/Rdatasets/doc/Ecdat/HI.html)

In [None]:
import pandas as pd

In [None]:
df = pd.read_csv('../data/HI.csv')

In [None]:
df.head()

In [None]:
df = df.set_index('Unnamed: 0')

In [None]:
df.head()

In [None]:
df.dtypes

### "Yes" and "no" are boolean values, we can make them so using applymap. Applymap will go over every field and apply a lambda change. Let's investigate 

In [None]:
df.applymap(lambda val: 1 if val == "yes" else 0)

### We could use map on each column...

In [None]:
df['hhi'].map(lambda val: 1 if val == 'yes' else 0)

### so, we need the other values to stay if they aren't yes or no... how about we try a dictionary?

In [None]:
yes_or_no = {'yes': 1, 'no': 0}

In [None]:
df.applymap(lambda val: yes_or_no.get(val) 
            if val in yes_or_no else val)

In [None]:
df = df.applymap(lambda val: yes_or_no.get(val) 
                 if val in yes_or_no else val)

In [None]:
df.dtypes

### Let's investigate some of the other string columns...

In [None]:
df.region.value_counts()

### We could do something similar as above and keep mapping things with dictionaries, but that also can get fairly tedious! Luckily, Pandas has a tool JUST for this type of problem.

In [None]:
pd.get_dummies(df.region, prefix='region')

In [None]:
dummy_regions = pd.get_dummies(df.region, prefix='region')

### Luckily, we already know how to join and merge tables! Let's try!

In [None]:
df.join(dummy_regions)

In [None]:
df = df.join(dummy_regions)

In [None]:
df.columns

In [None]:
df = df.drop('region', axis=1)

In [None]:
df.head()

### Exercise, can you do the same for the race column? Extra bonus for also the education column!

In [None]:
# your code here!

In [None]:
%load ../solutions/04_dummies.py