# Changing catagorical data to numeric

First we need to import pandas.

In [55]:
import pandas as pd

Now I'm going to makeup some data and put it into a dataframe.

In [56]:
# Here is a python dictionary of some data
my_data = { "shadow": ["yes", "no", "unknown", "yes", "yes", "no", "no data"], 
            "year": [1, 2, 3, 4, 5, 6, 7]}

# This is loading the data from the dictionary into a pandas datafram
df1 = pd.DataFrame.from_dict(my_data)

# Now let's look at the datafram
df1.head()

Unnamed: 0,shadow,year
0,yes,1
1,no,2
2,unknown,3
3,yes,4
4,yes,5


## Method 1

We can add a new column with numerical values. I have chosen to have **yes** -> 0, **no** -> 1, and everything else go to -1

In [57]:
# df.replace([current data possibilities], [new (numerical) data possibilities], 
# inplace=True to replcae the current column or False to put it in a new column)
df1["shadow_numeric"] = df1["shadow"].replace(['yes', 'no', 'unknown', 'no data'], [0, 1, -1, -1], inplace=False)

In [58]:
#Now let's look at the dataframe again
df1.head()

Unnamed: 0,shadow,year,shadow_numeric
0,yes,1,0
1,no,2,1
2,unknown,3,-1
3,yes,4,0
4,yes,5,0


## Method 2

We can create a "dummies" dataframe and then merge it with our existing dataframe

In [59]:
df2 = pd.DataFrame.from_dict(my_data)

df2.head()

Unnamed: 0,shadow,year
0,yes,1
1,no,2
2,unknown,3
3,yes,4
4,yes,5


In [60]:
# pd.get_dummies(column of dataframe)
dummies = pd.get_dummies(df['shadow'])

dummies.head()

Unnamed: 0,no,no data,unknown,yes
0,0,0,0,1
1,1,0,0,0
2,0,0,1,0
3,0,0,0,1
4,0,0,0,1


Since we only care about yes or no values, we can delete or **drop** the other columns.

In [61]:
# df.drop([names of columns to drop], axis=columns to drop the 
# columns or rows to drop rows from the dataframe)
dummies = dummies.drop(['no data', 'unknown'], axis='columns')

dummies.head()

Unnamed: 0,no,yes
0,0,1
1,1,0
2,0,0
3,0,1
4,0,1


Finally, we can merge or **concatonate** our original datafram with our dummies dataframe

In [62]:
# pd.concat([names of dataframes], axis=columns to drop the 
# columns or rows to drop rows from the dataframe)
df2 = pd.concat([df2, dummies], axis='columns')

In [63]:
df2.head()

Unnamed: 0,shadow,year,no,yes
0,yes,1,0,1
1,no,2,1,0
2,unknown,3,0,0
3,yes,4,0,1
4,yes,5,0,1
