# Pandas.get_dummies

[reference](https://pandas.pydata.org/pandas-docs/stable/generated/pandas.get_dummies.html)

Convert categorical variable into dummy/indicator variables.

In a strucrue data, You need to preprocess for training. For example, a feature has cateroty of 'Mon, Tue, ..., Sun'. In that case, those are not propoer to train, because they are not numerical value. Pandas's `get_dummies` function convert categorical variable into  0 / 1 variables. Let's learn it with example !

In [1]:
import numpy as np
import pandas as pd

s = pd.Series(list('abca'))

Let me print the result. It's simple!

In [2]:
print(s)

0    a
1    b
2    c
3    a
dtype: object


How about `pd.get_dummies` ? This function converts the categorical variable ('a', 'b', 'c') into 0 / 1 .

In [3]:
print(pd.get_dummies(s))

   a  b  c
0  1  0  0
1  0  1  0
2  0  0  1
3  1  0  0


It's the simple concept and useful to feed into machine learning model. Then, let's learn about options. <br>
Below example is about handling of `np.nan`. Without any option, **np.nan** will be ignored.

In [4]:
s1 = ['a', 'b', np.nan]
pd.get_dummies(s1)

Unnamed: 0,a,b
0,1,0
1,0,1
2,0,0


If you want to consider it, `dummy_na = True` option is necessary.

In [5]:
pd.get_dummies(s1, dummy_na=True)

Unnamed: 0,a,b,nan
0,1,0,0
1,0,1,0
2,0,0,1


How about below example, feature "A" has ['a', 'b', ''c'] and "B" also has ['a', 'b', 'c'].

In [6]:
df = pd.DataFrame({'A': ['a', 'b', 'a'], 'B': ['b', 'a', 'c'], 'C': [1, 2, 3]})
df

Unnamed: 0,A,B,C
0,a,b,1
1,b,a,2
2,a,c,3


In this case, pandas automatically new feature name with existing one and categorical variable.

In [7]:
pd.get_dummies(df)

Unnamed: 0,C,A_a,A_b,B_a,B_b,B_c
0,1,1,0,0,1,0
1,2,0,1,1,0,0
2,3,1,0,0,0,1


You can also designate new feature's prefix

In [8]:
pd.get_dummies(df, prefix=['col1', 'col2'])

Unnamed: 0,C,col1_a,col1_b,col2_a,col2_b,col2_c
0,1,1,0,0,1,0
1,2,0,1,1,0,0
2,3,1,0,0,0,1
