# Creating dummy variables in Pandas

- toc: true
- badges: false
- comments: true
- categories: [python, pandas]

A quick post to remind my future self of how to create dummy variables.

In [2]:
import pandas as pd

In [3]:
df = pd.DataFrame({
    'id': [1, 2, 3, 4, 5],
    'quality': ['good', 'excellent', 'very good', 'excellent', 'good']
})
df.head()

Unnamed: 0,id,quality
0,1,good
1,2,excellent
2,3,very good
3,4,excellent
4,5,good


Pandas makes creating dummies easy:

In [6]:
pd.get_dummies(df.quality)

Unnamed: 0,excellent,good,very good
0,0,1,0
1,1,0,0
2,0,0,1
3,1,0,0
4,0,1,0


If you want to label the source of the data, you can use the prefix argument:

In [7]:
pd.get_dummies(df.quality, prefix='quality')

Unnamed: 0,quality_excellent,quality_good,quality_very good
0,0,1,0
1,1,0,0
2,0,0,1
3,1,0,0
4,0,1,0


Often when we work with dummies from a variable with $n$ distinct values, we create $n-1$ dummies and treat the remaining group as the reference group. Pandas provides a convenient way to do this:

In [8]:
pd.get_dummies(df.quality, prefix='quality', drop_first=True)

Unnamed: 0,quality_good,quality_very good
0,1,0
1,0,0
2,0,1
3,0,0
4,1,0


Usually, we'll want to use the dummies with the rest of the data, so it's conveninet to have them in the original dataframe. One way to do this is to use concat like so:

In [9]:
dummies = pd.get_dummies(df.quality, prefix='quality', drop_first=True)
df_with_dummies = pd.concat([df, dummies], axis=1)
df_with_dummies.head()

Unnamed: 0,id,quality,quality_good,quality_very good
0,1,good,1,0
1,2,excellent,0,0
2,3,very good,0,1
3,4,excellent,0,0
4,5,good,1,0


This works. But Pandas provides a much easier way:

In [10]:
df_with_dummies1 = pd.get_dummies(df, columns=['quality'], drop_first=True)
df_with_dummies1

Unnamed: 0,id,quality_good,quality_very good
0,1,1,0
1,2,0,0
2,3,0,1
3,4,0,0
4,5,1,0


That's it. In one line we get a new dataframe that includes the dummies and excludes the original quality column.

# Sources

- [Data School](https://www.youtube.com/watch?v=0s_1IsROgDc&list=PL5-da3qGB5ICCsgW1MxlZ0Hq8LL5U3u9y&index=24)