# Create dummy variables based on one column

In [1]:
# Package: pandas
# Function: get_dummies()

In [2]:
import pandas as pd
df = pd.DataFrame(['A','B','C','A','D'],columns=['User'])
df

Unnamed: 0,User
0,A
1,B
2,C
3,A
4,D


In [3]:
df_dum = pd.get_dummies(df['User'],prefix='User')
df_dum.head()

Unnamed: 0,User_A,User_B,User_C,User_D
0,1,0,0,0
1,0,1,0,0
2,0,0,1,0
3,1,0,0,0
4,0,0,0,1


## Drop First Column

Drop first column is widely required. For the dummy columns, any single one of them can be generated from the rest.It would be nice to have less columns for the same information. Further, the undropped dummy dataframe cause multicolinearity for many models, it would be nice to drop them here.

In [4]:
df_dropped = pd.get_dummies(df['User'], prefix='User', drop_first=True)
df_dropped

Unnamed: 0,User_B,User_C,User_D
0,0,0,0
1,1,0,0
2,0,1,0
3,0,0,0
4,0,0,1


## Change data type to float

Notice the data type of the create dummy values are objects, sometimes we need to change it to float.

In [5]:
print(df_dropped.dtypes,'\n')

User_B    uint8
User_C    uint8
User_D    uint8
dtype: object 



In [6]:
-df_dropped
# notice this will output 255s

Unnamed: 0,User_B,User_C,User_D
0,0,0,0
1,255,0,0
2,0,255,0
3,0,0,0
4,0,0,255


In [7]:
# change the data type
df_dropped = df_dropped.astype('float')
df_dropped

Unnamed: 0,User_B,User_C,User_D
0,0.0,0.0,0.0
1,1.0,0.0,0.0
2,0.0,1.0,0.0
3,0.0,0.0,0.0
4,0.0,0.0,1.0


In [8]:
-df_dropped
# now they becomes -1.0s

Unnamed: 0,User_B,User_C,User_D
0,-0.0,-0.0,-0.0
1,-1.0,-0.0,-0.0
2,-0.0,-1.0,-0.0
3,-0.0,-0.0,-0.0
4,-0.0,-0.0,-1.0
