## Variáveis Dummies (Dummy Variables)

Dummy variables are a way to represent categorical data (like "Married" or "Single") numerically for machine learning. Instead of assigning a single number to each category, you create new binary columns for each category. A '1' indicates presence in a category, and '0' indicates absence. This converts categories into a numerical format without implying order. `OneHotEncoder` is used for this.

In [1]:
#Import Libs
from sklearn.preprocessing import LabelEncoder, OneHotEncoder
from sklearn.compose import ColumnTransformer
import pandas as pd
import numpy as np

In [2]:
#Fix Warnings
import warnings
from sklearn.exceptions import DataConversionWarning
warnings.filterwarnings(action='ignore', category=DataConversionWarning)

In [3]:
#Read the Database
df = pd.read_csv('/content/census.csv')
print(df)

       age          workclass  final-weight    education  education-num  \
0       39          State-gov         77516    Bachelors             13   
1       50   Self-emp-not-inc         83311    Bachelors             13   
2       38            Private        215646      HS-grad              9   
3       53            Private        234721         11th              7   
4       28            Private        338409    Bachelors             13   
...    ...                ...           ...          ...            ...   
32556   27            Private        257302   Assoc-acdm             12   
32557   40            Private        154374      HS-grad              9   
32558   58            Private        151910      HS-grad              9   
32559   22            Private        201490      HS-grad              9   
32560   52       Self-emp-inc        287927      HS-grad              9   

            marital-status          occupation    relationship    race  \
0            Never-marrie

In [4]:
#Divide the Dataframe in: Data and Classifier
X = df.iloc[:,0:14].values
Y = df.iloc[:, 14].values

In [5]:
#Using OneHotEncoder to Transform the Categorical Attributes in a Numeric Values
onehotencorder = ColumnTransformer(transformers=[("OneHot", OneHotEncoder(), [1,3,5,6,7,8,9,13])],remainder='passthrough')

In [6]:
#Apply OneHotEncoder
X = onehotencorder.fit_transform(X).toarray()

In [7]:
#After fix the Dummies Values, we need to convert the Array to DataFrame
X = pd.DataFrame(data=X)

In [8]:
#Show the Results
print(X)

       0    1    2    3    4    5    6    7    8    9    ...  98   99   100  \
0      0.0  0.0  0.0  0.0  0.0  0.0  0.0  1.0  0.0  0.0  ...  0.0  1.0  0.0   
1      0.0  0.0  0.0  0.0  0.0  0.0  1.0  0.0  0.0  0.0  ...  0.0  1.0  0.0   
2      0.0  0.0  0.0  0.0  1.0  0.0  0.0  0.0  0.0  0.0  ...  0.0  1.0  0.0   
3      0.0  0.0  0.0  0.0  1.0  0.0  0.0  0.0  0.0  0.0  ...  0.0  1.0  0.0   
4      0.0  0.0  0.0  0.0  1.0  0.0  0.0  0.0  0.0  0.0  ...  0.0  0.0  0.0   
...    ...  ...  ...  ...  ...  ...  ...  ...  ...  ...  ...  ...  ...  ...   
32556  0.0  0.0  0.0  0.0  1.0  0.0  0.0  0.0  0.0  0.0  ...  0.0  1.0  0.0   
32557  0.0  0.0  0.0  0.0  1.0  0.0  0.0  0.0  0.0  0.0  ...  0.0  1.0  0.0   
32558  0.0  0.0  0.0  0.0  1.0  0.0  0.0  0.0  0.0  0.0  ...  0.0  1.0  0.0   
32559  0.0  0.0  0.0  0.0  1.0  0.0  0.0  0.0  0.0  0.0  ...  0.0  1.0  0.0   
32560  0.0  0.0  0.0  0.0  0.0  1.0  0.0  0.0  0.0  0.0  ...  0.0  1.0  0.0   

       101   102       103   104      105  106   10