### NOMINAL OR ONE HOT ENCODING


One hot encoding, also known as nominal encoding, is a technique used to represent categorical data as numerical data, which is more suitable for machine learning algorithms. In this technique, each category is represented as a binary vector where each bit corresponds to a unique category. For example, if we have a categorical variable "color" with three possible values (red, green, blue), we can represent it using one hot encoding as follows:

1. Red: [1, 0, 0]
2. Green: [0, 1, 0]
3. Blue: [0, 0, 1]

In [21]:
import pandas as pd
from sklearn.preprocessing import OneHotEncoder

In [22]:
df = pd.DataFrame({
    'color': ['red', 'blue', 'green', 'green', 'red', 'blue']
})

In [23]:
df

Unnamed: 0,color
0,red
1,blue
2,green
3,green
4,red
5,blue


In [24]:
encoder = OneHotEncoder()
encoded = encoder.fit_transform(df[['color']]).toarray()

In [25]:
encoded_df = pd.DataFrame(encoded,columns=encoder.get_feature_names_out()) ##get_feature_names_out: new column name will be feature(color) name and value(red,green,blue)

In [26]:
encoded_df

Unnamed: 0,color_blue,color_green,color_red
0,0.0,0.0,1.0
1,1.0,0.0,0.0
2,0.0,1.0,0.0
3,0.0,1.0,0.0
4,0.0,0.0,1.0
5,1.0,0.0,0.0


In [27]:
pd.concat([df,encoded_df],axis=1)

Unnamed: 0,color,color_blue,color_green,color_red
0,red,0.0,0.0,1.0
1,blue,1.0,0.0,0.0
2,green,0.0,1.0,0.0
3,green,0.0,1.0,0.0
4,red,0.0,0.0,1.0
5,blue,1.0,0.0,0.0


In [28]:
## for new data
encoder.transform([['red']]).toarray()



array([[0., 0., 1.]])

In [None]:
################################################################################################################################

In [29]:
import seaborn as sns

In [31]:
df = sns.load_dataset('tips')
df.head()

Unnamed: 0,total_bill,tip,sex,smoker,day,time,size
0,16.99,1.01,Female,No,Sun,Dinner,2
1,10.34,1.66,Male,No,Sun,Dinner,3
2,21.01,3.5,Male,No,Sun,Dinner,3
3,23.68,3.31,Male,No,Sun,Dinner,2
4,24.59,3.61,Female,No,Sun,Dinner,4


In [32]:
from sklearn.preprocessing import OneHotEncoder

In [35]:
encoder = OneHotEncoder()
encoded = encoder.fit_transform(df[['sex','smoker','time']]).toarray()

In [36]:
df_encoded = pd.DataFrame(encoded,columns=encoder.get_feature_names_out())

In [37]:
df_encoded

Unnamed: 0,sex_Female,sex_Male,smoker_No,smoker_Yes,time_Dinner,time_Lunch
0,1.0,0.0,1.0,0.0,1.0,0.0
1,0.0,1.0,1.0,0.0,1.0,0.0
2,0.0,1.0,1.0,0.0,1.0,0.0
3,0.0,1.0,1.0,0.0,1.0,0.0
4,1.0,0.0,1.0,0.0,1.0,0.0
...,...,...,...,...,...,...
239,0.0,1.0,1.0,0.0,1.0,0.0
240,1.0,0.0,0.0,1.0,1.0,0.0
241,0.0,1.0,0.0,1.0,1.0,0.0
242,0.0,1.0,1.0,0.0,1.0,0.0


In [43]:
df_new = pd.concat([df[['sex','smoker','time']],df_encoded],axis =1)
df_new.head()

Unnamed: 0,sex,smoker,time,sex_Female,sex_Male,smoker_No,smoker_Yes,time_Dinner,time_Lunch
0,Female,No,Dinner,1.0,0.0,1.0,0.0,1.0,0.0
1,Male,No,Dinner,0.0,1.0,1.0,0.0,1.0,0.0
2,Male,No,Dinner,0.0,1.0,1.0,0.0,1.0,0.0
3,Male,No,Dinner,0.0,1.0,1.0,0.0,1.0,0.0
4,Female,No,Dinner,1.0,0.0,1.0,0.0,1.0,0.0


In [44]:
df_3 = df_new.iloc[0:9, [0,3,4,1,5,6,2,7,8]]

In [45]:
df_3.head()

Unnamed: 0,sex,sex_Female,sex_Male,smoker,smoker_No,smoker_Yes,time,time_Dinner,time_Lunch
0,Female,1.0,0.0,No,1.0,0.0,Dinner,1.0,0.0
1,Male,0.0,1.0,No,1.0,0.0,Dinner,1.0,0.0
2,Male,0.0,1.0,No,1.0,0.0,Dinner,1.0,0.0
3,Male,0.0,1.0,No,1.0,0.0,Dinner,1.0,0.0
4,Female,1.0,0.0,No,1.0,0.0,Dinner,1.0,0.0
