# One Hot Encoding 

#### It is technique used to convert categorical data into a binary matrix (or a series of binary vectors), where each category is represented as separate column, and a 1 or 0 is used to indicate the presence or absence of a category in a given observation.

#### This encoding is espically useful for nominal (categorical) data where no ordinal relationship exists between the categories. One-Hot Encoding eliminates the risk of introducing unintended ordinal relationships (like with label necoding) because each category get its own unique binary column.

#### Key characteristics of One-Hot encoding :
* binary representation
* no assumed order 
* used for nominal data

#### example of One-Hot encoding
###### ['Red', 'Blue', 'Green', 'Blue', 'Red']
| Color | Red | Blue | Green |
|-------|-----|------|-------|
| Red   |  1  |   0  |   0   |
| Blue  |  0  |   1  |   0   |
| Green |  0  |   0  |   1   |
| Blue  |  0  |   1  |   0   |
| Red   |  1  |   0  |   0   |


In [1]:
import pandas as pd 
import numpy as np

df=pd.read_csv(r'C:\Users\Lenovo\Downloads\nominal_data.csv')
df.head()

Unnamed: 0,Customer_ID,Favorite_Color,Preferred_Store,Subscription_Type,Feedback
0,101,Red,Store A,Basic,Good
1,102,Blue,Store B,Premium,Bad
2,103,Green,Store C,Premium,Neutral
3,104,Yellow,Store A,Basic,Good
4,105,Blue,Store B,Basic,Bad


### one hot encoding using pandas

In [3]:
pd.get_dummies(df,columns=['Favorite_Color','Preferred_Store'],drop_first=True)


Unnamed: 0,Customer_ID,Subscription_Type,Feedback,Favorite_Color_Green,Favorite_Color_Red,Favorite_Color_Yellow,Preferred_Store_Store B,Preferred_Store_Store C
0,101,Basic,Good,False,True,False,False,False
1,102,Premium,Bad,False,False,False,True,False
2,103,Premium,Neutral,True,False,False,False,True
3,104,Basic,Good,False,False,True,False,False
4,105,Basic,Bad,False,False,False,True,False
5,106,Premium,Neutral,False,True,False,False,True


In [7]:
from sklearn.model_selection import train_test_split
X_train, X_test, y_train, y_test=train_test_split(df.iloc[:,1:3],df.iloc[:,-1:],test_size=0.2,random_state=23)

## One Hot Encoding using Sklearn

In [8]:
from sklearn.preprocessing import OneHotEncoder
ohe=OneHotEncoder(drop='first',sparse_output=False)

In [9]:
X_train_new=ohe.fit_transform(X_train[['Favorite_Color','Preferred_Store']])

In [10]:
X_train_new

array([[1., 0., 0., 0., 1.],
       [0., 0., 0., 1., 0.],
       [0., 1., 0., 0., 0.],
       [0., 0., 1., 0., 0.]])