## Data Encoding

1. Nominal/OHE Encoding
2. Label and Ordinal Encoding
3. Target Guided Ordinal Encoding 

### Nominal/OHE Encoding
One hot encoding, also known as nominal encoding, is a technique used to represent categorical data as numerical data, which is more suitable for machine learning algorithms. In this technique, each category is represented as a binary vector where each bit corresponds to a unique category. For example, if we have a categorical variable "color" with three possible values (red, green, blue), we can represent it using one hot encoding as follows:

1. Red: [1, 0, 0]
2. Green: [0, 1, 0]
3. Blue: [0, 0, 1]

In [1]:
import pandas as pd
from sklearn.preprocessing import OneHotEncoder

In [3]:
## Create a simple dataframe
df = pd.DataFrame({
    'color': ['red','blue','green','green', 'red','blue']
})

In [5]:
df.head()

Unnamed: 0,color
0,red
1,blue
2,green
3,green
4,red


In [7]:
## create an instance of Onehotencoder
encoder = OneHotEncoder()

In [11]:
## perform fit and transform
encoded = encoder.fit_transform(df[['color']]).toarray()

In [13]:
import pandas as pd
encoder_df = pd.DataFrame(encoded, columns =encoder.get_feature_names_out())

In [17]:
encoder_df

Unnamed: 0,color_blue,color_green,color_red
0,0.0,0.0,1.0
1,1.0,0.0,0.0
2,0.0,1.0,0.0
3,0.0,1.0,0.0
4,0.0,0.0,1.0
5,1.0,0.0,0.0


In [23]:
## for new data
encoder.transform([['blue']]).toarray()



array([[1., 0., 0.]])

In [25]:
pd.concat([df,encoder_df],axis=1)

Unnamed: 0,color,color_blue,color_green,color_red
0,red,0.0,0.0,1.0
1,blue,1.0,0.0,0.0
2,green,0.0,1.0,0.0
3,green,0.0,1.0,0.0
4,red,0.0,0.0,1.0
5,blue,1.0,0.0,0.0


In [43]:
import seaborn as sns
df =sns.load_dataset('tips')

In [29]:
encoder = OneHotEncoder()

In [45]:
encoded = encoder.fit_transform(df[['day']]).toarray()

In [47]:
import pandas as pd
encoder_df = pd.DataFrame(encoded, columns = encoder.get_feature_names_out())

In [49]:
encoder_df

Unnamed: 0,day_Fri,day_Sat,day_Sun,day_Thur
0,0.0,0.0,1.0,0.0
1,0.0,0.0,1.0,0.0
2,0.0,0.0,1.0,0.0
3,0.0,0.0,1.0,0.0
4,0.0,0.0,1.0,0.0
...,...,...,...,...
239,0.0,1.0,0.0,0.0
240,0.0,1.0,0.0,0.0
241,0.0,1.0,0.0,0.0
242,0.0,1.0,0.0,0.0
