## Target Guided Ordinal Encoding


It is a technique used to encode categorical variables based on their relationship with the target variable. This encoding technique is useful when we have a categorical variable with a large number of unique categories, and we want to use this variable as a feature in our machine learning model.

In Target Guided Ordinal Encoding, we replace each category in the categorical variable with a numerical value based on the mean or median of the target variable for that category. This creates a monotonic relationship between the categorical variable and the target variable, which can improve the predictive power of our model.

In [3]:
import pandas as pd 
df = pd.DataFrame({'city':['kathmandu', 'pokhara', 'butwal', 'butwal', 'kathmandu', 'biratnagar'],
                   'price':[200, 400, 600, 500, 250, 300]})
df

Unnamed: 0,city,price
0,kathmandu,200
1,pokhara,400
2,butwal,600
3,butwal,500
4,kathmandu,250
5,biratnagar,300


In [8]:
mean_price=df.groupby('city')['price'].mean()
mean_price

city
biratnagar    300.0
butwal        550.0
kathmandu     225.0
pokhara       400.0
Name: price, dtype: float64

In [11]:
df_encoded= df['city'].map(mean_price)
df['city_encoded']=df_encoded
df

Unnamed: 0,city,price,city_encoded
0,kathmandu,200,225.0
1,pokhara,400,400.0
2,butwal,600,550.0
3,butwal,500,550.0
4,kathmandu,250,225.0
5,biratnagar,300,300.0


In [13]:
df[['price','city_encoded']]


Unnamed: 0,price,city_encoded
0,200,225.0
1,400,400.0
2,600,550.0
3,500,550.0
4,250,225.0
5,300,300.0


In [14]:
#example
import seaborn as sns
df= sns.load_dataset('tips'
                     )
df.head()

Unnamed: 0,total_bill,tip,sex,smoker,day,time,size
0,16.99,1.01,Female,No,Sun,Dinner,2
1,10.34,1.66,Male,No,Sun,Dinner,3
2,21.01,3.5,Male,No,Sun,Dinner,3
3,23.68,3.31,Male,No,Sun,Dinner,2
4,24.59,3.61,Female,No,Sun,Dinner,4


In [15]:
df['time'].value_counts()

time
Dinner    176
Lunch      68
Name: count, dtype: int64

In [33]:
time_encoded= df.groupby('time')['total_bill'].mean()
time_encoded

  time_encoded= df.groupby('time')['total_bill'].mean()


time
Lunch     17.168676
Dinner    20.797159
Name: total_bill, dtype: float64

In [34]:
df['time_encoded']=df['time'].map(time_encoded)
df.head()

Unnamed: 0,total_bill,tip,sex,smoker,day,time,size,time_encoded
0,16.99,1.01,Female,No,Sun,Dinner,2,20.797159
1,10.34,1.66,Male,No,Sun,Dinner,3,20.797159
2,21.01,3.5,Male,No,Sun,Dinner,3,20.797159
3,23.68,3.31,Male,No,Sun,Dinner,2,20.797159
4,24.59,3.61,Female,No,Sun,Dinner,4,20.797159


In [35]:
df[['time_encoded','total_bill']]

Unnamed: 0,time_encoded,total_bill
0,20.797159,16.99
1,20.797159,10.34
2,20.797159,21.01
3,20.797159,23.68
4,20.797159,24.59
...,...,...
239,20.797159,29.03
240,20.797159,27.18
241,20.797159,22.67
242,20.797159,17.82
