### Target Guided Ordinal Encoding 

It is a technique used to encode categorical variables based on their relationship with the target variable. This encoding technique is useful when we have categorical variable with a large number of unique categories, and we want to use this variable as a feature in our machine learning model.

In target guided ordinal encoding , we replace each category in the categorical variable with a numerical value based on the mean or median of the target variable for that category. This creates a monotonic relationship between the categorical variable and the target variable, which can improve the predictive power of our model.

In [2]:
import pandas as pd

## Create a sample dataframe with the categorical variable and a target variable
df = pd.DataFrame({
    'city' : ['New York', 'London', 'Paris', 'Tokyo', 'New York', 'Paris'],
    'price' : [200, 150, 300, 250, 180, 320]
})

In [3]:
df

Unnamed: 0,city,price
0,New York,200
1,London,150
2,Paris,300
3,Tokyo,250
4,New York,180
5,Paris,320


In [5]:
mean_price = df.groupby('city')['price'].mean().to_dict()

In [6]:
mean_price

{'London': 150.0, 'New York': 190.0, 'Paris': 310.0, 'Tokyo': 250.0}

In [7]:
df['city_encoded'] = df['city'].map(mean_price)

In [8]:
df

Unnamed: 0,city,price,city_encoded
0,New York,200,190.0
1,London,150,150.0
2,Paris,300,310.0
3,Tokyo,250,250.0
4,New York,180,190.0
5,Paris,320,310.0


In [9]:
df[['price', 'city_encoded']]

Unnamed: 0,price,city_encoded
0,200,190.0
1,150,150.0
2,300,310.0
3,250,250.0
4,180,190.0
5,320,310.0


#### Task : Convert time wrt the total_bill using target guided encoding for the tips dataset

In [11]:
import seaborn as sns
df = sns.load_dataset('tips')

In [12]:
df.head()

Unnamed: 0,total_bill,tip,sex,smoker,day,time,size
0,16.99,1.01,Female,No,Sun,Dinner,2
1,10.34,1.66,Male,No,Sun,Dinner,3
2,21.01,3.5,Male,No,Sun,Dinner,3
3,23.68,3.31,Male,No,Sun,Dinner,2
4,24.59,3.61,Female,No,Sun,Dinner,4


In [13]:
mean_time = df.groupby('time')['total_bill'].mean()
mean_time

  mean_time = df.groupby('time')['total_bill'].mean()


time
Lunch     17.168676
Dinner    20.797159
Name: total_bill, dtype: float64

In [18]:
df['time_target_encoded'] = df['time'].map(mean_time)
df[['time', 'total_bill', 'time_target_encoded']]

Unnamed: 0,time,total_bill,time_target_encoded
0,Dinner,16.99,20.797159
1,Dinner,10.34,20.797159
2,Dinner,21.01,20.797159
3,Dinner,23.68,20.797159
4,Dinner,24.59,20.797159
...,...,...,...
239,Dinner,29.03,20.797159
240,Dinner,27.18,20.797159
241,Dinner,22.67,20.797159
242,Dinner,17.82,20.797159
