## Target Guided Ordinal Encoding

Target Guided Ordinal Encoding replaces each category with a number based on the **mean (or median) of the target** for that category.  
It is useful when a categorical feature has **many unique categories**.

This creates an ordered (monotonic) relationship between the category and the target, which can help improve model performance.

**Note:** It may cause **target leakage** if not applied carefully (use only training data to compute the means).


In [1]:
import pandas as pd 

# sample dataframe with categorical variable and target variable
df = pd.DataFrame({
    'city': ['New York', 'London', 'Paris', 'Tokyo', 'New York', 'Paris'],
    'price': [200, 150, 300, 250, 180, 320]
})

df

Unnamed: 0,city,price
0,New York,200
1,London,150
2,Paris,300
3,Tokyo,250
4,New York,180
5,Paris,320


In [5]:
mean_price = df.groupby('city')['price'].mean().to_dict()

mean_price

{'London': 150.0, 'New York': 190.0, 'Paris': 310.0, 'Tokyo': 250.0}

In [8]:
df['city_encoded'] = df['city'].map(mean_price)

In [10]:
df[['city', 'city_encoded']]

Unnamed: 0,city,city_encoded
0,New York,190.0
1,London,150.0
2,Paris,310.0
3,Tokyo,250.0
4,New York,190.0
5,Paris,310.0


In [1]:
import seaborn as sns

tips = sns.load_dataset('tips')
tips

target_map = tips.groupby('time')['total_bill'].mean()

tips['time_encoded'] = tips['time'].map(target_map)

tips.head(10)

  target_map = tips.groupby('time')['total_bill'].mean()


Unnamed: 0,total_bill,tip,sex,smoker,day,time,size,time_encoded
0,16.99,1.01,Female,No,Sun,Dinner,2,20.797159
1,10.34,1.66,Male,No,Sun,Dinner,3,20.797159
2,21.01,3.5,Male,No,Sun,Dinner,3,20.797159
3,23.68,3.31,Male,No,Sun,Dinner,2,20.797159
4,24.59,3.61,Female,No,Sun,Dinner,4,20.797159
5,25.29,4.71,Male,No,Sun,Dinner,4,20.797159
6,8.77,2.0,Male,No,Sun,Dinner,2,20.797159
7,26.88,3.12,Male,No,Sun,Dinner,4,20.797159
8,15.04,1.96,Male,No,Sun,Dinner,2,20.797159
9,14.78,3.23,Male,No,Sun,Dinner,2,20.797159
