Target-guided ordinal encoding is a method used in machine learning to convert categorical data into numerical values while considering the target variable.

### How It Works:
1. **Categorical Data**: Start with a categorical feature (e.g., "Low," "Medium," "High").
2. **Target Variable**: Identify a target variable (e.g., "Sales").
3. **Order Based on Target**: Assign numerical values based on the average of the target variable for each category.

### Example:
- **Categories**: "Low," "Medium," "High"
- **Average Sales**: 
  - Low: $100
  - Medium: $200
  - High: $300

### Encoding:
- "Low" → 1
- "Medium" → 2
- "High" → 3

Now, instead of using categories, you can use these numbers in your model, which might help improve performance since it reflects the order related to the target variable.

In [2]:
import pandas as pd 
# create a simple dataframe with categorical variable and a target variable 

df = pd.DataFrame({
    'city' : ["New York","London", 'Paris','Tokyo',"New York", "Paris"],
    'price' :[200,150,300,250,180,320]
})

In [3]:
df.head()

Unnamed: 0,city,price
0,New York,200
1,London,150
2,Paris,300
3,Tokyo,250
4,New York,180


In [6]:
mean_price= df.groupby('city')['price'].mean().to_dict()

In [7]:
df['city_encoded'] = df['city'].map(mean_price)

In [10]:
df[['price','city_encoded']]

Unnamed: 0,price,city_encoded
0,200,190.0
1,150,150.0
2,300,310.0
3,250,250.0
4,180,190.0
5,320,310.0


In [11]:
import seaborn as sns

In [13]:
tips = sns.load_dataset('tips')

In [14]:
tips

Unnamed: 0,total_bill,tip,sex,smoker,day,time,size
0,16.99,1.01,Female,No,Sun,Dinner,2
1,10.34,1.66,Male,No,Sun,Dinner,3
2,21.01,3.50,Male,No,Sun,Dinner,3
3,23.68,3.31,Male,No,Sun,Dinner,2
4,24.59,3.61,Female,No,Sun,Dinner,4
...,...,...,...,...,...,...,...
239,29.03,5.92,Male,No,Sat,Dinner,3
240,27.18,2.00,Female,Yes,Sat,Dinner,2
241,22.67,2.00,Male,Yes,Sat,Dinner,2
242,17.82,1.75,Male,No,Sat,Dinner,2


In [15]:
tips[['total_bill','time']]

Unnamed: 0,total_bill,time
0,16.99,Dinner
1,10.34,Dinner
2,21.01,Dinner
3,23.68,Dinner
4,24.59,Dinner
...,...,...
239,29.03,Dinner
240,27.18,Dinner
241,22.67,Dinner
242,17.82,Dinner


In [18]:
mean = tips.groupby('time')['total_bill'].mean().to_dict()

  mean = tips.groupby('time')['total_bill'].mean().to_dict()


In [19]:
mean

{'Lunch': 17.168676470588235, 'Dinner': 20.79715909090909}

In [23]:
tips['bill_encoded'] = tips['time'].map(mean)

In [26]:
tips[tips['time']=='Lunch']

Unnamed: 0,total_bill,tip,sex,smoker,day,time,size,bill_encoded
77,27.20,4.00,Male,No,Thur,Lunch,4,17.168676
78,22.76,3.00,Male,No,Thur,Lunch,2,17.168676
79,17.29,2.71,Male,No,Thur,Lunch,2,17.168676
80,19.44,3.00,Male,Yes,Thur,Lunch,2,17.168676
81,16.66,3.40,Male,No,Thur,Lunch,2,17.168676
...,...,...,...,...,...,...,...,...
222,8.58,1.92,Male,Yes,Fri,Lunch,1,17.168676
223,15.98,3.00,Female,No,Fri,Lunch,3,17.168676
224,13.42,1.58,Male,Yes,Fri,Lunch,2,17.168676
225,16.27,2.50,Female,Yes,Fri,Lunch,2,17.168676


In [27]:
# tips based on time 

In [31]:
tips_mean = tips.groupby('time')['tip'].mean().to_dict()

  tips_mean = tips.groupby('time')['tip'].mean().to_dict()


In [32]:
tips_mean

{'Lunch': 2.7280882352941176, 'Dinner': 3.102670454545455}

In [35]:
tips['tips_mean'] = tips['time'].map(tips_mean)

In [36]:
tips

Unnamed: 0,total_bill,tip,sex,smoker,day,time,size,bill_encoded,tips_mean
0,16.99,1.01,Female,No,Sun,Dinner,2,20.797159,3.10267
1,10.34,1.66,Male,No,Sun,Dinner,3,20.797159,3.10267
2,21.01,3.50,Male,No,Sun,Dinner,3,20.797159,3.10267
3,23.68,3.31,Male,No,Sun,Dinner,2,20.797159,3.10267
4,24.59,3.61,Female,No,Sun,Dinner,4,20.797159,3.10267
...,...,...,...,...,...,...,...,...,...
239,29.03,5.92,Male,No,Sat,Dinner,3,20.797159,3.10267
240,27.18,2.00,Female,Yes,Sat,Dinner,2,20.797159,3.10267
241,22.67,2.00,Male,Yes,Sat,Dinner,2,20.797159,3.10267
242,17.82,1.75,Male,No,Sat,Dinner,2,20.797159,3.10267


In [37]:
tips[tips['time']=='Lunch']

Unnamed: 0,total_bill,tip,sex,smoker,day,time,size,bill_encoded,tips_mean
77,27.20,4.00,Male,No,Thur,Lunch,4,17.168676,2.728088
78,22.76,3.00,Male,No,Thur,Lunch,2,17.168676,2.728088
79,17.29,2.71,Male,No,Thur,Lunch,2,17.168676,2.728088
80,19.44,3.00,Male,Yes,Thur,Lunch,2,17.168676,2.728088
81,16.66,3.40,Male,No,Thur,Lunch,2,17.168676,2.728088
...,...,...,...,...,...,...,...,...,...
222,8.58,1.92,Male,Yes,Fri,Lunch,1,17.168676,2.728088
223,15.98,3.00,Female,No,Fri,Lunch,3,17.168676,2.728088
224,13.42,1.58,Male,Yes,Fri,Lunch,2,17.168676,2.728088
225,16.27,2.50,Female,Yes,Fri,Lunch,2,17.168676,2.728088


In [44]:
tips[tips['size'] > 6].shape[0]

0