## Target Guided Ordinal Encoding 
It is a technique used to encode categorical variables based on their relationship with the target variable. This encoding technique is useful when we have a categorical variable with a large number of unique categories, and we want to use this variable as a feature in our machine learning model.

In Target Guided Ordinal Encoding, we replace each category in the categorical variable with a numerical value based on the mean or median of the target variable for that category. This creates a monotonic relationship between the categorical variable and the target variable, which can improve the predictive power of our model.

In [1]:
import pandas as pd

# create a sample dataframe with a categorical variable and a target variable
df = pd.DataFrame({
    'city': ['New York', 'London', 'Paris', 'Tokyo', 'New York', 'Paris'],
    'price': [200, 150, 300, 250, 180, 320]
})

In [5]:
df

Unnamed: 0,city,price
0,New York,200
1,London,150
2,Paris,300
3,Tokyo,250
4,New York,180
5,Paris,320


In [2]:
df.head()

Unnamed: 0,city,price
0,New York,200
1,London,150
2,Paris,300
3,Tokyo,250
4,New York,180


In [3]:
# calculate the mean price for each city
df.groupby('city')['price']

<pandas.core.groupby.generic.SeriesGroupBy object at 0x7fb352b0ce20>

In [4]:
df.groupby('city')['price'].mean()

city
London      150.0
New York    190.0
Paris       310.0
Tokyo       250.0
Name: price, dtype: float64

In [6]:
# replace each city with its mean price

In [7]:
# we can use mapping function for that but before using mapping function you first convert it into dictionary

In [10]:
mean_price = df.groupby('city')['price'].mean().to_dict()

In [11]:
mean_price

{'London': 150.0, 'New York': 190.0, 'Paris': 310.0, 'Tokyo': 250.0}

In [12]:
df['city_encoded'] = df['city'].map(mean_price)

In [13]:
df

Unnamed: 0,city,price,city_encoded
0,New York,200,190.0
1,London,150,150.0
2,Paris,300,310.0
3,Tokyo,250,250.0
4,New York,180,190.0
5,Paris,320,310.0


In [3]:
import seaborn as sns
df2 = sns.load_dataset('tips')
# can we encode total_bill using sex , day or time

In [5]:
df2.head()

Unnamed: 0,total_bill,tip,sex,smoker,day,time,size
0,16.99,1.01,Female,No,Sun,Dinner,2
1,10.34,1.66,Male,No,Sun,Dinner,3
2,21.01,3.5,Male,No,Sun,Dinner,3
3,23.68,3.31,Male,No,Sun,Dinner,2
4,24.59,3.61,Female,No,Sun,Dinner,4


In [16]:
mean_bill = df2.groupby('sex')['total_bill'].mean().to_dict()

In [17]:
mean_bill

{'Male': 20.74407643312102, 'Female': 18.05689655172414}

In [18]:
df2['bill_encoded'] = df2['sex'].map(mean_bill)

In [19]:
df2

Unnamed: 0,total_bill,tip,sex,smoker,day,time,size,bill_encoded
0,16.99,1.01,Female,No,Sun,Dinner,2,18.056897
1,10.34,1.66,Male,No,Sun,Dinner,3,20.744076
2,21.01,3.50,Male,No,Sun,Dinner,3,20.744076
3,23.68,3.31,Male,No,Sun,Dinner,2,20.744076
4,24.59,3.61,Female,No,Sun,Dinner,4,18.056897
...,...,...,...,...,...,...,...,...
239,29.03,5.92,Male,No,Sat,Dinner,3,20.744076
240,27.18,2.00,Female,Yes,Sat,Dinner,2,18.056897
241,22.67,2.00,Male,Yes,Sat,Dinner,2,20.744076
242,17.82,1.75,Male,No,Sat,Dinner,2,20.744076


Ques. You are analyzing a dataset with two continuous variables, "Temperature" and "Humidity", and two
categorical variables, "Weather Condition" (Sunny/Cloudy/Rainy) and "Wind Direction" (North/South/
East/West). Calculate the covariance between each pair of variables and interpret the results.

In [7]:
import pandas as pd

# Sample dataset
data = {
    "Temperature": [78, 82, 75, 70, 77],
    "Humidity": [55, 60, 62, 58, 54],
    "Weather Condition": ["Sunny", "Cloudy", "Rainy", "Sunny", "Rainy"],
    "Wind Direction": ["North", "South", "East", "West", "North"]
}

df = pd.DataFrame(data)

# Calculate means of the continuous variable within each category
mean_temperature_by_weather = df.groupby("Weather Condition")["Temperature"].mean()
mean_temperature_by_wind = df.groupby("Wind Direction")["Temperature"].mean()

mean_humidity_by_weather = df.groupby("Weather Condition")["Humidity"].mean()
mean_humidity_by_wind = df.groupby("Wind Direction")["Humidity"].mean()

In [9]:
df['Mean Temp by Weather Condition'] = df['Weather Condition'].map(mean_temperature_by_weather)
df['Mean Temp by Wind Direction'] = df['Wind Direction'].map(mean_temperature_by_wind)
df['Mean Humidity by Weather Condition'] = df['Weather Condition'].map(mean_humidity_by_weather)
df['Mean Humidity by Wind Direction'] = df['Wind Direction'].map(mean_humidity_by_wind)

In [10]:
df

Unnamed: 0,Temperature,Humidity,Weather Condition,Wind Direction,Mean Temp by Weather Condition,Mean Temp by Wind Direction,Mean Humidity by Weather Condition,Mean Humidity by Wind Direction
0,78,55,Sunny,North,74.0,77.5,56.5,54.5
1,82,60,Cloudy,South,82.0,82.0,60.0,60.0
2,75,62,Rainy,East,76.0,75.0,58.0,62.0
3,70,58,Sunny,West,74.0,70.0,56.5,58.0
4,77,54,Rainy,North,76.0,77.5,58.0,54.5
