# Mean Target Encoding

<img src = 'https://imgur.com/Hrm6Yz0.png'>

## We have already worked with Label encoding, One hot encoding, Frequency encoding, Binary Encoding and Hash Encoding. You must be wondering why there is need of one ore encoding technique.

###All those techniques that we have learnt till now  are very well known techniques which have been usually used in feature engineering to improve the accuracy of a prediction when there are categorical features in a dataset 

- All the techniques we have learnt have some ups and downs. We use **one hot encoding** when there are **limited** number of categorical variable is limited in feature. However, it might **not be useful** when the **number of categorical variables** increases in the feature because it results in increasing the dimension of the dataset.


- **Label encoding** has its limitations too since it gives **random order** to the features. And furthermore there's **no correlation b/w label encoded feature and target variable.**

## What is Mean Target Encoding and how does it work ?

- In Mean Encoding we take the number of labels into account along with the target variable.

- Leet's look at the dummy dataset so you will understand it properly.

In [11]:
import pandas as pd
import numpy as np

data = pd.DataFrame({'Feature' : ['Mumbai','Delhi','Mumbai','Mumbai','Mumbai','Delhi','Mumbai','Mumbai','Mumbai','Delhi'],
                     'Target' : [0,0,1,1,1,0,1,1,0,1]})
data

Unnamed: 0,Feature,Target
0,Mumbai,0
1,Delhi,0
2,Mumbai,1
3,Mumbai,1
4,Mumbai,1
5,Delhi,0
6,Mumbai,1
7,Mumbai,1
8,Mumbai,0
9,Delhi,1


Out of total 9 Instances 

|Label|Number of time appeared in Feature Column| No. of 1's in Target|No. of 0's in Target 
----|----|----|----|
|  Mumbai  |    7| 5| 2|
|  Delhi   |    3| 1| 2|

### Here we have 2 categories in "Feature" column, namely "Mumbai" and "Delhi" and we have Target 0 and 1.

- If we do **label encoding** here, we will simply give **"Mumbai" as 0 and "Delhi" as 1.**


- But in **mean encoding**

   "Mumbai" = [Number of true targets under the label Mumbai/ Total Number of targets under the label Mumbai ] which comes out to be 5/7 = 0.71 approax
   
   "Delhi" =  [Number of true targets under the label Delhi/ Total Number of targets under the label Delhi ] = 1/3 = 0.33

### After doing target encoding "Mumbai" will be replaced my 0.71 and "Delhi" by 0.33

In [12]:
# check the target mean for both the categories
data.groupby('Feature')['Target'].mean().round(2)

Feature
Delhi     0.33
Mumbai    0.71
Name: Target, dtype: float64

## Now we can map and replace Delhi with 0.33 and Mumbai with 0.71

In [13]:
data['Feature'] = data['Feature'].map({'Delhi' : 0.33, 'Mumbai': 0.71})

In [14]:
data

Unnamed: 0,Feature,Target
0,0.71,0
1,0.33,0
2,0.71,1
3,0.71,1
4,0.71,1
5,0.33,0
6,0.71,1
7,0.71,1
8,0.71,0
9,0.33,1


## There must be many questions in your mind 

### When shall we use Mean target Encoding?

- We use **mean target encoding** when the **cardinality is high** in our dataset (When there are **large number of features**).


- As if we apply **one hot encoding, binary encoding or hashing** it will **increase number of features**.


- And if we apply **label encoding, frequency encoding** there **won't be any correlation** b/w feature and target variable. 

## Can we use Mean target encoding for regression problems as well ?
- We use **target encoding** for **classification** as well as **regression** problems in the same way as we use for classification problems.

### Is there any drawback of using Mean target encoding ?

- Well yes, there's a big drawback of **overfitting** using **mean target encoding** and it should be taken care of using various regularizations and cross validation techniques.

### I want you guys to use google and kaggle to find out how we deal with this drawback of overfitting and explain it in the discussion channel.

       Mean target encoding is a robust feature engineering method, there is no guarantee that it always the best method to improve accuracy. 
       We need to apply diverse feature engineerings to see which one give us better performance.

                                                         Happy Learning !