# Mean/Target Encoding
---

#### Import the necessary libraries

In [1]:
import pandas as pd
from category_encoders import TargetEncoder

#### Create a sample dataset

In [2]:
data = {'color': ['red', 'green', 'blue', 'red', 'green'],
        'size': ['small', 'medium', 'large', 'medium', 'large'],
        'price': [10, 12, 15, 14, 16]}
df = pd.DataFrame(data)
df

Unnamed: 0,color,size,price
0,red,small,10
1,green,medium,12
2,blue,large,15
3,red,medium,14
4,green,large,16


In [3]:
12+14 / 2

19.0

Let's say for example that the **features** are the `color` and `size` and the **target** is the `price`.

Now we will compute for the mean price of red colors:

In [4]:
# Mean price of red colors
df.loc[df['color'] == 'red']['price'].mean()

12.0

To double check, compute the mean price of red colors manually just by looking the data:

In [5]:
(10 + 14) / 2

12.0

---

# Performing the encoding
Initialize the TargetEncoder

In [6]:
encoder = TargetEncoder(cols=['color', 'size'], return_df=True)

#### Fit the encoder on the dataset

In [7]:
encoder.fit(df, df['price'])

#### Transform the dataset

In [8]:
encoded_df = encoder.transform(df)

#### Display the encoded dataset

In [9]:
encoded_df

Unnamed: 0,color,size,price
0,13.201409,12.957631,10
1,13.485111,13.34326,12
2,13.608174,13.697887,15
3,13.201409,13.34326,14
4,13.485111,13.697887,16


Note that probabilities are applied to the categorical features.
>For the case of categorical target: features are replaced with a blend of posterior probability of the target given particular categorical value and the prior probability of the target over all the training data.

It is not what I expected it to be becuase further computations are applied.

---

### Mean Encoder as a Custom Transformer
Without using blend of probabilities

In [10]:
from sklearn.base import BaseEstimator, TransformerMixin
import pandas as pd

class MeanEncoder(BaseEstimator, TransformerMixin):
    def __init__(self, cols, target_col):
        self.cols = cols
        self.target_col = target_col
    
    def fit(self, X, y=None):
        self.means = {}
        for col in self.cols:
            means = X.groupby(col)[self.target_col].mean()
            self.means[col] = means
        return self
    
    def transform(self, X):
        for col in self.cols:
            X[col] = X[col].map(self.means[col])
        return X

# Create a sample dataset
data = {'color': ['red', 'green', 'blue', 'red', 'green'],
        'size': ['small', 'medium', 'large', 'medium', 'large'],
        'price': [10, 12, 15, 14, 16]}
df = pd.DataFrame(data)

# Initialize the encoder
encoder = MeanEncoder(cols=['color', 'size'], target_col='price')

# Fit the encoder on the dataset
encoder.fit(df)

# Transform the dataset
encoded_df = encoder.transform(df)

# Display the encoded dataset
print(encoded_df)

   color  size  price
0   12.0  10.0     10
1   14.0  13.0     12
2   15.0  15.5     15
3   12.0  13.0     14
4   14.0  15.5     16
