##### Backward Difference Encoding is a categorical encoding technique used in machine learning to represent categorical variables as numerical features. It is a contrast coding method that compares each category of a variable with the mean of the subsequent categories.

##### Here's how the Backward Difference Encoding works:

##### 1. Sort the categories of the categorical variable in a specific order. The order can be based on some logical or meaningful sequence.

##### 2. For each category, calculate the mean of the subsequent categories. In other words, take the average of all the categories that come after the current category in the sorted order.

##### 3. Create new binary features that represent the difference between each category and the mean of the subsequent categories. The new binary features indicate whether a particular category is above or below the mean of the subsequent categories. Typically, one fewer binary feature is created than the total number of categories.

##### For example, consider a categorical variable "Color" with the categories: Red, Blue, Green, and Yellow. The sorted order could be Red, Blue, Green, Yellow.

##### To apply Backward Difference Encoding:

##### 1. Calculate the mean of the subsequent categories for each category:

##### - Mean of Red: (Blue + Green + Yellow) / 3
##### - Mean of Blue: (Green + Yellow) / 2
##### - Mean of Green: (Yellow) / 1
##### - Mean of Yellow: No subsequent categories
##### 2. Create binary features based on the differences:

##### - Feature 1: Red - Mean of Red
##### - Feature 2: Blue - Mean of Blue
##### - Feature 3: Green - Mean of Green


<img src="2023-07-15 11_43_00-BackDiff Encoding.png" alt="MarineGEO circle logo" style="height: 100px; width:500px;"/>

In [4]:
import pandas as pd
import category_encoders as ce

# Create a sample dataset
data = {'Color': ['Red', 'Blue', 'Green', 'Yellow']}
df = pd.DataFrame(data)

# Apply backward difference encoding
encoder = ce.BackwardDifferenceEncoder(cols=['Color'])
df_encoded = encoder.fit_transform(df)

# Print the encoded dataframe
print(df_encoded)

   intercept  Color_0  Color_1  Color_2
0          1    -0.75     -0.5    -0.25
1          1     0.25     -0.5    -0.25
2          1     0.25      0.5    -0.25
3          1     0.25      0.5     0.75


