# **Bias Mitigation with Grid Search Reduction**
This tutorial demonstrates how to implement the "Grid search reduction" inprocessing method to enhance fairness in regression models using the `holisticai` library.

- [Traditional implementation](#traditional-implementation)
- [Pipeline implementation](#pipeline-implementation)

First, install the `holisticai` package if you haven't already:
```bash
!pip install holisticai[all]
```
Then, import the necessary libraries.

In [11]:
import numpy as np
from sklearn.linear_model import LinearRegression
from sklearn.metrics import mean_squared_error
from holisticai.datasets import load_dataset
from holisticai.bias.mitigation import GridSearchReduction
from holisticai.bias.metrics import regression_bias_metrics

np.random.seed(0)
import warnings
warnings.filterwarnings("ignore")

Load the proprocessed "Communities and Crime" dataset.

In [12]:
dataset = load_dataset('us_crime', protected_attribute="race")
dataset = dataset.train_test_split(test_size=0.2, random_state=0)
train_data = dataset['train']
test_data = dataset['test']

dataset

In [13]:
model = LinearRegression()
model.fit(train_data['X'], train_data['y'])
model

## **Bias Mitigation**
### **Traditional Implementation**
We will implement the "Grid search reduction" method, an in-processing technique to mitigate bias in the regression model.

In [14]:
model = LinearRegression()
inprocessing_model = GridSearchReduction(constraints="BoundedGroupLoss", 
                                         loss='Square', min_val=-0.1, max_val=0.1, 
                                         grid_size=50).transform_estimator(model)

inprocessing_model.fit(train_data['X'], train_data['y'], train_data['group_a'], train_data['group_b'])
inprocessing_model

In [15]:
y_pred = inprocessing_model.predict(test_data['X'])

df = regression_bias_metrics(
    test_data['group_a'],
    test_data['group_b'],
    y_pred,
    test_data['y'],
    metric_type='both'
)
df

Unnamed: 0_level_0,Value,Reference
Metric,Unnamed: 1_level_1,Unnamed: 2_level_1
Disparate Impact Q90,0.025284,1
Disparate Impact Q80,0.111974,1
Disparate Impact Q50,0.412979,1
Statistical Parity Q50,-0.725221,0
No Disparate Impact Level,0.055754,-
Average Score Difference,-0.375943,0
Average Score Ratio,0.308226,1
Z Score Difference,-2.799448,0
Max Statistical Parity,0.784808,0
Statistical Parity AUC,0.454691,0


In [16]:
grid_search_rmse = mean_squared_error(test_data['y'], y_pred, squared=False)
print("RMS error: {}".format(grid_search_rmse))

RMS error: 0.14223548828832497


### **Pipeline Implementation**
Implement the method using the pipeline.

In [17]:
from holisticai.pipeline import Pipeline

inprocessing_model = GridSearchReduction(constraints="BoundedGroupLoss", 
                                         loss='Square', min_val=-0.1, max_val=1.3, 
                                         grid_size=20).transform_estimator(model)

pipeline = Pipeline(
    steps=[
        ("bm_inprocessing", inprocessing_model),
    ]
)

fit_params = {
    "bm__group_a": train_data['group_a'], 
    "bm__group_b": train_data['group_b']
}

pipeline.fit(train_data['X'], train_data['y'], **fit_params)

predict_params = {
    "bm__group_a": test_data['group_a'],
    "bm__group_b": test_data['group_b'],
}
y_pred_pipeline = pipeline.predict(test_data['X'], **predict_params)
df_pipeline = regression_bias_metrics(
    test_data['group_a'],
    test_data['group_b'],
    y_pred,
    test_data['y'],
    metric_type='both'
)
df_pipeline

Unnamed: 0_level_0,Value,Reference
Metric,Unnamed: 1_level_1,Unnamed: 2_level_1
Disparate Impact Q90,0.025284,1
Disparate Impact Q80,0.111974,1
Disparate Impact Q50,0.412979,1
Statistical Parity Q50,-0.725221,0
No Disparate Impact Level,0.055754,-
Average Score Difference,-0.375943,0
Average Score Ratio,0.308226,1
Z Score Difference,-2.799448,0
Max Statistical Parity,0.784808,0
Statistical Parity AUC,0.454691,0


In [18]:
pipeline_rmse = mean_squared_error(test_data['y'], y_pred_pipeline, squared=False)
print("Pipeline RMSE: {}".format(pipeline_rmse))

Pipeline RMSE: 0.14977135111289078
