# Bias Mitigation Using Reweighing

In this notebook, we apply a bias mitigation technique (Reweighing) from AIF360 on our cleaned Adult dataset. The goals are to:

- Load the cleaned dataset (`train_cleaned.csv`).
- Convert the data into an AIF360 `BinaryLabelDataset` format.
- Apply the reweighing algorithm to adjust instance weights to reduce bias.
- Convert the reweighted dataset back to a pandas DataFrame and save it as `weighted_train.csv`.

Let's begin by setting up the necessary file paths and loading our cleaned data.


In [6]:
import os
import sys
import pandas as pd
from aif360.datasets import BinaryLabelDataset
from aif360.algorithms.preprocessing import Reweighing

# Determine current working directory
current_dir = os.getcwd()
print("Current working directory:", current_dir)

# If the current directory is 'notebooks', move up one level to get the project root.
if os.path.basename(current_dir) == "notebooks":
    project_root = os.path.abspath(os.path.join(current_dir, ".."))
else:
    project_root = current_dir

print("Project root directory:", project_root)

# Construct the absolute path to the cleaned data file.
cleaned_csv_path = os.path.join(project_root, "data", "train_cleaned.csv")
print("Looking for cleaned data at:", cleaned_csv_path)

# Check if the cleaned data file exists.
if not os.path.exists(cleaned_csv_path):
    sys.exit(f"Error: Cleaned data file not found at {cleaned_csv_path}.\n"
             "Please run the data_cleaning script to generate train_cleaned.csv before proceeding.")

# Load the cleaned data.
df = pd.read_csv(cleaned_csv_path)
print("Loaded cleaned data with shape:", df.shape)


Current working directory: /Users/stay-c/Desktop/AI_Fairness_Project/notebooks
Project root directory: /Users/stay-c/Desktop/AI_Fairness_Project
Looking for cleaned data at: /Users/stay-c/Desktop/AI_Fairness_Project/data/train_cleaned.csv
Loaded cleaned data with shape: (32561, 15)


## 2. Convert DataFrame to AIF360 BinaryLabelDataset

We now convert our pandas DataFrame into an AIF360 `BinaryLabelDataset`.  
In our dataset:
- The target column is `income_binary`.
- The protected attribute is `sex`.

This conversion is essential for applying bias mitigation algorithms provided by AIF360.


In [7]:
from aif360.datasets import BinaryLabelDataset

# Convert the DataFrame to an AIF360 BinaryLabelDataset.
dataset = BinaryLabelDataset(df=df, label_names=['income_binary'], protected_attribute_names=['sex'])
print("Data converted to BinaryLabelDataset format.")


Data converted to BinaryLabelDataset format.


## 3. Apply the Reweighing Algorithm

Next, we apply the reweighing bias mitigation technique.  
- We designate `sex` = 0 as the unprivileged group and `sex` = 1 as the privileged group.
- The algorithm adjusts the weights of instances to reduce bias.


In [8]:
from aif360.algorithms.preprocessing import Reweighing

# Apply the Reweighing algorithm.
rw = Reweighing(unprivileged_groups=[{'sex': 0}], privileged_groups=[{'sex': 1}])
dataset_reweighted = rw.fit_transform(dataset)
print("Reweighing applied to dataset.")


Reweighing applied to dataset.


## 4. Convert Reweighted Dataset and Save

We now convert the reweighted AIF360 dataset back into a pandas DataFrame.  
Finally, we save this weighted dataset as `weighted_train.csv` for subsequent model training and evaluation.


In [9]:
# Convert the reweighted dataset back to a pandas DataFrame.
weighted_df, _ = dataset_reweighted.convert_to_dataframe()

# Define the path to save the weighted data.
weighted_csv_path = os.path.join(project_root, "data", "weighted_train.csv")

# Save the weighted data.
weighted_df.to_csv(weighted_csv_path, index=False)
print("Weighted data saved as:", weighted_csv_path)


Weighted data saved as: /Users/stay-c/Desktop/AI_Fairness_Project/data/weighted_train.csv


## Conclusion

In this notebook, we:
- Set up our environment and loaded the cleaned dataset.
- Converted the data into an AIF360 `BinaryLabelDataset`.
- Applied the reweighing algorithm to mitigate bias.
- Saved the weighted data as `weighted_train.csv`.

This reweighted dataset will now be used in later steps (e.g., model training and evaluation) to assess the impact of bias mitigation.
