In this notebook, we apply bias mitigation using the reweighing technique from AIF360 on our cleaned dataset. The goal is to adjust instance weights to reduce bias in the dataset.

We'll:
- Define paths to our cleaned data (`train_cleaned.csv`).
- Load the cleaned data and convert it to an AIF360 `BinaryLabelDataset`.
- Apply the reweighing algorithm to mitigate bias.
- Convert the reweighted dataset back to a pandas DataFrame and save it as `weighted_train.csv`.

Let's start by setting up the environment and defining the necessary file paths.

In [5]:
import os
import sys
import pandas as pd
from aif360.datasets import BinaryLabelDataset
from aif360.algorithms.preprocessing import Reweighing

# Define the absolute path for the cleaned data file.
# Note: In a notebook, you might use os.getcwd() if __file__ is not defined.
project_root = os.getcwd()  
cleaned_csv_path = os.path.abspath(os.path.join(project_root, "data", "train_cleaned.csv"))
print("Looking for cleaned data at:", cleaned_csv_path)

# Check if the cleaned data file exists.
if not os.path.exists(cleaned_csv_path):
    sys.exit(f"Error: Cleaned data file not found at {cleaned_csv_path}.\n"
             "Please run 'python3 src/data_cleaning.py' to generate train_cleaned.csv before running this notebook.")

# Load the cleaned data.
df = pd.read_csv(cleaned_csv_path)
print("Loaded cleaned data with shape:", df.shape)


Looking for cleaned data at: /Users/stay-c/Desktop/AI_Fairness_Project/notebooks/data/train_cleaned.csv


SystemExit: Error: Cleaned data file not found at /Users/stay-c/Desktop/AI_Fairness_Project/notebooks/data/train_cleaned.csv.
Please run 'python3 src/data_cleaning.py' to generate train_cleaned.csv before running this notebook.