# AIF360 Fairness Evaluation for Spotify Baseline Recommender

This script analyzes fairness in a popularity-based music recommender system using the AIF360 toolkit. The dataset is derived from Spotify playlists with known artist gender labels.

## Step-by-Step Explanation

### 1. **Data Loading**

The CSV file `spotify_tracks_with_gender_filtered.csv` is loaded. It includes playlist track entries with `track_uri` and `artist_gender`.


In [2]:
# AIF360 Baseline Fairness Analysis for Spotify Recommender

import pandas as pd
from aif360.datasets import BinaryLabelDataset
from aif360.metrics import BinaryLabelDatasetMetric
from aif360.algorithms.preprocessing import Reweighing

# 1. Load the dataset
tracks_df = pd.read_csv("spotify_tracks_with_gender_filtered.csv")



### 2. **Track Popularity Calculation**

Popularity is determined by counting how often each track appears across all playlists.

```python
popularity_df = tracks_df.groupby("track_uri").size().reset_index(name="play_count")
```

Then, we join this with gender information:

```python
track_gender_df = tracks_df.drop_duplicates("track_uri")[["track_uri", "artist_gender"]]
popular_tracks = popularity_df.merge(track_gender_df, on="track_uri")
```


### 3. **Top-N Recommendations**

We sort by popularity and select the top 100 tracks as the recommender's output.

In [3]:
# 2. Create popularity measure (count appearances in playlists)
popularity_df = tracks_df.groupby("track_uri").size().reset_index(name="play_count")
track_gender_df = tracks_df.drop_duplicates("track_uri")[["track_uri", "artist_gender"]]
popular_tracks = popularity_df.merge(track_gender_df, on="track_uri")

# 3. Recommend top-N tracks
N = 100
popular_tracks_sorted = popular_tracks.sort_values(by="play_count", ascending=False).reset_index(drop=True)
recommendations = popular_tracks_sorted.head(N).copy()

### 4. **Labeling for Fairness Analysis**

A binary column `recommended` is added: 1 if the track is in the top 100, 0 otherwise.


In [5]:
# 4. Label dataset for fairness (1 if in top N, else 0)
track_gender_df["recommended"] = track_gender_df["track_uri"].isin(recommendations["track_uri"]).astype(int)


### 5. **Filtering and Encoding Gender**

Only male/female artists are retained. Gender is encoded numerically:

* Female → 1 (unprivileged)
* Male → 0 (privileged)

Only numeric columns are kept to avoid AIF360 parsing issues:


In [8]:
# 5. Keep only male/female for binary fairness analysis
gender_filtered = track_gender_df[track_gender_df["artist_gender"].isin(["male", "female"])]
gender_map = {"female": 1, "male": 0}
gender_filtered["gender_binary"] = gender_filtered["artist_gender"].map(gender_map)


# Remove non-numeric columns before passing to AIF360
aif_input_df = gender_filtered[["recommended", "gender_binary"]].copy()



A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  gender_filtered["gender_binary"] = gender_filtered["artist_gender"].map(gender_map)


### 6. **Conversion to AIF360 Dataset Format**

We convert to AIF360's `BinaryLabelDataset`, specifying:

* Label: `recommended`
* Protected attribute: `gender_binary`
* Favorable label: 1 (recommended)
* Unfavorable label: 0 (not recommended)

In [9]:
# 6. Convert to BinaryLabelDataset
aif_data = BinaryLabelDataset(
    df=aif_input_df,
    label_names=["recommended"],
    protected_attribute_names=["gender_binary"],
    favorable_label=1,
    unfavorable_label=0
)


### 7. **Fairness Metrics Calculation**

We calculate fairness metrics:

* **Statistical Parity Difference**: Difference in recommendation rates between genders.
* **Disparate Impact**: Ratio of recommendation rates.


In [10]:

# 7. Compute fairness metrics
metric_orig = BinaryLabelDatasetMetric(
    aif_data,
    privileged_groups=[{"gender_binary": 0}],
    unprivileged_groups=[{"gender_binary": 1}]
)

print("\n--- Fairness Metrics on Baseline Recommender ---")
print("Statistical parity difference:", metric_orig.statistical_parity_difference())
print("Disparate impact:", metric_orig.disparate_impact())




--- Fairness Metrics on Baseline Recommender ---
Statistical parity difference: -0.0007922598070604304
Disparate impact: 0.25370867405252645


### 8. **Bias Mitigation with Reweighing**
A preprocessing technique that adjusts instance weights to reduce bias before training or analysis.

In [12]:
# 8. (Optional) Apply reweighing
rw = Reweighing(
    privileged_groups=[{"gender_binary": 0}],
    unprivileged_groups=[{"gender_binary": 1}]
)
rw.fit(aif_data)
aif_data_rw = rw.transform(aif_data)

metric_rw = BinaryLabelDatasetMetric(
    aif_data_rw,
    privileged_groups=[{"gender_binary": 0}],
    unprivileged_groups=[{"gender_binary": 1}]
)

print("\n--- Fairness Metrics After Reweighing ---")
print("Statistical parity difference:", metric_rw.statistical_parity_difference())
print("Disparate impact:", metric_rw.disparate_impact())



--- Fairness Metrics After Reweighing ---
Statistical parity difference: 1.0842021724855044e-19
Disparate impact: 1.0000000000000002


# Fairness Metrics Comparison Table

| Metric                        | Baseline Recommender | After Reweighing |
| ----------------------------- | -------------------- | ---------------- |
| Statistical Parity Difference | -0.00079             | \~0.00000        |
| Disparate Impact              | 0.254                | 1.000            |

### Notes:

* **Statistical Parity Difference (SPD)** closer to 0 indicates more fair treatment between groups.
* **Disparate Impact (DI)** ideally should be close to 1. Values < 0.8 or > 1.25 usually indicate potential bias.
* The reweighing technique effectively neutralized bias based on these fairness metrics.
