# Evaluating Bias in Movie Recommendations
## A Simple Diagnostic Offline Evaluation Example

This notebook demonstrates how to perform a diagnostic offline evaluation to identify potential bias in movie recommendation systems. We'll look at factors like:
- Blockbuster vs. indie representation across age groups
- Genre diversity in recommendations
- Diverse cast representation by subscription type

In [None]:
# Import necessary libraries
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns

# Set the style for our visualizations
plt.style.use('ggplot')
sns.set_palette("viridis")

## 1. Create Synthetic Data

First, we'll generate synthetic user and movie data to illustrate our example.

In [None]:
# Set random seed for reproducibility
np.random.seed(42)

# Generate user data with demographics
num_users = 500
user_data = {
    'user_id': range(1, num_users + 1),
    'age_group': np.random.choice(['18-24', '25-34', '35-44', '45-54', '55+'], num_users),
    'subscription_type': np.random.choice(['free', 'premium'], num_users, p=[0.7, 0.3]),
    'activity_level': np.random.choice(['low', 'medium', 'high'], num_users, p=[0.3, 0.5, 0.2])
}
users_df = pd.DataFrame(user_data)

# Display the first few rows of user data
print(f"Generated {num_users} users with demographic information:")
users_df.head()

In [None]:
# Generate movie data
num_movies = 200
genres = ['Action', 'Comedy', 'Drama', 'SciFi', 'Romance', 'Horror', 'Documentary', 'Indie']
movie_data = {
    'movie_id': range(1, num_movies + 1),
    'title': [f'Movie {i}' for i in range(1, num_movies + 1)],
    'genre': np.random.choice(genres, num_movies),
    'release_year': np.random.randint(1990, 2024, num_movies),
    'is_blockbuster': np.random.choice([0, 1], num_movies, p=[0.7, 0.3]),
    'diverse_cast': np.random.choice([0, 1], num_movies, p=[0.6, 0.4])
}
movies_df = pd.DataFrame(movie_data)

# Display the first few rows of movie data
print(f"Generated {num_movies} movies with metadata:")
movies_df.head()

## 2. Simulate a Recommendation Model's Output

Now we'll create a function that simulates a movie recommendation algorithm with deliberately introduced biases:

In [None]:
def generate_recommendations(user_row, movies_df):
    """Simulate movie recommendations with built-in bias for demonstration"""
    age_group = user_row['age_group']
    subscription = user_row['subscription_type']
    
    # Introduce some bias in our "model"
    if age_group in ['18-24', '25-34']:
        # Younger users get more blockbusters
        blockbuster_prob = 0.7
        indie_prob = 0.1
    else:
        # Older users get more balanced recommendations
        blockbuster_prob = 0.4
        indie_prob = 0.3
    
    # Premium users get more diverse cast recommendations
    diverse_prob = 0.6 if subscription == 'premium' else 0.2
    
    # Sample movies based on these biases
    blockbuster_mask = movies_df['is_blockbuster'] == 1
    indie_mask = (movies_df['genre'] == 'Indie') & (movies_df['is_blockbuster'] == 0)
    diverse_mask = movies_df['diverse_cast'] == 1
    
    # Create a probability distribution for selection
    probs = np.ones(len(movies_df)) * 0.1  # base probability
    probs[blockbuster_mask] = blockbuster_prob
    probs[indie_mask] = indie_prob
    probs[diverse_mask] = diverse_prob
    
    # Normalize probabilities
    probs = probs / probs.sum()
    
    # Sample movies without replacement
    recommended_indices = np.random.choice(
        movies_df.index, 
        size=10, 
        replace=False, 
        p=probs
    )
    
    return movies_df.iloc[recommended_indices]['movie_id'].tolist()

In [None]:
# Generate recommendations for all users
recommendations = []
for _, user in users_df.iterrows():
    user_recs = generate_recommendations(user, movies_df)
    for rank, movie_id in enumerate(user_recs, 1):
        recommendations.append({
            'user_id': user['user_id'],
            'movie_id': movie_id,
            'rank': rank
        })

recommendations_df = pd.DataFrame(recommendations)

# Display a sample of recommendations
print(f"Generated {len(recommendations)} recommendations:")
recommendations_df.head(10)

## 3. Diagnostic Evaluation: Analyzing Bias in Recommendations

Now we'll join the recommendation data with user and movie metadata to perform our diagnostic evaluation.

In [None]:
# Join with user and movie data to analyze recommendations
analysis_df = recommendations_df.merge(users_df, on='user_id')
analysis_df = analysis_df.merge(movies_df, on='movie_id')

# Display the combined data
analysis_df.head()

### 3.1 Analyzing Blockbuster vs. Indie Representation by Age Group

In [None]:
# Calculate the proportion of blockbuster movies recommended to each age group
blockbuster_by_age = analysis_df.groupby('age_group')['is_blockbuster'].mean().reset_index()
blockbuster_by_age.columns = ['age_group', 'blockbuster_ratio']

blockbuster_by_age

### 3.2 Analyzing Genre Diversity by Age Group

In [None]:
# Calculate the distribution of genres within each age group's recommendations
genre_diversity = analysis_df.groupby(['age_group', 'genre']).size().reset_index(name='count')
genre_pivot = genre_diversity.pivot(index='age_group', columns='genre', values='count').fillna(0)
genre_pivot = genre_pivot.div(genre_pivot.sum(axis=1), axis=0)  # Normalize by row to get proportions

genre_pivot

### 3.3 Analyzing Diverse Cast Representation by Subscription Type

In [None]:
# Calculate the proportion of diverse cast movies recommended to each subscription type
diverse_by_subscription = analysis_df.groupby('subscription_type')['diverse_cast'].mean().reset_index()
diverse_by_subscription.columns = ['subscription_type', 'diverse_cast_ratio']

diverse_by_subscription

## 4. Visualizing the Results

Now we'll create visualizations to better understand the patterns and biases in our recommendations.

In [None]:
# 4.1 Blockbuster Ratio by Age Group
plt.figure(figsize=(10, 6))
sns.barplot(x='age_group', y='blockbuster_ratio', data=blockbuster_by_age)
plt.title('Blockbuster Ratio in Recommendations by Age Group', fontsize=14)
plt.xlabel('Age Group', fontsize=12)
plt.ylabel('Proportion of Blockbuster Movies', fontsize=12)
plt.ylim(0, 1)
plt.grid(True, alpha=0.3)
plt.xticks(rotation=0)
plt.tight_layout()
plt.show()

In [None]:
# 4.2 Genre Distribution Heatmap by Age Group
plt.figure(figsize=(12, 8))
sns.heatmap(genre_pivot, annot=True, cmap='viridis', fmt='.2f', cbar_kws={'label': 'Proportion of Recommendations'})
plt.title('Genre Distribution in Recommendations by Age Group', fontsize=14)
plt.ylabel('Age Group', fontsize=12)
plt.xlabel('Genre', fontsize=12)
plt.tight_layout()
plt.show()

In [None]:
# 4.3 Diverse Cast Ratio by Subscription Type
plt.figure(figsize=(8, 5))
sns.barplot(x='subscription_type', y='diverse_cast_ratio', data=diverse_by_subscription)
plt.title('Diverse Cast Representation by Subscription Type', fontsize=14)
plt.xlabel('Subscription Type', fontsize=12)
plt.ylabel('Proportion of Movies with Diverse Cast', fontsize=12)
plt.ylim(0, 1)
plt.grid(True, alpha=0.3)
plt.tight_layout()
plt.show()

## 5. Calculate Fairness Metrics

Let's calculate some standard fairness metrics to quantify the biases we've observed.

In [None]:
# 5.1 Calculate Demographic Parity for blockbuster recommendations
# Demographic parity measures if different groups receive similar rates of positive outcomes
# Here "positive" means getting recommended a blockbuster movie

# Reference group (e.g., '18-24' age group)
reference_group = '18-24'
reference_rate = blockbuster_by_age[blockbuster_by_age['age_group'] == reference_group]['blockbuster_ratio'].values[0]

# Calculate disparity ratio for each group compared to reference
blockbuster_by_age['disparity_ratio'] = blockbuster_by_age['blockbuster_ratio'] / reference_rate
blockbuster_by_age['absolute_difference'] = abs(blockbuster_by_age['blockbuster_ratio'] - reference_rate)

print("\nDemographic Parity Analysis for Blockbuster Recommendations:")
print(blockbuster_by_age[['age_group', 'blockbuster_ratio', 'disparity_ratio', 'absolute_difference']])
print("\nInterpretation: Values closer to 1.0 for disparity ratio indicate better demographic parity.")

In [None]:
# 5.2 Calculate Genre Diversity Score
# Higher values indicate more evenly distributed genres in recommendations
genre_entropy = -(genre_pivot * np.log2(genre_pivot + 1e-10)).sum(axis=1)
max_entropy = np.log2(len(genres))  # Maximum possible entropy (if all genres equally likely)
genre_diversity_score = genre_entropy / max_entropy  # Normalize to [0,1]

print("\nGenre Diversity Score by Age Group (higher is more diverse):")
for age, score in genre_diversity_score.items():
    print(f"{age}: {score:.4f}")

## 6. Overall Bias Assessment Report

Let's summarize our findings and provide recommendations for addressing the biases we've identified.

In [None]:
print("\n===== Bias Assessment Summary =====")
print("\nBlockbuster Bias:")
if blockbuster_by_age['absolute_difference'].max() > 0.2:
    print("⚠️ Significant bias detected in blockbuster recommendations across age groups.")
    most_biased = blockbuster_by_age.loc[blockbuster_by_age['absolute_difference'].idxmax()]
    print(f"   Largest disparity: {most_biased['age_group']} group with {most_biased['absolute_difference']:.2f} absolute difference.")
else:
    print("✓ Blockbuster recommendations are relatively balanced across age groups.")

print("\nGenre Diversity:")
if genre_diversity_score.min() < 0.6:
    print(f"⚠️ Low genre diversity detected for {genre_diversity_score.idxmin()} age group.")
else:
    print("✓ Genre diversity is satisfactory across all age groups.")

print("\nDiverse Cast Representation:")
diverse_diff = diverse_by_subscription['diverse_cast_ratio'].max() - diverse_by_subscription['diverse_cast_ratio'].min()
if diverse_diff > 0.2:
    print("⚠️ Significant disparity in diverse cast recommendations between subscription types.")
    print(f"   The difference is {diverse_diff:.2f} between subscription types.")
else:
    print("✓ Diverse cast recommendations are balanced across subscription types.")

print("\n=================================")
print("Recommendation: Based on this diagnostic evaluation, the recommendation algorithm should be adjusted to reduce age-based bias in blockbuster recommendations and ensure more equitable genre distribution across all user groups.")

## Conclusion

In this notebook, we've performed a diagnostic offline evaluation of a movie recommendation system to identify potential biases. We found:

1. **Age-Based Bias**: Younger users receive significantly more blockbuster recommendations than older users.
2. **Subscription-Based Bias**: Premium subscribers receive more recommendations featuring diverse casts.
3. **Genre Distribution**: There are differences in genre diversity across age groups.

These findings highlight the importance of diagnostic evaluations in identifying biases before deploying recommendation systems to real users. By addressing these biases, we can create more fair and equitable recommendation systems that better serve all user groups.