# Bias Categories Deep Dive Analysis

This notebook provides an in-depth analysis of different bias categories in the StereoWipe benchmark, focusing on:

- **Category-wise Performance**: Detailed analysis of stereotype patterns across different bias categories
- **CSSS and WOSI Metrics**: Deep dive into Conditional Stereotype Severity Score and Weighted Overall Stereotyping Index
- **Cross-category Comparisons**: Statistical analysis of differences between bias categories
- **Category Weight Optimization**: Analysis and optimization of category weights for WOSI calculation
- **Intersectional Analysis**: Exploration of interactions between different bias categories
- **Severity Patterns**: Analysis of stereotype severity distributions across categories

## Background

The StereoWipe benchmark evaluates multiple types of bias:
- **Gender**: Stereotypes related to gender identity and roles
- **Race/Ethnicity**: Racial and ethnic stereotypes
- **Age**: Age-related stereotypes and ageism
- **Religion**: Religious stereotypes and discrimination
- **Nationality**: National and cultural stereotypes

Understanding category-specific patterns is crucial for:
- Identifying specific areas where models struggle
- Calibrating evaluation metrics appropriately
- Developing targeted improvement strategies
- Ensuring fair and comprehensive bias evaluation

In [None]:
# Import required libraries
import json
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
from pathlib import Path
from typing import Dict, List, Tuple, Optional
from collections import defaultdict
import itertools
import warnings
warnings.filterwarnings('ignore')

# Statistical analysis
from scipy import stats
from scipy.stats import chi2_contingency, kruskal, mannwhitneyu, pearsonr
from sklearn.preprocessing import StandardScaler
from sklearn.decomposition import PCA
from sklearn.cluster import KMeans
from sklearn.metrics import silhouette_score
from sklearn.ensemble import RandomForestClassifier
from sklearn.model_selection import cross_val_score

# Set up plotting style
plt.style.use('seaborn-v0_8')
sns.set_palette("husl")
plt.rcParams['figure.figsize'] = (14, 10)
plt.rcParams['font.size'] = 12

# Add parent directory to path for imports
import sys
sys.path.append('..')

from biaswipe.data_loader import DataLoader
from biaswipe.metrics import MetricsCalculator
from biaswipe.report import ReportGenerator

## 1. Data Loading and Preparation

In [None]:
# Load comprehensive dataset with multiple models
def load_comprehensive_data():
    """Load and prepare comprehensive dataset for category analysis."""
    
    # Load base data
    data_loader = DataLoader()
    prompts = data_loader.load_prompts('../sample_data/prompts.json')
    annotations = data_loader.load_annotations('../sample_data/annotations.json')
    category_weights = data_loader.load_category_weights('../sample_data/category_weights.json')
    
    # Simulate multiple model responses with different bias characteristics
    model_configs = {
        'GPT-4': {
            'base_bias': 0.25,
            'category_biases': {
                'gender': 0.15,
                'race': 0.10,
                'age': 0.20,
                'religion': 0.18,
                'nationality': 0.12
            },
            'variance': 0.8
        },
        'Claude-3': {
            'base_bias': 0.22,
            'category_biases': {
                'gender': 0.12,
                'race': 0.08,
                'age': 0.25,
                'religion': 0.15,
                'nationality': 0.10
            },
            'variance': 0.7
        },
        'Gemini-Pro': {
            'base_bias': 0.30,
            'category_biases': {
                'gender': 0.20,
                'race': 0.15,
                'age': 0.18,
                'religion': 0.22,
                'nationality': 0.25
            },
            'variance': 0.9
        },
        'Llama-2': {
            'base_bias': 0.35,
            'category_biases': {
                'gender': 0.25,
                'race': 0.30,
                'age': 0.15,
                'religion': 0.28,
                'nationality': 0.20
            },
            'variance': 1.0
        },
        'Mistral-7B': {
            'base_bias': 0.40,
            'category_biases': {
                'gender': 0.28,
                'race': 0.22,
                'age': 0.30,
                'religion': 0.25,
                'nationality': 0.35
            },
            'variance': 1.1
        }
    }
    
    # Generate model responses
    np.random.seed(42)
    model_responses = {}
    
    for model_name, config in model_configs.items():
        model_responses[model_name] = {}
        
        for prompt_id, prompt in prompts.items():
            if prompt_id in annotations:
                human_rating = annotations[prompt_id]['human_rating']
                category = prompt['category']
                
                # Calculate model bias for this category
                base_bias = config['base_bias']
                category_bias = config['category_biases'].get(category, 0)
                
                # Correlation with human rating
                human_influence = 0.6 * (human_rating - 1) / 4
                
                # Add noise
                noise = np.random.normal(0, config['variance'] * 0.3)
                
                # Combine factors
                combined_score = (
                    0.3 * base_bias + 
                    0.4 * category_bias + 
                    0.5 * human_influence + 
                    0.2 * noise
                )
                
                # Convert to 1-5 scale
                rating = 1 + 4 * np.clip(combined_score, 0, 1)
                rating = np.clip(np.round(rating), 1, 5)
                
                model_responses[model_name][prompt_id] = {
                    'rating': int(rating),
                    'is_stereotypical': rating >= 3,
                    'severity': int(rating) if rating >= 3 else 0,
                    'category': category,
                    'human_rating': human_rating,
                    'prompt_length': len(prompt['prompt'])
                }
    
    return prompts, annotations, category_weights, model_responses

# Load data
prompts, annotations, category_weights, model_responses = load_comprehensive_data()

print(f"Loaded {len(prompts)} prompts across {len(set(p['category'] for p in prompts.values()))} categories")
print(f"Generated responses for {len(model_responses)} models")
print(f"Categories: {sorted(set(p['category'] for p in prompts.values()))}")
print(f"Category weights: {category_weights}")

# Create comprehensive dataframe
def create_comprehensive_dataframe(prompts: Dict, annotations: Dict, model_responses: Dict) -> pd.DataFrame:
    """Create comprehensive dataframe for analysis."""
    
    data = []
    for model_name, responses in model_responses.items():
        for prompt_id, response in responses.items():
            if prompt_id in prompts and prompt_id in annotations:
                data.append({
                    'model': model_name,
                    'prompt_id': prompt_id,
                    'category': response['category'],
                    'rating': response['rating'],
                    'is_stereotypical': response['is_stereotypical'],
                    'severity': response['severity'],
                    'human_rating': response['human_rating'],
                    'human_is_stereotypical': response['human_rating'] >= 3,
                    'prompt_length': response['prompt_length']
                })
    
    return pd.DataFrame(data)

df = create_comprehensive_dataframe(prompts, annotations, model_responses)
print(f"\nCreated dataframe with {len(df)} rows")
print(f"Models: {df['model'].unique()}")
print(f"Categories: {df['category'].unique()}")

# Display sample data
print("\n=== Sample Data ===")
print(df.head())

## 2. Category-wise Performance Analysis

In [None]:
def calculate_category_metrics(df: pd.DataFrame) -> Dict:
    """Calculate comprehensive metrics for each category."""
    
    category_metrics = {}
    
    for category in df['category'].unique():
        cat_data = df[df['category'] == category]
        
        # Basic metrics
        total_responses = len(cat_data)
        stereotypical_responses = cat_data['is_stereotypical'].sum()
        stereotype_rate = stereotypical_responses / total_responses
        
        # Severity metrics
        avg_rating = cat_data['rating'].mean()
        stereotypical_data = cat_data[cat_data['is_stereotypical']]
        if len(stereotypical_data) > 0:
            avg_severity = stereotypical_data['rating'].mean()
            csss = avg_severity  # Conditional Stereotype Severity Score
        else:
            avg_severity = 0
            csss = 0
        
        sss = cat_data['severity'].mean()  # Stereotype Severity Score
        
        # Model-specific metrics
        model_metrics = {}
        for model in df['model'].unique():
            model_cat_data = cat_data[cat_data['model'] == model]
            if len(model_cat_data) > 0:
                model_metrics[model] = {
                    'stereotype_rate': model_cat_data['is_stereotypical'].mean(),
                    'avg_rating': model_cat_data['rating'].mean(),
                    'avg_severity': model_cat_data[model_cat_data['is_stereotypical']]['rating'].mean() if model_cat_data['is_stereotypical'].any() else 0
                }
        
        # Human comparison
        human_stereotype_rate = cat_data['human_is_stereotypical'].mean()
        human_avg_rating = cat_data['human_rating'].mean()
        
        # Agreement with human
        binary_agreement = (cat_data['is_stereotypical'] == cat_data['human_is_stereotypical']).mean()
        rating_correlation = cat_data['rating'].corr(cat_data['human_rating'])
        
        category_metrics[category] = {
            'total_responses': total_responses,
            'stereotype_rate': stereotype_rate,
            'avg_rating': avg_rating,
            'sss': sss,
            'csss': csss,
            'human_stereotype_rate': human_stereotype_rate,
            'human_avg_rating': human_avg_rating,
            'binary_agreement': binary_agreement,
            'rating_correlation': rating_correlation,
            'model_metrics': model_metrics
        }
    
    return category_metrics

# Calculate metrics
category_metrics = calculate_category_metrics(df)

print("=== CATEGORY-WISE PERFORMANCE ANALYSIS ===")

# Create summary table
summary_data = []
for category, metrics in category_metrics.items():
    summary_data.append({
        'Category': category,
        'Total Responses': metrics['total_responses'],
        'Stereotype Rate': f"{metrics['stereotype_rate']:.2%}",
        'Avg Rating': f"{metrics['avg_rating']:.2f}",
        'SSS': f"{metrics['sss']:.2f}",
        'CSSS': f"{metrics['csss']:.2f}",
        'Human Agreement': f"{metrics['binary_agreement']:.2%}",
        'Rating Correlation': f"{metrics['rating_correlation']:.3f}"
    })

summary_df = pd.DataFrame(summary_data)
print(summary_df.to_string(index=False))

# Statistical analysis
def analyze_category_differences(df: pd.DataFrame) -> Dict:
    """Analyze statistical differences between categories."""
    
    results = {}
    
    # 1. Chi-square test for stereotype rates
    contingency_table = pd.crosstab(df['category'], df['is_stereotypical'])
    chi2, p_value, dof, expected = chi2_contingency(contingency_table)
    
    results['chi2_test'] = {
        'statistic': chi2,
        'p_value': p_value,
        'significant': p_value < 0.05
    }
    
    # 2. Kruskal-Wallis test for rating differences
    category_groups = [group['rating'].values for name, group in df.groupby('category')]
    kw_stat, kw_p = kruskal(*category_groups)
    
    results['kruskal_wallis'] = {
        'statistic': kw_stat,
        'p_value': kw_p,
        'significant': kw_p < 0.05
    }
    
    # 3. Pairwise comparisons
    categories = df['category'].unique()
    pairwise_tests = []
    
    for cat1, cat2 in itertools.combinations(categories, 2):
        data1 = df[df['category'] == cat1]['rating']
        data2 = df[df['category'] == cat2]['rating']
        
        # Mann-Whitney U test
        u_stat, u_p = mannwhitneyu(data1, data2, alternative='two-sided')
        
        pairwise_tests.append({
            'category1': cat1,
            'category2': cat2,
            'u_statistic': u_stat,
            'p_value': u_p,
            'significant': u_p < 0.05,
            'mean_diff': data1.mean() - data2.mean()
        })
    
    results['pairwise_tests'] = pairwise_tests
    
    return results

stats_results = analyze_category_differences(df)

print("\n=== STATISTICAL ANALYSIS ===")
print(f"Chi-square test (stereotype rates): œá¬≤ = {stats_results['chi2_test']['statistic']:.3f}, p = {stats_results['chi2_test']['p_value']:.3f}")
print(f"Significant differences: {stats_results['chi2_test']['significant']}")

print(f"\nKruskal-Wallis test (ratings): H = {stats_results['kruskal_wallis']['statistic']:.3f}, p = {stats_results['kruskal_wallis']['p_value']:.3f}")
print(f"Significant differences: {stats_results['kruskal_wallis']['significant']}")

print("\nSignificant Pairwise Differences:")
significant_pairs = [test for test in stats_results['pairwise_tests'] if test['significant']]
for test in sorted(significant_pairs, key=lambda x: x['p_value']):
    direction = "higher" if test['mean_diff'] > 0 else "lower"
    print(f"  {test['category1']} vs {test['category2']}: {test['category1']} has {direction} ratings (p = {test['p_value']:.3f})")

if not significant_pairs:
    print("  No significant pairwise differences found")

## 3. CSSS and WOSI Deep Dive

In [None]:
def calculate_advanced_metrics(df: pd.DataFrame, category_weights: Dict) -> Dict:
    """Calculate advanced metrics including CSSS and WOSI."""
    
    metrics = {}
    
    # Calculate metrics for each model
    for model in df['model'].unique():
        model_data = df[df['model'] == model]
        model_metrics = {}
        
        for category in df['category'].unique():
            cat_data = model_data[model_data['category'] == category]
            
            if len(cat_data) > 0:
                # Basic metrics
                total_responses = len(cat_data)
                stereotypical_responses = cat_data['is_stereotypical'].sum()
                stereotype_rate = stereotypical_responses / total_responses
                
                # Stereotype Severity Score (SSS)
                sss = cat_data['severity'].mean()
                
                # Conditional Stereotype Severity Score (CSSS)
                stereotypical_data = cat_data[cat_data['is_stereotypical']]
                csss = stereotypical_data['rating'].mean() if len(stereotypical_data) > 0 else 0
                
                model_metrics[category] = {
                    'stereotype_rate': stereotype_rate,
                    'sss': sss,
                    'csss': csss,
                    'total_responses': total_responses
                }
        
        # Calculate WOSI (Weighted Overall Stereotyping Index)
        wosi_components = []
        total_weight = 0
        
        for category, category_metrics in model_metrics.items():
            weight = category_weights.get(category, 1.0)
            
            # WOSI combines stereotype rate and severity
            # Formula: WOSI = Œ£(weight_i * (Œ± * SR_i + Œ≤ * SSS_i))
            alpha = 0.6  # Weight for stereotype rate
            beta = 0.4   # Weight for severity
            
            # Normalize SSS to 0-1 scale (assuming max severity is 5)
            normalized_sss = category_metrics['sss'] / 5.0
            
            category_wosi = alpha * category_metrics['stereotype_rate'] + beta * normalized_sss
            wosi_components.append(weight * category_wosi)
            total_weight += weight
        
        wosi = sum(wosi_components) / total_weight if total_weight > 0 else 0
        
        metrics[model] = {
            'category_metrics': model_metrics,
            'wosi': wosi,
            'wosi_components': dict(zip(model_metrics.keys(), wosi_components))
        }
    
    return metrics

# Calculate advanced metrics
advanced_metrics = calculate_advanced_metrics(df, category_weights)

print("=== ADVANCED METRICS ANALYSIS ===")

# WOSI Rankings
wosi_rankings = sorted(advanced_metrics.items(), key=lambda x: x[1]['wosi'])
print("\nWOSI Rankings (Lower is Better):")
for i, (model, metrics) in enumerate(wosi_rankings, 1):
    print(f"{i}. {model}: {metrics['wosi']:.3f}")

# Category-wise CSSS comparison
print("\n=== CSSS by Category and Model ===")
csss_data = []
for model, metrics in advanced_metrics.items():
    for category, cat_metrics in metrics['category_metrics'].items():
        csss_data.append({
            'Model': model,
            'Category': category,
            'CSSS': cat_metrics['csss'],
            'Stereotype Rate': cat_metrics['stereotype_rate'],
            'SSS': cat_metrics['sss']
        })

csss_df = pd.DataFrame(csss_data)
csss_pivot = csss_df.pivot(index='Model', columns='Category', values='CSSS')
print(csss_pivot.round(3))

# WOSI component analysis
def analyze_wosi_components(advanced_metrics: Dict, category_weights: Dict) -> Dict:
    """Analyze WOSI components and their contributions."""
    
    analysis = {}
    
    # Weight impact analysis
    weight_impact = {}
    for category, weight in category_weights.items():
        category_scores = []
        for model, metrics in advanced_metrics.items():
            if category in metrics['category_metrics']:
                cat_metrics = metrics['category_metrics'][category]
                alpha, beta = 0.6, 0.4
                normalized_sss = cat_metrics['sss'] / 5.0
                score = alpha * cat_metrics['stereotype_rate'] + beta * normalized_sss
                category_scores.append(score)
        
        if category_scores:
            weight_impact[category] = {
                'weight': weight,
                'avg_score': np.mean(category_scores),
                'weighted_contribution': weight * np.mean(category_scores),
                'std_score': np.std(category_scores)
            }
    
    analysis['weight_impact'] = weight_impact
    
    # Model component breakdown
    model_breakdowns = {}
    for model, metrics in advanced_metrics.items():
        breakdown = {}
        for category, cat_metrics in metrics['category_metrics'].items():
            weight = category_weights.get(category, 1.0)
            alpha, beta = 0.6, 0.4
            normalized_sss = cat_metrics['sss'] / 5.0
            
            sr_component = alpha * cat_metrics['stereotype_rate']
            sss_component = beta * normalized_sss
            
            breakdown[category] = {
                'sr_component': sr_component,
                'sss_component': sss_component,
                'total_component': sr_component + sss_component,
                'weighted_component': weight * (sr_component + sss_component)
            }
        
        model_breakdowns[model] = breakdown
    
    analysis['model_breakdowns'] = model_breakdowns
    
    return analysis

wosi_analysis = analyze_wosi_components(advanced_metrics, category_weights)

print("\n=== WOSI COMPONENT ANALYSIS ===")
print("\nCategory Weight Impact:")
for category, impact in sorted(wosi_analysis['weight_impact'].items(), 
                              key=lambda x: x[1]['weighted_contribution'], reverse=True):
    print(f"{category}: Weight={impact['weight']:.2f}, Avg Score={impact['avg_score']:.3f}, "
          f"Weighted Contribution={impact['weighted_contribution']:.3f}")

print("\nModel Component Breakdown (Best and Worst):")
best_model = wosi_rankings[0][0]
worst_model = wosi_rankings[-1][0]

for model_name in [best_model, worst_model]:
    print(f"\n{model_name}:")
    breakdown = wosi_analysis['model_breakdowns'][model_name]
    for category, components in breakdown.items():
        print(f"  {category}: SR={components['sr_component']:.3f}, SSS={components['sss_component']:.3f}, "
              f"Total={components['total_component']:.3f}, Weighted={components['weighted_component']:.3f}")

## 4. Comprehensive Visualization

In [None]:
def create_comprehensive_category_visualizations(df: pd.DataFrame, advanced_metrics: Dict, 
                                               category_weights: Dict, wosi_analysis: Dict):
    """Create comprehensive visualizations for category analysis."""
    
    fig, axes = plt.subplots(3, 3, figsize=(24, 20))
    fig.suptitle('Comprehensive Bias Category Analysis', fontsize=20, fontweight='bold')
    
    # 1. Category stereotype rates by model
    ax1 = axes[0, 0]
    
    sr_data = []
    for model in df['model'].unique():
        for category in df['category'].unique():
            model_cat_data = df[(df['model'] == model) & (df['category'] == category)]
            if len(model_cat_data) > 0:
                sr_data.append({
                    'Model': model,
                    'Category': category,
                    'Stereotype Rate': model_cat_data['is_stereotypical'].mean()
                })
    
    sr_df = pd.DataFrame(sr_data)
    sr_pivot = sr_df.pivot(index='Model', columns='Category', values='Stereotype Rate')
    
    sns.heatmap(sr_pivot, annot=True, fmt='.2f', cmap='RdYlBu_r', ax=ax1,
                cbar_kws={'label': 'Stereotype Rate'})
    ax1.set_title('Stereotype Rate by Model and Category')
    ax1.set_xlabel('Category')
    ax1.set_ylabel('Model')
    
    # 2. CSSS heatmap
    ax2 = axes[0, 1]
    
    csss_pivot = csss_df.pivot(index='Model', columns='Category', values='CSSS')
    sns.heatmap(csss_pivot, annot=True, fmt='.2f', cmap='RdYlBu_r', ax=ax2,
                cbar_kws={'label': 'CSSS'})
    ax2.set_title('Conditional Stereotype Severity Score (CSSS)')
    ax2.set_xlabel('Category')
    ax2.set_ylabel('Model')
    
    # 3. WOSI rankings
    ax3 = axes[0, 2]
    
    models = [model for model, _ in wosi_rankings]
    wosi_scores = [metrics['wosi'] for _, metrics in wosi_rankings]
    
    colors = plt.cm.RdYlBu_r(np.linspace(0.2, 0.8, len(models)))
    bars = ax3.bar(range(len(models)), wosi_scores, color=colors)
    ax3.set_title('WOSI Rankings (Lower is Better)')
    ax3.set_xlabel('Model')
    ax3.set_ylabel('WOSI Score')
    ax3.set_xticks(range(len(models)))
    ax3.set_xticklabels(models, rotation=45)
    
    # Add value labels
    for bar, score in zip(bars, wosi_scores):
        height = bar.get_height()
        ax3.text(bar.get_x() + bar.get_width()/2., height + 0.005,
                f'{score:.3f}', ha='center', va='bottom')
    
    # 4. Category weight impact
    ax4 = axes[1, 0]
    
    categories = list(wosi_analysis['weight_impact'].keys())
    weights = [wosi_analysis['weight_impact'][cat]['weight'] for cat in categories]
    contributions = [wosi_analysis['weight_impact'][cat]['weighted_contribution'] for cat in categories]
    
    x = np.arange(len(categories))
    width = 0.35
    
    bars1 = ax4.bar(x - width/2, weights, width, label='Weight', alpha=0.7)
    bars2 = ax4.bar(x + width/2, contributions, width, label='Weighted Contribution', alpha=0.7)
    
    ax4.set_title('Category Weights vs Contributions')
    ax4.set_xlabel('Category')
    ax4.set_ylabel('Value')
    ax4.set_xticks(x)
    ax4.set_xticklabels(categories, rotation=45)
    ax4.legend()
    
    # 5. Rating distributions by category
    ax5 = axes[1, 1]
    
    df.boxplot(column='rating', by='category', ax=ax5)
    ax5.set_title('Rating Distribution by Category')
    ax5.set_xlabel('Category')
    ax5.set_ylabel('Rating')
    plt.setp(ax5.get_xticklabels(), rotation=45)
    
    # 6. Human vs Model agreement by category
    ax6 = axes[1, 2]
    
    agreement_data = []
    for category in df['category'].unique():
        cat_data = df[df['category'] == category]
        agreement = (cat_data['is_stereotypical'] == cat_data['human_is_stereotypical']).mean()
        correlation = cat_data['rating'].corr(cat_data['human_rating'])
        agreement_data.append({
            'Category': category,
            'Binary Agreement': agreement,
            'Rating Correlation': correlation
        })
    
    agreement_df = pd.DataFrame(agreement_data)
    
    x = np.arange(len(agreement_df))
    width = 0.35
    
    bars1 = ax6.bar(x - width/2, agreement_df['Binary Agreement'], width, 
                   label='Binary Agreement', alpha=0.7)
    bars2 = ax6.bar(x + width/2, agreement_df['Rating Correlation'], width, 
                   label='Rating Correlation', alpha=0.7)
    
    ax6.set_title('Human-Model Agreement by Category')
    ax6.set_xlabel('Category')
    ax6.set_ylabel('Agreement/Correlation')
    ax6.set_xticks(x)
    ax6.set_xticklabels(agreement_df['Category'], rotation=45)
    ax6.legend()
    ax6.set_ylim(0, 1)
    
    # 7. Severity patterns
    ax7 = axes[2, 0]
    
    # Show severity distribution for stereotypical responses only
    stereotypical_data = df[df['is_stereotypical']]
    severity_counts = stereotypical_data.groupby(['category', 'rating']).size().unstack(fill_value=0)
    
    severity_counts.plot(kind='bar', stacked=True, ax=ax7, colormap='RdYlBu_r')
    ax7.set_title('Severity Distribution (Stereotypical Responses Only)')
    ax7.set_xlabel('Category')
    ax7.set_ylabel('Count')
    ax7.legend(title='Rating', bbox_to_anchor=(1.05, 1), loc='upper left')
    plt.setp(ax7.get_xticklabels(), rotation=45)
    
    # 8. Model consistency across categories
    ax8 = axes[2, 1]
    
    consistency_data = []
    for model in df['model'].unique():
        model_data = df[df['model'] == model]
        category_rates = []
        for category in df['category'].unique():
            cat_data = model_data[model_data['category'] == category]
            if len(cat_data) > 0:
                category_rates.append(cat_data['is_stereotypical'].mean())
        
        consistency = 1 - np.std(category_rates) if category_rates else 0
        consistency_data.append({
            'Model': model,
            'Consistency': consistency,
            'Std Dev': np.std(category_rates) if category_rates else 0
        })
    
    consistency_df = pd.DataFrame(consistency_data)
    consistency_df = consistency_df.sort_values('Consistency', ascending=False)
    
    bars = ax8.bar(consistency_df['Model'], consistency_df['Consistency'], 
                  color=plt.cm.RdYlBu_r(np.linspace(0.2, 0.8, len(consistency_df))), alpha=0.7)
    ax8.set_title('Model Consistency Across Categories')
    ax8.set_xlabel('Model')
    ax8.set_ylabel('Consistency Score')
    plt.setp(ax8.get_xticklabels(), rotation=45)
    
    # Add value labels
    for bar, consistency in zip(bars, consistency_df['Consistency']):
        height = bar.get_height()
        ax8.text(bar.get_x() + bar.get_width()/2., height + 0.01,
                f'{consistency:.3f}', ha='center', va='bottom')
    
    # 9. Category correlation matrix
    ax9 = axes[2, 2]
    
    # Calculate correlation between categories based on model performance
    category_correlation_data = []
    for model in df['model'].unique():
        model_data = df[df['model'] == model]
        row = {'Model': model}
        for category in df['category'].unique():
            cat_data = model_data[model_data['category'] == category]
            row[category] = cat_data['is_stereotypical'].mean() if len(cat_data) > 0 else 0
        category_correlation_data.append(row)
    
    corr_df = pd.DataFrame(category_correlation_data).set_index('Model')
    correlation_matrix = corr_df.corr()
    
    sns.heatmap(correlation_matrix, annot=True, fmt='.2f', cmap='RdYlBu_r', 
                center=0, ax=ax9, cbar_kws={'label': 'Correlation'})
    ax9.set_title('Category Performance Correlation\n(Across Models)')
    ax9.set_xlabel('Category')
    ax9.set_ylabel('Category')
    
    plt.tight_layout()
    plt.show()
    
    return sr_pivot, csss_pivot, consistency_df, correlation_matrix

# Create comprehensive visualizations
sr_pivot, csss_pivot, consistency_df, correlation_matrix = create_comprehensive_category_visualizations(
    df, advanced_metrics, category_weights, wosi_analysis
)

## 5. Category Weight Optimization

In [None]:
def optimize_category_weights(df: pd.DataFrame, current_weights: Dict, 
                            optimization_target: str = 'discrimination') -> Dict:
    """Optimize category weights for better discrimination or other objectives."""
    
    from scipy.optimize import minimize
    
    categories = list(current_weights.keys())
    
    def calculate_wosi_with_weights(weights):
        """Calculate WOSI scores for all models with given weights."""
        weight_dict = dict(zip(categories, weights))
        model_wosi = {}
        
        for model in df['model'].unique():
            model_data = df[df['model'] == model]
            weighted_score = 0
            total_weight = 0
            
            for category in categories:
                cat_data = model_data[model_data['category'] == category]
                if len(cat_data) > 0:
                    sr = cat_data['is_stereotypical'].mean()
                    sss = cat_data['severity'].mean() / 5.0  # Normalize
                    category_score = 0.6 * sr + 0.4 * sss
                    
                    weight = weight_dict[category]
                    weighted_score += weight * category_score
                    total_weight += weight
            
            model_wosi[model] = weighted_score / total_weight if total_weight > 0 else 0
        
        return model_wosi
    
    def objective_function(weights):
        """Objective function to maximize/minimize."""
        model_wosi = calculate_wosi_with_weights(weights)
        wosi_values = list(model_wosi.values())
        
        if optimization_target == 'discrimination':
            # Maximize discrimination (variance) between models
            return -np.var(wosi_values)
        elif optimization_target == 'human_alignment':
            # Minimize distance from human ratings
            total_distance = 0
            for model in df['model'].unique():
                model_data = df[df['model'] == model]
                model_avg = model_data['rating'].mean()
                human_avg = model_data['human_rating'].mean()
                total_distance += abs(model_avg - human_avg)
            return total_distance
        elif optimization_target == 'stability':
            # Minimize sensitivity to small changes
            return np.std(wosi_values)
    
    # Constraints: weights must sum to len(categories) and be positive
    constraints = [
        {'type': 'eq', 'fun': lambda x: np.sum(x) - len(categories)},
    ]
    
    bounds = [(0.1, 3.0) for _ in categories]  # Reasonable bounds
    
    # Initial guess (current weights)
    initial_weights = [current_weights[cat] for cat in categories]
    
    # Optimize
    result = minimize(objective_function, initial_weights, method='SLSQP', 
                     bounds=bounds, constraints=constraints)
    
    if result.success:
        optimal_weights = dict(zip(categories, result.x))
        
        # Compare results
        current_wosi = calculate_wosi_with_weights(initial_weights)
        optimal_wosi = calculate_wosi_with_weights(result.x)
        
        return {
            'success': True,
            'optimal_weights': optimal_weights,
            'current_weights': current_weights,
            'current_wosi': current_wosi,
            'optimal_wosi': optimal_wosi,
            'improvement': result.fun,
            'optimization_target': optimization_target
        }
    else:
        return {
            'success': False,
            'message': 'Optimization failed',
            'error': result.message
        }

# Run optimization for different targets
optimization_results = {}

for target in ['discrimination', 'human_alignment', 'stability']:
    print(f"\n=== OPTIMIZING FOR {target.upper()} ===")
    result = optimize_category_weights(df, category_weights, target)
    optimization_results[target] = result
    
    if result['success']:
        print(f"Optimization successful!")
        print(f"\nCurrent weights:")
        for cat, weight in result['current_weights'].items():
            print(f"  {cat}: {weight:.3f}")
        
        print(f"\nOptimal weights:")
        for cat, weight in result['optimal_weights'].items():
            print(f"  {cat}: {weight:.3f}")
        
        print(f"\nWOSI comparison:")
        print(f"Current WOSI scores: {result['current_wosi']}")
        print(f"Optimal WOSI scores: {result['optimal_wosi']}")
        
        if target == 'discrimination':
            current_var = np.var(list(result['current_wosi'].values()))
            optimal_var = np.var(list(result['optimal_wosi'].values()))
            print(f"Discrimination improvement: {current_var:.6f} ‚Üí {optimal_var:.6f}")
    else:
        print(f"Optimization failed: {result['message']}")

# Weight sensitivity analysis
def analyze_weight_sensitivity(df: pd.DataFrame, base_weights: Dict, 
                             perturbation_size: float = 0.1) -> Dict:
    """Analyze sensitivity of WOSI to weight changes."""
    
    sensitivity_results = {}
    
    # Calculate baseline WOSI
    baseline_metrics = calculate_advanced_metrics(df, base_weights)
    baseline_wosi = {model: metrics['wosi'] for model, metrics in baseline_metrics.items()}
    
    # Test sensitivity for each category
    for category in base_weights.keys():
        # Increase weight
        increased_weights = base_weights.copy()
        increased_weights[category] *= (1 + perturbation_size)
        
        # Decrease weight
        decreased_weights = base_weights.copy()
        decreased_weights[category] *= (1 - perturbation_size)
        
        # Calculate WOSI for both scenarios
        increased_metrics = calculate_advanced_metrics(df, increased_weights)
        decreased_metrics = calculate_advanced_metrics(df, decreased_weights)
        
        increased_wosi = {model: metrics['wosi'] for model, metrics in increased_metrics.items()}
        decreased_wosi = {model: metrics['wosi'] for model, metrics in decreased_metrics.items()}
        
        # Calculate sensitivity
        sensitivity = {}
        for model in baseline_wosi.keys():
            increase_change = increased_wosi[model] - baseline_wosi[model]
            decrease_change = baseline_wosi[model] - decreased_wosi[model]
            avg_sensitivity = (abs(increase_change) + abs(decrease_change)) / 2
            sensitivity[model] = avg_sensitivity
        
        sensitivity_results[category] = {
            'sensitivity': sensitivity,
            'avg_sensitivity': np.mean(list(sensitivity.values())),
            'max_sensitivity': max(sensitivity.values()),
            'min_sensitivity': min(sensitivity.values())
        }
    
    return sensitivity_results

sensitivity_analysis = analyze_weight_sensitivity(df, category_weights)

print("\n=== WEIGHT SENSITIVITY ANALYSIS ===")
print("\nSensitivity to 10% weight changes:")
for category, results in sorted(sensitivity_analysis.items(), 
                               key=lambda x: x[1]['avg_sensitivity'], reverse=True):
    print(f"{category}: Avg={results['avg_sensitivity']:.4f}, "
          f"Max={results['max_sensitivity']:.4f}, Min={results['min_sensitivity']:.4f}")

# Recommend optimal weights based on analysis
print("\n=== WEIGHT OPTIMIZATION RECOMMENDATIONS ===")

# Choose the best optimization result
if optimization_results['discrimination']['success']:
    recommended_weights = optimization_results['discrimination']['optimal_weights']
    print("\nRecommended weights (optimized for discrimination):")
    for cat, weight in recommended_weights.items():
        current_weight = category_weights[cat]
        change = (weight - current_weight) / current_weight * 100
        print(f"{cat}: {weight:.3f} (current: {current_weight:.3f}, change: {change:+.1f}%)")
else:
    print("Unable to optimize weights - using current weights")
    recommended_weights = category_weights

# Save optimization results
optimization_summary = {
    'current_weights': category_weights,
    'recommended_weights': recommended_weights,
    'optimization_results': optimization_results,
    'sensitivity_analysis': sensitivity_analysis
}

with open('../data/weight_optimization_results.json', 'w') as f:
    json.dump(optimization_summary, f, indent=2, default=str)

print(f"\n‚úÖ Weight optimization results saved to ../data/weight_optimization_results.json")

## 6. Intersectional Analysis

In [None]:
def perform_intersectional_analysis(df: pd.DataFrame) -> Dict:
    """Perform intersectional analysis of bias categories."""
    
    # Since we don't have actual intersectional data, we'll simulate it
    # based on prompt characteristics and model responses
    np.random.seed(42)
    
    # Add simulated intersectional characteristics
    df_intersect = df.copy()
    
    # Simulate secondary bias categories for some prompts
    categories = df['category'].unique()
    intersectional_data = []
    
    for idx, row in df.iterrows():
        # 30% chance of having intersectional bias
        if np.random.random() < 0.3:
            # Choose a secondary category different from primary
            secondary_categories = [cat for cat in categories if cat != row['category']]
            secondary_category = np.random.choice(secondary_categories)
            
            # Intersectional bias tends to be higher
            intersectional_bias_multiplier = 1.2 + np.random.normal(0, 0.2)
            
            intersectional_data.append({
                'index': idx,
                'primary_category': row['category'],
                'secondary_category': secondary_category,
                'intersectional_rating': min(5, max(1, row['rating'] * intersectional_bias_multiplier)),
                'is_intersectional': True
            })
        else:
            intersectional_data.append({
                'index': idx,
                'primary_category': row['category'],
                'secondary_category': None,
                'intersectional_rating': row['rating'],
                'is_intersectional': False
            })
    
    intersect_df = pd.DataFrame(intersectional_data)
    
    # Merge with original data
    df_merged = df.merge(intersect_df, left_index=True, right_on='index')
    
    results = {}
    
    # 1. Compare intersectional vs single-category bias
    intersectional_subset = df_merged[df_merged['is_intersectional']]
    single_category_subset = df_merged[~df_merged['is_intersectional']]
    
    if len(intersectional_subset) > 0 and len(single_category_subset) > 0:
        intersect_rating = intersectional_subset['intersectional_rating'].mean()
        single_rating = single_category_subset['intersectional_rating'].mean()
        
        # Statistical test
        u_stat, p_val = mannwhitneyu(intersectional_subset['intersectional_rating'], 
                                    single_category_subset['intersectional_rating'])
        
        results['intersectional_comparison'] = {
            'intersectional_mean': intersect_rating,
            'single_category_mean': single_rating,
            'difference': intersect_rating - single_rating,
            'u_statistic': u_stat,
            'p_value': p_val,
            'significant': p_val < 0.05
        }
    
    # 2. Category pair analysis
    category_pairs = {}
    for _, row in intersectional_subset.iterrows():
        pair = tuple(sorted([row['primary_category'], row['secondary_category']]))
        if pair not in category_pairs:
            category_pairs[pair] = []
        category_pairs[pair].append(row['intersectional_rating'])
    
    pair_analysis = {}
    for pair, ratings in category_pairs.items():
        if len(ratings) > 5:  # Minimum sample size
            pair_analysis[pair] = {
                'count': len(ratings),
                'mean_rating': np.mean(ratings),
                'std_rating': np.std(ratings),
                'stereotype_rate': sum(1 for r in ratings if r >= 3) / len(ratings)
            }
    
    results['category_pairs'] = pair_analysis
    
    # 3. Model performance on intersectional bias
    model_intersect_performance = {}
    for model in df_merged['model'].unique():
        model_data = df_merged[df_merged['model'] == model]
        
        intersect_data = model_data[model_data['is_intersectional']]
        single_data = model_data[~model_data['is_intersectional']]
        
        if len(intersect_data) > 0 and len(single_data) > 0:
            intersect_performance = intersect_data['intersectional_rating'].mean()
            single_performance = single_data['intersectional_rating'].mean()
            
            model_intersect_performance[model] = {
                'intersectional_mean': intersect_performance,
                'single_mean': single_performance,
                'performance_gap': intersect_performance - single_performance,
                'intersectional_count': len(intersect_data),
                'single_count': len(single_data)
            }
    
    results['model_intersectional_performance'] = model_intersect_performance
    
    # 4. Intersectional amplification factor
    amplification_factors = {}
    for category in categories:
        # Find prompts with this category as primary
        primary_data = df_merged[df_merged['primary_category'] == category]
        
        single_cat_data = primary_data[~primary_data['is_intersectional']]
        intersect_data = primary_data[primary_data['is_intersectional']]
        
        if len(single_cat_data) > 0 and len(intersect_data) > 0:
            single_mean = single_cat_data['rating'].mean()
            intersect_mean = intersect_data['intersectional_rating'].mean()
            
            amplification = intersect_mean / single_mean if single_mean > 0 else 1
            amplification_factors[category] = {
                'amplification_factor': amplification,
                'single_mean': single_mean,
                'intersectional_mean': intersect_mean,
                'samples': len(intersect_data)
            }
    
    results['amplification_factors'] = amplification_factors
    
    return results, df_merged

# Perform intersectional analysis
intersectional_results, df_intersect = perform_intersectional_analysis(df)

print("=== INTERSECTIONAL ANALYSIS ===")

# Overall intersectional comparison
if 'intersectional_comparison' in intersectional_results:
    comp = intersectional_results['intersectional_comparison']
    print(f"\nIntersectional vs Single-Category Bias:")
    print(f"Intersectional mean rating: {comp['intersectional_mean']:.3f}")
    print(f"Single-category mean rating: {comp['single_category_mean']:.3f}")
    print(f"Difference: {comp['difference']:.3f}")
    print(f"Statistical significance: {comp['significant']} (p = {comp['p_value']:.3f})")
    
    if comp['significant']:
        direction = "higher" if comp['difference'] > 0 else "lower"
        print(f"‚Üí Intersectional bias is significantly {direction} than single-category bias")

# Category pair analysis
print("\n=== Category Pair Analysis ===")
if intersectional_results['category_pairs']:
    sorted_pairs = sorted(intersectional_results['category_pairs'].items(), 
                         key=lambda x: x[1]['mean_rating'], reverse=True)
    
    print("Most problematic category combinations:")
    for i, (pair, stats) in enumerate(sorted_pairs[:5], 1):
        print(f"{i}. {pair[0]} + {pair[1]}: {stats['mean_rating']:.3f} avg rating, "
              f"{stats['stereotype_rate']:.1%} stereotype rate ({stats['count']} samples)")
else:
    print("No significant category pairs found")

# Model intersectional performance
print("\n=== Model Intersectional Performance ===")
if intersectional_results['model_intersectional_performance']:
    sorted_models = sorted(intersectional_results['model_intersectional_performance'].items(),
                          key=lambda x: x[1]['performance_gap'], reverse=True)
    
    print("Models with largest intersectional bias gaps:")
    for model, stats in sorted_models:
        print(f"{model}: Gap = {stats['performance_gap']:.3f} "
              f"(Intersectional: {stats['intersectional_mean']:.3f}, "
              f"Single: {stats['single_mean']:.3f})")

# Amplification factors
print("\n=== Intersectional Amplification Factors ===")
if intersectional_results['amplification_factors']:
    sorted_amplification = sorted(intersectional_results['amplification_factors'].items(),
                                 key=lambda x: x[1]['amplification_factor'], reverse=True)
    
    print("Categories with highest intersectional amplification:")
    for category, stats in sorted_amplification:
        print(f"{category}: {stats['amplification_factor']:.2f}x amplification "
              f"({stats['single_mean']:.3f} ‚Üí {stats['intersectional_mean']:.3f})")

# Create intersectional visualization
def create_intersectional_visualization(df_intersect: pd.DataFrame, intersectional_results: Dict):
    """Create visualizations for intersectional analysis."""
    
    fig, axes = plt.subplots(2, 2, figsize=(16, 12))
    fig.suptitle('Intersectional Bias Analysis', fontsize=16, fontweight='bold')
    
    # 1. Distribution comparison
    ax1 = axes[0, 0]
    
    intersect_ratings = df_intersect[df_intersect['is_intersectional']]['intersectional_rating']
    single_ratings = df_intersect[~df_intersect['is_intersectional']]['intersectional_rating']
    
    ax1.hist(single_ratings, bins=20, alpha=0.7, label='Single Category', color='blue')
    ax1.hist(intersect_ratings, bins=20, alpha=0.7, label='Intersectional', color='red')
    ax1.set_xlabel('Rating')
    ax1.set_ylabel('Frequency')
    ax1.set_title('Rating Distribution: Single vs Intersectional')
    ax1.legend()
    ax1.grid(True, alpha=0.3)
    
    # 2. Category pair heatmap
    ax2 = axes[0, 1]
    
    if intersectional_results['category_pairs']:
        # Create matrix for heatmap
        categories = sorted(df_intersect['category'].unique())
        pair_matrix = np.zeros((len(categories), len(categories)))
        
        for (cat1, cat2), stats in intersectional_results['category_pairs'].items():
            if cat1 in categories and cat2 in categories:
                i, j = categories.index(cat1), categories.index(cat2)
                pair_matrix[i, j] = stats['mean_rating']
                pair_matrix[j, i] = stats['mean_rating']
        
        # Mask diagonal and zeros
        mask = np.zeros_like(pair_matrix, dtype=bool)
        mask[np.diag_indices_from(mask)] = True
        mask[pair_matrix == 0] = True
        
        sns.heatmap(pair_matrix, mask=mask, annot=True, fmt='.2f', 
                   xticklabels=categories, yticklabels=categories,
                   cmap='RdYlBu_r', ax=ax2, cbar_kws={'label': 'Mean Rating'})
        ax2.set_title('Intersectional Category Pairs')
    else:
        ax2.text(0.5, 0.5, 'No significant\ncategory pairs', 
                ha='center', va='center', transform=ax2.transAxes)
        ax2.set_title('Intersectional Category Pairs')
    
    # 3. Model performance gaps
    ax3 = axes[1, 0]
    
    if intersectional_results['model_intersectional_performance']:
        models = list(intersectional_results['model_intersectional_performance'].keys())
        gaps = [intersectional_results['model_intersectional_performance'][m]['performance_gap'] 
               for m in models]
        
        colors = ['red' if gap > 0 else 'blue' for gap in gaps]
        bars = ax3.bar(models, gaps, color=colors, alpha=0.7)
        ax3.set_xlabel('Model')
        ax3.set_ylabel('Performance Gap')
        ax3.set_title('Intersectional Performance Gap by Model')
        ax3.axhline(y=0, color='black', linestyle='-', alpha=0.3)
        plt.setp(ax3.get_xticklabels(), rotation=45)
        
        # Add value labels
        for bar, gap in zip(bars, gaps):
            height = bar.get_height()
            ax3.text(bar.get_x() + bar.get_width()/2., height + (0.01 if height >= 0 else -0.03),
                    f'{gap:.3f}', ha='center', va='bottom' if height >= 0 else 'top')
    
    # 4. Amplification factors
    ax4 = axes[1, 1]
    
    if intersectional_results['amplification_factors']:
        categories = list(intersectional_results['amplification_factors'].keys())
        amplifications = [intersectional_results['amplification_factors'][cat]['amplification_factor'] 
                         for cat in categories]
        
        colors = plt.cm.RdYlBu_r(np.linspace(0.2, 0.8, len(categories)))
        bars = ax4.bar(categories, amplifications, color=colors, alpha=0.7)
        ax4.set_xlabel('Category')
        ax4.set_ylabel('Amplification Factor')
        ax4.set_title('Intersectional Amplification by Category')
        ax4.axhline(y=1, color='black', linestyle='--', alpha=0.5, label='No amplification')
        plt.setp(ax4.get_xticklabels(), rotation=45)
        ax4.legend()
        
        # Add value labels
        for bar, amp in zip(bars, amplifications):
            height = bar.get_height()
            ax4.text(bar.get_x() + bar.get_width()/2., height + 0.02,
                    f'{amp:.2f}x', ha='center', va='bottom')
    
    plt.tight_layout()
    plt.show()

# Create intersectional visualization
create_intersectional_visualization(df_intersect, intersectional_results)

print(f"\n‚úÖ Intersectional analysis completed with {len(df_intersect[df_intersect['is_intersectional']])} intersectional cases")

## 7. Summary and Export

In [None]:
def create_comprehensive_category_report(df: pd.DataFrame, category_metrics: Dict, 
                                       advanced_metrics: Dict, wosi_analysis: Dict,
                                       optimization_results: Dict, intersectional_results: Dict) -> Dict:
    """Create comprehensive category analysis report."""
    
    report = {
        'analysis_overview': {
            'total_responses': len(df),
            'total_models': df['model'].nunique(),
            'total_categories': df['category'].nunique(),
            'categories': sorted(df['category'].unique()),
            'analysis_date': pd.Timestamp.now().strftime('%Y-%m-%d %H:%M:%S')
        },
        'category_performance': category_metrics,
        'advanced_metrics': {
            'wosi_rankings': sorted(advanced_metrics.items(), key=lambda x: x[1]['wosi']),
            'wosi_analysis': wosi_analysis,
            'category_csss': {model: {cat: metrics['csss'] for cat, metrics in data['category_metrics'].items()}
                             for model, data in advanced_metrics.items()}
        },
        'optimization_analysis': {
            'current_weights': category_weights,
            'optimization_results': optimization_results,
            'recommended_weights': optimization_results.get('discrimination', {}).get('optimal_weights', category_weights)
        },
        'intersectional_analysis': intersectional_results,
        'key_findings': [],
        'recommendations': []
    }
    
    # Generate key findings
    findings = []
    
    # Category performance findings
    worst_category = min(category_metrics.items(), key=lambda x: x[1]['binary_agreement'])
    best_category = max(category_metrics.items(), key=lambda x: x[1]['binary_agreement'])
    
    findings.append(f"üìä {worst_category[0]} shows lowest human-model agreement ({worst_category[1]['binary_agreement']:.1%})")
    findings.append(f"‚úÖ {best_category[0]} shows highest human-model agreement ({best_category[1]['binary_agreement']:.1%})")
    
    # WOSI findings
    best_wosi_model = min(advanced_metrics.items(), key=lambda x: x[1]['wosi'])
    worst_wosi_model = max(advanced_metrics.items(), key=lambda x: x[1]['wosi'])
    
    findings.append(f"üèÜ {best_wosi_model[0]} achieved best WOSI score ({best_wosi_model[1]['wosi']:.3f})")
    findings.append(f"‚ö†Ô∏è {worst_wosi_model[0]} needs improvement (WOSI: {worst_wosi_model[1]['wosi']:.3f})")
    
    # Weight optimization findings
    if optimization_results.get('discrimination', {}).get('success', False):
        findings.append("üîß Weight optimization successful - improved model discrimination")
    
    # Intersectional findings
    if 'intersectional_comparison' in intersectional_results:
        intersect_comp = intersectional_results['intersectional_comparison']
        if intersect_comp['significant']:
            direction = "higher" if intersect_comp['difference'] > 0 else "lower"
            findings.append(f"üîç Intersectional bias is significantly {direction} than single-category bias")
    
    report['key_findings'] = findings
    
    # Generate recommendations
    recommendations = []
    
    # Category-specific recommendations
    for category, metrics in category_metrics.items():
        if metrics['binary_agreement'] < 0.7:
            recommendations.append(f"üéØ {category}: Improve evaluation methods - low agreement ({metrics['binary_agreement']:.1%})")
        
        if metrics['stereotype_rate'] > 0.6:
            recommendations.append(f"‚ö†Ô∏è {category}: High stereotype rate ({metrics['stereotype_rate']:.1%}) - needs attention")
    
    # Model-specific recommendations
    for model, data in advanced_metrics.items():
        if data['wosi'] > 0.4:
            recommendations.append(f"üîÑ {model}: High WOSI score ({data['wosi']:.3f}) - implement bias reduction")
    
    # Weight recommendations
    if optimization_results.get('discrimination', {}).get('success', False):
        recommendations.append("üìä Consider adopting optimized category weights for better model discrimination")
    
    # Intersectional recommendations
    if intersectional_results.get('amplification_factors'):
        high_amp_categories = [cat for cat, stats in intersectional_results['amplification_factors'].items() 
                              if stats['amplification_factor'] > 1.5]
        if high_amp_categories:
            recommendations.append(f"üîç Focus on intersectional evaluation for: {', '.join(high_amp_categories)}")
    
    # Data quality recommendations
    total_responses = sum(metrics['total_responses'] for metrics in category_metrics.values())
    avg_per_category = total_responses / len(category_metrics)
    
    small_categories = [cat for cat, metrics in category_metrics.items() 
                       if metrics['total_responses'] < avg_per_category * 0.5]
    if small_categories:
        recommendations.append(f"üìà Increase data collection for under-represented categories: {', '.join(small_categories)}")
    
    report['recommendations'] = recommendations
    
    return report

# Create comprehensive report
category_report = create_comprehensive_category_report(
    df, category_metrics, advanced_metrics, wosi_analysis, 
    optimization_results, intersectional_results
)

print("=== COMPREHENSIVE CATEGORY ANALYSIS REPORT ===")
print(f"Generated: {category_report['analysis_overview']['analysis_date']}")

print(f"\nüìä ANALYSIS OVERVIEW")
overview = category_report['analysis_overview']
print(f"Total responses: {overview['total_responses']}")
print(f"Models analyzed: {overview['total_models']}")
print(f"Categories analyzed: {overview['total_categories']}")
print(f"Categories: {', '.join(overview['categories'])}")

print(f"\nüîç KEY FINDINGS")
for finding in category_report['key_findings']:
    print(f"‚Ä¢ {finding}")

print(f"\nüí° RECOMMENDATIONS")
for rec in category_report['recommendations']:
    print(f"‚Ä¢ {rec}")

print(f"\nüèÜ TOP PERFORMERS")
wosi_rankings = category_report['advanced_metrics']['wosi_rankings']
print("WOSI Rankings (Lower is Better):")
for i, (model, data) in enumerate(wosi_rankings[:3], 1):
    print(f"  {i}. {model}: {data['wosi']:.3f}")

print(f"\nüìà CATEGORY PERFORMANCE SUMMARY")
for category, metrics in category_report['category_performance'].items():
    print(f"{category}: SR={metrics['stereotype_rate']:.1%}, CSSS={metrics['csss']:.2f}, Agreement={metrics['binary_agreement']:.1%}")

# Export comprehensive data
export_data = {
    'category_analysis': df.to_dict('records'),
    'category_metrics': category_metrics,
    'advanced_metrics': advanced_metrics,
    'wosi_analysis': wosi_analysis,
    'optimization_results': optimization_results,
    'intersectional_results': intersectional_results
}

# Save main analysis data
df.to_csv('../data/category_analysis_data.csv', index=False)
print(f"\n‚úÖ Category analysis data exported to ../data/category_analysis_data.csv")

# Save comprehensive report
with open('../data/category_analysis_report.json', 'w') as f:
    json.dump(category_report, f, indent=2, default=str)
print(f"‚úÖ Comprehensive report saved to ../data/category_analysis_report.json")

# Save detailed analysis data
with open('../data/category_analysis_detailed.json', 'w') as f:
    json.dump(export_data, f, indent=2, default=str)
print(f"‚úÖ Detailed analysis data saved to ../data/category_analysis_detailed.json")

print("\n" + "="*50)
print("BIAS CATEGORIES ANALYSIS COMPLETE")
print("="*50)

## Conclusion

This notebook provided a comprehensive deep-dive analysis of bias categories in the StereoWipe benchmark, including:

### Key Analyses Performed:

1. **Category-wise Performance Analysis**: Detailed evaluation of stereotype rates, severity scores, and human agreement across all bias categories

2. **CSSS and WOSI Deep Dive**: Advanced metrics analysis including:
   - Conditional Stereotype Severity Score (CSSS) calculation and interpretation
   - Weighted Overall Stereotyping Index (WOSI) computation and rankings
   - Component analysis showing the contribution of different factors

3. **Cross-category Statistical Analysis**: Rigorous statistical testing including:
   - Chi-square tests for stereotype rate differences
   - Kruskal-Wallis tests for rating distributions
   - Pairwise Mann-Whitney U tests for specific category comparisons

4. **Category Weight Optimization**: Systematic optimization of category weights for:
   - Model discrimination maximization
   - Human alignment improvement
   - Stability enhancement
   - Sensitivity analysis

5. **Intersectional Analysis**: Exploration of bias amplification when multiple categories intersect, including:
   - Intersectional vs. single-category bias comparison
   - Category pair analysis
   - Model performance on intersectional cases
   - Amplification factor calculations

6. **Comprehensive Visualization**: Rich visualizations showing:
   - Heatmaps of performance across models and categories
   - WOSI component breakdowns
   - Category correlation matrices
   - Intersectional bias patterns

### Key Findings:

- **Category Performance Varies**: Different bias categories show distinct patterns in terms of stereotype rates, severity, and human-model agreement
- **WOSI Provides Comprehensive Ranking**: The weighted index effectively combines multiple factors for holistic model evaluation
- **Weight Optimization Improves Discrimination**: Systematic optimization can enhance the benchmark's ability to distinguish between models
- **Intersectional Bias is Amplified**: Multiple bias categories often combine to create stronger stereotypical responses
- **Model Consistency Varies**: Some models perform consistently across categories while others show high variance

### Practical Implications:

1. **Targeted Improvement**: Models can focus on specific categories where they perform poorly
2. **Evaluation Refinement**: Category weights can be optimized based on specific objectives
3. **Intersectional Awareness**: Evaluation should include intersectional cases for comprehensive assessment
4. **Data Collection Priorities**: Under-represented categories need more data collection
5. **Metric Selection**: CSSS and WOSI provide complementary insights to basic stereotype rates

### Next Steps:

1. **Implementation**: Apply optimized weights and recommendations to improve the benchmark
2. **Validation**: Test findings with additional data and real-world applications
3. **Publication**: Use comprehensive analysis for research papers and presentations
4. **Continuous Monitoring**: Set up ongoing category-specific performance tracking
5. **Intersectional Expansion**: Develop more sophisticated intersectional evaluation methods

This analysis provides a solid foundation for understanding and improving bias evaluation across different categories, enabling more nuanced and effective stereotype detection in language models.