# Circular Bias Detection Framework - Interactive Demo

This notebook demonstrates how to use the Circular Bias Detection Framework to analyze algorithm evaluation data.

[![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/hongping-zh/circular-bias-detection/blob/main/examples/demo_notebook.ipynb)

## Setup

First, let's install the required dependencies and import the framework.

In [None]:
# Install dependencies (uncomment if needed)
# !pip install numpy pandas matplotlib seaborn

import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns

# Clone repository (if running in Colab)
# !git clone https://github.com/hongping-zh/circular-bias-detection.git
# import sys
# sys.path.insert(0, '/content/circular-bias-detection')

from circular_bias_detector import BiasDetector
from circular_bias_detector.utils import create_synthetic_data

sns.set_style('whitegrid')
print("✓ Setup complete!")

## Part 1: Load and Visualize Sample Data

Let's load the sample dataset and explore its structure.

In [None]:
# Load sample data
df = pd.read_csv('../data/sample_data.csv')

print(f"Dataset shape: {df.shape}")
print(f"\nColumns: {list(df.columns)}")
print(f"\nAlgorithms: {df['algorithm'].unique()}")
print(f"Time periods: {df['time_period'].min()} to {df['time_period'].max()}")

df.head(10)

In [None]:
# Visualize performance trends over time
plt.figure(figsize=(12, 6))

for algorithm in df['algorithm'].unique():
    data = df[df['algorithm'] == algorithm]
    plt.plot(data['time_period'], data['performance'], marker='o', label=algorithm, linewidth=2)

plt.xlabel('Time Period', fontsize=12)
plt.ylabel('Performance', fontsize=12)
plt.title('Algorithm Performance Over Time', fontsize=14, fontweight='bold')
plt.legend()
plt.grid(True, alpha=0.3)
plt.tight_layout()
plt.show()

## Part 2: Prepare Data for Bias Detection

Convert the dataframe into the required matrix format.

In [None]:
# Create performance matrix (T x K)
performance_matrix = df.pivot(
    index='time_period',
    columns='algorithm',
    values='performance'
).values

# Create constraint matrix (T x p)
constraint_matrix = df.groupby('time_period')[[
    'constraint_compute',
    'constraint_memory',
    'constraint_dataset_size'
]].first().values

algorithms = df['algorithm'].unique().tolist()

print(f"Performance matrix shape: {performance_matrix.shape}")
print(f"Constraint matrix shape: {constraint_matrix.shape}")
print(f"\nPerformance matrix:\n{performance_matrix}")

## Part 3: Run Bias Detection

Apply the three detection indicators: PSI, CCS, and ρ_PC.

In [None]:
# Initialize detector
detector = BiasDetector(
    psi_threshold=0.15,
    ccs_threshold=0.85,
    rho_pc_threshold=0.5
)

# Run detection
results = detector.detect_bias(
    performance_matrix=performance_matrix,
    constraint_matrix=constraint_matrix,
    algorithm_names=algorithms
)

# Display results
print("=" * 60)
print("BIAS DETECTION RESULTS")
print("=" * 60)
print(f"\nPSI Score:  {results['psi_score']:.4f} {'⚠️ UNSTABLE' if results['psi_score'] > 0.15 else '✓ Stable'}")
print(f"CCS Score:  {results['ccs_score']:.4f} {'⚠️ INCONSISTENT' if results['ccs_score'] < 0.85 else '✓ Consistent'}")
print(f"ρ_PC Score: {results['rho_pc_score']:+.4f} {'⚠️ DEPENDENT' if abs(results['rho_pc_score']) > 0.5 else '✓ Independent'}")
print(f"\nOverall Bias Detected: {'❌ YES' if results['overall_bias'] else '✅ NO'}")
print(f"Confidence: {results['confidence']:.1%}")
print("=" * 60)

## Part 4: Visualize Results

Create visualizations to understand the detection results.

In [None]:
# Create indicator comparison chart
fig, axes = plt.subplots(1, 3, figsize=(15, 4))

# PSI
axes[0].bar(['PSI'], [results['psi_score']], color='steelblue')
axes[0].axhline(0.15, color='red', linestyle='--', label='Threshold')
axes[0].set_ylabel('Score')
axes[0].set_title('PSI (Parameter Stability)', fontweight='bold')
axes[0].set_ylim(0, max(0.3, results['psi_score'] * 1.2))
axes[0].legend()

# CCS
axes[1].bar(['CCS'], [results['ccs_score']], color='forestgreen')
axes[1].axhline(0.85, color='red', linestyle='--', label='Threshold')
axes[1].set_ylabel('Score')
axes[1].set_title('CCS (Constraint Consistency)', fontweight='bold')
axes[1].set_ylim(0, 1)
axes[1].legend()

# ρ_PC
color = 'coral' if abs(results['rho_pc_score']) > 0.5 else 'steelblue'
axes[2].bar(['ρ_PC'], [results['rho_pc_score']], color=color)
axes[2].axhline(0.5, color='red', linestyle='--', alpha=0.5)
axes[2].axhline(-0.5, color='red', linestyle='--', alpha=0.5, label='Threshold')
axes[2].axhline(0, color='black', linestyle='-', alpha=0.3, linewidth=0.8)
axes[2].set_ylabel('Correlation')
axes[2].set_title('ρ_PC (Performance-Constraint)', fontweight='bold')
axes[2].set_ylim(-1, 1)
axes[2].legend()

plt.tight_layout()
plt.show()

## Part 5: Test with Synthetic Biased Data

Generate synthetic data with known bias to test detection accuracy.

In [None]:
# Generate biased scenario
print("Generating synthetic biased data...\n")

perf_biased, const_biased = create_synthetic_data(
    n_time_periods=15,
    n_algorithms=4,
    n_constraints=3,
    bias_intensity=0.7,  # High bias
    random_seed=123
)

# Detect bias
results_biased = detector.detect_bias(
    performance_matrix=perf_biased,
    constraint_matrix=const_biased,
    algorithm_names=[f'Algo_{i+1}' for i in range(4)]
)

print("BIASED SCENARIO RESULTS:")
print(f"PSI: {results_biased['psi_score']:.4f}")
print(f"CCS: {results_biased['ccs_score']:.4f}")
print(f"ρ_PC: {results_biased['rho_pc_score']:+.4f}")
print(f"\nBias Detected: {'✓ YES (Correct!)' if results_biased['overall_bias'] else '✗ NO (Missed!)'}")
print(f"Confidence: {results_biased['confidence']:.1%}")

## Part 6: Batch Analysis

Analyze multiple scenarios with varying bias intensities.

In [None]:
# Test multiple bias levels
bias_levels = [0.0, 0.2, 0.4, 0.6, 0.8, 1.0]
detection_results = []

for bias in bias_levels:
    perf, const = create_synthetic_data(
        n_time_periods=20,
        n_algorithms=4,
        n_constraints=3,
        bias_intensity=bias,
        random_seed=42
    )
    
    result = detector.detect_bias(perf, const)
    detection_results.append({
        'bias_intensity': bias,
        'psi': result['psi_score'],
        'ccs': result['ccs_score'],
        'rho_pc': result['rho_pc_score'],
        'detected': result['overall_bias']
    })

results_df = pd.DataFrame(detection_results)
results_df

In [None]:
# Plot detection curves
fig, ax = plt.subplots(figsize=(10, 6))

ax.plot(results_df['bias_intensity'], results_df['psi'], marker='o', label='PSI', linewidth=2)
ax.plot(results_df['bias_intensity'], results_df['ccs'], marker='s', label='CCS', linewidth=2)
ax.plot(results_df['bias_intensity'], results_df['rho_pc'].abs(), marker='^', label='|ρ_PC|', linewidth=2)

ax.axhline(0.15, color='red', linestyle='--', alpha=0.5, label='PSI Threshold')
ax.axhline(0.85, color='green', linestyle='--', alpha=0.5, label='CCS Threshold')
ax.axhline(0.5, color='orange', linestyle='--', alpha=0.5, label='ρ_PC Threshold')

ax.set_xlabel('True Bias Intensity', fontsize=12)
ax.set_ylabel('Indicator Value', fontsize=12)
ax.set_title('Detection Performance Across Bias Intensities', fontsize=14, fontweight='bold')
ax.legend(loc='best')
ax.grid(True, alpha=0.3)
plt.tight_layout()
plt.show()

# Calculate detection accuracy
accuracy = (results_df['detected'] == (results_df['bias_intensity'] > 0.3)).mean()
print(f"\n✓ Detection Accuracy: {accuracy:.1%}")

## Summary

This notebook demonstrated:
1. Loading and visualizing evaluation data
2. Running the bias detection framework
3. Interpreting the three indicators (PSI, CCS, ρ_PC)
4. Testing with synthetic biased data
5. Batch analysis across bias intensities

For more information:
- **Repository**: https://github.com/hongping-zh/circular-bias-detection
- **Dataset**: https://doi.org/10.5281/zenodo.17201032
- **Documentation**: See README.md