# 🔍 Sleuth - Quick Start Guide

**Detect circular bias in AI evaluations in 5 minutes!**

[![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/hongping-zh/circular-bias-detection/blob/main/examples/quickstart_colab.ipynb)
[![GitHub](https://img.shields.io/badge/GitHub-Repository-blue)](https://github.com/hongping-zh/circular-bias-detection)
[![Demo](https://img.shields.io/badge/Web%20App-Try%20Now-brightgreen)](https://hongping-zh.github.io/circular-bias-detection/)

---

## What You'll Learn

- ✅ Install and set up Sleuth in Google Colab
- ✅ Detect circular bias in your evaluation data
- ✅ Interpret the 3 key indicators (PSI, CCS, ρ_PC)
- ✅ Generate publication-ready reports

**No prior knowledge required!**

## Step 1: Installation (30 seconds)

Run this cell to install Sleuth and its dependencies:

In [None]:
%%capture
# Install framework
!pip install numpy pandas scipy matplotlib seaborn scikit-learn

# Clone repository
!git clone https://github.com/hongping-zh/circular-bias-detection.git

# Add to Python path
import sys
sys.path.insert(0, '/content/circular-bias-detection')

print("✅ Installation complete!")

## Step 2: Import Libraries

In [None]:
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
from circular_bias_detector import BiasDetector

# Configure plotting
sns.set_style('whitegrid')
plt.rcParams['figure.figsize'] = (10, 6)

print("✅ Libraries imported successfully!")

## Step 3: Quick Example - LLM Evaluation

Let's analyze a typical scenario: You evaluated 4 LLMs across 5 iterations, tweaking temperature each time.

**Question:** Did the parameter changes bias your results?

In [None]:
# Load sample LLM evaluation data
df = pd.read_csv('/content/circular-bias-detection/data/sample_data.csv')

print("📊 Sample Data (first 10 rows):")
print(df.head(10))
print(f"\n📈 Shape: {df.shape[0]} evaluations")

## Step 4: Run Bias Detection

In [None]:
# Prepare data matrices
algorithms = df['algorithm'].unique()
time_periods = sorted(df['time_period'].unique())

# Performance matrix (T x K)
performance_matrix = df.pivot(
    index='time_period', 
    columns='algorithm', 
    values='performance'
).values

# Constraint matrix (T x p)
constraint_matrix = df.groupby('time_period')[[
    'constraint_compute', 
    'constraint_memory', 
    'constraint_dataset_size'
]].first().values

# Run detection
detector = BiasDetector(
    psi_threshold=0.15,
    ccs_threshold=0.85,
    rho_pc_threshold=0.50
)

results = detector.detect_bias(
    performance_matrix=performance_matrix,
    constraint_matrix=constraint_matrix,
    algorithm_names=list(algorithms),
    enable_bootstrap=True,
    n_bootstrap=1000
)

print("✅ Detection complete!")

## Step 5: View Results

In [None]:
# Display results with color coding
from IPython.display import display, Markdown

def format_result(name, value, threshold, lower_is_better=True, ci_lower=None, ci_upper=None, p_value=None):
    if lower_is_better:
        status = "✅ PASS" if value < threshold else "❌ FAIL"
        color = "green" if value < threshold else "red"
    else:
        status = "✅ PASS" if value > threshold else "❌ FAIL"
        color = "green" if value > threshold else "red"
    
    result = f"**{name}**: <span style='color:{color}; font-size:1.2em'>{value:.4f}</span> {status}\n"
    
    if ci_lower is not None and ci_upper is not None:
        result += f"  - 95% CI: [{ci_lower:.4f}, {ci_upper:.4f}]\n"
    
    if p_value is not None:
        result += f"  - p-value: {p_value:.3f}\n"
    
    result += f"  - Threshold: {threshold}\n"
    
    return result

# Format output
output = "# 🔍 Bias Detection Results\n\n"

if results.get('overall_bias'):
    output += "## 🔴 BIAS DETECTED\n\n"
else:
    output += "## ✅ NO BIAS DETECTED\n\n"

output += "### Indicators:\n\n"

output += format_result(
    "PSI (Parameter Stability Index)",
    results['psi'],
    0.15,
    lower_is_better=True,
    ci_lower=results.get('psi_ci_lower'),
    ci_upper=results.get('psi_ci_upper'),
    p_value=results.get('psi_p_value')
) + "\n"

output += format_result(
    "CCS (Constraint Consistency Score)",
    results['ccs'],
    0.85,
    lower_is_better=False,
    ci_lower=results.get('ccs_ci_lower'),
    ci_upper=results.get('ccs_ci_upper'),
    p_value=results.get('ccs_p_value')
) + "\n"

output += format_result(
    "ρ_PC (Performance-Constraint Correlation)",
    results['rho_pc'],
    0.50,
    lower_is_better=True,
    ci_lower=results.get('rho_pc_ci_lower'),
    ci_upper=results.get('rho_pc_ci_upper'),
    p_value=results.get('rho_pc_p_value')
) + "\n"

output += f"### Interpretation:\n\n{results.get('interpretation', 'N/A')}\n"

display(Markdown(output))

## Step 6: Understanding the Indicators

### 📊 PSI (Parameter Stability Index)
**What it measures:** How much your hyperparameters changed during evaluation
- ✅ **Low PSI (<0.15)**: Parameters stayed consistent → Good!
- ❌ **High PSI (>0.15)**: Parameters changed too much → Risk of bias

### 🎯 CCS (Constraint Consistency Score)
**What it measures:** How consistent your evaluation environment was
- ✅ **High CCS (>0.85)**: Conditions stayed the same → Good!
- ❌ **Low CCS (<0.85)**: Conditions varied → Unfair comparison

### 🔗 ρ_PC (Performance-Constraint Correlation)
**What it measures:** Whether performance improvements came from resource changes
- ✅ **Low ρ_PC (<0.50)**: Performance independent of resources → Good!
- ❌ **High ρ_PC (>0.50)**: Performance linked to resource changes → Suspicious

## Step 7: Visualize Results

In [None]:
# Plot performance over time
fig, axes = plt.subplots(1, 2, figsize=(14, 5))

# Performance trends
for i, algo in enumerate(algorithms):
    axes[0].plot(time_periods, performance_matrix[:, i], marker='o', label=algo, linewidth=2)

axes[0].set_xlabel('Time Period', fontsize=12)
axes[0].set_ylabel('Performance', fontsize=12)
axes[0].set_title('Performance Over Time', fontsize=14, fontweight='bold')
axes[0].legend()
axes[0].grid(True, alpha=0.3)

# Constraint changes
axes[1].plot(time_periods, constraint_matrix[:, 0], marker='s', label='Compute', linewidth=2)
axes[1].plot(time_periods, constraint_matrix[:, 1]*100, marker='^', label='Memory x100', linewidth=2)
axes[1].set_xlabel('Time Period', fontsize=12)
axes[1].set_ylabel('Constraint Value', fontsize=12)
axes[1].set_title('Constraint Changes Over Time', fontsize=14, fontweight='bold')
axes[1].legend()
axes[1].grid(True, alpha=0.3)

plt.tight_layout()
plt.show()

## Step 8: Try Your Own Data

Upload your CSV file with these columns:
- `time_period`: Evaluation iteration (1, 2, 3, ...)
- `algorithm`: Model name
- `performance`: Performance metric (0-1)
- `constraint_compute`, `constraint_memory`, `constraint_dataset_size`: Resource limits

In [None]:
from google.colab import files

# Upload your CSV
uploaded = files.upload()

# Get filename
filename = list(uploaded.keys())[0]

# Load your data
your_data = pd.read_csv(filename)

print("✅ Your data loaded successfully!")
print(your_data.head())

# Now run the detection on your data
# (Copy code from Step 4 and modify variable names)

## Next Steps

### 🚀 Try More Features
- [Advanced Tutorial](https://github.com/hongping-zh/circular-bias-detection/blob/main/examples/demo_notebook.ipynb)
- [Python SDK Documentation](https://github.com/hongping-zh/circular-bias-detection#readme)
- [Web App](https://hongping-zh.github.io/circular-bias-detection/)

### 📚 Learn More
- [Full Dataset on Zenodo](https://doi.org/10.5281/zenodo.17201032)
- [Research Paper](https://github.com/hongping-zh/circular-bias-detection/blob/main/paper.md)
- [Report an Issue](https://github.com/hongping-zh/circular-bias-detection/issues)

### ⭐ Support the Project
If this tool helped you, please:
- Star the [GitHub repository](https://github.com/hongping-zh/circular-bias-detection)
- Share with colleagues
- Cite in your papers

---

**Questions?** Open an issue on GitHub or contact: yujjam@uest.edu.gr