# 🔍 Detect Circular Bias in AI Evaluations with Sleuth

**Author**: Hongping Zhang  
**Tool**: [Sleuth](https://github.com/hongping-zh/circular-bias-detection)  
**Goal**: In 5 minutes, check if your benchmark results suffer from circular reasoning bias (e.g., prompt/hyperparameter tuning until scores look good).

## 📦 Install Sleuth

In [None]:
!pip install circular-bias-detector[cli]

## 🧪 Load Sample Data (LLM Evaluation)

In [None]:
import pandas as pd
df = pd.read_csv('https://raw.githubusercontent.com/hongping-zh/circular-bias-detection/main/data/llm_eval_sample.csv')
df.head()

## 🛠️ Prepare Matrices for Sleuth

In [None]:
from circular_bias_detector import BiasDetector

# Pivot performance by time_period × algorithm
perf_matrix = df.pivot(index='time_period', columns='algorithm', values='performance').values

# Extract constraints (one row per time_period)
constraint_cols = ['constraint_compute', 'constraint_memory', 'max_tokens', 'temperature']
const_matrix = df.groupby('time_period')[constraint_cols].first().values

print("Performance matrix shape:", perf_matrix.shape)
print("Constraint matrix shape:", const_matrix.shape)

## 🕵️ Run Bias Detection (with Bootstrap & Adaptive Thresholds)

In [None]:
detector = BiasDetector(
    enable_bootstrap=True,
    n_bootstrap=1000,
    enable_adaptive_thresholds=True
)

results = detector.detect_bias(
    performance_matrix=perf_matrix,
    constraint_matrix=const_matrix,
    algorithm_names=df['algorithm'].unique().tolist()
)

## 📊 View Results

In [None]:
print(detector.generate_report(results))

In [None]:
# Optional: visualize
from circular_bias_detector.visualization import plot_interactive_dashboard
plot_interactive_dashboard(perf_matrix, const_matrix, results, df['algorithm'].unique().tolist(), save_html='sleuth_dashboard.html')
from IPython.display import IFrame
IFrame(src="sleuth_dashboard.html", width=900, height=600)

## 📥 Try Your Own Data

Upload a CSV with columns: `time_period`, `algorithm`, `performance`, and constraint fields (e.g., `temperature`, `max_tokens`).
Then re-run the cells above!