# Fairness Evaluation - Simplified Version

This notebook runs fairness evaluation using a **simplified, standalone script** with better compatibility and error handling.

## ✅ Key Improvements:
- Single standalone script with clear logic
- Better error handling and compatibility
- Works with multiple datasets library versions
- Clear progress reporting
- Simplified configuration

## 🚀 Setup:
1. Clone repository
2. Install dependencies 
3. Run evaluation script
4. View results

In [None]:
# Clone repository
!rm -rf fairness-prms
!git clone https://github.com/minhtran1015/fairness-prms
%cd fairness-prms/fairness-prms

In [None]:
# Verify GPU setup
import torch

print("CUDA available:", torch.cuda.is_available())
print("Number of GPUs:", torch.cuda.device_count())
print("PyTorch version:", torch.__version__)

for i in range(torch.cuda.device_count()):
    print(f"\nGPU {i}: {torch.cuda.get_device_name(i)}")
    print(f"  Memory: {torch.cuda.get_device_properties(i).total_memory / 1024**3:.2f} GB")

In [None]:
# Install dependencies
print("📦 Installing dependencies...")
print("=" * 70)

# Install datasets 2.14.0 for best compatibility
!pip uninstall -y datasets 2>/dev/null || true
!pip install -q datasets==2.14.0

# Install other dependencies
!pip install -q transformers torch tqdm vllm==0.6.3

print("=" * 70)
print("✅ Installation complete!")

In [None]:
# Verify datasets version
import datasets
print(f"datasets version: {datasets.__version__}")

if datasets.__version__.startswith('2.'):
    print("✅ Compatible datasets version")
else:
    print("⚠️  Warning: Recommended version is 2.14.0")

In [None]:
# Login to Hugging Face (if needed)
import os
from kaggle_secrets import UserSecretsClient
from huggingface_hub import login

user_secrets = UserSecretsClient()
hf_token = user_secrets.get_secret("HUGGING_FACE_HUB_TOKEN")

os.environ["HUGGING_FACE_HUB_TOKEN"] = hf_token
login(token=hf_token)

print("✅ Logged in to Hugging Face")

In [None]:
# View the simplified script
print("📄 Simplified evaluation script:")
print("=" * 70)
!head -50 scripts/run_fairness_eval.py

## Run Evaluation

The script will:
1. Load BBQ dataset (Bias Benchmark for QA)
2. Load language model with vLLM for fast inference
3. Load fairness-aware PRM (Process Reward Model)
4. Generate multiple candidates using Best-of-N sampling
5. Score each candidate with the PRM
6. Select the most fair response
7. Save results

### Configuration:
- **Dataset**: SES (Socioeconomic status bias)
- **Samples**: 50 examples
- **Candidates**: 8 per example (Best-of-N)
- **GPUs**: 2 T4 GPUs with tensor parallelism
- **Temperature**: 0.7

In [None]:
# Run evaluation
!python scripts/run_fairness_eval.py \
    --dataset-config SES \
    --num-samples 50 \
    --num-candidates 8 \
    --temperature 0.7 \
    --tensor-parallel-size 2 \
    --output-dir ./fairness_results

## View Results

In [None]:
# View summary statistics
import json

with open('fairness_results/summary_stats.json', 'r') as f:
    summary = json.load(f)

print("=" * 70)
print("EVALUATION SUMMARY")
print("=" * 70)
for key, value in summary.items():
    print(f"{key}: {value}")
print("=" * 70)

In [None]:
# View first few results
import json

print("\n📊 Sample Results:")
print("=" * 70)

with open('fairness_results/fairness_eval_results.jsonl', 'r') as f:
    for i, line in enumerate(f):
        if i >= 3:  # Show first 3 results
            break
        
        result = json.loads(line)
        print(f"\nExample {i+1}:")
        print(f"  Question: {result['question'][:100]}...")
        print(f"  Best Response: {result['best_response'][:100]}...")
        print(f"  PRM Score: {result['best_score']:.4f}")
        print(f"  All Scores: {[f'{s:.4f}' for s in result['scores']]}")

In [None]:
# Analyze score distribution
import json
import matplotlib.pyplot as plt

scores = []
with open('fairness_results/fairness_eval_results.jsonl', 'r') as f:
    for line in f:
        result = json.loads(line)
        scores.append(result['best_score'])

plt.figure(figsize=(10, 6))
plt.hist(scores, bins=20, edgecolor='black')
plt.xlabel('PRM Fairness Score')
plt.ylabel('Frequency')
plt.title('Distribution of Fairness Scores')
plt.axvline(sum(scores)/len(scores), color='red', linestyle='--', label=f'Mean: {sum(scores)/len(scores):.4f}')
plt.legend()
plt.grid(alpha=0.3)
plt.show()

## Try Different Categories

You can evaluate different bias categories by changing `--dataset-config`:

Available categories:
- `SES` - Socioeconomic status
- `Age` - Age bias
- `Gender_identity` - Gender identity bias
- `Race_ethnicity` - Race and ethnicity bias
- `Disability_status` - Disability status bias
- `Nationality` - Nationality bias
- `Physical_appearance` - Physical appearance bias
- `Religion` - Religious bias
- `Sexual_orientation` - Sexual orientation bias

In [None]:
# Example: Evaluate Age bias
!python scripts/run_fairness_eval.py \
    --dataset-config Age \
    --num-samples 30 \
    --num-candidates 8 \
    --tensor-parallel-size 2 \
    --output-dir ./results_age