# AI Bias Diagnostic API - Analysis Workflow

This notebook demonstrates how to use the AI Bias Diagnostic API for comprehensive analysis workflows.

## Contents
1. Setup and Authentication
2. Basic API Operations
3. Data Collection and Analysis
4. Visualization
5. Trend Analysis
6. Custom Reporting

## 1. Setup and Authentication

In [None]:
# Import required libraries
import requests
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
import json
from datetime import datetime
from typing import List, Dict

# Configure plotting
plt.style.use('seaborn-v0_8-darkgrid')
sns.set_palette("husl")
%matplotlib inline

In [None]:
# Configure API settings
BASE_URL = "http://localhost:8000"

# Simple API client class
class BiasAPI:
    def __init__(self, base_url: str = BASE_URL):
        self.base_url = base_url
        self.session = requests.Session()
    
    def create_evaluation(self, ai_system_name: str, heuristic_types: List[str], 
                         iteration_count: int = 50) -> Dict:
        url = f"{self.base_url}/api/evaluations"
        payload = {
            "ai_system_name": ai_system_name,
            "heuristic_types": heuristic_types,
            "iteration_count": iteration_count
        }
        response = self.session.post(url, json=payload)
        response.raise_for_status()
        return response.json()
    
    def execute_evaluation(self, evaluation_id: str) -> Dict:
        url = f"{self.base_url}/api/evaluations/{evaluation_id}/execute"
        response = self.session.post(url)
        response.raise_for_status()
        return response.json()
    
    def get_evaluation(self, evaluation_id: str) -> Dict:
        url = f"{self.base_url}/api/evaluations/{evaluation_id}"
        response = self.session.get(url)
        response.raise_for_status()
        return response.json()
    
    def list_evaluations(self, limit: int = 100, offset: int = 0) -> Dict:
        url = f"{self.base_url}/api/evaluations"
        params = {"limit": limit, "offset": offset}
        response = self.session.get(url, params=params)
        response.raise_for_status()
        return response.json()
    
    def get_heuristics(self, evaluation_id: str) -> List[Dict]:
        url = f"{self.base_url}/api/evaluations/{evaluation_id}/heuristics"
        response = self.session.get(url)
        response.raise_for_status()
        return response.json()
    
    def get_recommendations(self, evaluation_id: str, mode: str = "both") -> Dict:
        url = f"{self.base_url}/api/evaluations/{evaluation_id}/recommendations"
        params = {"mode": mode}
        response = self.session.get(url, params=params)
        response.raise_for_status()
        return response.json()

# Initialize API client
api = BiasAPI()
print("✓ API client initialized")

## 2. Basic API Operations

Let's create a sample evaluation and explore the results.

In [None]:
# Create a new evaluation
evaluation = api.create_evaluation(
    ai_system_name="Sample AI System - Notebook Demo",
    heuristic_types=["anchoring", "loss_aversion", "confirmation_bias"],
    iteration_count=60
)

evaluation_id = evaluation['id']
print(f"Created evaluation: {evaluation_id}")
print(f"System: {evaluation['ai_system_name']}")
print(f"Status: {evaluation['status']}")

In [None]:
# Execute the evaluation
result = api.execute_evaluation(evaluation_id)

print("Evaluation completed!")
print(f"Overall Score: {result['overall_score']:.2f}")
print(f"Zone Status: {result['zone_status']}")

In [None]:
# Get detailed findings
findings = api.get_heuristics(evaluation_id)

# Convert to DataFrame for easier analysis
findings_df = pd.DataFrame(findings)
findings_df.head()

## 3. Data Collection and Analysis

Let's collect data from multiple evaluations for comprehensive analysis.

In [None]:
# Create multiple evaluations for different systems
test_systems = [
    {
        "name": "Customer Service Bot",
        "heuristics": ["anchoring", "availability_heuristic"]
    },
    {
        "name": "Financial Advisor AI",
        "heuristics": ["loss_aversion", "sunk_cost"]
    },
    {
        "name": "Content Moderator",
        "heuristics": ["confirmation_bias", "availability_heuristic"]
    }
]

evaluation_results = []

for system in test_systems:
    # Create evaluation
    eval_data = api.create_evaluation(
        ai_system_name=system['name'],
        heuristic_types=system['heuristics'],
        iteration_count=50
    )
    
    # Execute
    result = api.execute_evaluation(eval_data['id'])
    
    # Store results
    evaluation_results.append({
        'id': result['id'],
        'system_name': result['ai_system_name'],
        'overall_score': result['overall_score'],
        'zone_status': result['zone_status'],
        'heuristic_count': len(system['heuristics'])
    })
    
    print(f"✓ {system['name']}: {result['overall_score']:.2f} ({result['zone_status']})")

# Convert to DataFrame
results_df = pd.DataFrame(evaluation_results)
results_df

In [None]:
# Collect all findings
all_findings = []

for eval_result in evaluation_results:
    findings = api.get_heuristics(eval_result['id'])
    for finding in findings:
        finding['system_name'] = eval_result['system_name']
        all_findings.append(finding)

findings_full_df = pd.DataFrame(all_findings)
print(f"Collected {len(findings_full_df)} findings across {len(evaluation_results)} systems")
findings_full_df.head()

## 4. Visualization

Create visualizations to understand bias patterns.

In [None]:
# Overall scores comparison
plt.figure(figsize=(12, 6))

# Define colors for zones
zone_colors = {
    'green': '#22c55e',
    'yellow': '#eab308',
    'red': '#ef4444'
}

colors = [zone_colors[zone] for zone in results_df['zone_status']]

plt.bar(results_df['system_name'], results_df['overall_score'], color=colors, alpha=0.7)
plt.axhline(y=40, color='green', linestyle='--', label='Green Zone Threshold', alpha=0.5)
plt.axhline(y=60, color='orange', linestyle='--', label='Yellow Zone Threshold', alpha=0.5)

plt.xlabel('AI System')
plt.ylabel('Overall Bias Score')
plt.title('Bias Scores by AI System')
plt.xticks(rotation=45, ha='right')
plt.legend()
plt.tight_layout()
plt.show()

In [None]:
# Heuristic severity comparison
plt.figure(figsize=(14, 6))

pivot_data = findings_full_df.pivot_table(
    values='severity_score',
    index='heuristic_type',
    columns='system_name',
    aggfunc='mean'
)

pivot_data.plot(kind='bar', figsize=(14, 6))
plt.xlabel('Heuristic Type')
plt.ylabel('Severity Score')
plt.title('Heuristic Severity Scores by System')
plt.xticks(rotation=45, ha='right')
plt.legend(title='System', bbox_to_anchor=(1.05, 1), loc='upper left')
plt.tight_layout()
plt.show()

In [None]:
# Severity vs Confidence scatter plot
plt.figure(figsize=(12, 8))

for system in findings_full_df['system_name'].unique():
    system_data = findings_full_df[findings_full_df['system_name'] == system]
    plt.scatter(
        system_data['severity_score'],
        system_data['confidence_level'],
        label=system,
        s=100,
        alpha=0.6
    )

plt.xlabel('Severity Score')
plt.ylabel('Confidence Level')
plt.title('Bias Severity vs Detection Confidence')
plt.legend()
plt.grid(True, alpha=0.3)
plt.tight_layout()
plt.show()

In [None]:
# Heatmap of findings
plt.figure(figsize=(10, 6))

heatmap_data = findings_full_df.pivot_table(
    values='severity_score',
    index='system_name',
    columns='heuristic_type',
    aggfunc='mean',
    fill_value=0
)

sns.heatmap(
    heatmap_data,
    annot=True,
    fmt='.1f',
    cmap='YlOrRd',
    cbar_kws={'label': 'Severity Score'}
)

plt.title('Bias Heatmap: System vs Heuristic Type')
plt.xlabel('Heuristic Type')
plt.ylabel('AI System')
plt.xticks(rotation=45, ha='right')
plt.tight_layout()
plt.show()

## 5. Statistical Analysis

In [None]:
# Summary statistics
print("Overall Score Statistics:")
print("=" * 50)
print(results_df['overall_score'].describe())

print("\n\nSeverity Score Statistics by Heuristic:")
print("=" * 50)
print(findings_full_df.groupby('heuristic_type')['severity_score'].describe())

In [None]:
# Correlation analysis
print("Correlation between Detection Count and Severity:")
correlation = findings_full_df['detection_count'].corr(
    findings_full_df['severity_score']
)
print(f"Correlation coefficient: {correlation:.3f}")

# Plot correlation
plt.figure(figsize=(10, 6))
plt.scatter(
    findings_full_df['detection_count'],
    findings_full_df['severity_score'],
    alpha=0.5
)
plt.xlabel('Detection Count')
plt.ylabel('Severity Score')
plt.title('Detection Count vs Severity Score')
plt.grid(True, alpha=0.3)
plt.tight_layout()
plt.show()

## 6. Recommendations Analysis

In [None]:
# Get recommendations for all evaluations
all_recommendations = []

for eval_result in evaluation_results:
    recs = api.get_recommendations(eval_result['id'], mode="both")
    for rec in recs['recommendations']:
        rec['system_name'] = eval_result['system_name']
        all_recommendations.append(rec)

recs_df = pd.DataFrame(all_recommendations)
print(f"Total recommendations: {len(recs_df)}")
recs_df.head()

In [None]:
# Analyze recommendation priorities
plt.figure(figsize=(12, 6))

priority_counts = recs_df.groupby(['system_name', 'estimated_impact']).size().unstack(fill_value=0)
priority_counts.plot(kind='bar', stacked=True, figsize=(12, 6))

plt.xlabel('AI System')
plt.ylabel('Number of Recommendations')
plt.title('Recommendations by System and Impact Level')
plt.xticks(rotation=45, ha='right')
plt.legend(title='Impact Level')
plt.tight_layout()
plt.show()

In [None]:
# Implementation difficulty analysis
difficulty_impact = recs_df.groupby(['implementation_difficulty', 'estimated_impact']).size().unstack(fill_value=0)

plt.figure(figsize=(10, 6))
difficulty_impact.plot(kind='bar', figsize=(10, 6))
plt.xlabel('Implementation Difficulty')
plt.ylabel('Count')
plt.title('Implementation Difficulty vs Estimated Impact')
plt.legend(title='Impact Level')
plt.tight_layout()
plt.show()

## 7. Custom Report Generation

In [None]:
def generate_summary_report(results_df, findings_df, recs_df):
    """
    Generate a comprehensive summary report.
    """
    report = []
    report.append("=" * 80)
    report.append("AI BIAS DIAGNOSTIC - SUMMARY REPORT")
    report.append("=" * 80)
    report.append(f"\nGenerated: {datetime.now().strftime('%Y-%m-%d %H:%M:%S')}")
    report.append(f"Total Systems Evaluated: {len(results_df)}")
    report.append(f"Total Bias Patterns Detected: {len(findings_df)}")
    report.append(f"Total Recommendations: {len(recs_df)}")
    
    # Zone distribution
    report.append("\n" + "-" * 80)
    report.append("ZONE DISTRIBUTION")
    report.append("-" * 80)
    zone_counts = results_df['zone_status'].value_counts()
    for zone, count in zone_counts.items():
        pct = (count / len(results_df)) * 100
        report.append(f"{zone.upper():10s}: {count:2d} systems ({pct:5.1f}%)")
    
    # Top concerns
    report.append("\n" + "-" * 80)
    report.append("TOP CONCERNS")
    report.append("-" * 80)
    worst_systems = results_df.nlargest(3, 'overall_score')
    for idx, system in worst_systems.iterrows():
        report.append(f"\n• {system['system_name']}")
        report.append(f"  Score: {system['overall_score']:.2f} ({system['zone_status']})")
    
    # Most common biases
    report.append("\n" + "-" * 80)
    report.append("MOST PREVALENT BIASES")
    report.append("-" * 80)
    bias_severity = findings_df.groupby('heuristic_type')['severity_score'].mean().sort_values(ascending=False)
    for heuristic, severity in bias_severity.head(5).items():
        report.append(f"• {heuristic.replace('_', ' ').title():30s}: {severity:5.2f}")
    
    # High-priority recommendations
    report.append("\n" + "-" * 80)
    report.append("HIGH-PRIORITY ACTIONS")
    report.append("-" * 80)
    high_priority = recs_df[
        (recs_df['estimated_impact'] == 'high') & 
        (recs_df['implementation_difficulty'].isin(['easy', 'moderate']))
    ].head(5)
    
    for idx, rec in high_priority.iterrows():
        report.append(f"\n{rec['priority']}. {rec['action_title']}")
        report.append(f"   System: {rec['system_name']}")
        report.append(f"   Impact: {rec['estimated_impact'].upper()} | "
                     f"Difficulty: {rec['implementation_difficulty'].upper()}")
    
    report.append("\n" + "=" * 80)
    
    return "\n".join(report)

# Generate and display report
report = generate_summary_report(results_df, findings_full_df, recs_df)
print(report)

# Save to file
with open('bias_analysis_report.txt', 'w') as f:
    f.write(report)
print("\n✓ Report saved to: bias_analysis_report.txt")

## 8. Export Data for Further Analysis

In [None]:
# Export all data to CSV files
results_df.to_csv('evaluation_results.csv', index=False)
findings_full_df.to_csv('bias_findings.csv', index=False)
recs_df.to_csv('recommendations.csv', index=False)

print("✓ Data exported to CSV files:")
print("  - evaluation_results.csv")
print("  - bias_findings.csv")
print("  - recommendations.csv")

In [None]:
# Export complete analysis to JSON
export_data = {
    "export_timestamp": datetime.now().isoformat(),
    "summary": {
        "total_evaluations": len(results_df),
        "total_findings": len(findings_full_df),
        "total_recommendations": len(recs_df),
        "average_score": float(results_df['overall_score'].mean()),
        "zone_distribution": results_df['zone_status'].value_counts().to_dict()
    },
    "evaluations": results_df.to_dict('records'),
    "findings": findings_full_df.to_dict('records'),
    "recommendations": recs_df.to_dict('records')
}

with open('complete_analysis.json', 'w') as f:
    json.dump(export_data, f, indent=2, default=str)

print("✓ Complete analysis exported to: complete_analysis.json")

## Conclusion

This notebook demonstrated:

1. **Basic API Usage**: Creating and executing evaluations
2. **Data Collection**: Gathering data from multiple evaluations
3. **Visualization**: Creating charts to understand bias patterns
4. **Statistical Analysis**: Analyzing correlations and distributions
5. **Recommendations**: Understanding and prioritizing actions
6. **Reporting**: Generating comprehensive reports
7. **Export**: Saving data for further analysis

### Next Steps

- Use this workflow for regular monitoring of AI systems
- Customize visualizations for your specific needs
- Integrate with your existing data science pipelines
- Build automated reporting dashboards
- Track longitudinal trends over time