# Bias in AI Suggestions: Why Historical Data Can Be Dangerous

This notebook demonstrates how AI suggestions can perpetuate bias from historical data and why validation is essential.

Understanding this is critical because:
- **AI learns from historical data** which may contain past discrimination
- **AI can perpetuate bias** in its suggestions
- **Bias leads to unfair decisions** and legal problems
- **Human oversight is essential** to identify and prevent bias


## Key Concepts

**Bias in Historical Data**:
- Historical decisions may reflect past discrimination
- Data shows patterns that perpetuate unfairness
- AI learns these patterns and suggests them

**AI Perpetuates Bias**:
- AI suggests constraints based on historical patterns
- If history was biased, suggestions will be biased
- AI doesn't understand fairness - it just finds patterns

**Human Oversight**:
- Humans must review for fairness
- Identify discriminatory patterns
- Reject biased suggestions
- Ensure equal opportunity

**Critical insight**: AI is not neutral. It reflects the biases in its training data. Always review for bias.


## Scenario: Hiring Model Constraints

An AI tool analyzes 10 years of hiring data and suggests constraints for a hiring optimization model. The historical data reflects past bias.


## Step 1: Install Required Packages (Colab)


In [None]:
# Install required packages (if needed in Colab)
%pip install numpy matplotlib pandas -q


## Step 2: Import Libraries


In [None]:
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt

np.random.seed(42)


## Step 3: Historical Data Contains Bias


In [None]:
# Historical hiring data (contains past bias)
# In the past, certain groups were hired less often, even when qualified

historical_data = {
    'Group': ['Group A', 'Group B', 'Group C'],
    'Qualified_Candidates': [1000, 1000, 1000],  # Equal qualifications
    'Hired': [800, 600, 500],  # But different hiring rates (past bias)
    'Hiring_Rate': [0.80, 0.60, 0.50]
}

df_historical = pd.DataFrame(historical_data)
df_historical['Hiring_Rate'] = df_historical['Hired'] / df_historical['Qualified_Candidates']

print("HISTORICAL HIRING DATA (10 years):")
print("=" * 60)
print(df_historical.to_string(index=False))
print("\n‚ö†Ô∏è  Notice: Equal qualifications, but different hiring rates")
print("   This reflects PAST BIAS in hiring decisions")


## Step 4: AI Suggests Constraints Based on Historical Patterns


In [None]:
# AI analyzes historical data and suggests constraints
# AI sees: Group A hired 80%, Group B 60%, Group C 50%
# AI suggests: "Prioritize Group A, limit Group C"

ai_suggestion = {
    'constraint': 'Prioritize candidates from Group A',
    'reasoning': 'Historical data shows Group A has 80% hiring rate vs 50% for Group C',
    'confidence': 'High'
}

print("AI SUGGESTION (Based on Historical Data):")
print("=" * 60)
print(f"Constraint: {ai_suggestion['constraint']}")
print(f"Reasoning: {ai_suggestion['reasoning']}")
print(f"AI Confidence: {ai_suggestion['confidence']}")
print("\n‚ö†Ô∏è  PROBLEM: AI learned from biased historical data")
print("   The suggestion would PERPETUATE past discrimination!")
print("   Group C candidates are equally qualified but would be deprioritized")


## Step 5: Human Review Identifies Bias


In [None]:
# Human review identifies the bias
print("HUMAN REVIEW (Diversity & Inclusion Team):")
print("=" * 60)
print("\n‚ùå BIAS IDENTIFIED:")
print("   - Historical data reflects past discrimination")
print("   - All groups have equal qualifications")
print("   - Different hiring rates indicate unfair past practices")
print("   - AI suggestion would perpetuate this discrimination")
print("\n‚úÖ CORRECTED APPROACH:")
print("   - Reject AI suggestion")
print("   - Use fair hiring principles:")
print("   - Evaluate all qualified candidates equally")
print("   - No prioritization based on group membership")
print("\nüìä FAIR MODEL:")
print("   - Constraint: All qualified candidates evaluated equally")
print("   - No group-based prioritization")
print("   - Ensures equal opportunity")


## Step 6: Visualize the Problem


In [None]:
# Visualize the bias problem
fig, (ax1, ax2) = plt.subplots(1, 2, figsize=(14, 5))

# Left: Historical data (biased)
ax1.bar(df_historical['Group'], df_historical['Hiring_Rate']*100, 
        color=['#FF6B6B', '#FFA07A', '#FFD700'], alpha=0.7, edgecolor='black')
ax1.axhline(100, color='green', linestyle='--', linewidth=2, label='Equal Opportunity (100%)')
ax1.set_ylabel('Hiring Rate (%)', fontsize=12)
ax1.set_title('Historical Data\n(Contains Bias)', fontsize=14, fontweight='bold')
ax1.set_ylim(0, 100)
ax1.legend()
ax1.grid(axis='y', alpha=0.3)

# Right: Fair approach
fair_rates = [100, 100, 100]  # Equal opportunity
ax2.bar(df_historical['Group'], fair_rates, 
        color=['#4ECDC4', '#4ECDC4', '#4ECDC4'], alpha=0.7, edgecolor='black')
ax2.axhline(100, color='green', linestyle='--', linewidth=2, label='Equal Opportunity (100%)')
ax2.set_ylabel('Hiring Rate (%)', fontsize=12)
ax2.set_title('Fair Model\n(Equal Opportunity)', fontsize=14, fontweight='bold')
ax2.set_ylim(0, 100)
ax2.legend()
ax2.grid(axis='y', alpha=0.3)

plt.tight_layout()
plt.show()

print("\nüîç KEY INSIGHT:")
print("   Historical data shows unequal hiring (past bias)")
print("   AI learned this pattern and suggested perpetuating it")
print("   Human review identified the bias and corrected it")
print("   Fair model ensures equal opportunity for all qualified candidates")


## Key Takeaways

1. **AI is not neutral**: It learns from data, and if data contains bias, AI will learn and perpetuate that bias.

2. **Historical data can be dangerous**: Past discrimination shows up as patterns that AI suggests continuing.

3. **Human oversight is essential**: Humans must review AI suggestions for fairness and bias.

4. **Bias leads to harm**: Discriminatory models cause real harm to individuals and violate laws.

5. **Always review for bias**: Check AI suggestions against fair hiring/decision principles before using them.

**This completes Lesson 11 notebooks!** You now understand simulation, visualization, and responsible AI use in prescriptive analytics.
