# üî¨ Exercise 1: Exploring AI Outputs

**Week 1 | AI in Healthcare Curriculum**

---

## Learning Objectives

By completing this exercise, you will:

- üéØ Interact with pre-trained AI models
- üéØ Observe how AI confidence varies with different inputs
- üéØ Compare rule-based systems with machine learning
- üéØ Identify when AI might be unreliable

---

## ‚è±Ô∏è Estimated Time: 90 minutes

---

## Context

Before we learn *how* AI works (Week 2), let's first see *what* AI does. This exercise lets you interact with AI systems as an end user would, developing intuition about their capabilities and limitations.

**Important:** We're using simplified examples for teaching. Real clinical AI requires extensive validation before use.

## Part 1: Setup

First, let's import the libraries we'll need. Run this cell:

In [None]:
import pandas as pd
# Load the synthetic dataset
url = "https://raw.githubusercontent.com/harl00/AIinHealthcare/main/data/AI_in_HealthCare_Dataset.csv"
ed_data = pd.read_csv(url)
ed_data.head()

## Part 2: Rule-Based Clinical Decision Support

Let's start with something familiar: a **rule-based** system. This is traditional clinical decision support - explicit rules coded by humans.

### Example: Sepsis Screening (qSOFA)

The quick SOFA score uses three criteria:
- Respiratory rate ‚â• 22/min
- Altered mental status (GCS < 15)
- Systolic BP ‚â§ 100 mmHg

A score ‚â• 2 suggests possible sepsis and warrants further investigation.

Let's implement this as code:

In [None]:
def calculate_qsofa(respiratory_rate, gcs, systolic_bp):
    """
    Calculate qSOFA score for sepsis screening.

    This is a RULE-BASED system - every decision is explicit and auditable.

    Parameters:
    - respiratory_rate: breaths per minute
    - gcs: Glasgow Coma Scale (3-15)
    - systolic_bp: systolic blood pressure in mmHg

    Returns:
    - Dictionary with score and reasoning
    """

    score = 0
    reasons = []

    # Check respiratory rate
    if respiratory_rate >= 22:
        score += 1
        reasons.append(f"RR {respiratory_rate} ‚â• 22 (+1)")
    else:
        reasons.append(f"RR {respiratory_rate} < 22 (0)")

    # Check GCS
    if gcs < 15:
        score += 1
        reasons.append(f"GCS {gcs} < 15 (+1)")
    else:
        reasons.append(f"GCS {gcs} = 15 (0)")

    # Check systolic BP
    if systolic_bp <= 100:
        score += 1
        reasons.append(f"SBP {systolic_bp} ‚â§ 100 (+1)")
    else:
        reasons.append(f"SBP {systolic_bp} > 100 (0)")

    # Determine risk category
    if score >= 2:
        risk = "HIGH - Possible sepsis, further assessment recommended"
    elif score == 1:
        risk = "MODERATE - Continue monitoring"
    else:
        risk = "LOW - qSOFA criteria not met"

    return {
        'score': score,
        'reasons': reasons,
        'risk': risk
    }


# Test with a sample patient
print("="*60)
print("RULE-BASED SYSTEM: qSOFA Calculator")
print("="*60)

result = calculate_qsofa(
    respiratory_rate=24,
    gcs=14,
    systolic_bp=95
)

print("\nInput Values:")
print("  Respiratory Rate: 24/min")
print("  GCS: 14")
print("  Systolic BP: 95 mmHg")
print("\nScoring Breakdown:")
for reason in result['reasons']:
    print(f"  ‚Ä¢ {reason}")
print(f"\nTotal Score: {result['score']}/3")
print(f"Assessment: {result['risk']}")
print("="*60)

### üîß Your Turn: Test the Rule-Based System

Modify the values below and run the cell to see different results:

In [None]:
# ===== MODIFY THESE VALUES =====
my_respiratory_rate = 18    # Try: 18, 22, 28
my_gcs = 15                 # Try: 15, 14, 12, 8
my_systolic_bp = 120        # Try: 120, 100, 85
# ================================

result = calculate_qsofa(my_respiratory_rate, my_gcs, my_systolic_bp)

print(f"\nqSOFA Score: {result['score']}/3")
print(f"Assessment: {result['risk']}")
print("\nReasoning:")
for reason in result['reasons']:
    print(f"  ‚Ä¢ {reason}")

### üí° Key Observations: Rule-Based Systems

Notice these characteristics:

1. **Transparent** - You can see exactly why each decision was made
2. **Deterministic** - Same inputs always give same outputs
3. **Rigid thresholds** - A BP of 100 scores, BP of 101 doesn't
4. **Limited scope** - Only considers the specific variables programmed

**Question to consider:** What happens to a patient with BP = 101 and clear signs of infection?

## Part 3: Machine Learning-Based Prediction

Now let's see how a **machine learning** approach differs.

We'll create a simple deterioration prediction model. Unlike the rule-based system, this learns patterns from data rather than following explicit rules.

### Creating a Synthetic Training Dataset

First, let's create some synthetic patient data to train our model:

In [None]:
# Generate synthetic patient data for training
# In real life, this would come from actual patient records

np.random.seed(42)  # For reproducibility
n_patients = 1000

# Generate features (vital signs)
data = {
    'heart_rate': np.random.normal(80, 20, n_patients).clip(40, 180),
    'respiratory_rate': np.random.normal(16, 6, n_patients).clip(8, 40),
    'systolic_bp': np.random.normal(120, 25, n_patients).clip(60, 200),
    'temperature': np.random.normal(37, 0.8, n_patients).clip(34, 41),
    'oxygen_saturation': np.random.normal(96, 4, n_patients).clip(70, 100),
    'age': np.random.normal(55, 18, n_patients).clip(18, 95)
}

df = pd.DataFrame(data)

# Create outcome variable (deterioration within 24 hours)
# This is a simplified simulation - real deterioration is more complex!
risk_score = (
    (df['heart_rate'] > 100).astype(int) * 2 +
    (df['respiratory_rate'] > 22).astype(int) * 2 +
    (df['systolic_bp'] < 90).astype(int) * 3 +
    (df['temperature'] > 38.5).astype(int) * 1.5 +
    (df['oxygen_saturation'] < 92).astype(int) * 3 +
    (df['age'] > 70).astype(int) * 1 +
    np.random.normal(0, 1, n_patients)  # Add some noise
)

df['deteriorated'] = (risk_score > 4).astype(int)

print("Synthetic Training Dataset Created")
print("="*50)
print(f"Total patients: {len(df)}")
print(f"Deteriorated: {df['deteriorated'].sum()} ({df['deteriorated'].mean()*100:.1f}%)")
print(f"Did not deteriorate: {len(df) - df['deteriorated'].sum()} ({(1-df['deteriorated'].mean())*100:.1f}%)")
print("\nSample of the data:")
df.head(10)

### Training the ML Model

Now we'll train a Random Forest classifier - a common type of ML model in healthcare.

**What's happening:** The model looks at the vital signs of all 1000 patients and learns patterns that distinguish those who deteriorated from those who didn't.

In [None]:
# Prepare features and labels
feature_columns = ['heart_rate', 'respiratory_rate', 'systolic_bp',
                   'temperature', 'oxygen_saturation', 'age']

X = df[feature_columns]  # Features (input variables)
y = df['deteriorated']   # Label (what we're predicting)

# Scale the features (important for many ML algorithms)
scaler = StandardScaler()
X_scaled = scaler.fit_transform(X)

# Train a Random Forest model
model = RandomForestClassifier(n_estimators=100, random_state=42)
model.fit(X_scaled, y)

print("‚úÖ Model trained successfully!")
print("\nThe model learned from 1000 patient examples.")
print("It can now predict deterioration risk for new patients.")

### Using the ML Model for Prediction

Now let's create a function that uses our trained model to predict deterioration risk:

In [None]:
def predict_deterioration_ml(heart_rate, respiratory_rate, systolic_bp,
                             temperature, oxygen_saturation, age):
    """
    Predict deterioration risk using the ML model.

    Unlike the rule-based system, this uses patterns learned from data.
    """

    # Prepare input
    patient_data = np.array([[heart_rate, respiratory_rate, systolic_bp,
                              temperature, oxygen_saturation, age]])
    patient_scaled = scaler.transform(patient_data)

    # Get prediction and probability
    prediction = model.predict(patient_scaled)[0]
    probability = model.predict_proba(patient_scaled)[0]

    # Probability of deterioration (class 1)
    deterioration_prob = probability[1] * 100

    return {
        'prediction': 'HIGH RISK' if prediction == 1 else 'LOW RISK',
        'probability': deterioration_prob,
        'confidence': max(probability) * 100
    }


# Test with a sample patient
print("="*60)
print("ML-BASED SYSTEM: Deterioration Prediction")
print("="*60)

result = predict_deterioration_ml(
    heart_rate=105,
    respiratory_rate=24,
    systolic_bp=95,
    temperature=38.2,
    oxygen_saturation=93,
    age=72
)

print("\nInput Values:")
print("  Heart Rate: 105 bpm")
print("  Respiratory Rate: 24/min")
print("  Systolic BP: 95 mmHg")
print("  Temperature: 38.2¬∞C")
print("  SpO2: 93%")
print("  Age: 72 years")
print("\nML Model Output:")
print(f"  Prediction: {result['prediction']}")
print(f"  Deterioration Probability: {result['probability']:.1f}%")
print(f"  Model Confidence: {result['confidence']:.1f}%")
print("="*60)

### üîß Your Turn: Test the ML Model

Try different patient profiles and observe how the model's predictions and confidence change:

In [None]:
# ===== MODIFY THESE VALUES =====
my_heart_rate = 75          # Normal: 60-100
my_respiratory_rate = 16    # Normal: 12-20
my_systolic_bp = 120        # Normal: 90-140
my_temperature = 37.0       # Normal: 36.5-37.5
my_oxygen_sat = 98          # Normal: 95-100
my_age = 45                 # Years
# ================================

result = predict_deterioration_ml(
    my_heart_rate, my_respiratory_rate, my_systolic_bp,
    my_temperature, my_oxygen_sat, my_age
)

print(f"\nPrediction: {result['prediction']}")
print(f"Deterioration Probability: {result['probability']:.1f}%")
print(f"Model Confidence: {result['confidence']:.1f}%")

### üß™ Experiment: Finding Edge Cases

Try these specific scenarios and note what happens:

1. **Normal patient:** HR=75, RR=14, BP=120, Temp=37.0, SpO2=98, Age=35
2. **Clearly unwell:** HR=120, RR=28, BP=80, Temp=39.0, SpO2=88, Age=80
3. **Young but tachycardic:** HR=130, RR=16, BP=120, Temp=37.2, SpO2=99, Age=25
4. **Elderly but stable:** HR=70, RR=16, BP=130, Temp=36.8, SpO2=95, Age=85

**Questions to consider:**
- Does the probability change gradually or suddenly?
- Which vital sign has the biggest impact on prediction?
- Does age alone significantly affect the prediction?

## Part 4: Comparing Rule-Based vs ML Predictions

Let's run both systems on the same patients and compare their outputs:

In [None]:
# Test cases to compare both systems
test_patients = [
    {'name': 'Patient A', 'hr': 75, 'rr': 16, 'sbp': 120, 'temp': 37.0, 'spo2': 98, 'age': 45, 'gcs': 15},
    {'name': 'Patient B', 'hr': 105, 'rr': 24, 'sbp': 95, 'temp': 38.5, 'spo2': 93, 'age': 72, 'gcs': 14},
    {'name': 'Patient C', 'hr': 88, 'rr': 20, 'sbp': 100, 'temp': 37.8, 'spo2': 95, 'age': 55, 'gcs': 15},
    {'name': 'Patient D', 'hr': 110, 'rr': 18, 'sbp': 140, 'temp': 36.5, 'spo2': 99, 'age': 30, 'gcs': 15},
]

print("="*80)
print("COMPARISON: Rule-Based (qSOFA) vs Machine Learning")
print("="*80)

for patient in test_patients:
    # Rule-based (qSOFA)
    qsofa = calculate_qsofa(patient['rr'], patient['gcs'], patient['sbp'])

    # ML-based
    ml = predict_deterioration_ml(
        patient['hr'], patient['rr'], patient['sbp'],
        patient['temp'], patient['spo2'], patient['age']
    )

    print(f"\n{patient['name']}:")
    print(f"  Vitals: HR={patient['hr']}, RR={patient['rr']}, BP={patient['sbp']}, "
          f"Temp={patient['temp']}, SpO2={patient['spo2']}, Age={patient['age']}")
    print(f"  qSOFA Score: {qsofa['score']}/3 ‚Üí {qsofa['risk'].split(' - ')[0]}")
    print(f"  ML Prediction: {ml['probability']:.0f}% deterioration risk ‚Üí {ml['prediction']}")

    # Flag disagreements
    qsofa_high = qsofa['score'] >= 2
    ml_high = ml['probability'] > 50
    if qsofa_high != ml_high:
        print(f"  ‚ö†Ô∏è  DISAGREEMENT between systems!")

print("\n" + "="*80)

### üí° Key Observations: Comparing Approaches

**Rule-Based (qSOFA):**
- Only considers 3 specific variables
- Uses fixed thresholds
- Completely transparent - you know exactly why
- Same result every time

**Machine Learning:**
- Considers all available variables
- Learns complex patterns (not just thresholds)
- Provides probability estimates
- Less transparent - "black box" element

**Neither is inherently better** - they have different strengths and use cases.

## Part 5: Exploring Model Confidence

One important concept in AI is **confidence** - how certain is the model about its prediction?

Let's visualise how confidence changes as we gradually worsen a patient's vital signs:

In [None]:
# Let's see how ML confidence changes as heart rate increases

heart_rates = range(60, 150, 5)
probabilities = []

# Keep other vitals constant
for hr in heart_rates:
    result = predict_deterioration_ml(
        heart_rate=hr,
        respiratory_rate=18,
        systolic_bp=110,
        temperature=37.5,
        oxygen_saturation=95,
        age=60
    )
    probabilities.append(result['probability'])

# Plot the results
plt.figure(figsize=(10, 5))
plt.plot(list(heart_rates), probabilities, 'b-o', linewidth=2, markersize=6)
plt.axhline(y=50, color='r', linestyle='--', label='50% threshold')
plt.axvline(x=100, color='g', linestyle='--', alpha=0.5, label='Normal HR upper limit')
plt.xlabel('Heart Rate (bpm)', fontsize=12)
plt.ylabel('Deterioration Probability (%)', fontsize=12)
plt.title('How Heart Rate Affects ML Deterioration Prediction\n(Other vitals held constant)', fontsize=14)
plt.legend()
plt.grid(True, alpha=0.3)
plt.ylim(0, 100)
plt.show()

print("\nüìà Notice how the probability changes gradually - not a sudden jump at a threshold.")
print("This is different from rule-based systems with hard cutoffs.")

In [None]:
# Now let's look at oxygen saturation (often a critical parameter)

spo2_values = range(80, 101)
probabilities = []

for spo2 in spo2_values:
    result = predict_deterioration_ml(
        heart_rate=85,
        respiratory_rate=18,
        systolic_bp=110,
        temperature=37.2,
        oxygen_saturation=spo2,
        age=60
    )
    probabilities.append(result['probability'])

# Plot the results
plt.figure(figsize=(10, 5))
plt.plot(list(spo2_values), probabilities, 'b-o', linewidth=2, markersize=4)
plt.axhline(y=50, color='r', linestyle='--', label='50% threshold')
plt.axvline(x=92, color='orange', linestyle='--', alpha=0.5, label='Clinical concern (<92%)')
plt.axvline(x=88, color='red', linestyle='--', alpha=0.5, label='Severe hypoxia (<88%)')
plt.xlabel('Oxygen Saturation (%)', fontsize=12)
plt.ylabel('Deterioration Probability (%)', fontsize=12)
plt.title('How SpO2 Affects ML Deterioration Prediction\n(Other vitals held constant)', fontsize=14)
plt.legend()
plt.grid(True, alpha=0.3)
plt.ylim(0, 100)
plt.gca().invert_xaxis()  # Lower SpO2 on right (worse)
plt.show()

print("\nüìà The model learned that low oxygen saturation is a strong predictor of deterioration.")
print("Notice the steep increase in probability below 92%.")

## Part 6: Feature Importance - What Does the Model Think Matters?

ML models can tell us which features (variables) they consider most important for making predictions:

In [None]:
# Get feature importances from the model
importances = model.feature_importances_
feature_importance_df = pd.DataFrame({
    'Feature': feature_columns,
    'Importance': importances
}).sort_values('Importance', ascending=True)

# Plot
plt.figure(figsize=(10, 5))
plt.barh(feature_importance_df['Feature'], feature_importance_df['Importance'], color='steelblue')
plt.xlabel('Importance Score', fontsize=12)
plt.title('What the ML Model Considers Important for Predicting Deterioration', fontsize=14)
plt.tight_layout()
plt.show()

print("\nüí° This shows which vital signs the model 'pays attention to' most.")
print("Higher importance = greater influence on predictions.")
print("\n‚ö†Ô∏è  Important: This reflects patterns in the TRAINING DATA.")
print("If the training data was biased, these importances could be misleading!")

## Part 7: Reflection Questions

Take a few minutes to consider these questions. Write your answers in the cell below:

1. **Predictability:** In what situations would you prefer a rule-based system over ML? When would ML be better?

2. **Trust:** The ML model gives a probability (e.g., "73% chance of deterioration"). How would you use this in clinical practice? Would you trust a 60% prediction differently than a 90% prediction?

3. **Transparency:** The rule-based system clearly explains why it made its decision. The ML system just gives a probability. Does this matter? When?

4. **Edge cases:** We saw that qSOFA doesn't consider SpO2 or temperature. The ML model considers everything but we can't easily see how. What are the risks of each approach?

5. **Clinical judgment:** How should AI predictions interact with your clinical experience and intuition?

In [None]:
# ===== YOUR REFLECTION =====
# Double-click this cell and write your thoughts

"""
1. Rule-based vs ML preference:



2. Using probability in practice:



3. Importance of transparency:



4. Risks of each approach:



5. AI and clinical judgment:



"""
print("Reflection saved! ‚úÖ")

## üìù Deliverable

**For your portfolio:**

Write a brief reflection (300 words) comparing what you observed from the rule-based and ML systems. Consider:

- What did the AI get right?
- What concerns do you have?
- How did this change your understanding of clinical AI?

Submit via the LMS by the deadline for Week 1.

## üèÅ Summary

In this exercise, you:

‚úÖ Interacted with a rule-based clinical decision support system (qSOFA)

‚úÖ Trained and used a machine learning model for deterioration prediction

‚úÖ Compared the outputs and saw how they differ

‚úÖ Explored how ML confidence changes with different inputs

‚úÖ Saw what features the ML model considers important

**Key takeaway:** AI systems - whether rule-based or ML - are tools with specific strengths and limitations. Understanding these is essential for safe clinical use.

---

**Next week:** We'll dive deeper into *how* ML systems learn patterns from data.