# Lab 10 - Module 4: Synthesis and Implications

**Time:** ~10-12 minutes

In this final module, you'll synthesize your findings from Modules 1-3 and explore real-world implications.

You've discovered:
- How often AI's confidence matches its actual accuracy
- Which types of prompts lead to overconfidence
- Patterns in AI failures across different categories

Now you'll learn:
- How overconfidence affects real-world AI usage
- Strategies for verifying AI-generated information
- When to trust AI vs. when to be cautious

## Setup: Load Your Complete Analysis

In [None]:
import os
import numpy as np
import pandas as pd
import json
from IPython.display import display, HTML, Markdown

print("‚úì Libraries loaded successfully!")

# Google Drive Setup

**IMPORTANT:** This module loads data from your Google Drive (saved in previous modules).

Run the cell below to mount Google Drive. You may need to authorize access again.

In [None]:
from google.colab import drive
print("Mounting Google Drive...")
drive.mount('/content/drive')
LAB_DIR = '/content/drive/MyDrive/DATA1010/Lab10'
print("Google Drive mounted successfully!")
print(f"Lab directory: {LAB_DIR}")

## Load Your Group's Data

In [None]:
group_code = int(input("Enter your group code: "))

# Load complete analysis
try:
    complete_df = pd.read_csv(f"{LAB_DIR}/lab10_group_{group_code}_complete.csv")
    
    with open(f"{LAB_DIR}/lab10_group_{group_code}_summary.json", 'r') as f:
        summary = json.load(f)
    
    print(f"‚úì Loaded complete analysis for group {group_code}")
    print()
    print("=" * 60)
    print("YOUR GROUP'S KEY FINDINGS (from Module 3)")
    print("=" * 60)
    print(f"Total prompts tested: {summary['total_prompts']}")
    print(f"Accurate responses: {summary['accurate_count']}/{summary['total_prompts']} ({summary['accuracy_rate']:.1f}%)")
    print(f"Confident responses: {summary['confident_count']}/{summary['total_prompts']}")
    print(f"Overconfident responses: {summary['overconfident_count']} ({summary['overconfidence_rate']:.1f}% of confident)")
    print(f"Well-calibrated responses: {summary['well_calibrated_count']}")
    print()
    print(f"Category with most errors: {summary['worst_category']} ({summary['worst_category_error_rate']:.1f}% error rate)")
    print("=" * 60)
    
except FileNotFoundError as e:
    print(f"‚ùå ERROR: Could not find analysis files")
    print("Please run Module 3 first to generate the complete analysis.")

## Real-World Consequences of AI Overconfidence

The patterns you observed in your data have real-world consequences. Below are four documented cases where AI overconfidence led to serious problems.

### Case Study 1: Legal Brief Hallucinations (2023)

**What Happened:**
- A lawyer used ChatGPT to research case law for a legal brief
- The AI confidently cited 6 court cases with specific case numbers and quotations
- All 6 cases were completely fabricated‚Äîthey never existed
- The lawyer submitted the brief to federal court without verification

**The Outcome:**
- The judge discovered the fabricated citations
- The lawyer faced sanctions and professional embarrassment
- The case became national news

**What Went Wrong:**
- The AI presented hallucinated citations with **high confidence** (no caveats)
- The lawyer trusted the AI without verification
- The fabricated citations looked plausible (correct format, realistic details)

**Connection to Your Data:**
- This is an example of the **"Citation Request"** category from your prompts
- Did your group find overconfidence in citation-related prompts?
- How often did the AI fabricate sources in your tests?

### Case Study 2: Medical Misinformation

**What Happened:**
- A patient used an AI chatbot to research symptoms and treatment options
- The AI confidently recommended a specific medication dosage
- The dosage was incorrect and potentially dangerous
- The AI did not express uncertainty or recommend consulting a doctor

**The Outcome:**
- The patient followed the AI's advice initially
- Fortunately, a pharmacist caught the error before harm occurred
- The incident highlighted risks of AI in healthcare contexts

**What Went Wrong:**
- The AI was overconfident in a domain requiring extreme accuracy
- No disclaimers were prominent enough to prevent misuse
- The user assumed confidence indicated medical expertise

**Connection to Your Data:**
- This relates to **"Factual Recall"** and **"Mathematical"** categories
- Did the AI in your tests express uncertainty for health-related or numerical prompts?
- Should AI always express strong caveats in medical contexts, even when confident?

### Case Study 3: Academic Paper Fabrication

**What Happened:**
- A student asked an AI to summarize research on a scientific topic
- The AI cited 12 academic papers with authors, titles, and journals
- The student included these citations in their paper
- The professor discovered that 8 of the 12 papers didn't exist

**The Outcome:**
- The student received a failing grade for academic dishonesty
- The student claimed they didn't know AI could fabricate citations
- The incident sparked policy discussions about AI use in coursework

**What Went Wrong:**
- The AI generated plausible-sounding but fake academic references
- The student didn't verify sources (assumed AI was accurate)
- The fabrications were detailed enough to seem legitimate

**Connection to Your Data:**
- Another **"Citation Request"** failure mode
- Did your group's AI hallucinate any sources?
- How would you verify academic citations from an AI?

### Case Study 4: News Article Fabrication

**What Happened:**
- A journalist used an AI to check facts about a historical event
- The AI confidently provided dates, names, and statistics
- Some details were accurate, but several key facts were wrong
- The journalist published the article with the errors

**The Outcome:**
- Readers noticed the factual errors and complained
- The publication had to issue a correction
- The journalist's credibility was damaged

**What Went Wrong:**
- The AI mixed accurate and inaccurate information seamlessly
- The **partial accuracy** made the errors harder to detect
- The confident tone suggested thorough fact-checking had occurred

**Connection to Your Data:**
- This relates to **"Factual Recall"** and **"Recent Events"** categories
- Did you find cases where AI was "mostly accurate" but had critical errors?
- How do you detect errors when most of the response is correct?

## When to Trust AI: A Framework

Based on research and your experimental data, here's a framework for deciding when AI is reliable.

### ‚úÖ When AI Is Generally Reliable

AI systems tend to be accurate when:

1. **Well-established facts with broad agreement**
   - Example: "What is the capital of France?"
   - The answer appears millions of times in training data

2. **Simple mathematical calculations**
   - Example: "What is 25% of 80?"
   - Straightforward, verifiable, no ambiguity

3. **Commonsense reasoning**
   - Example: "Why do we refrigerate milk?"
   - Basic knowledge with clear, consistent answers

4. **General explanations of concepts**
   - Example: "Explain photosynthesis"
   - Well-documented, stable knowledge

**But even for these, verification is wise for high-stakes decisions.**

### ‚ö†Ô∏è When to Use Extra Caution

AI systems are prone to errors when:

1. **Specific citations or sources are requested**
   - AI often fabricates paper titles, authors, or case law
   - Always verify citations independently

2. **Recent events or current information**
   - AI training has a cutoff date (often months or years old)
   - Cannot access real-time information

3. **Numerical data or statistics**
   - AI may confidently state wrong numbers
   - Always check statistics against primary sources

4. **Ambiguous or underspecified questions**
   - AI may guess what you meant and answer the wrong question
   - Be specific in your prompts

5. **Niche or specialized domains**
   - Less training data = higher error rates
   - Domain expertise is essential for verification

6. **Multi-step reasoning chains**
   - AI can make logical errors in complex reasoning
   - Check each step of the logic

7. **High-stakes decisions** (medical, legal, financial)
   - Never rely solely on AI
   - Always consult qualified professionals

### üîç Verification Strategies

How to verify AI-generated information:

**For Factual Claims:**
1. Search Google for authoritative sources (government sites, academic institutions)
2. Check Wikipedia (but verify with additional sources)
3. Look for consensus across multiple independent sources
4. Prefer primary sources over AI summaries

**For Citations:**
1. Search for the exact paper/book title in quotes
2. Use Google Scholar for academic papers
3. Verify authors exist and work in the relevant field
4. Check if the journal or publisher is legitimate
5. If you can't find it anywhere, assume it's fabricated

**For Numerical Data:**
1. Find the original data source (government statistics, research papers)
2. Recalculate if possible
3. Check units and magnitudes for reasonableness

**For Reasoning:**
1. Work through the logic yourself
2. Test the conclusion with simple examples
3. Check for common logical fallacies

**For Recent Events:**
1. Check news sources directly
2. Note the AI's training cutoff date
3. Be skeptical of specific recent claims

## Reflection Questions

Answer these questions on your lab handout to synthesize your learning.

### Q19: AI Failure Patterns

Based on your data from Modules 1-3 and the case studies above, complete this sentence:

**"AI models are most likely to fail when..."**

List at least 3 specific conditions or prompt types that led to errors in your testing.

*(Answer on your handout)*

### Q20: Verification Strategies

You're writing a research paper and used AI to find sources on climate change impacts.

The AI provided 5 citations to scientific papers. What specific steps would you take to verify these citations before including them in your paper?

List at least 3 concrete verification steps.

*(Answer on your handout)*

### Q21: Design Trade-offs

Consider this dilemma:

**Option A:** AI always expresses strong uncertainty and caveats, even when it's very likely to be correct.
- Result: Users lose trust and find AI less helpful

**Option B:** AI sounds confident to be helpful, even when accuracy is uncertain.
- Result: Users may be misled by overconfident errors

Which option is better? Or is there a middle ground? Explain your reasoning.

*(Answer on your handout)*

### Q22: Personal AI Use

How will this lab change the way you use AI tools (ChatGPT, Claude, Gemini, etc.) in the future?

Describe at least 2 specific changes you'll make to how you:
- Ask questions
- Interpret responses
- Verify information

*(Answer on your handout)*

### Q23: Explaining to Others

Imagine a friend says: *"I don't understand why AI can't just tell us when it's going to make a mistake. Can't it check its own answers?"*

Using what you learned in this lab, write a 3-4 sentence explanation of why AI cannot reliably predict its own errors.

Use evidence from your group's data if possible.

*(Answer on your handout)*

## Summary: Key Takeaways from Lab 10

You've completed an investigation into AI self-assessment and discovered several critical insights:

### üéØ Core Findings

1. **AI Confidence ‚â† AI Accuracy**
   - AI can sound very confident while being completely wrong
   - Tone and caveats are unreliable indicators of correctness

2. **Systematic Weaknesses Exist**
   - Citation requests often trigger hallucinations
   - Ambiguous questions lead to confident but potentially wrong answers
   - Recent events exceed training data boundaries

3. **Overconfidence is Common**
   - In your data, you likely found cases where AI was confident but wrong
   - This pattern appears across different models and contexts

4. **AI Cannot Reliably Self-Assess**
   - AI cannot consistently predict when it will make mistakes
   - Meta-predictions about AI behavior are themselves unreliable

### ‚úÖ Practical Guidelines for AI Use

**Always Verify When:**
- Stakes are high (academic, professional, medical, legal)
- Specific sources or citations are provided
- Numerical data or statistics are given
- Information relates to recent events

**Trust But Verify:**
- Use multiple independent sources
- Check original sources, not summaries
- Apply domain knowledge to spot implausible claims

**Use AI Effectively:**
- Great for brainstorming and initial research
- Excellent for explaining concepts
- Useful for generating ideas and drafts
- **But always verify critical information**

### üî¨ Connection to Data Science

This lab connects to broader themes in DATA 1010:

- **Model Evaluation:** AI systems are evaluated on accuracy, but calibration (confidence matching accuracy) is equally important
- **Uncertainty Quantification:** Good models should express appropriate uncertainty
- **Human-AI Collaboration:** AI is a tool that requires human judgment and verification
- **Responsible AI:** Understanding limitations is essential for ethical deployment

### üåç Real-World Impact

As AI becomes more prevalent:
- Critical thinking and verification skills become more valuable
- Understanding AI limitations protects against misinformation
- Responsible AI use requires active engagement, not passive acceptance

**You now have the tools to use AI effectively and responsibly.**

## Congratulations! üéâ

You've completed Lab 10: AI Self-Assessment and the Hallucination Boundary.

### What You've Accomplished:

‚úì Generated and tested 8 diverse prompts on a real AI system

‚úì Recorded AI confidence levels and made accuracy predictions

‚úì Verified actual accuracy through independent research

‚úì Visualized patterns of overconfidence in your data

‚úì Connected findings to real-world AI failures

‚úì Developed strategies for responsible AI use

### Final Checklist:

- [ ] Completed all questions (Q1-Q23) on your handout
- [ ] Saved all CSV files from Modules 1-3
- [ ] Reviewed your group's summary statistics
- [ ] Compared findings with other groups (if time permits)
- [ ] Reflected on how this changes your AI usage

### Remember:

**AI is a powerful tool, but it requires human judgment, verification, and critical thinking to use effectively.**

Thank you for your careful work in this lab!