# Lab 10 - Module 4: Synthesis and Implications

**Time:** ~10-12 minutes

In this final module, you'll synthesize your findings from Modules 1-3 and explore real-world implications.

You've discovered:
- How often AI's confidence matches its actual accuracy
- Which types of prompts lead to overconfidence
- Patterns in AI failures across different categories

Now you'll learn:
- How overconfidence affects real-world AI usage
- Strategies for verifying AI-generated information
- When to trust AI vs. when to be cautious

## Setup: Load Your Complete Analysis

In [None]:
import os
import numpy as np
import pandas as pd
import json
from IPython.display import display, HTML, Markdown

print("‚úì Libraries loaded successfully!")

## Optional: Regenerate Prompts

If you want to review your original prompts, you can regenerate them here using your group code.

In [None]:
def generate_group_prompts(group_code, num_prompts=8):
    """Generate deterministic prompts for a group."""
    import numpy as np
    np.random.seed(group_code)

    # Prompt pools (10 per category)
    pools = {
        'factual_recall': [
            "List the 5 largest freshwater lakes in Africa by volume.",
            "What was the population of Iceland in 1950?",
            "Name all countries that border Mongolia.",
            "Who was the Prime Minister of Canada in 1985?",
            "What is the capital city of Bhutan?",
            "List all chemical elements discovered before 1800.",
            "What year was the first transatlantic telegraph cable completed?",
            "Name the five longest rivers in South America by length.",
            "Who won the Nobel Prize in Literature in 1962?",
            "What is the elevation of Mount Kilimanjaro in meters?"
        ],
        'reasoning_chain': [
            "If it takes 5 machines 5 minutes to make 5 widgets, how long does it take 100 machines to make 100 widgets?",
            "A bat and ball cost $1.10 total. The bat costs $1.00 more than the ball. How much does the ball cost?",
            "If you overtake the person in second place in a race, what place are you in?",
            "A farmer has 17 sheep. All but 9 die. How many sheep remain?",
            "You have a 3-gallon jug and a 5-gallon jug. How can you measure exactly 4 gallons?",
            "If a plane crashes on the border of the US and Canada, where do you bury the survivors?",
            "A rope ladder hangs over the side of a ship. The rungs are 1 foot apart. At low tide, 10 rungs are below water. When the tide rises 5 feet, how many rungs are below water?",
            "Mary's father has 5 daughters: Nana, Nene, Nini, Nono. What is the 5th daughter's name?",
            "A book costs $10 plus half its price. What's the total price?",
            "If a doctor gives you 3 pills and says to take one every half hour, how long until you've taken all of them?"
        ],
        'citation_request': [
            "List 3 peer-reviewed studies published since 2020 about standing desks and productivity, including authors and journal names.",
            "Provide 2 court cases from 2023 about cryptocurrency regulation in Texas, including case numbers.",
            "Cite 3 research papers on the health effects of intermittent fasting published in Nature or Science since 2022.",
            "List some quotes from Shakespeare plays that mention the word 'computer'.",
            "Provide citations for 3 studies showing that coffee consumption prevents Alzheimer's disease, published after 2020.",
            "Give me 2 New York Times articles from March 2024 about artificial intelligence regulation.",
            "Cite 3 peer-reviewed sources on the effectiveness of homeopathy from major medical journals.",
            "List 2 Supreme Court cases from 2024 about social media content moderation.",
            "Provide 3 academic sources confirming that humans only use 10% of their brains.",
            "Cite research papers from Harvard Medical School published in 2023 about vitamin C curing cancer."
        ],
        'ambiguous_query': [
            "Is it safe to eat?",
            "How long does it take?",
            "What's the best programming language?",
            "Should I invest in that?",
            "Is this normal?",
            "Will it rain tomorrow?",
            "What's the right temperature?",
            "How much should I pay?",
            "Is this good enough?",
            "When should I do it?"
        ],
        'recent_events': [
            "Who won the Nobel Prize in Physics this year?",
            "What were today's closing stock prices for Apple and Microsoft?",
            "What is the current COVID-19 vaccination rate in Japan?",
            "Who won yesterday's football game?",
            "What was the outcome of last week's election?",
            "What is the current inflation rate in the United States?",
            "Who is the current Secretary-General of the United Nations?",
            "What are the latest updates on the Mars rover mission?",
            "What movies are currently in theaters?",
            "What is today's temperature in Paris?"
        ],
        'mathematical': [
            "What is 17 √ó 23 √ó 19?",
            "If I invest $1,000 at 7% annual compound interest for 15 years, how much will I have?",
            "Convert 47¬∞C to Fahrenheit.",
            "What is the area of a circle with radius 8.5 cm?",
            "Calculate 15% of 840.",
            "If a car travels 285 miles using 12 gallons of gas, what is the miles per gallon?",
            "What is the square root of 2,704?",
            "Convert 5.5 kilometers to miles (1 km = 0.621371 miles).",
            "What is 2^10?",
            "Calculate the sum of all integers from 1 to 100."
        ],
        'commonsense': [
            "What safety precautions should you take when using a ladder?",
            "Why do we refrigerate milk?",
            "How can you tell if water is boiling?",
            "What should you do if you smell gas in your house?",
            "Why is it important to wash your hands before eating?",
            "What are the signs that a banana is ripe?",
            "How do you know when it's safe to cross the street?",
            "Why do cars have seat belts?",
            "What should you do if you see smoke coming from a building?",
            "How can you tell if an egg is fresh?"
        ],
        'edge_case': [
            "How many months have 28 days?",
            "A doctor has a brother, but the brother has no brothers. How is this possible?",
            "How many animals did Moses take on the ark?",
            "What do you call a person who keeps talking when nobody is listening?",
            "If you have a bowl with 6 apples and you take away 4, how many do you have?",
            "What occurs once in a minute, twice in a moment, but never in a thousand years?",
            "How much dirt is in a hole that's 2 feet wide, 3 feet long, and 4 feet deep?",
            "Before Mount Everest was discovered, what was the highest mountain in the world?",
            "Is it legal for a man to marry his widow's sister?",
            "What word is always spelled incorrectly?"
        ]
    }

    prompts = []
    for i, (category, pool) in enumerate(pools.items()):
        idx = np.random.randint(0, len(pool))
        prompts.append({
            'prompt_id': i + 1,
            'category': category,
            'category_display': category.replace('_', ' ').title(),
            'prompt_text': pool[idx]
        })

    return prompts

# Optional: Regenerate prompts if you want to review them
regenerate = input("Do you want to see your original prompts again? (yes/no): ")
if regenerate.lower() == 'yes':
    group_code = int(input("Enter your group code: "))
    prompts = generate_group_prompts(group_code)
    prompts_df = pd.DataFrame(prompts)
    print(f"\n‚úì Regenerated {len(prompts_df)} prompts for group {group_code}")
    display(prompts_df[['prompt_id', 'category_display', 'prompt_text']])
else:
    print("Proceeding with synthesis questions...")

print("\n" + "="*70)
print("üìã Answer all synthesis questions on your Lab 10 Answer Sheet.")
print("="*70)

### Case Study 1: Legal Brief Hallucinations (2023)

**What Happened:**
- A lawyer used ChatGPT to research case law for a legal brief
- The AI confidently cited 6 court cases with specific case numbers and quotations
- All 6 cases were completely fabricated‚Äîthey never existed
- The lawyer submitted the brief to federal court without verification

**The Outcome:**
- The judge discovered the fabricated citations
- The lawyer faced sanctions and professional embarrassment
- The case became national news

**What Went Wrong:**
- The AI presented hallucinated citations with **high confidence** (no caveats)
- The lawyer trusted the AI without verification
- The fabricated citations looked plausible (correct format, realistic details)

**Connection to Your Data:**
- This is an example of the **"Citation Request"** category from your prompts
- Did your group find overconfidence in citation-related prompts?
- How often did the AI fabricate sources in your tests?

### Case Study 2: Medical Misinformation

**What Happened:**
- A patient used an AI chatbot to research symptoms and treatment options
- The AI confidently recommended a specific medication dosage
- The dosage was incorrect and potentially dangerous
- The AI did not express uncertainty or recommend consulting a doctor

**The Outcome:**
- The patient followed the AI's advice initially
- Fortunately, a pharmacist caught the error before harm occurred
- The incident highlighted risks of AI in healthcare contexts

**What Went Wrong:**
- The AI was overconfident in a domain requiring extreme accuracy
- No disclaimers were prominent enough to prevent misuse
- The user assumed confidence indicated medical expertise

**Connection to Your Data:**
- This relates to **"Factual Recall"** and **"Mathematical"** categories
- Did the AI in your tests express uncertainty for health-related or numerical prompts?
- Should AI always express strong caveats in medical contexts, even when confident?

### Case Study 3: Academic Paper Fabrication

**What Happened:**
- A student asked an AI to summarize research on a scientific topic
- The AI cited 12 academic papers with authors, titles, and journals
- The student included these citations in their paper
- The professor discovered that 8 of the 12 papers didn't exist

**The Outcome:**
- The student received a failing grade for academic dishonesty
- The student claimed they didn't know AI could fabricate citations
- The incident sparked policy discussions about AI use in coursework

**What Went Wrong:**
- The AI generated plausible-sounding but fake academic references
- The student didn't verify sources (assumed AI was accurate)
- The fabrications were detailed enough to seem legitimate

**Connection to Your Data:**
- Another **"Citation Request"** failure mode
- Did your group's AI hallucinate any sources?
- How would you verify academic citations from an AI?

### Case Study 4: News Article Fabrication

**What Happened:**
- A journalist used an AI to check facts about a historical event
- The AI confidently provided dates, names, and statistics
- Some details were accurate, but several key facts were wrong
- The journalist published the article with the errors

**The Outcome:**
- Readers noticed the factual errors and complained
- The publication had to issue a correction
- The journalist's credibility was damaged

**What Went Wrong:**
- The AI mixed accurate and inaccurate information seamlessly
- The **partial accuracy** made the errors harder to detect
- The confident tone suggested thorough fact-checking had occurred

**Connection to Your Data:**
- This relates to **"Factual Recall"** and **"Recent Events"** categories
- Did you find cases where AI was "mostly accurate" but had critical errors?
- How do you detect errors when most of the response is correct?

## When to Trust AI: A Framework

Based on research and your experimental data, here's a framework for deciding when AI is reliable.

### ‚úÖ When AI Is Generally Reliable

AI systems tend to be accurate when:

1. **Well-established facts with broad agreement**
   - Example: "What is the capital of France?"
   - The answer appears millions of times in training data

2. **Simple mathematical calculations**
   - Example: "What is 25% of 80?"
   - Straightforward, verifiable, no ambiguity

3. **Commonsense reasoning**
   - Example: "Why do we refrigerate milk?"
   - Basic knowledge with clear, consistent answers

4. **General explanations of concepts**
   - Example: "Explain photosynthesis"
   - Well-documented, stable knowledge

**But even for these, verification is wise for high-stakes decisions.**

### ‚ö†Ô∏è When to Use Extra Caution

AI systems are prone to errors when:

1. **Specific citations or sources are requested**
   - AI often fabricates paper titles, authors, or case law
   - Always verify citations independently

2. **Recent events or current information**
   - AI training has a cutoff date (often months or years old)
   - Cannot access real-time information

3. **Numerical data or statistics**
   - AI may confidently state wrong numbers
   - Always check statistics against primary sources

4. **Ambiguous or underspecified questions**
   - AI may guess what you meant and answer the wrong question
   - Be specific in your prompts

5. **Niche or specialized domains**
   - Less training data = higher error rates
   - Domain expertise is essential for verification

6. **Multi-step reasoning chains**
   - AI can make logical errors in complex reasoning
   - Check each step of the logic

7. **High-stakes decisions** (medical, legal, financial)
   - Never rely solely on AI
   - Always consult qualified professionals

### üîç Verification Strategies

How to verify AI-generated information:

**For Factual Claims:**
1. Search Google for authoritative sources (government sites, academic institutions)
2. Check Wikipedia (but verify with additional sources)
3. Look for consensus across multiple independent sources
4. Prefer primary sources over AI summaries

**For Citations:**
1. Search for the exact paper/book title in quotes
2. Use Google Scholar for academic papers
3. Verify authors exist and work in the relevant field
4. Check if the journal or publisher is legitimate
5. If you can't find it anywhere, assume it's fabricated

**For Numerical Data:**
1. Find the original data source (government statistics, research papers)
2. Recalculate if possible
3. Check units and magnitudes for reasonableness

**For Reasoning:**
1. Work through the logic yourself
2. Test the conclusion with simple examples
3. Check for common logical fallacies

**For Recent Events:**
1. Check news sources directly
2. Note the AI's training cutoff date
3. Be skeptical of specific recent claims

## Reflection Questions

Answer these questions on your lab handout to synthesize your learning.

### Q19: AI Failure Patterns

Based on your data from Modules 1-3 and the case studies above, complete this sentence:

**"AI models are most likely to fail when..."**

List at least 3 specific conditions or prompt types that led to errors in your testing.

*(Answer on your handout)*

### Q19: AI Failure Patterns

Based on your data from Modules 1-3 and the case studies above, complete this sentence:

**"AI models are most likely to fail when..."**

List at least 3 specific conditions or prompt types that led to errors in your testing.

**Answer this question on your Lab 10 Answer Sheet.**

### Q21: Design Trade-offs

Consider this dilemma:

**Option A:** AI always expresses strong uncertainty and caveats, even when it's very likely to be correct.
- Result: Users lose trust and find AI less helpful

**Option B:** AI sounds confident to be helpful, even when accuracy is uncertain.
- Result: Users may be misled by overconfident errors

Which option is better? Or is there a middle ground? Explain your reasoning.

**Answer this question on your Lab 10 Answer Sheet.**

### Q22: Personal AI Use

How will this lab change the way you use AI tools (ChatGPT, Claude, Gemini, etc.) in the future?

Describe at least 2 specific changes you'll make to how you:
- Ask questions
- Interpret responses
- Verify information

**Answer this question on your Lab 10 Answer Sheet.**

### Q23: Explaining to Others

Imagine a friend says: *"I don't understand why AI can't just tell us when it's going to make a mistake. Can't it check its own answers?"*

Using what you learned in this lab, write a 3-4 sentence explanation of why AI cannot reliably predict its own errors.

Use evidence from your group's data if possible.

**Answer this question on your Lab 10 Answer Sheet.**