# Lab 11 - Module 3: Revise & Compare (Stage 3)

**Time:** ~20-25 minutes

## Stage 3: Revision and Improvement

In Module 2, you identified the 2 weakest criteria in the AI's output. Now you'll test whether AI can **improve when given explicit feedback**.

### Learning Objectives

- Use rubric feedback to guide AI revision
- Evaluate whether explicit criteria help AI improve
- Identify trade-offs in revision (gains vs. new problems)
- Compare original vs. revised quality systematically
- Understand AI as a revision tool (not an authority)

### What You'll Do

1. Review the 2 weakest criteria from Module 2
2. Create a targeted revision prompt
3. Get the AI to revise its work
4. Score the revised version using the same rubric
5. Compare: Did it improve? Where? Did new problems appear?
6. Make an overall judgment: original or revised?

## Setup: Import Libraries

In [None]:
import numpy as np
import pandas as pd
import ipywidgets as widgets
from IPython.display import display, clear_output, HTML, Markdown

print("âœ“ Libraries loaded!")

## Step 1: Regenerate Your Scenario

Enter your group code to regenerate your unique scenario and review your original prompt.

In [None]:
# Prompt generation system
def generate_group_scenario(group_code):
    """
    Generates a deterministic creative prompt based on group code.
    8 prompt families Ã— 5-6 variants each = unique scenarios for each group.
    """
    np.random.seed(group_code)
    
    # 8 Prompt Families
    families = [
        'Science Explainer',
        'Public Service Announcement',
        'Museum Exhibit Panel',
        'Product Pitch/Campaign',
        'Infographic Structure',
        'Short Narrative',
        'Educational Analogy',
        'Debate Position Statement'
    ]
    
    # Select family (deterministic based on group code)
    family_idx = group_code % 8
    family = families[family_idx]
    
    # Rubric mapping
    rubric_map = {
        'Science Explainer': 'General Communication',
        'Public Service Announcement': 'Persuasion/Campaign',
        'Museum Exhibit Panel': 'General Communication',
        'Product Pitch/Campaign': 'Persuasion/Campaign',
        'Infographic Structure': 'General Communication',
        'Short Narrative': 'Creative/Narrative',
        'Educational Analogy': 'General Communication',
        'Debate Position Statement': 'Persuasion/Campaign'
    }
    
    # Family-specific prompt generation
    if family == 'Science Explainer':
        topics = ['quantum entanglement', 'CRISPR gene editing', 'dark matter', 
                  'neural plasticity', 'photosynthesis']
        audiences = ['middle school students', 'first-year college students', 'general public']
        tones = ['serious and formal', 'playful and engaging', 'inspiring and aspirational']
        
        topic = np.random.choice(topics)
        audience = np.random.choice(audiences)
        tone = np.random.choice(tones)
        
        prompt = f"""Write a 300-word explanation of {topic} for {audience}.

Requirements:
- Use a {tone} tone
- Avoid unnecessary jargon
- Include at least one concrete example or analogy
- Make it engaging and accessible"""
        
        variants = {'topic': topic, 'audience': audience, 'tone': tone}
    
    elif family == 'Public Service Announcement':
        issues = ['water conservation', 'mental health awareness', 'digital privacy', 
                  'food waste reduction', 'voter registration']
        audiences = ['young adults (18-25)', 'seniors (65+)', 'parents of young children', 'general public']
        constraints = ['include a specific statistic', 'include a clear call to action', 
                       'address a common misconception']
        urgencies = ['moderate', 'high']
        
        issue = np.random.choice(issues)
        audience = np.random.choice(audiences)
        constraint = np.random.choice(constraints)
        urgency = np.random.choice(urgencies)
        
        prompt = f"""Create a Public Service Announcement about {issue} targeting {audience}.

Requirements:
- Urgency level: {urgency}
- Length: 250-350 words
- Must {constraint}
- Be persuasive but not preachy"""
        
        variants = {'issue': issue, 'audience': audience, 'urgency': urgency, 'constraint': constraint}
    
    elif family == 'Museum Exhibit Panel':
        topics = ['Apollo 11 moon landing', 'invention of the printing press', 
                  'discovery of DNA structure', 'fall of the Berlin Wall', 
                  'development of the internet']
        audience_levels = ['general public', 'high school students', 'history enthusiasts']
        tones = ['formal and academic', 'accessible and engaging', 'narrative storytelling']
        
        topic = np.random.choice(topics)
        audience = np.random.choice(audience_levels)
        tone = np.random.choice(tones)
        
        prompt = f"""Write a museum exhibit panel about the {topic} for {audience}.

Requirements:
- Length: 300-400 words
- Tone: {tone}
- Include historical context and significance
- Make it informative yet engaging"""
        
        variants = {'topic': topic, 'audience': audience, 'tone': tone}
    
    elif family == 'Product Pitch/Campaign':
        products = ['reusable water bottle with built-in filter', 'productivity app for students',
                    'sustainable fashion clothing line', 'meal-prep kit service', 
                    'noise-canceling headphones for remote work']
        markets = ['college students', 'busy professionals', 'environmentally conscious consumers', 
                   'health-focused individuals']
        benefits = ['environmental impact', 'cost savings', 'convenience', 'health benefits', 'quality/durability']
        tones = ['aspirational and lifestyle-focused', 'practical and problem-solving', 
                 'humorous and memorable']
        
        product = np.random.choice(products)
        market = np.random.choice(markets)
        benefit = np.random.choice(benefits)
        tone = np.random.choice(tones)
        
        prompt = f"""Create a campaign outline for a {product} targeting {market}.

Requirements:
- Length: 300-400 words
- Primary benefit to emphasize: {benefit}
- Tone: {tone}
- Include key messaging and target audience insights"""
        
        variants = {'product': product, 'market': market, 'benefit': benefit, 'tone': tone}
    
    elif family == 'Infographic Structure':
        topics = ['renewable energy sources comparison', 'stages of sleep and their functions',
                  'water cycle explained', 'history of social media platforms', 
                  'nutrition basics: macro vs micronutrients']
        formats = ['timeline', 'comparison chart', 'process flow', 'hierarchical breakdown']
        audiences = ['students', 'policymakers', 'general public', 'professionals in the field']
        messages = ['emphasize cost-effectiveness', 'highlight environmental impact', 
                    'focus on practical applications', 'stress health benefits']
        
        topic = np.random.choice(topics)
        format_type = np.random.choice(formats)
        audience = np.random.choice(audiences)
        message = np.random.choice(messages)
        
        prompt = f"""Outline an infographic about {topic} using a {format_type} format for {audience}.

Requirements:
- Describe the visual structure and organization
- Key message to {message}
- Length: 250-350 words
- Focus on clear, scannable information hierarchy"""
        
        variants = {'topic': topic, 'format': format_type, 'audience': audience, 'message': message}
    
    elif family == 'Short Narrative':
        settings = ['space station orbiting Mars', 'lighthouse during a storm', 
                    'abandoned subway tunnel', 'research lab in Antarctica', 
                    'small cafÃ© in a foreign city']
        tones = ['hopeful and uplifting', 'tense and suspenseful', 'mysterious and contemplative', 
                 'bittersweet and nostalgic']
        constraints = ['include a metaphor about light or darkness', 
                       'avoid clichÃ©s about fate or destiny',
                       'include dialogue that reveals character',
                       'use sensory details (sound, smell, texture)']
        
        setting = np.random.choice(settings)
        tone = np.random.choice(tones)
        constraint = np.random.choice(constraints)
        
        prompt = f"""Write a 400-500 word short story set in a {setting}.

Requirements:
- Tone: {tone}
- Must {constraint}
- Create a complete narrative arc (beginning, middle, end)
- Show, don't just tell"""
        
        variants = {'setting': setting, 'tone': tone, 'constraint': constraint}
    
    elif family == 'Educational Analogy':
        concepts = ['machine learning', 'blockchain technology', 'quantum computing', 
                    'climate feedback loops', 'compound interest']
        analogy_domains = ['cooking', 'sports', 'gardening', 'everyday household objects', 
                           'transportation systems']
        audiences = ['non-technical managers', 'high school students', 'curious beginners', 
                     'professionals from another field']
        
        concept = np.random.choice(concepts)
        domain = np.random.choice(analogy_domains)
        audience = np.random.choice(audiences)
        
        prompt = f"""Explain {concept} using an analogy from {domain} for {audience}.

Requirements:
- Length: 300-400 words
- Make the analogy clear and accurate
- Explain how the analogy maps to the concept
- Avoid oversimplification that loses key insights"""
        
        variants = {'concept': concept, 'domain': domain, 'audience': audience}
    
    else:  # Debate Position Statement
        topics = ['universal basic income', 'social media age verification requirements',
                  'mandatory voting', 'AI regulation in creative industries', 
                  'genetic modification of food crops']
        positions = ['FOR (supporting)', 'AGAINST (opposing)']
        audiences = ['academic/scholarly', 'general public', 'policymakers', 'industry professionals']
        
        topic = np.random.choice(topics)
        position = np.random.choice(positions)
        audience = np.random.choice(audiences)
        
        prompt = f"""Write a position statement {position} {topic} for an {audience} audience.

Requirements:
- Length: 350-450 words
- Present clear arguments with supporting evidence
- Acknowledge and address counterarguments
- Maintain logical structure and persuasive tone"""
        
        variants = {'topic': topic, 'position': position, 'audience': audience}
    
    default_rubric = rubric_map[family]
    
    return {
        'group_code': group_code,
        'family': family,
        'prompt': prompt,
        'default_rubric': default_rubric,
        'variants': variants
    }

# Enter group code
group_code = int(input("Enter your group code: "))
scenario = generate_group_scenario(group_code)

print(f"âœ“ Regenerated scenario for Group {group_code}")
print(f"\nPrompt Family: {scenario['family']}")
print(f"Your prompt:\n{scenario['prompt']}")

## ðŸ“‹ Recording Instructions

Refer to your Module 2 scores on your answer sheet.
Record revised scores and comparison notes on your answer sheet.

## Step 2: Review Your Module 2 Findings

Before creating your revision prompt, review:
- Your original AI output (from Module 1)
- Your scores from Module 2 (recorded on your answer sheet)
- The 2 weakest criteria you identified

You'll use this information to create a targeted revision prompt.

## Step 3: Create a Revision Prompt

You'll ask the AI to revise its work, focusing specifically on the 2 weakest criteria.

### Effective Revision Prompts:

**Good approach:**
- Be specific about what to improve
- Reference the rubric criteria by name
- Give concrete examples of what's missing

**Less effective:**
- "Make it better"
- "Fix the problems"
- Vague, non-actionable feedback

In [None]:
# Template for revision prompt
print("SUGGESTED REVISION PROMPT TEMPLATE")
print("="*70)
print("(Fill in the [brackets] with specific feedback)")
print()
print("""Please revise your previous response to improve these two specific areas:

1. **[First Weakest Criterion]**: [Explain specifically what needs improvement based on the rubric]

2. **[Second Weakest Criterion]**: [Explain specifically what needs improvement based on the rubric]

Keep the same length (~same word count) and maintain the strengths of your original response, but focus on addressing these two weaknesses.

Provide the complete revised version.""")
print()
print("="*70)
print("\nWrite your revision prompt, then give it to the AI.")
print("Record the AI's revised output on your answer sheet.")

## Step 4: Get the Revised Version from AI

### Instructions:

1. Go back to your AI conversation
2. Copy your revision prompt
3. Paste it into the AI
4. Wait for the complete revised response
5. Record the ENTIRE revised output on your answer sheet
6. Note any changes the AI mentions making

## Step 5: Score the Revised Version

Now evaluate the **revised** output using the **same rubric** you used in Module 2.

Record your revised scores on your answer sheet.

For each criterion:
- Give a score (1-5)
- Note whether it improved from your original score
- Justify your score

## Step 6: Compare Original vs. Revised

Create a comparison table on your answer sheet:

| Criterion | Original Score | Revised Score | Change | Notes |
|-----------|---------------|---------------|--------|-------|
| [Criterion 1] | X/5 | Y/5 | +/- | Did it improve? |
| [Criterion 2] | X/5 | Y/5 | +/- | Did it improve? |
| ... | ... | ... | ... | ... |

Calculate:
- Total original score
- Total revised score
- Overall change

Note:
- Which criteria improved?
- Which criteria got worse (if any)?
- Did the targeted weaknesses improve?

## Step 7: Analysis Questions

Answer on your answer sheet:

1. **New Problems**: Did the revision introduce any NEW problems? Did improving one area make another area worse?

2. **Overall Judgment**: Overall, is the revised version better than the original? Why or why not? Consider ALL criteria, not just the targeted ones.
   - [ ] Original is better
   - [ ] Revised is better
   - [ ] About the same
   - [ ] Mixed (some better, some worse)

3. **Further Revision**: If you had to revise AGAIN, what would you focus on? What's still weak?

## Step 8 (Optional): Ask AI to Score the Revision

If time permits, you can ask the AI to score its **revised** version and see if its self-evaluation improved.

Record on your answer sheet:
- AI's self-evaluation of the revision
- Was AI more/less accurate in evaluating the revision compared to the original?

## Module 3 Questions

Answer these on your Lab 11 Answer Sheet.

### Q12: Revision Prompt

What were the 2 weakest criteria? What revision prompt did you use? Was it specific and actionable?

*(Answer on your answer sheet)*

### Q13: Improvement Analysis

Did the revised version improve on the targeted criteria? Show the comparison (original score â†’ revised score for each targeted criterion).

*(Answer on your answer sheet with comparison table)*

### Q14: New Problems

Did the revision introduce any NEW problems? Did improving one area make another area worse?

*(Answer on your answer sheet)*

### Q15: Overall Judgment

Overall, is the revised version better than the original? Why or why not? Consider ALL criteria, not just the targeted ones.

*(Answer on your answer sheet)*

### Q16 (Optional): AI Self-Evaluation Accuracy

If you asked AI to score its revision, was it more or less accurate than its original self-evaluation? Did it learn from the feedback?

*(Answer on your answer sheet)*

### Q17: Further Revision

If you had to revise AGAIN, what would you focus on? What's still weak?

*(Answer on your answer sheet)*

## Summary: What You've Accomplished

âœ“ Identified the 2 weakest criteria from Module 2

âœ“ Created a targeted revision prompt

âœ“ Got AI to revise its work

âœ“ Scored the revised version using the same rubric

âœ“ Compared original vs. revised systematically

âœ“ Identified improvements, regressions, and trade-offs

âœ“ Made an overall judgment about quality

### Key Insights

You've likely discovered:

1. **Explicit feedback helps**: When you give clear, rubric-based guidance, AI can often improve

2. **Trade-offs exist**: Improving one aspect sometimes makes another worse

3. **Iteration has limits**: AI revision is useful but not perfect

4. **Human judgment remains essential**: You, not the AI, must decide if the revision is actually better

### The Bigger Picture

AI is a **collaborator**, not an **authority**:
- âœ“ Good at generating drafts
- âœ“ Good at revising when given specific feedback
- âœ— Poor at evaluating its own quality
- âœ— Cannot reliably decide what's "good"

**Humans must provide the judgment, standards, and oversight.**

### What's Next?

In **Module 4 (Reflection)**, you will:
- Synthesize your findings from all 3 stages
- Read case studies about AI evaluation failures
- Articulate when AI is a good collaborator vs. when humans must lead
- Develop principles for responsible human-AI collaboration