# Lab 10 - Module 1: Collecting AI Self-Predictions

**Time:** ~15-20 minutes

In this module, you'll test your 8 prompts on a real AI model and record:
- How confident the AI sounds (tone, caveats, hedging)
- Your prediction of whether the response is actually accurate
- Specific phrases the AI uses to express certainty or uncertainty

**Important:** You will NOT verify accuracy yet‚Äîthat happens in Module 2. For now, just observe and record.

## Setup: Import Libraries and Load Prompts

In [None]:
import numpy as np
import pandas as pd
import ipywidgets as widgets
from IPython.display import display, clear_output, HTML, Markdown
import json
import os

print("‚úì Libraries loaded successfully!")

# üìÅ Google Drive Setup

**IMPORTANT:** This module loads data from your Google Drive (saved in Module 0).

Run the cell below to mount Google Drive. You may need to authorize access again.

In [None]:
# Mount Google Drive
from google.colab import drive

print("Mounting Google Drive...")
drive.mount('/content/drive')

# Set lab directory
LAB_DIR = '/content/drive/MyDrive/DATA1010/Lab10'

print("‚úì Google Drive mounted successfully!")
print(f"‚úì Lab directory: {LAB_DIR}")

## Enter Your Group Code

Use the **same group code** from Module 0 to load your prompts.

In [None]:
group_code = int(input("Enter your group code: "))

# Load prompts from Module 0 (from Google Drive)
filename = f"{LAB_DIR}/lab10_group_{group_code}_prompts.csv"

try:
    prompts_df = pd.read_csv(filename)
    print(f"‚úì Loaded {len(prompts_df)} prompts from Google Drive")
    print(f"‚úì Group Code: {group_code}")
except FileNotFoundError:
    print(f"‚ùå ERROR: Could not find {filename}")
    print("‚ùå Make sure you:")
    print("   1. Ran Module 0 first")
    print("   2. Used the same group code")
    print("   3. Mounted Google Drive (run the cell above)")

## Testing Instructions

### How to Test Each Prompt

For each of the 8 prompts, follow these steps:

1. **Open a fresh AI chat window**
   - Use ChatGPT, Claude, Gemini, or another LLM
   - Open a **private/incognito window** for each prompt
   - This ensures no context from previous prompts

2. **Copy and paste the prompt exactly**
   - Don't modify it or add context
   - Just paste the prompt text and send

3. **Read the AI's complete response**
   - Pay attention to tone (confident, cautious, uncertain)
   - Note any caveats ("I might be wrong", "Please verify")
   - Look for hedging language ("typically", "generally", "often")

4. **Record your observations using the widgets below**
   - AI's confidence level (dropdown)
   - Your prediction of accuracy (dropdown)
   - Specific phrases (text area)

5. **Close the window and move to the next prompt**

### What to Look For

**High confidence signals:**
- "Definitely", "certainly", "without a doubt"
- No caveats or warnings
- Specific details (numbers, dates, names)
- Authoritative tone

**Uncertainty signals:**
- "I might be wrong", "I'm not certain"
- "You should verify this", "Please double-check"
- "This may vary", "typically", "generally"
- Refusal to answer

### Time Budget

Spend about **2 minutes per prompt**:
- 30 seconds: Open window and paste prompt
- 30 seconds: Read response
- 1 minute: Record observations

**Total: ~16 minutes for all 8 prompts**

## Data Collection Interface

Run this cell to start the interactive data collection process.

You'll see each prompt one at a time with dropdowns and text areas to record your observations.

In [None]:
# Initialize data storage
predictions_data = []

# Confidence level options
confidence_options = [
    'Select...',
    'No caveats - answered confidently',
    'Mild caveats (e.g., "This might...", "Generally...")',
    'Strong caveats (e.g., "I may be wrong", "Please verify")',
    'Refused or heavily qualified the answer'
]

# Student prediction options
prediction_options = [
    'Select...',
    'Will be accurate',
    'Might have minor errors',
    'Likely to have major errors',
    'Completely failed/refused'
]

def create_prompt_interface(idx, row):
    """Create widgets for a single prompt."""
    
    # Display prompt information
    print("="*70)
    print(f"PROMPT #{row['prompt_id']} of {len(prompts_df)}")
    print(f"Category: {row['category_display']}")
    print("="*70)
    print()
    print("COPY THIS PROMPT TO YOUR AI:")
    print("-"*70)
    print(row['prompt_text'])
    print("-"*70)
    print()
    print("After pasting into AI and reading the response, record your observations below:")
    print()
    
    # Create widgets
    confidence_widget = widgets.Dropdown(
        options=confidence_options,
        value='Select...',
        description='AI Confidence:',
        style={'description_width': 'initial'},
        layout={'width': '650px'}
    )
    
    prediction_widget = widgets.Dropdown(
        options=prediction_options,
        value='Select...',
        description='Your Prediction:',
        style={'description_width': 'initial'},
        layout={'width': '650px'}
    )
    
    notes_widget = widgets.Textarea(
        value='',
        placeholder='Optional: Record specific phrases the AI used (e.g., "I am confident that...", "This might vary...", etc.)',
        description='Notes/Quotes:',
        style={'description_width': 'initial'},
        layout={'width': '650px', 'height': '100px'}
    )
    
    save_button = widgets.Button(
        description=f'Save and Continue to Prompt #{row["prompt_id"] + 1}' if idx < len(prompts_df) - 1 else 'Save Final Prompt',
        button_style='success',
        layout={'width': '300px'}
    )
    
    output = widgets.Output()
    
    def on_save(b):
        with output:
            clear_output()
            
            # Validate inputs
            if confidence_widget.value == 'Select...' or prediction_widget.value == 'Select...':
                print("‚ö†Ô∏è Please select values for both dropdowns before saving.")
                return
            
            # Save data
            predictions_data.append({
                'prompt_id': row['prompt_id'],
                'category': row['category'],
                'ai_confidence': confidence_widget.value,
                'student_prediction': prediction_widget.value,
                'notes': notes_widget.value
            })
            
            print(f"‚úì Prompt #{row['prompt_id']} saved!")
            print(f"Progress: {len(predictions_data)}/{len(prompts_df)} prompts completed")
            
            if idx < len(prompts_df) - 1:
                print(f"\n‚Üí Scroll down for Prompt #{row['prompt_id'] + 1}")
            else:
                print("\n‚úì All prompts completed!")
                print("‚Üí Scroll down to save your data.")
    
    save_button.on_click(on_save)
    
    # Display widgets
    display(confidence_widget)
    display(prediction_widget)
    display(notes_widget)
    display(save_button)
    display(output)
    print()
    print()

# Display interface for each prompt
for idx, row in prompts_df.iterrows():
    create_prompt_interface(idx, row)

## Save Your Predictions

After completing all 8 prompts above, run this cell to save your data.

In [None]:
if len(predictions_data) < len(prompts_df):
    print(f"‚ö†Ô∏è WARNING: You've only completed {len(predictions_data)}/{len(prompts_df)} prompts.")
    print("Please complete all prompts above before saving.")
else:
    # Create DataFrame
    predictions_df = pd.DataFrame(predictions_data)
    
    # Save to Google Drive
    predictions_filename = f"{LAB_DIR}/lab10_group_{group_code}_predictions.csv"
    predictions_df.to_csv(predictions_filename, index=False)
    
    # Display summary
    print("="*70)
    print("‚úì Data Saved Successfully to Google Drive!")
    print("="*70)
    print(f"File: {predictions_filename}")
    print(f"Prompts completed: {len(predictions_df)}")
    print()
    print("Summary of AI Confidence Levels:")
    print(predictions_df['ai_confidence'].value_counts())
    print()
    print("Summary of Your Predictions:")
    print(predictions_df['student_prediction'].value_counts())
    print("="*70)
    print()
    print("Next Steps:")
    print("1. Answer Q4-Q7 on your lab handout")
    print("2. Continue to Module 2 to verify actual accuracy")
    print("3. Use the same group code in Module 2!")
    print("="*70)

## Preview Your Data

Optional: View the data you just collected.

In [None]:
if len(predictions_data) == len(prompts_df):
    display(HTML(predictions_df.to_html(index=False, classes='table table-striped')))
else:
    print("Complete all prompts and save data first.")

## Questions for Module 1

Answer these questions on your lab handout using the data you just collected.

### Q4: Uncertainty Language in Prompt #1

Looking at Prompt #1, did the AI express any uncertainty or caveats? Quote specific phrases from the response.

*(Answer on your handout)*

### Q5: Refusals and Strong Uncertainty

For which prompt(s) did the AI refuse to answer or express strong uncertainty? List the prompt ID numbers and categories.

*(Answer on your handout)*

### Q6: Variation in Confidence

Did the AI use similar language for all prompts, or did confidence levels vary across different categories? Give specific examples.

*(Answer on your handout)*

### Q7: Predicting Overconfidence

PREDICTION: Looking at the 8 responses you collected, for which prompts do you think the AI's self-assessment will be accurate? Which ones do you suspect might show overconfidence (confident tone but actually wrong)?

*(Answer on your handout)*

## Next Steps

1. **Answer Q4-Q7** on your lab handout
2. **Remember your group code:** (write it down again!)
3. **Continue to Module 2** where you'll verify the actual accuracy of each response
4. **Use the same group code** in Module 2

In Module 2, you'll discover whether the AI's confidence matched reality!