# Lab 2: Prompt Chains & Testing Workflows

In this hands-on lab, you'll move beyond single prompts and learn to build **multi-step prompt workflows** — sequences of prompts where the output of one step feeds into the next. You'll also learn to systematically test and compare prompts using A/B testing techniques.

## Learning Objectives
- Understand and debug multi-step prompt chains
- Design prompt workflows using sequential, fan-out, and iterative patterns
- Critically evaluate biased A/B test designs
- Architect complex multi-step pipelines for real business tasks

**Duration:** 55–65 minutes | **Difficulty:** Intermediate

**What You Need:** Access to an AI tool (ChatGPT, Claude, Copilot, or similar)

---

## Part 1: Understanding Prompt Chains

A **prompt chain** breaks a complex task into discrete steps where the output of one step becomes the input for the next. This mirrors how humans tackle large problems: research first, then outline, then draft, then polish.

### Example: Article Writing Chain

Here's a 3-step chain for writing an article about AI in healthcare:

**Step 1 — Extract Topics:**
> Extract the 5 most important topics from the following subject: AI in healthcare

*Output:*
> 1. Current adoption rates and growth trajectory
> 2. Primary use cases in clinical settings
> 3. Regulatory landscape and compliance
> 4. Cost-benefit analysis for hospital systems
> 5. Patient outcome improvements backed by studies

**Step 2 — Create Outline** (uses Step 1's output as input):
> Create a detailed article outline organized around these topics:
> [paste Step 1 output]

*Output:*
> I. Introduction — The AI revolution in healthcare
>    A. Hook: startling statistic on diagnostic errors
>    B. Thesis: AI is transforming clinical outcomes
> II. Adoption Landscape
>    A. Current rates: 38% of hospitals using some form of AI
>    B. Growth: projected 45% CAGR through 2028
> III. Clinical Use Cases
> *(and so on...)*

**Step 3 — Write Introduction** (uses Step 2's output as input):
> Write a compelling 150-word introduction for an article with this outline:
> [paste Step 2 output]

*Output:*
> Every year, an estimated 12 million Americans receive a misdiagnosis. For hospital administrators, artificial intelligence offers a compelling answer...

### Why Chains Work
- Each step has a **focused, manageable scope**
- The AI produces better output when given specific, narrow tasks
- You can **inspect and fix** intermediate results before they cascade
- You can **reuse steps** across different workflows

---

## Workflow Patterns

Real-world prompt workflows use three common patterns:

### 1. Sequential Pattern
Steps run one after another. Each step's output feeds the next.

```
[Step 1: Research] → [Step 2: Outline] → [Step 3: Draft] → [Step 4: Polish]
```

**Best for:** Article writing, report generation, email drafting

### 2. Fan-Out Pattern
The same prompt is applied to multiple inputs independently. Results are collected.

```
[Input A] → [Same Prompt] → [Result A]
[Input B] → [Same Prompt] → [Result B]
[Input C] → [Same Prompt] → [Result C]
```

**Best for:** Batch analysis, processing multiple documents, scoring candidates

### 3. Iterative Pattern
The same prompt is applied repeatedly to progressively refine output.

```
[Draft v1] → [Refine Prompt] → [Draft v2] → [Refine Prompt] → [Draft v3]
```

**Best for:** Editing, polishing, improving quality

### Combining Patterns

Real workflows often combine patterns. For example, a hiring pipeline might:
1. **Fan-out:** Score each candidate on 3 dimensions
2. **Sequential:** Rank → Select finalists → Generate interview questions → Write recommendation memo

---

## Exercise 1: Debug the Broken Chain (10 minutes)

Below is a 4-step prompt chain for generating a product launch email for **CloudSync Pro** (an AI-powered cloud storage product). Each step's prompt and output are shown, but **3 of the 4 steps have problems** that cascade through the chain.

**Your task:**
1. Read each step's prompt and output
2. Identify the bug in Steps 1, 2, and 3
3. Write corrected prompts for the broken steps
4. Run your corrected chain in your AI tool (step by step, pasting each output into the next step)
5. Paste the final email output

---

### The Broken Chain

**Product:** CloudSync Pro — AI-powered cloud storage with smart file organization, 2TB capacity, real-time collaboration, end-to-end encryption, and intelligent search.

**Step 1: Identify Features**

*Prompt:* "List the 5 most compelling features of this product: CloudSync Pro — AI-powered cloud storage with smart file organization, 2TB capacity, real-time collaboration, end-to-end encryption, and intelligent search."

*Output:*
> Key Features of SmartFit Pro Fitness Tracker:
> 1. Heart rate monitoring with medical-grade sensors
> 2. 7-day battery life
> 3. Water resistant to 50m
> 4. Sleep tracking with REM analysis
> 5. GPS route mapping for outdoor workouts

**Bug in Step 1:** _______________________________________________

**Your corrected prompt for Step 1:**

> _______________________________________________

**Step 2: Prioritize Features** (receives Step 1's output)

*Prompt:* "Rank these features by customer impact, highest first: [Step 1 output]"

*Output:*
> Prioritized Features (by importance):
> 1. [LOW] Water resistance — nice-to-have
> 2. [LOW] GPS mapping — niche appeal
> 3. [MED] Sleep tracking — growing interest
> 4. [HIGH] Battery life — key differentiator
> 5. [HIGH] AI-powered insights — unique selling point

**Bug in Step 2:** _______________________________________________

**Your corrected prompt for Step 2:**

> _______________________________________________

**Step 3: Draft Email** (receives Step 2's output)

*Prompt:* "Write a launch announcement email leading with the top features: [Step 2 output]"

*Output:*
> Subject: Exciting News!
> 
> Hi there,
> 
> We have something new for you. Our product is great and you should buy it. It has many features that you will love. It is better than the competition in every way.
> 
> Thanks,
> The Team

**Bug in Step 3:** _______________________________________________

**Your corrected prompt for Step 3:**

> _______________________________________________

**Step 4: Add CTA** (receives Step 3's output) — *This step works correctly*

*Prompt:* "Add a compelling call-to-action to this email draft: [Step 3 output]"

### Run Your Fixed Chain

Now run your corrected Steps 1–3 in your AI tool (one at a time, pasting each output as input to the next step), then run Step 4.

**Paste your final email here:**

> _______________________________________________
> _______________________________________________
> _______________________________________________

---

## Part 2: Prompt Testing & A/B Comparison

Professional prompt engineers don't just write prompts and hope for the best. They **test** prompts systematically by:

1. **Defining test cases** — specific inputs with expected outputs
2. **Running both prompts** against the same inputs
3. **Scoring outputs** on consistent dimensions
4. **Comparing results** in a structured table

### Evaluation Dimensions

When comparing two prompts, score each output on these dimensions:

| Dimension | What to Look For | Score 1-5 |
|-----------|-----------------|----------|
| **Relevance** | Does the output directly address the input? | |
| **Completeness** | Does it cover all requested points? | |
| **Format Compliance** | Does it follow the requested structure? | |
| **Consistency** | Is the quality consistent across different inputs? | |

### Example: Testing Customer Support Prompts

**Prompt A (basic):** "Reply to this customer message: {input}"

**Prompt B (CRAFT):** "Context: You're handling a support ticket for a SaaS product. Role: Senior customer success rep with 8 years experience. Task: Respond to: {input}. Acknowledge concern, provide solution, include timeline. Format: Greeting, empathy, solution, next steps. Tone: Empathetic and professional."

| Test Case | Prompt A Score | Prompt B Score |
|-----------|---------------|---------------|
| Refund request | 10/20 | 18/20 |
| Feature question | 11/20 | 17/20 |
| Complaint | 8/20 | 19/20 |
| **Total** | **29/60** | **54/60** |

Prompt B wins — but only if the test was fair. What makes a test fair? Read on...

---

## Exercise 2: Design a Feedback Pipeline (15 minutes)

Process the following 5 customer feedback items through a multi-step prompt workflow.

### The Feedback Items

1. "The export feature crashes every time I try to save as PDF. This is blocking my entire team's workflow!!!"
2. "Love the new dashboard redesign! The charts are so much clearer and the dark mode option is fantastic."
3. "It would be great if you could add integration with Slack so we get notifications when reports are ready."
4. "Your billing system charged me twice this month. I need an immediate refund. This is unacceptable and I'm considering switching to a competitor."
5. "The search function is slow when filtering by date range. Takes about 10 seconds to load results for large datasets."

### Your Pipeline

**Step 1 (Fan-Out) — Categorize:** Write a prompt that categorizes each item as: Bug, Feature Request, Praise, or Complaint. Run it for all 5 items.

**Step 2 (Fan-Out) — Score Urgency:** Write a prompt that scores urgency 1–5 for each item. Run it for all 5 items.

**Step 3 (Sequential) — Plan Responses:** Using the categorized and scored results, write a prompt that generates an action plan for addressing the feedback, prioritized by urgency.

---

### Step 1: Categorization Prompt

**Your prompt:**

> _______________________________________________
> _______________________________________________

**Results:**

| # | Feedback (first 50 chars) | Category |
|---|--------------------------|----------|
| 1 | The export feature crashes every time... | ___________ |
| 2 | Love the new dashboard redesign... | ___________ |
| 3 | It would be great if you could add... | ___________ |
| 4 | Your billing system charged me twice... | ___________ |
| 5 | The search function is slow when... | ___________ |

### Step 2: Urgency Scoring Prompt

**Your prompt:**

> _______________________________________________
> _______________________________________________

**Results:**

| # | Feedback (first 50 chars) | Category | Urgency (1-5) |
|---|--------------------------|----------|---------------|
| 1 | The export feature crashes every time... | ___________ | ___/5 |
| 2 | Love the new dashboard redesign... | ___________ | ___/5 |
| 3 | It would be great if you could add... | ___________ | ___/5 |
| 4 | Your billing system charged me twice... | ___________ | ___/5 |
| 5 | The search function is slow when... | ___________ | ___/5 |

### Step 3: Response Planning Prompt

**Your prompt** (paste the categorized + scored results as input):

> _______________________________________________
> _______________________________________________
> _______________________________________________

**Paste the AI-generated action plan here:**

> _______________________________________________
> _______________________________________________
> _______________________________________________

### Summary

**What did you learn about designing multi-step workflows?**

_______________________________________________

**Which step was hardest to get right, and why?**

_______________________________________________

---

## Exercise 3: Expose the Rigged A/B Test (15 minutes)

The A/B test below compares two prompts for writing product descriptions. It concludes that Prompt B is the clear winner. But the test is **rigged** — the setup is deliberately biased to make Prompt B win regardless of actual quality.

**Your task:**
1. Read the test setup carefully
2. Identify **at least 3 specific biases** in the test design
3. Design a **fair test** with the same two prompts
4. Run your fair test in your AI tool and determine the real winner

---

### The Rigged Test

**Prompt A (basic):** "Describe this product: {product}"

**Prompt B (CRAFT):** "Context: You are writing for an e-commerce platform. The customer is tech-savvy and values detailed specifications. Role: Senior product copywriter. Task: Write a compelling product description for: {product}. Include key features, benefits, and a comparison with alternatives. Format: Opening hook, 3 bullet points for features, closing CTA. Tone: Enthusiastic and persuasive."

**Test Products:** NovaBuds Pro wireless earbuds, ErgoRise laptop stand, HydroTrack smart water bottle

**"Results":**

| Product | Prompt A Score | Prompt B Score |
|---------|---------------|---------------|
| NovaBuds Pro earbuds | 6/20 | 19/20 |
| ErgoRise laptop stand | 5/20 | 18/20 |
| HydroTrack water bottle | 7/20 | 20/20 |
| **Total** | **18/60** | **57/60** |

**"Conclusion":** Prompt B is dramatically better!

### Part 1: Identify the Biases

**Bias 1:** _______________________________________________

**Bias 2:** _______________________________________________

**Bias 3:** _______________________________________________

*(Optional) Bias 4:* _______________________________________________

### Part 2: Design a Fair Test

**Your 3 test products:**
1. _______________________________________________
2. _______________________________________________
3. _______________________________________________

**Your evaluation criteria (what makes a "good" product description?):**

| Dimension | What to Look For |
|-----------|------------------|
| ____________ | ____________ |
| ____________ | ____________ |
| ____________ | ____________ |
| ____________ | ____________ |

### Part 3: Run Your Fair Test

**Test each product with Prompt A, then Prompt B. Paste outputs and score.**

**Product 1: _______________**

*Prompt A output:*
> _______________________________________________

*Prompt B output:*
> _______________________________________________

| Dimension | Prompt A | Prompt B |
|-----------|---------|----------|
| | ___/5 | ___/5 |
| | ___/5 | ___/5 |
| | ___/5 | ___/5 |
| | ___/5 | ___/5 |
| **Total** | ___/20 | ___/20 |

**Product 2: _______________**

*Prompt A output:*
> _______________________________________________

*Prompt B output:*
> _______________________________________________

| Dimension | Prompt A | Prompt B |
|-----------|---------|----------|
| **Total** | ___/20 | ___/20 |

**Product 3: _______________**

*Prompt A output:*
> _______________________________________________

*Prompt B output:*
> _______________________________________________

| Dimension | Prompt A | Prompt B |
|-----------|---------|----------|
| **Total** | ___/20 | ___/20 |

### Fair Test Results

| Product | Prompt A | Prompt B |
|---------|---------|----------|
| Product 1 | ___/20 | ___/20 |
| Product 2 | ___/20 | ___/20 |
| Product 3 | ___/20 | ___/20 |
| **Total** | ___/60 | ___/60 |

**Does Prompt B still win in a fair test?** _______________________________________________

**What did the biases in the original test hide?** _______________________________________________

---

## Exercise 4: Workflow Architect — Hiring Pipeline (15 minutes)

Design a multi-step prompt workflow to evaluate 4 job candidates and produce a hiring recommendation.

### The Candidates

1. **Alex Chen** — 8 years Python/ML experience, built recommendation systems at scale. Quiet in interviews but code samples are excellent. Prefers remote work.
2. **Jordan Rivera** — 3 years experience, bootcamp graduate. Very articulate and enthusiastic presenter. Built a popular open-source CLI tool. Wants mentorship.
3. **Sam Patel** — 12 years full-stack experience, led teams of 10+. Strong opinions about architecture, sometimes clashes with peers. Deep distributed systems expertise.
4. **Morgan Kim** — 5 years experience, PhD in NLP. Published 4 papers on transformer architectures. Limited industry experience but strong theoretical foundation.

### Your Pipeline

1. **Fan-Out — Score:** Write a prompt that scores each candidate on 3 dimensions: Technical Skills (1-5), Communication (1-5), Culture Fit (1-5). Run it for all 4 candidates.
2. **Sequential — Rank:** Use the scores to rank candidates. Write a prompt to determine the ranking.
3. **Sequential — Select:** Write a prompt to select the top 2 finalists and explain why.
4. **Sequential — Interview Questions:** Write a prompt to generate 2 tailored interview questions per finalist, targeting their weakest dimension.
5. **Sequential — Recommendation Memo:** Write a prompt to produce the final hiring recommendation.

---

### Step 1: Scoring Prompt (Fan-Out)

**Your scoring prompt:**

> _______________________________________________
> _______________________________________________

**Run for each candidate and fill in the table:**

| Candidate | Technical Skills (1-5) | Communication (1-5) | Culture Fit (1-5) | Total (/15) |
|-----------|----------------------|--------------------|--------------------|-------------|
| Alex Chen | ___ | ___ | ___ | ___ |
| Jordan Rivera | ___ | ___ | ___ | ___ |
| Sam Patel | ___ | ___ | ___ | ___ |
| Morgan Kim | ___ | ___ | ___ | ___ |

### Step 2: Ranking Prompt

**Your ranking prompt** (include the scores table as input):

> _______________________________________________

**Paste the ranking output:**

> _______________________________________________

### Step 3: Selection Prompt

**Your selection prompt:**

> _______________________________________________

**Paste the selection output (top 2 finalists + reasoning):**

> _______________________________________________

### Step 4: Interview Questions Prompt

**Your interview questions prompt:**

> _______________________________________________

**Paste the interview questions:**

> Finalist 1: _______________
> - Q1: _______________________________________________
> - Q2: _______________________________________________
> 
> Finalist 2: _______________
> - Q1: _______________________________________________
> - Q2: _______________________________________________

### Step 5: Recommendation Memo Prompt

**Your memo prompt:**

> _______________________________________________

**Paste the final recommendation memo:**

> _______________________________________________
> _______________________________________________
> _______________________________________________

---

## Key Takeaways

### Prompt Engineering Best Practices Checklist

- [ ] **Use the CRAFT framework** for every important prompt
- [ ] **Break complex tasks into chains** — don't ask the AI to do everything at once
- [ ] **Inspect intermediate outputs** in chains before they cascade
- [ ] **Test prompts systematically** with multiple inputs, not just one
- [ ] **Watch for A/B test biases** — fair evaluation requires fair setup
- [ ] **Adapt prompts for your audience** — the same information needs different framing
- [ ] **Build a template library** — reuse what works instead of starting from scratch
- [ ] **Score prompts with a rubric** — but remember rubrics have blind spots too

### When to Use Each Workflow Pattern

| Pattern | Best For | Example |
|---------|---------|----------|
| **Sequential** | Multi-step processes with dependencies | Research → Outline → Draft → Edit |
| **Fan-Out** | Same task applied to many items | Scoring multiple candidates, analyzing multiple documents |
| **Iterative** | Progressive refinement | Draft → Improve → Polish |
| **Combined** | Complex real-world tasks | Fan-out scoring + sequential ranking |

---

## Final Reflection

Take 5 minutes to reflect on what you learned in this lab.

**1. What was the most surprising thing you learned about prompt chains?**

_______________________________________________

**2. Think of a task in your daily work that could benefit from a multi-step prompt workflow. Describe the steps:**

- Step 1: _______________________________________________
- Step 2: _______________________________________________
- Step 3: _______________________________________________
- Step 4: _______________________________________________

**3. What's one bias in prompt testing that you'll now watch for?**

_______________________________________________

**4. If you could only give one piece of prompt engineering advice to a colleague, what would it be?**

_______________________________________________

*Congratulations on completing Lab 2! You now have the skills to build sophisticated prompt workflows and test them systematically.*

---