<a href="https://colab.research.google.com/github/prishaarorain-collab/Prompt-Engineering/blob/main/Exercise3_Self_Reflection.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Exercise 3 — Self-Reflection Prompt for Improving Output

## Goal
Ask Claude to:
1. Generate an **initial summary** of an article
2. **Critique** its own summary against explicit requirements
3. Produce an **improved revision** based on the critique

This demonstrates the **Critique-and-Revise** pattern in prompt engineering.

**Tool used:** Python + Anthropic Claude API

In [1]:
!pip install anthropic -q

[?25l   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m0.0/455.2 kB[0m [31m?[0m eta [36m-:--:--[0m[2K   [91m━━━━━━━━[0m[91m╸[0m[90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m102.4/455.2 kB[0m [31m2.7 MB/s[0m eta [36m0:00:01[0m[2K   [91m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m[91m╸[0m [32m450.6/455.2 kB[0m [31m6.1 MB/s[0m eta [36m0:00:01[0m[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m455.2/455.2 kB[0m [31m3.5 MB/s[0m eta [36m0:00:00[0m
[?25h

In [2]:
import anthropic
import json
import re

from google.colab import userdata
api_key = userdata.get('ANTHROPIC_API_KEY')

client = anthropic.Anthropic(api_key=api_key)

def call_claude(system_prompt: str, user_message: str, max_tokens: int = 512) -> str:
    """Helper: send a prompt to Claude and return the text response."""
    response = client.messages.create(
        model="claude-sonnet-4-6",
        max_tokens=max_tokens,
        messages=[{"role": "user", "content": user_message}],
        system=system_prompt
    )
    text = response.content[0].text.strip()
    # Strip markdown code fences if present (e.g. ```json ... ```)
    text = re.sub(r"^```[a-z]*\n?", "", text)
    text = re.sub(r"```$", "", text).strip()
    return text

print("Setup complete.")


Setup complete.


---
## Source Article
Use a factual excerpt about AI in healthcare as the document to summarize.

In [3]:
ARTICLE = """
Artificial intelligence is rapidly transforming healthcare delivery across the globe.
Machine learning algorithms can now detect certain cancers from medical imaging with accuracy
that matches or exceeds that of experienced radiologists. For example, Google's DeepMind
developed an AI system that identified over 50 eye diseases from retinal scans with 94%
accuracy. Similarly, AI tools are being deployed to predict patient deterioration in ICUs
hours before clinical signs appear, allowing early intervention.

However, widespread adoption faces significant barriers. Data privacy regulations such as
HIPAA in the US and GDPR in Europe restrict the sharing of patient data needed to train
robust models. Algorithmic bias remains a serious concern: models trained predominantly on
data from certain demographics may underperform for others, potentially widening health
disparities. Healthcare providers also cite high integration costs and the need for
clinician retraining as major obstacles.

Despite these challenges, investment in healthcare AI reached $6.6 billion in 2021 and is
projected to exceed $45 billion by 2026. Startups and established companies alike are
racing to bring AI-powered diagnostics, drug discovery tools, and administrative automation
to market. Regulatory bodies such as the FDA have begun approving AI-based medical devices,
with over 500 such approvals granted as of 2023.

Experts emphasize that AI should augment, not replace, clinical judgment. The most successful
implementations treat AI as a decision-support tool that flags anomalies and reduces
administrative burden, freeing clinicians to focus on patient care.
"""

# Clear, numbered requirements used for critique
SUMMARY_REQUIREMENTS = """
1. LENGTH: Between 80 and 100 words exactly. Count every word.
2. AUDIENCE: Written for non-technical healthcare administrators. No unexplained jargon.
3. STRUCTURE: Must include all three: (a) one benefit of AI in healthcare, (b) one challenge, (c) one statistic from the article.
4. TONE: Neutral and informative. No hype or alarmism.
5. ACCURACY: Every claim must be supported by the source article. Do not fabricate or combine examples inaccurately.
6. FORMAT: Plain prose only. No bullet points or headers.
"""

print("Article and requirements loaded.")


Article and requirements loaded.


---
## Step 1 — Generate Initial Summary (Before Reflection)

In [4]:
INITIAL_SUMMARY_SYSTEM = """
You are a content writer. Summarize the article below in a few sentences.
Keep it informative.
"""

initial_summary = call_claude(
    INITIAL_SUMMARY_SYSTEM,
    f"Summarize this article:\n\n{ARTICLE}",
    max_tokens=300
)

word_count_initial = len(initial_summary.split())

print("=" * 60)
print("INITIAL SUMMARY (BEFORE SELF-REFLECTION)")
print("=" * 60)
print(initial_summary)
print(f"\n[Word count: {word_count_initial}]")


INITIAL SUMMARY (BEFORE SELF-REFLECTION)
## AI Transforming Healthcare: A Summary

Artificial intelligence is revolutionizing healthcare by enabling early disease detection, predicting patient deterioration, and streamlining clinical workflows, with some AI systems matching or surpassing the diagnostic accuracy of experienced specialists. However, adoption faces major hurdles, including **data privacy regulations, algorithmic bias, high integration costs, and the need for clinician retraining**.

Despite these challenges, the sector is experiencing rapid growth, with healthcare AI investment expected to surge from $6.6 billion in 2021 to over $45 billion by 2026, backed by growing regulatory approval from bodies like the FDA. Experts stress that AI should serve as a **decision-support tool** to assist clinicians rather than replace their judgment, ultimately allowing healthcare professionals to focus more on patient care.

[Word count: 121]


---
## Step 2 — Self-Critique Prompt

Claude is asked to evaluate the initial summary against explicit criteria and identify all gaps.

In [5]:
CRITIQUE_SYSTEM = """
You are an editorial reviewer checking a summary against a list of requirements.
For each requirement, state PASS or FAIL and give a one-sentence reason.
Be specific and actionable. End with a list of the top changes needed.

Format each line as:
Requirement [N]: PASS/FAIL - [reason]
"""

critique = call_claude(
    CRITIQUE_SYSTEM,
    f"""ARTICLE:\n{ARTICLE}\n\nSUMMARY:\n{initial_summary}\n\nREQUIREMENTS:\n{SUMMARY_REQUIREMENTS}\n\nEvaluate the summary against each requirement.""",
    max_tokens=500
)

print("=" * 60)
print("SELF-CRITIQUE")
print("=" * 60)
print(critique)


SELF-CRITIQUE
I'll evaluate the summary against each requirement systematically.

**Requirement 1: FAIL** - The summary contains approximately 117 words, which exceeds the 80–100 word limit. A reduction of roughly 17–37 words is needed.

**Requirement 2: FAIL** - The term "algorithmic bias" is used without explanation, which may not be immediately clear to non-technical healthcare administrators. It should be briefly clarified (e.g., "bias in AI models that can disadvantage certain patient groups").

**Requirement 3: PASS** - The summary includes a benefit (early disease detection and predicting patient deterioration), a challenge (data privacy regulations, bias, costs), and a statistic ($6.6 billion in 2021 projected to $45 billion by 2026).

**Requirement 4: PASS** - The tone is measured and informative throughout, with no alarmist language or excessive enthusiasm. The word "revolutionizing" leans slightly promotional but is not disqualifying.

**Requirement 5: PASS** - All claims ar

---
## Step 3 — Generate Improved Summary (After Reflection)

In [6]:
REVISION_SYSTEM = """
You are rewriting a summary to fix specific problems.
Output ONLY the revised summary. No explanation, no preamble, no word count.

REQUIRED PHRASES - you must use these exact wordings:
- Say "matches or exceeds" when describing AI accuracy (not just "matches")
- Reference "medical imaging" specifically when describing disease detection
- Use the $6.6 billion statistic from 2021
"""

def get_revised_summary(base_critique, max_attempts=6):
    current_critique = base_critique
    best_result = ""
    best_wc = 0
    for attempt in range(1, max_attempts + 1):
        prompt = f"""ARTICLE:
{ARTICLE}

REQUIREMENTS:
{SUMMARY_REQUIREMENTS}

ORIGINAL SUMMARY:
{initial_summary}

PROBLEMS TO FIX:
{current_critique}

Write a new summary that fixes all problems.
IMPORTANT: Count your words. It must be 80-100 words. Not one word more or less.
Output ONLY the summary.
"""
        result = call_claude(REVISION_SYSTEM, prompt, max_tokens=250)
        wc = len(result.split())
        print(f"Attempt {attempt}: {wc} words", "OK" if 80 <= wc <= 100 else "")
        if 80 <= wc <= 100:
            return result, wc
        if wc > 100:
            feedback = f"PREVIOUS VERSION HAD {wc} WORDS - that is {wc-100} words TOO LONG. Remove {wc-100} words."
        else:
            feedback = f"PREVIOUS VERSION HAD {wc} WORDS - that is {80-wc} words TOO SHORT. Add {80-wc} words."
        current_critique = base_critique + "\n\n" + feedback
        best_result = result
        best_wc = wc
    return best_result, best_wc

improved_summary, word_count_improved = get_revised_summary(critique)

print("=" * 60)
print("IMPROVED SUMMARY (AFTER SELF-REFLECTION)")
print("=" * 60)
print(improved_summary)
print(f"\n[Word count: {word_count_improved}]")
print(f"[Within 80-100: {'YES' if 80 <= word_count_improved <= 100 else 'NO'}]")


Attempt 1: 101 words 
Attempt 2: 95 words OK
IMPROVED SUMMARY (AFTER SELF-REFLECTION)
Artificial intelligence is improving healthcare by enabling earlier disease detection through medical imaging, with some AI systems achieving accuracy that matches or exceeds that of experienced specialists. However, adoption faces real hurdles, including data privacy regulations, high integration costs, and bias in AI models that can disadvantage certain patient groups. Despite these challenges, investment in healthcare AI reached $6.6 billion in 2021 and is projected to exceed $45 billion by 2026. Experts emphasize that AI should serve as a decision-support tool, helping clinicians identify problems and reduce administrative burden rather than replacing their professional judgment.

[Word count: 95]
[Within 80-100: YES]


---
## Final Comparison: Before vs After

In [7]:
print("=" * 60)
print("BEFORE (Initial Summary)")
print("=" * 60)
print(initial_summary)
print(f"[{word_count_initial} words]")

print()
print("=" * 60)
print("AFTER (Improved Summary)")
print("=" * 60)
print(improved_summary)
print(f"[{word_count_improved} words]")

print()
print("=" * 60)
print("IMPROVEMENT SUMMARY")
print("=" * 60)
print(f"Word count: {word_count_initial} words -> {word_count_improved} words (target: 80-100)")
print(f"Within required range: {'YES' if 80 <= word_count_improved <= 100 else 'NO'}")


BEFORE (Initial Summary)
## AI Transforming Healthcare: A Summary

Artificial intelligence is revolutionizing healthcare by enabling early disease detection, predicting patient deterioration, and streamlining clinical workflows, with some AI systems matching or surpassing the diagnostic accuracy of experienced specialists. However, adoption faces major hurdles, including **data privacy regulations, algorithmic bias, high integration costs, and the need for clinician retraining**.

Despite these challenges, the sector is experiencing rapid growth, with healthcare AI investment expected to surge from $6.6 billion in 2021 to over $45 billion by 2026, backed by growing regulatory approval from bodies like the FDA. Experts stress that AI should serve as a **decision-support tool** to assist clinicians rather than replace their judgment, ultimately allowing healthcare professionals to focus more on patient care.
[121 words]

AFTER (Improved Summary)
Artificial intelligence is improving healt

---
## Step 4 — Verify Improved Summary Against Requirements

In [8]:
# Count words precisely in Python first
actual_word_count = len(improved_summary.split())

final_critique = call_claude(
    CRITIQUE_SYSTEM,
    f"""ARTICLE:
{ARTICLE}

SUMMARY TO EVALUATE:
{improved_summary}

REQUIREMENTS:
{SUMMARY_REQUIREMENTS}

IMPORTANT: The summary has been programmatically counted and contains EXACTLY {actual_word_count} words.
Use this number for Requirement 1. Do not recount manually.

Evaluate the summary against each requirement.""",
    max_tokens=500
)

print("=" * 60)
print("FINAL VERIFICATION (improved summary)")
print("=" * 60)
print(final_critique)
print(f"\nWord count: {actual_word_count} | In range: {'YES' if 80 <= actual_word_count <= 100 else 'NO'}")


FINAL VERIFICATION (improved summary)
Requirement 1: PASS - The summary contains exactly 95 words, which falls within the required 80–100 word range.

Requirement 2: PASS - The summary uses plain, accessible language appropriate for non-technical healthcare administrators, with no unexplained jargon; terms like "decision-support tool" and "administrative burden" are self-explanatory in context.

Requirement 3: PASS - The summary includes (a) a benefit (earlier disease detection via medical imaging matching specialist accuracy), (b) a challenge (data privacy regulations, high integration costs, and algorithmic bias), and (c) a statistic ($6.6 billion investment in 2021 projected to exceed $45 billion by 2026).

Requirement 4: PASS - The tone is consistently neutral and informative, presenting both benefits and challenges without overstating AI's promise or exaggerating its risks.

Requirement 5: PASS - All claims are directly supported by the source article; the summary accurately refle

---
## Iteration Notes

**Prompt engineering lessons from this exercise:**

| Issue | Fix Applied |
|-------|-------------|
| Initial summary too long / too short | Added strict `80–100 word` constraint |
| Vague critique ("could be better") | Required numbered requirement-by-requirement format |
| Revision added preamble text | Added `Output ONLY the revised summary` constraint |
| No verification of improvement | Added Step 4 re-critique for confirmation |

**Key takeaway:** Explicit, numbered criteria in the critique prompt prevent vague feedback and
force measurable improvement between versions.