##### Copyright 2025 Google LLC.

In [None]:
# @title Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# https://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.

# Gemini API: Self-critique prompt optimization

<a target="_blank" href="https://colab.research.google.com/github/google-gemini/cookbook/blob/main/examples/prompting/Self_critique_prompt_optimization.ipynb"><img src="https://colab.research.google.com/assets/colab-badge.svg" height=30/></a>

Prompt engineering often involves manual trial and error. You write a prompt, evaluate the output, tweak the prompt, and repeat. This notebook demonstrates how to automate this process by having Gemini critique its own outputs and suggest prompt improvements.

This technique, sometimes called **meta-prompting** or **self-critique**, uses the model to:

1. Generate a response from an initial prompt
2. Critique the quality of that response
3. Identify specific weaknesses
4. Rewrite the prompt to address those weaknesses
5. Generate an improved response

By the end of this notebook, you will understand how to implement an iterative prompt optimization loop that can help you develop better prompts faster.

## Setup

### Install SDK

In [None]:
%pip install -U -q "google-genai>=1.0.0"

### Set up your API key

To run the following cell, your API key must be stored in a Colab Secret named `GOOGLE_API_KEY`. If you don't already have an API key, or you're not sure how to create a Colab Secret, see [Authentication](https://github.com/google-gemini/cookbook/blob/main/quickstarts/Authentication.ipynb) for an example.

In [None]:
from google.colab import userdata
from google import genai

GOOGLE_API_KEY = userdata.get('GOOGLE_API_KEY')
client = genai.Client(api_key=GOOGLE_API_KEY)

Select the model you want to use from the available options:

In [None]:
MODEL_ID = "gemini-2.5-flash"  # @param ["gemini-2.5-flash-lite", "gemini-2.5-flash", "gemini-2.5-pro", "gemini-2.5-flash-preview", "gemini-3-flash-preview", "gemini-3-pro-preview"] {"allow-input":true, isTemplate: true}

In [None]:
from IPython.display import Markdown, display

## The problem: weak prompts produce weak results

Consider a common scenario: you need the model to explain a technical concept, but your initial prompt is vague. The output might be generic, miss key details, or lack the structure you need.

Here's a deliberately weak prompt to demonstrate:

In [None]:
# Define the task context - this stays constant throughout optimization
task_description = "Explain how neural networks learn"

# Initial weak prompt - vague and lacks specificity
initial_prompt = "Explain how neural networks learn."

### Generate the initial response

In [None]:
initial_response = client.models.generate_content(
    model=MODEL_ID,
    contents=initial_prompt
)

print("=" * 60)
print("INITIAL PROMPT:")
print("=" * 60)
print(initial_prompt)
print("\n" + "=" * 60)
print("INITIAL RESPONSE:")
print("=" * 60)
display(Markdown(initial_response.text))

## Step 1: Critique the output

Now, ask the model to critically evaluate its own response. The critique prompt should ask for specific, actionable feedback.

In [None]:
def critique_response(task, prompt, response_text):
    """
    Ask the model to critique a response and identify weaknesses.
    """
    critique_prompt = f"""
You are a prompt engineering expert. Analyze the following prompt and its 
response, then provide a detailed critique.

TASK: {task}

PROMPT USED:
{prompt}

RESPONSE GENERATED:
{response_text}

Provide your critique in this format:

STRENGTHS:
- List what the response did well

WEAKNESSES:
- List specific problems with the response
- Focus on: clarity, completeness, structure, accuracy, relevance

PROMPT ISSUES:
- Identify what was missing or unclear in the original prompt
- Explain how prompt weaknesses led to response weaknesses

QUALITY SCORE: [1-10]
"""
    
    critique = client.models.generate_content(
        model=MODEL_ID,
        contents=critique_prompt
    )
    return critique.text

In [None]:
critique_1 = critique_response(task_description, initial_prompt, initial_response.text)

print("=" * 60)
print("CRITIQUE OF INITIAL RESPONSE:")
print("=" * 60)
display(Markdown(critique_1))

## Step 2: Rewrite the prompt

Based on the critique, ask the model to generate an improved prompt that addresses the identified weaknesses.

In [None]:
def rewrite_prompt(task, original_prompt, critique):
    """
    Generate an improved prompt based on the critique.
    """
    rewrite_instruction = f"""
You are a prompt engineering expert. Based on the critique below, rewrite the 
prompt to address all identified weaknesses.

TASK: {task}

ORIGINAL PROMPT:
{original_prompt}

CRITIQUE:
{critique}

Write an improved prompt that:
1. Addresses all weaknesses mentioned in the critique
2. Is clear and specific about expectations
3. Includes relevant constraints (format, length, audience, etc.)
4. Guides the model toward a higher quality response

Return ONLY the improved prompt, nothing else. Do not include explanations 
or commentary.
"""
    
    result = client.models.generate_content(
        model=MODEL_ID,
        contents=rewrite_instruction
    )
    return result.text.strip()

In [None]:
improved_prompt_1 = rewrite_prompt(task_description, initial_prompt, critique_1)

print("=" * 60)
print("IMPROVED PROMPT (Iteration 1):")
print("=" * 60)
print(improved_prompt_1)

## Step 3: Generate response with improved prompt

In [None]:
improved_response_1 = client.models.generate_content(
    model=MODEL_ID,
    contents=improved_prompt_1
)

print("=" * 60)
print("IMPROVED RESPONSE (Iteration 1):")
print("=" * 60)
display(Markdown(improved_response_1.text))

## Iteration 2: Further refinement

Run the critique-rewrite cycle again to see if additional improvements are possible.

In [None]:
critique_2 = critique_response(task_description, improved_prompt_1, improved_response_1.text)

print("=" * 60)
print("CRITIQUE (Iteration 2):")
print("=" * 60)
display(Markdown(critique_2))

In [None]:
improved_prompt_2 = rewrite_prompt(task_description, improved_prompt_1, critique_2)

print("=" * 60)
print("IMPROVED PROMPT (Iteration 2):")
print("=" * 60)
print(improved_prompt_2)

In [None]:
improved_response_2 = client.models.generate_content(
    model=MODEL_ID,
    contents=improved_prompt_2
)

print("=" * 60)
print("IMPROVED RESPONSE (Iteration 2):")
print("=" * 60)
display(Markdown(improved_response_2.text))

## Iteration 3: Final refinement

One more iteration to maximize prompt quality.

In [None]:
critique_3 = critique_response(task_description, improved_prompt_2, improved_response_2.text)

print("=" * 60)
print("CRITIQUE (Iteration 3):")
print("=" * 60)
display(Markdown(critique_3))

In [None]:
improved_prompt_3 = rewrite_prompt(task_description, improved_prompt_2, critique_3)

print("=" * 60)
print("FINAL OPTIMIZED PROMPT (Iteration 3):")
print("=" * 60)
print(improved_prompt_3)

In [None]:
final_response = client.models.generate_content(
    model=MODEL_ID,
    contents=improved_prompt_3
)

print("=" * 60)
print("FINAL RESPONSE (Iteration 3):")
print("=" * 60)
display(Markdown(final_response.text))

## Compare: Before and after

Let's compare the prompt evolution and have the model evaluate the improvement.

In [None]:
print("=" * 60)
print("PROMPT EVOLUTION")
print("=" * 60)

print("\n[ORIGINAL PROMPT]")
print(initial_prompt)

print("\n" + "-" * 40)
print("\n[ITERATION 1]")
print(improved_prompt_1)

print("\n" + "-" * 40)
print("\n[ITERATION 2]")
print(improved_prompt_2)

print("\n" + "-" * 40)
print("\n[FINAL PROMPT]")
print(improved_prompt_3)

In [None]:
comparison_prompt = f"""
Compare these two responses to the task: "{task_description}"

RESPONSE A (from weak prompt):
{initial_response.text[:2000]}...

RESPONSE B (from optimized prompt):
{final_response.text[:2000]}...

Provide a brief comparison:
1. What specific improvements do you see in Response B?
2. Rate each response on a scale of 1-10
3. What made the optimized prompt more effective?
"""

comparison = client.models.generate_content(
    model=MODEL_ID,
    contents=comparison_prompt
)

print("=" * 60)
print("FINAL COMPARISON:")
print("=" * 60)
display(Markdown(comparison.text))

## Bonus: Automated optimization loop

Here's a reusable function that combines all steps into a single optimization loop.

In [None]:
def optimize_prompt(task, initial_prompt, iterations=3, verbose=True):
    """
    Automatically optimize a prompt through iterative self-critique.
    
    Args:
        task: Description of what the prompt should accomplish
        initial_prompt: The starting prompt to optimize
        iterations: Number of critique-rewrite cycles
        verbose: Whether to print intermediate results
    
    Returns:
        Dictionary with optimization history and final results
    """
    history = {
        "prompts": [initial_prompt],
        "responses": [],
        "critiques": []
    }
    
    current_prompt = initial_prompt
    
    for i in range(iterations):
        if verbose:
            print(f"\n{'='*60}")
            print(f"ITERATION {i + 1}")
            print("=" * 60)
        
        # Generate response
        response = client.models.generate_content(
            model=MODEL_ID,
            contents=current_prompt
        )
        history["responses"].append(response.text)
        
        # Critique
        critique = critique_response(task, current_prompt, response.text)
        history["critiques"].append(critique)
        
        if verbose:
            print(f"\nPrompt: {current_prompt[:100]}...")
            print(f"\nCritique summary: {critique[:200]}...")
        
        # Rewrite
        current_prompt = rewrite_prompt(task, current_prompt, critique)
        history["prompts"].append(current_prompt)
    
    # Generate final response with optimized prompt
    final_response = client.models.generate_content(
        model=MODEL_ID,
        contents=current_prompt
    )
    history["responses"].append(final_response.text)
    
    return {
        "initial_prompt": initial_prompt,
        "final_prompt": current_prompt,
        "initial_response": history["responses"][0],
        "final_response": final_response.text,
        "history": history
    }

### Try it with a different task

In [None]:
# Try optimizing a different weak prompt
result = optimize_prompt(
    task="Write a product description for a fitness tracker",
    initial_prompt="Write about a fitness tracker.",
    iterations=2,
    verbose=True
)

print("\n" + "=" * 60)
print("OPTIMIZATION COMPLETE")
print("=" * 60)
print(f"\nInitial prompt: {result['initial_prompt']}")
print(f"\nFinal prompt: {result['final_prompt']}")

## Key learnings

This self-critique approach reveals common prompt improvements:

1. **Specificity**: Vague prompts get vague responses. The model adds specific requirements.

2. **Structure**: Optimized prompts often request specific formats (bullet points, sections, examples).

3. **Audience**: Defining the target audience helps calibrate complexity and tone.

4. **Constraints**: Adding length limits, focus areas, or exclusions improves relevance.

5. **Context**: Providing background information leads to more informed responses.

You can use this technique to:
- Rapidly iterate on prompts for production applications
- Learn what makes prompts effective for specific tasks
- Generate prompt templates for common use cases
- Debug why certain prompts underperform

## Next steps

### Related prompting techniques

Explore other prompting examples in this repository:

- [Chain of thought prompting](./Chain_of_thought_prompting.ipynb) - Guide the model through reasoning steps
- [Few-shot prompting](./Few_shot_prompting.ipynb) - Provide examples to guide output format
- [Role prompting](./Role_prompting.ipynb) - Assign personas for specialized responses
- [Self-ask prompting](./Self_ask_prompting.ipynb) - Have the model decompose complex questions

### Useful API references

- [Prompt design guide](https://ai.google.dev/gemini-api/docs/prompting-intro)
- [System instructions](https://ai.google.dev/gemini-api/docs/system-instructions)
- [JSON mode for structured outputs](../json_capabilities/)