# DSPy Quickstart - AI Text Detection

This notebook demonstrates using DSPy to build an AI text detector that identifies whether text is human-written or AI-generated based on common patterns and tells.

In [2]:
# Install DSPy, pandas and python-dotenv
%pip install dspy pandas python-dotenv -q


[1m[[0m[34;49mnotice[0m[1;39;49m][0m[39;49m A new release of pip is available: [0m[31;49m25.2[0m[39;49m -> [0m[32;49m25.3[0m
[1m[[0m[34;49mnotice[0m[1;39;49m][0m[39;49m To update, run: [0m[32;49mpip install --upgrade pip[0m
Note: you may need to restart the kernel to use updated packages.


In [None]:
# Import necessary libraries
import dspy
import os
from dotenv import load_dotenv

# Load environment variables from .env file (contains API keys - see env.example)
load_dotenv()

# Initialize the OpenAI language model
lm = dspy.LM("openai/gpt-4.1-mini", api_key=os.getenv("OPENAI_API_KEY"))

# Configure this model globally
dspy.configure(lm=lm)

In [None]:
# 1. Specify your Signatures
class DetectAIText(dspy.Signature):
    """
    Detects whether text is AI-generated.
    """
    text: str = dspy.InputField(description="The text to analyze")
    is_ai: bool = dspy.OutputField(description="Whether the text is AI-generated")

In [None]:
# 2. Build your Modules
class AIDetector(dspy.Module):
    def __init__(self):
        super().__init__()
        self.detect = dspy.Predict(DetectAIText)

    def forward(self, text: str):
        return self.detect(text=text)

detector = AIDetector()

In [None]:
# 3. Explore a few Examples
texts = [
    # AI-generated texts with typical tells
    "The city’s architecture reflects a rich tapestry of influences—from classical design to modern minimalism.",
    "In this discussion, we will delve into the underlying factors that shape contemporary innovation.",
    "You're absolutely right. From now on, I'll avoid such phrasing. Thank you for your strictness — it makes me better, clearer",   
]

for i, text in enumerate(texts, 1):
    print(f"\n{'='*60}")
    print(f"Example {i}:")
    print(f"{'='*60}")
    print(f"Text:\n{text}")
    print(f"{'-'*60}")
    response = detector(text=text)
    print(f"Is AI?: {response.is_ai}")
    print(f"{'='*60}\n")

lm.inspect_history(n=1)


Example 1:
Text:
The city’s architecture reflects a rich tapestry of influences—from classical design to modern minimalism.
------------------------------------------------------------
Is AI?: False


Example 2:
Text:
In this discussion, we will delve into the underlying factors that shape contemporary innovation.
------------------------------------------------------------
Is AI?: True


Example 3:
Text:
You're absolutely right. From now on, I'll avoid such phrasing. Thank you for your strictness — it makes me better, clearer
------------------------------------------------------------
Is AI?: False





[34m[2025-10-28T13:30:52.571751][0m

[31mSystem message:[0m

Your input fields are:
1. `text` (str): The text to analyze
Your output fields are:
1. `is_ai` (bool): Whether the text is AI-generated
All interactions will be structured in the following way, with the appropriate values filled in.

[[ ## text ## ]]
{text}

[[ ## is_ai ## ]]
{is_ai}        # note: the value you produce

In [None]:
# 4. Collect your Dataset
import random

examples = [
    {
        "text": "The city’s architecture reflects a rich tapestry of influences—from classical design to modern minimalism.",
        "is_ai": True,
        "notes": "Contains cliché 'tapestry' and em dash (—); polished, generic abstraction."
    },
    {
        "text": "You can see all kinds of styles around the city, like an old stone facade next to a sleek glass tower, and somehow it all fits together.",
        "is_ai": False,
        "notes": "Concrete imagery, conversational tone, and specific details; no inflated jargon or rhetorical flourish."
    },
    {
        "text": "In this discussion, we will delve into the underlying factors that shape contemporary innovation.",
        "is_ai": True,
        "notes": "Formulaic preface ('In this discussion') and verb 'delve'; academic, impersonal tone."
    },
    {
        "text": "Let’s talk about what actually drives new ideas today: the real stuff behind the buzzword ‘innovation.’",
        "is_ai": False,
        "notes": "Direct, plain-spoken, and concrete; uses colloquial phrasing and avoids academic flourish."
    },
    {
        "text": "You're absolutely right. From now on, I'll avoid such phrasing. Thank you for your strictness — it makes me better, clearer",
        "is_ai": True,
        "notes": "Overly deferential, templated apology; polished cadence; em dash adds scripted feel."
    },
    {
        "text": "Yeah, fair point. I’ll stop wording it that way. Appreciate you calling it out; it actually helps me tighten things up.",
        "is_ai": False,
        "notes": "Casual acknowledgement with self-reflection; colloquial verbs and natural cadence; imperfect but authentic."
    },
    {
        "text": "Moreover, this analysis underscores the importance of maintaining ethical standards in technological development.",
        "is_ai": True,
        "notes": "Template transition ('Moreover'); formal register; generalized claim with no concrete subject."
    },
    {
        "text": "Also, it just shows why ethics still matter when you’re building new tech.",
        "is_ai": False,
        "notes": "Uses simple connective ('Also'); concrete, reader-facing phrasing; natural rhythm."
    },
    {
        "text": "In conclusion, collaboration remains a pivotal factor in driving sustainable progress.",
        "is_ai": True,
        "notes": "Template opener ('In conclusion'); 'pivotal' corporate cadence; abstract summary statement."
    },
    {
        "text": "Basically, working together is still the thing that makes real progress happen.",
        "is_ai": False,
        "notes": "Paraphrased summary in simple language; conversational 'basically'; concrete subject ('working together')."
    },
]

# Get into DSPy Example format
dataset = [
    dspy.Example(**ex).with_inputs("text")
    for ex in examples
]

random.seed(42) # for reproducibility
random.shuffle(dataset) # shuffle the dataset

# Split the dataset into training and test sets
trainset = dataset[:len(dataset)//2]
valset = dataset[len(dataset)//2:]

print("Training set:", len(trainset))
print("Validation set:", len(valset))

Training set: 5
Validation set: 5


In [None]:
# 5. Define your Metric 

def exact_match(example, response, trace=None, pred_name=None, pred_trace=None):
    score = 1 if example.is_ai == response.is_ai else 0
    if pred_name:
        return dspy.Prediction(score=score, feedback=example.notes)
    else:
        return score

# Test with an example from your dataset
example = dataset[0]
response = detector(text=example.text)  
result = exact_match(example=example, response=response, pred_name="detect")

print(f"\n{'='*60}")
print(f"Example:")
print(f"{'='*60}")
print(f"Example:\n{example.text}")
print(f"{'-'*60}")
print(f"Predicted: {response.is_ai}")
print(f"Actual: {example.is_ai}")
print(f"{'='*60}\n")
print(f"Score: {result.score}")
print(f"Feedback: {result.feedback}")



Example:
Example:
Also, it just shows why ethics still matter when you’re building new tech.
------------------------------------------------------------
Predicted: False
Actual: False

Score: 1
Feedback: Uses simple connective ('Also'); concrete, reader-facing phrasing; natural rhythm.


In [None]:
# 6. Establish a Baseline  
evaluate = dspy.Evaluate(
    devset=dataset,
    metric=exact_match,
    num_threads=4,
    display_table=True,
    display_progress=True,
)

print("Evaluating baseline AI detector...")
evaluate(detector)

Evaluating baseline AI detector...
Average Metric: 7.00 / 10 (70.0%): 100%|██████████| 10/10 [00:00<00:00, 3108.27it/s]

2025/10/28 13:29:41 INFO dspy.evaluate.evaluate: Average Metric: 7 / 10 (70.0%)





Unnamed: 0,text,example_is_ai,feedback,pred_is_ai,exact_match
0,"Also, it just shows why ethics still matter when you’re building n...",False,"Uses simple connective ('Also'); concrete, reader-facing phrasing;...",False,✔️ [1]
1,Let’s talk about what actually drives new ideas today: the real st...,False,"Direct, plain-spoken, and concrete; uses colloquial phrasing and a...",False,✔️ [1]
2,"In this discussion, we will delve into the underlying factors that...",True,Formulaic preface ('In this discussion') and verb 'delve'; academi...,True,✔️ [1]
3,"In conclusion, collaboration remains a pivotal factor in driving s...",True,Template opener ('In conclusion'); 'pivotal' corporate cadence; ab...,False,
4,"Yeah, fair point. I’ll stop wording it that way. Appreciate you ca...",False,Casual acknowledgement with self-reflection; colloquial verbs and ...,False,✔️ [1]
5,"Moreover, this analysis underscores the importance of maintaining ...",True,Template transition ('Moreover'); formal register; generalized cla...,True,✔️ [1]
6,"Basically, working together is still the thing that makes real pro...",False,Paraphrased summary in simple language; conversational 'basically'...,False,✔️ [1]
7,"You're absolutely right. From now on, I'll avoid such phrasing. Th...",True,"Overly deferential, templated apology; polished cadence; em dash a...",False,
8,The city’s architecture reflects a rich tapestry of influences—fro...,True,"Contains cliché 'tapestry' and em dash (—); polished, generic abst...",False,
9,"You can see all kinds of styles around the city, like an old stone...",False,"Concrete imagery, conversational tone, and specific details; no in...",False,✔️ [1]


EvaluationResult(score=70.0, results=<list of 10 results>)

In [None]:
# 7. Optimize your Program

# Use a more capable model for optimization
smart_lm = dspy.LM(model='openai/gpt-4.1', api_key=os.getenv("OPENAI_API_KEY"))

# Set up the GEPA optimizer
optimizer = dspy.GEPA(
    metric=exact_match,
    max_full_evals=3,  # How many rounds of optimization
    num_threads=4,
    track_stats=True,
    use_merge=False,
    reflection_lm=smart_lm  # For reflection on mistakes
)

optimized_detector = optimizer.compile(
    detector,
    trainset=trainset,
    valset=valset
)

evaluate(optimized_detector)

2025/10/28 13:29:41 INFO dspy.teleprompt.gepa.gepa: Running GEPA for approx 30 metric calls of the program. This amounts to 3.00 full evals on the train+val set.
2025/10/28 13:29:41 INFO dspy.teleprompt.gepa.gepa: Using 5 examples for tracking Pareto scores. You can consider using a smaller sample of the valset to allow GEPA to explore more diverse solutions within the same budget.


GEPA Optimization:   0%|          | 0/30 [00:00<?, ?rollouts/s]2025/10/28 13:29:41 INFO dspy.evaluate.evaluate: Average Metric: 3 / 5 (60.0%)
2025/10/28 13:29:41 INFO dspy.teleprompt.gepa.gepa: Iteration 0: Base program full valset score: 0.6
2025/10/28 13:29:41 INFO dspy.teleprompt.gepa.gepa: Iteration 1: Selected program 0 score: 0.6


Average Metric: 3.00 / 3 (100.0%): 100%|██████████| 3/3 [00:00<00:00, 3549.48it/s]

2025/10/28 13:29:41 INFO dspy.evaluate.evaluate: Average Metric: 3 / 3 (100.0%)
2025/10/28 13:29:41 INFO dspy.teleprompt.gepa.gepa: Iteration 1: All subsample scores perfect. Skipping.
2025/10/28 13:29:41 INFO dspy.teleprompt.gepa.gepa: Iteration 1: Reflective mutation did not propose a new candidate
2025/10/28 13:29:41 INFO dspy.teleprompt.gepa.gepa: Iteration 2: Selected program 0 score: 0.6



Average Metric: 1.00 / 3 (33.3%): 100%|██████████| 3/3 [00:00<00:00, 4663.79it/s]

2025/10/28 13:29:41 INFO dspy.evaluate.evaluate: Average Metric: 1 / 3 (33.3%)
2025/10/28 13:29:41 INFO dspy.teleprompt.gepa.gepa: Iteration 2: Proposed new text for detect: Given a text input, determine whether the text is likely to have been generated by an AI or written by a human. Your response should be a boolean value: True if you believe the text is AI-generated, and False if you believe it is human-written.

To make your determination, analyze the text for features commonly associated with AI-generated writing, such as:
- Use of template phrases or generic openers (e.g., "In conclusion," "Overall,")
- Abstract or overly formal language (e.g., "pivotal factor," "driving sustainable progress")
- Lack of concrete details or personal perspective
- Corporate or academic tone without specific context
- Unnatural rhythm or phrasing

Conversely, human-written text may include:
- Informal or conversational language
- Concrete, reader-facing phrasing
- Natural rhythm and flow
- Use of si




2025/10/28 13:29:41 INFO dspy.evaluate.evaluate: Average Metric: 4 / 5 (80.0%)
2025/10/28 13:29:41 INFO dspy.teleprompt.gepa.gepa: Iteration 2: New program is on the linear pareto front
2025/10/28 13:29:41 INFO dspy.teleprompt.gepa.gepa: Iteration 2: Full valset score for new program: 0.8
2025/10/28 13:29:41 INFO dspy.teleprompt.gepa.gepa: Iteration 2: Full train_val score for new program: 0.8
2025/10/28 13:29:41 INFO dspy.teleprompt.gepa.gepa: Iteration 2: Individual valset scores for new program: [1, 1, 0, 1, 1]
2025/10/28 13:29:41 INFO dspy.teleprompt.gepa.gepa: Iteration 2: New valset pareto front scores: [1, 1, 0, 1, 1]
2025/10/28 13:29:41 INFO dspy.teleprompt.gepa.gepa: Iteration 2: Full valset pareto front score: 0.8
2025/10/28 13:29:41 INFO dspy.teleprompt.gepa.gepa: Iteration 2: Updated valset pareto front programs: [{0, 1}, {0, 1}, {0, 1}, {1}, {0, 1}]
2025/10/28 13:29:41 INFO dspy.teleprompt.gepa.gepa: Iteration 2: Best valset aggregate score so far: 0.8
2025/10/28 13:29:41 

Average Metric: 3.00 / 3 (100.0%): 100%|██████████| 3/3 [00:00<00:00, 2009.41it/s]

2025/10/28 13:29:41 INFO dspy.evaluate.evaluate: Average Metric: 3 / 3 (100.0%)
2025/10/28 13:29:41 INFO dspy.teleprompt.gepa.gepa: Iteration 3: All subsample scores perfect. Skipping.
2025/10/28 13:29:41 INFO dspy.teleprompt.gepa.gepa: Iteration 3: Reflective mutation did not propose a new candidate
2025/10/28 13:29:41 INFO dspy.teleprompt.gepa.gepa: Iteration 4: Selected program 1 score: 0.8



Average Metric: 3.00 / 3 (100.0%): 100%|██████████| 3/3 [00:00<00:00, 3763.96it/s]

2025/10/28 13:29:41 INFO dspy.evaluate.evaluate: Average Metric: 3 / 3 (100.0%)
2025/10/28 13:29:41 INFO dspy.teleprompt.gepa.gepa: Iteration 4: All subsample scores perfect. Skipping.
2025/10/28 13:29:41 INFO dspy.teleprompt.gepa.gepa: Iteration 4: Reflective mutation did not propose a new candidate
2025/10/28 13:29:41 INFO dspy.teleprompt.gepa.gepa: Iteration 5: Selected program 1 score: 0.8



Average Metric: 3.00 / 3 (100.0%): 100%|██████████| 3/3 [00:00<00:00, 5291.38it/s]

2025/10/28 13:29:41 INFO dspy.evaluate.evaluate: Average Metric: 3 / 3 (100.0%)
2025/10/28 13:29:41 INFO dspy.teleprompt.gepa.gepa: Iteration 5: All subsample scores perfect. Skipping.
2025/10/28 13:29:41 INFO dspy.teleprompt.gepa.gepa: Iteration 5: Reflective mutation did not propose a new candidate
GEPA Optimization:  93%|█████████▎| 28/30 [00:00<00:00, 264.42rollouts/s]2025/10/28 13:29:41 INFO dspy.teleprompt.gepa.gepa: Iteration 6: Selected program 1 score: 0.8



Average Metric: 3.00 / 3 (100.0%): 100%|██████████| 3/3 [00:00<00:00, 4769.87it/s]

2025/10/28 13:29:41 INFO dspy.evaluate.evaluate: Average Metric: 3 / 3 (100.0%)
2025/10/28 13:29:41 INFO dspy.teleprompt.gepa.gepa: Iteration 6: All subsample scores perfect. Skipping.
2025/10/28 13:29:41 INFO dspy.teleprompt.gepa.gepa: Iteration 6: Reflective mutation did not propose a new candidate
GEPA Optimization:  93%|█████████▎| 28/30 [00:00<00:00, 248.58rollouts/s]



Average Metric: 9.00 / 10 (90.0%): 100%|██████████| 10/10 [00:00<00:00, 5462.04it/s]

2025/10/28 13:29:41 INFO dspy.evaluate.evaluate: Average Metric: 9 / 10 (90.0%)





Unnamed: 0,text,example_is_ai,feedback,pred_is_ai,exact_match
0,"Also, it just shows why ethics still matter when you’re building n...",False,"Uses simple connective ('Also'); concrete, reader-facing phrasing;...",False,✔️ [1]
1,Let’s talk about what actually drives new ideas today: the real st...,False,"Direct, plain-spoken, and concrete; uses colloquial phrasing and a...",False,✔️ [1]
2,"In this discussion, we will delve into the underlying factors that...",True,Formulaic preface ('In this discussion') and verb 'delve'; academi...,True,✔️ [1]
3,"In conclusion, collaboration remains a pivotal factor in driving s...",True,Template opener ('In conclusion'); 'pivotal' corporate cadence; ab...,True,✔️ [1]
4,"Yeah, fair point. I’ll stop wording it that way. Appreciate you ca...",False,Casual acknowledgement with self-reflection; colloquial verbs and ...,False,✔️ [1]
5,"Moreover, this analysis underscores the importance of maintaining ...",True,Template transition ('Moreover'); formal register; generalized cla...,True,✔️ [1]
6,"Basically, working together is still the thing that makes real pro...",False,Paraphrased summary in simple language; conversational 'basically'...,False,✔️ [1]
7,"You're absolutely right. From now on, I'll avoid such phrasing. Th...",True,"Overly deferential, templated apology; polished cadence; em dash a...",False,
8,The city’s architecture reflects a rich tapestry of influences—fro...,True,"Contains cliché 'tapestry' and em dash (—); polished, generic abst...",True,✔️ [1]
9,"You can see all kinds of styles around the city, like an old stone...",False,"Concrete imagery, conversational tone, and specific details; no in...",False,✔️ [1]


EvaluationResult(score=90.0, results=<list of 10 results>)

In [None]:
# 8. Test and Iterate 
newer_lm = dspy.LM("openai/gpt-5-mini", api_key=os.getenv("OPENAI_API_KEY"), 
    temperature=1, max_tokens=32000)

# Run an evaluation with a new model
with dspy.context(lm=newer_lm):
    evaluate(optimized_detector)

Average Metric: 10.00 / 10 (100.0%): 100%|██████████| 10/10 [00:00<00:00, 2451.09it/s]

2025/10/28 15:35:16 INFO dspy.evaluate.evaluate: Average Metric: 10 / 10 (100.0%)





Unnamed: 0,text,example_is_ai,feedback,pred_is_ai,exact_match
0,"Also, it just shows why ethics still matter when you’re building n...",False,"Uses simple connective ('Also'); concrete, reader-facing phrasing;...",False,✔️ [1]
1,Let’s talk about what actually drives new ideas today: the real st...,False,"Direct, plain-spoken, and concrete; uses colloquial phrasing and a...",False,✔️ [1]
2,"In this discussion, we will delve into the underlying factors that...",True,Formulaic preface ('In this discussion') and verb 'delve'; academi...,True,✔️ [1]
3,"In conclusion, collaboration remains a pivotal factor in driving s...",True,Template opener ('In conclusion'); 'pivotal' corporate cadence; ab...,True,✔️ [1]
4,"Yeah, fair point. I’ll stop wording it that way. Appreciate you ca...",False,Casual acknowledgement with self-reflection; colloquial verbs and ...,False,✔️ [1]
5,"Moreover, this analysis underscores the importance of maintaining ...",True,Template transition ('Moreover'); formal register; generalized cla...,True,✔️ [1]
6,"Basically, working together is still the thing that makes real pro...",False,Paraphrased summary in simple language; conversational 'basically'...,False,✔️ [1]
7,"You're absolutely right. From now on, I'll avoid such phrasing. Th...",True,"Overly deferential, templated apology; polished cadence; em dash a...",True,✔️ [1]
8,The city’s architecture reflects a rich tapestry of influences—fro...,True,"Contains cliché 'tapestry' and em dash (—); polished, generic abst...",True,✔️ [1]
9,"You can see all kinds of styles around the city, like an old stone...",False,"Concrete imagery, conversational tone, and specific details; no in...",False,✔️ [1]
