# Optimizing Prompts with **Automatic Prompt Engineer** (APE)

This notebook demonstrates how to use Automatic Prompt Engineer (APE) (arxiv link) to optimize prompts for text generation. In its simplest form, APE takes as input a dataset (a list of inputs and a list of outputs), a prompt template, and optimizes this prompt template so that it generates the outputs given the inputs.

APE accomplishes this in two steps. First, it uses a language model to generate a set of candidate prompts. Then, it uses a prompt evaluation function to evaluate the quality of each candidate prompt. Finally, it returns the prompt with the highest evaluation score.

In [1]:
# First, let's define a simple dataset consisting of words and their antonyms.
words = ["sane", "direct", "informally", "unpopular", "subtractive", "nonresidential",
    "inexact", "uptown", "incomparable", "powerful", "gaseous", "evenly", "formality",
    "deliberately", "off"]
antonyms = ["insane", "indirect", "formally", "popular", "additive", "residential",
    "exact", "downtown", "comparable", "powerless", "solid", "unevenly", "informality",
    "accidentally", "on"]

In [2]:
# Now, we need to define the format of the prompt that we are using.

eval_template = \
"""Instruction: [PROMPT]
Input: [INPUT]
Output: [OUTPUT]"""

In [3]:
# Now, let's use APE to find prompts that generate antonyms for each word.
from automatic_prompt_engineer import ape

results = ape.eape(
    dataset=(words, antonyms),
    eval_template=eval_template,
)

Generating prompts...
[GPT_forward] Generating 50 completions, split into 1 batches of size 5000


100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 1/1 [00:01<00:00,  1.24s/it]


Model returned 50 prompts. Deduplicating...
Deduplicated to 50 prompts.
First 3 prompts: Output: pythony
Output: informality
Output: get into his dream

Evaluating prompts... ea
ea


Evaluating prompts: 100%|██████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 5/5 [00:00<00:00, 2520.62it/s]


50 50 50 !!!!!
child Output: informals original purpose Output: avert itity
mutate Output: informals original purpose Output: avert itity
child Output: informality Output: use their hands to hit them
mutate Output: informality Output: use their hands to hit them
child Output: pythony Output: instruct a friend to send a flier.
mutate Output: pythony Output: instruct a friend to send a flier.
child Output: get into his dream Output: use their hands to hit them
mutate Output: get into his dream Output: use their hands to hit them
child Output: get into his dream Output: informal
mutate Output: get into his dream Output: informal
50 50 50 !!!!!
child Output: unassemble Output: sane
mutate Output: unassemble Output: sane
child Output: to be ht a friend to send a flier. Output: instruconest
mutate Output: to be ht a friend to send a flier. Output: instruconest
child Output: unassemble Output: pythony
mutate Output: unassemble Output: pythony
child Output: go out there with no one Output: pyt

Let's compare with a prompt written by a human:

"*Write an antonym to the following word.*"

In [9]:
from automatic_prompt_engineer import ape

manual_prompt = "Write an antonym to the following word."

human_result = ape.simple_eval(
    dataset=(words, antonyms),
    eval_template=eval_template,
    prompts=[manual_prompt],
)

ea


Evaluating prompts:   0%|                                                                                                                                                                 | 0/5 [00:00<?, ?it/s]

1 1 1 !!!!!





IndexError: list index out of range

In [None]:
print(human_result)