Your prompts, evolved. A CLI tool that uses LLMs to automatically improve your prompts β inspired by the Automatic Prompt Engineer (APE) paper.
"Large Language Models Are Human-Level Prompt Engineers" β Zhou et al., ICLR 2023
ape takes a prompt you've written, generates multiple improved variants using an LLM, scores each variant on clarity, specificity, and effectiveness, then ranks them best-to-worst. It's the APE paper's generate β score β select pipeline in your terminal.
Three stages, two API calls:
- 𧬠Generate β An LLM produces N improved variants of your prompt (high temperature for diversity)
- π Score β The same LLM judges each variant on 3 criteria (low temperature for consistency)
- π Rank β Results sorted by overall score, displayed with color-coded ratings
The original APE paper (Zhou et al., ICLR 2023) framed prompt optimization as natural language program synthesis β a black-box optimization problem where LLMs both propose candidate instructions and evaluate them. APE achieved human-level or better performance on 24/24 Instruction Induction tasks and famously discovered "Let's work this out in a step by step way to be sure we have the right answer" β beating the hand-crafted "Let's think step by step" CoT prompt.
This CLI implements a simplified version of that pipeline, adapted for interactive use. Instead of execution accuracy on test pairs, we use multi-criteria LLM-as-judge scoring β making it work for any prompt, not just those with known answers.
This tool is designed to be invoked by AI agents as a prompt improvement skill. When an AI receives a user's prompt, it can shell out to improve it before responding:
# AI agent calls this, parses JSON, uses the top-ranked variant
echo "summarize this article for me" | ape --json --count 3The --json flag outputs machine-readable results that an AI can parse to:
- Pick the highest-scoring variant as the improved prompt
- Present multiple options to the user for selection
- Explain why the improved prompt is better using the scoring criteria
Pipe-in, pipe-out, zero interactivity β a composable Unix tool for AI agent workflows.
Requires Bun runtime.
# Run directly without installing
bunx ape-cli --help
# Or install globally
bun install -g ape-cli
# Or clone and link locally
git clone https://github.com/kouryuu/ape-cli.git && cd ape-cli
bun install
bun linkNow you can use ape anywhere.
# Basic usage β pipe in a prompt
echo "Summarize this article" | ape
# Read from a file
cat prompt.txt | ape
# Generate 3 variants using Claude
echo "Write a poem about space" | ape --provider claude --count 3
# JSON output for scripting / AI skill usage
echo "Explain quantum computing" | ape --json
# Use with OpenRouter
echo "Help me debug" | ape --base-url https://openrouter.ai/api/v1 --model openai/gpt-4o
# Use with local Ollama
echo "Translate this" | ape --base-url http://localhost:11434/v1 --model llama3
# Extract just the top prompt
echo "My prompt" | ape --json | jq -r '.[0].text'| Flag | Default | Description |
|---|---|---|
--provider |
auto-detect | openai or claude. Auto-detected from env vars. |
--model |
gpt-4o-mini / claude-sonnet-4-20250514 |
Model for generation and scoring |
--base-url |
provider default | Override API base URL |
--count |
5 |
Number of variants to generate |
--json |
false |
Machine-readable JSON output |
--help |
β | Show usage |
| Variable | Provider | Description |
|---|---|---|
OPENAI_API_KEY |
openai |
OpenAI (or compatible) API key |
ANTHROPIC_API_KEY |
claude |
Anthropic Claude API key |
Auto-detection: If no --provider is specified, the CLI checks which key is set. If both are set, defaults to openai.
| Provider | --provider |
--base-url |
Notes |
|---|---|---|---|
| OpenAI | openai |
(default) | Uses OPENAI_API_KEY |
| Claude | claude |
(default) | Uses ANTHROPIC_API_KEY |
| OpenRouter | openai |
https://openrouter.ai/api/v1 |
Set OPENAI_API_KEY to your OpenRouter key |
| Ollama | openai |
http://localhost:11434/v1 |
No API key needed (set dummy value) |
| Any OpenAI-compatible | openai |
<your-url> |
Works with any compatible endpoint |
β¦ Prompt Improver β 5 variants generated and scored
#1 ββββββββββββββββββββ 8.7/10
β You are an expert summarizer. Given the following article, produce a...
β
β Clarity: 9 Specificity: 8 Effectiveness: 9
β Feedback: Added role assignment and output format constraints
β Changes: Assigned expert role, added structured output requirement
ββββ
[
{
"rank": 1,
"text": "You are an expert summarizer...",
"reasoning": "Added role assignment and structured output",
"scores": { "clarity": 9, "specificity": 8, "effectiveness": 9 },
"overall": 8.7,
"feedback": "Clear role and format guidance"
}
]bun testTests use mocked LLM providers β no API key required.
This tool is inspired by:
@inproceedings{zhou2023large,
title={Large Language Models are Human-Level Prompt Engineers},
author={Zhou, Yongchao and Muresanu, Andrei Ioan and Han, Ziwen and Paster, Keiran and Pitis, Silviu and Chan, Harris and Ba, Jimmy},
booktitle={ICLR},
year={2023}
}This CLI has zero runtime dependencies. Everything is built on Bun built-ins:
fetchfor API callsBun.stdinfor inputBun.argvfor arg parsing- Raw ANSI escape codes for colors
Dev dependencies (typescript, @types/bun) are only used for type-checking.
MIT Β© 2025 Rodrigo Reyes
Built with π by an ape who believes your prompts deserve better.