llm-diff

git diff for prompts — compare LLM responses across prompt versions.

See token count changes, cost deltas, latency shifts, and a word-level diff of the actual responses — all in one command.

$ llm-diff --a prompt-v1.txt --b prompt-v2.txt --model gpt-4o

llm-diff  openai/gpt-4o

  tokens          312 →        289   -23  (-7.4%)
    input          45 →         38   -7
    output        267 →        251   -16
  cost       $0.0041 →    $0.0038   -$0.0003  (-7.3%)
  latency      1247ms →      943ms   -304ms  (-24.4%)

--- prompt A
+++ prompt B
  The capital of France is Paris.
- It is located in northern France and has a population of approximately 2.1 million people...
+ Paris, with ~2.1M residents, serves as the political and cultural center of the country...

Install

npx llm-diff --a v1.txt --b v2.txt --model gpt-4o

Or install globally:

npm install -g llm-diff

Quick start

1. Set your API key:

export OPENAI_API_KEY=sk-...
# or ANTHROPIC_API_KEY, GEMINI_API_KEY, GROQ_API_KEY

2. Compare two prompts:

# From files
llm-diff --a prompt-v1.txt --b prompt-v2.txt --model gpt-4o

# Inline text
llm-diff -a "Explain gravity" -b "Explain gravity to a child" -m gpt-4o-mini

# With a system prompt
llm-diff -a v1.txt -b v2.txt -m claude-sonnet-4-20250514 -s "You are a science teacher"

Usage

llm-diff --a <prompt-a> --b <prompt-b> --model <model> [options]

Required

Flag	Description
`--a, -a`	Prompt A — file path or inline text
`--b, -b`	Prompt B — file path or inline text
`--model, -m`	Model name (see `--models` for full list)

Options

Flag	Default	Description
`--system, -s`	—	System prompt (file path or inline text)
`--base-url`	—	Gateway URL override
`--max-tokens`	2048	Max output tokens
`--temperature`	0	Temperature
`--timeout`	60000	Request timeout (ms)
`--runs`	1	Number of runs to average
`--no-parallel`	—	Run A and B sequentially
`--full`	—	Show full inline diff with highlighting
`--json`	—	JSON output for scripting
`--models`	—	List supported models and pricing

Supported models

llm-diff --models

OpenAI

gpt-4o · gpt-4o-mini · gpt-4-turbo · gpt-4 · gpt-3.5-turbo · o1 · o1-mini · o3-mini

Anthropic

claude-sonnet-4-20250514 · claude-3.5-haiku · claude-3-opus

Google Gemini

gemini-2.0-flash · gemini-2.0-pro · gemini-1.5-pro · gemini-1.5-flash

Groq

llama-3.3-70b · llama-3.1-8b · mixtral-8x7b · gemma2-9b

Gateway support

Route requests through a custom gateway (like llmhut) instead of direct API calls:

llm-diff --a v1.txt --b v2.txt -m gpt-4o --base-url https://gw.llmhut.com/v1

The gateway handles authentication, so you don't need provider-specific API keys.

Averaging multiple runs

LLM responses vary. Average over multiple runs for stable comparisons:

llm-diff --a v1.txt --b v2.txt -m gpt-4o --runs 5

Token counts and latency are averaged. The last response text is used for the diff.

JSON output

Pipe results into scripts, dashboards, or eval pipelines:

llm-diff --a v1.txt --b v2.txt -m gpt-4o --json | jq '.delta'

{
  "totalTokens": -23,
  "totalTokensPct": -7.4,
  "cost": -0.000293,
  "costPct": -7.1,
  "latencyMs": -304,
  "latencyPct": -24.4
}

Programmatic API

import { runDiff } from 'llm-diff';

const result = await runDiff({
  promptA: 'Explain gravity',
  promptB: 'Explain gravity to a 5-year-old',
  model: 'gpt-4o-mini',
});

console.log(result.delta);
// { totalTokens: -23, cost: -0.0003, latencyMs: -304, ... }

How it works

Resolves the model → provider, pricing, API adapter
Reads prompt A and B (from files or inline text)
Fires both requests in parallel (or sequentially with --no-parallel)
Collects token counts, cost, and latency from the API response
Computes deltas between A and B
Generates a word-level diff of the response text
Renders everything to the terminal (or as JSON)

Roadmap

Eval pipeline integration (named experiments, history)
Side-by-side diff view
Cross-model comparison (--model-a gpt-4o --model-b claude-sonnet-4-20250514)
HTML report output
Config file support (.llm-diff.json)
Streaming output with live token counting
Mistral, Cohere, Together AI providers

Contributing

See CONTRIBUTING.md.

License

Apache License — see LICENSE.

Name		Name	Last commit message	Last commit date
Latest commit History 3 Commits
.github		.github
bin		bin
docs		docs
src		src
test		test
.gitignore		.gitignore
CONTRIBUTING.md		CONTRIBUTING.md
LICENSE		LICENSE
README.md		README.md
package-lock.json		package-lock.json
package.json		package.json
vitest.config.js		vitest.config.js

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

llm-diff

Install

Quick start

Usage

Required

Options

Supported models

OpenAI

Anthropic

Google Gemini

Groq

Gateway support

Averaging multiple runs

JSON output

Programmatic API

How it works

Roadmap

Contributing

License

About

Uh oh!

Releases 2

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

llm-diff

Install

Quick start

Usage

Required

Options

Supported models

OpenAI

Anthropic

Google Gemini

Groq

Gateway support

Averaging multiple runs

JSON output

Programmatic API

How it works

Roadmap

Contributing

License

About

Topics

Resources

License

Contributing

Uh oh!

Stars

Watchers

Forks

Releases 2

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages