Skip to content
Merged
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 1 addition & 1 deletion src/langsmith/evaluation-concepts.mdx
Original file line number Diff line number Diff line change
Expand Up @@ -110,7 +110,7 @@ Learn about [how to define an LLM-as-a-judge evaluator](/langsmith/llm-as-judge)

### Pairwise

Pairwise evaluators allow you to compare the outputs of two versions of an application. Think [LMSYS Chatbot Arena](https://chat.lmsys.org/) - this is the same concept, but applied to AI applications more generally, not just models! This can use either a heuristic ("which response is longer"), an LLM (with a specific pairwise prompt), or human (asking them to manually annotate examples).
Pairwise evaluators allow you to compare the outputs of two versions of an application. This can use either a heuristic ("which response is longer"), an LLM (with a specific pairwise prompt), or human (asking them to manually annotate examples).

**When should you use pairwise evaluation?**

Expand Down