diff --git a/src/langsmith/evaluation-concepts.mdx b/src/langsmith/evaluation-concepts.mdx index 1406365d65..4f9dd8b235 100644 --- a/src/langsmith/evaluation-concepts.mdx +++ b/src/langsmith/evaluation-concepts.mdx @@ -110,7 +110,7 @@ Learn about [how to define an LLM-as-a-judge evaluator](/langsmith/llm-as-judge) ### Pairwise -Pairwise evaluators allow you to compare the outputs of two versions of an application. Think [LMSYS Chatbot Arena](https://chat.lmsys.org/) - this is the same concept, but applied to AI applications more generally, not just models! This can use either a heuristic ("which response is longer"), an LLM (with a specific pairwise prompt), or human (asking them to manually annotate examples). +Pairwise evaluators allow you to compare the outputs of two versions of an application. This can use either a heuristic ("which response is longer"), an LLM (with a specific pairwise prompt), or human (asking them to manually annotate examples). **When should you use pairwise evaluation?**