From 90c323f55763692f852bae87130b1431a31a85a8 Mon Sep 17 00:00:00 2001
From: Cleo Schneider <cleoschneider@google.com>
Date: Wed, 1 May 2024 21:51:23 +0000
Subject: [PATCH 1/2] Remove preview language from evals docs and expand
 supported evaluators

---
 docs/evaluation.md | 26 ++++++++++++++++++++------
 1 file changed, 20 insertions(+), 6 deletions(-)

diff --git a/docs/evaluation.md b/docs/evaluation.md
index 5f1b13c0cc..a1c63b1c8a 100644
--- a/docs/evaluation.md
+++ b/docs/evaluation.md
@@ -1,6 +1,4 @@
-# Evaluation (Preview)
-
-Note: **Evaluation in Firebase Genkit is currently in early preview** with a limited set of available evaluation metrics. You can try out the current experience by following the documentation below. If you run into any issues or have suggestions for improvements, please [file an issue](http://github.com/google/genkit/issues). We would love to see your feedback as we refine the evaluation experience!
+# Evaluation
 
 Evaluations are a form of testing which helps you validate your LLM’s responses and ensure they meet your quality bar.
 
@@ -9,7 +7,7 @@ of your LLM-powered applications. Genkit tooling helps you automatically extract
 
 For example, if you have a RAG flow, Genkit will extract the set
 of documents that was returned by the retriever so that you can evaluate the
-quality of your retriever while it runs in the context of the flow as shown below with the RAGAS faithfulness and answer relevancy metrics:
+quality of your retriever while it runs in the context of the flow as shown below with the Genkit faithfulness and answer relevancy metrics:
 
 ```js
 import { GenkitMetric, genkitEval } from '@genkit-ai/evaluator';
@@ -25,8 +23,6 @@ export default configureGenkit({
 });
 ```
 
-We only support a small number of evaluators to help developers get started that are inspired by [RAGAS](https://docs.ragas.io/en/latest/index.html) metrics including: Faithfulness, Answer Relevancy, and Maliciousness.
-
 Start by defining a set of inputs that you want to use as an input dataset called `testQuestions.json`. This input dataset represents the test cases you will use to generate output for evaluation.
 
 ```json
@@ -57,6 +53,24 @@ genkit eval:flow bobQA --input testQuestions.json --output eval-result.json
 Note: Below you can see an example of how an LLM can help you generate the test
 cases.
 
+## Supported Evaluator Plugins
+
+### Genkit Eval
+
+We have created a small number of native evaluators to help developers get started that are inspired by [RAGAS](https://docs.ragas.io/en/latest/index.html) metrics including:
+
+- Faithfulness
+- Answer Relevancy
+- Maliciousness
+
+### VertexAI Rapid Evaluators
+
+We support a handful of VertexAI Rapid Evaluators via the [VertexAI Plugin](/docs/plugins/vertex-ai#evaluation).
+
+### Langchain Evaluators
+
+Firebase Genkit supports [Langchain Criteria Evaluation](https://python.langchain.com/docs/guides/productionization/evaluation/string/criteria_eval_chain/) via the Langchain Plugin.
+
 ## Advanced use
 
 `eval:flow` is a convenient way quickly evaluate the flow, but sometimes you

From 32109f7a509b1b400471bb14ebb8726d1fcc3312 Mon Sep 17 00:00:00 2001
From: Cleo Schneider <cleoschneider@google.com>
Date: Thu, 2 May 2024 11:55:46 +0000
Subject: [PATCH 2/2] Incorporating feedback

---
 docs/evaluation.md | 15 +++++++--------
 1 file changed, 7 insertions(+), 8 deletions(-)

diff --git a/docs/evaluation.md b/docs/evaluation.md
index a1c63b1c8a..e326ab876e 100644
--- a/docs/evaluation.md
+++ b/docs/evaluation.md
@@ -53,23 +53,22 @@ genkit eval:flow bobQA --input testQuestions.json --output eval-result.json
 Note: Below you can see an example of how an LLM can help you generate the test
 cases.
 
-## Supported Evaluator Plugins
+## Supported evaluators
 
-### Genkit Eval
+### Genkit evaluators
 
-We have created a small number of native evaluators to help developers get started that are inspired by [RAGAS](https://docs.ragas.io/en/latest/index.html) metrics including:
+Genkit includes a small number of native evaluators, inspired by RAGAS, to help you get started:
 
 - Faithfulness
 - Answer Relevancy
 - Maliciousness
 
-### VertexAI Rapid Evaluators
+### Evaluator plugins
 
-We support a handful of VertexAI Rapid Evaluators via the [VertexAI Plugin](/docs/plugins/vertex-ai#evaluation).
+Genkit supports additional evaluators through plugins:
 
-### Langchain Evaluators
-
-Firebase Genkit supports [Langchain Criteria Evaluation](https://python.langchain.com/docs/guides/productionization/evaluation/string/criteria_eval_chain/) via the Langchain Plugin.
+- VertexAI Rapid Evaluators via the [VertexAI Plugin](plugins/vertex-ai#evaluation).
+- [LangChain Criteria Evaluation](https://python.langchain.com/docs/guides/productionization/evaluation/string/criteria_eval_chain/) via the [LangChain plugin](plugins/langchain.md).
 
 ## Advanced use