Skip to content
Merged
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
19 changes: 9 additions & 10 deletions docs/evaluate/index.md
Original file line number Diff line number Diff line change
Expand Up @@ -72,12 +72,10 @@ This approach involves creating individual test files, each representing a singl
through the right path to generate final response.
- `Final Response`: The expected final response from the agent.

You can give the file any name for example `evaluation.test.json`.The framework only checks for the `.test.json` suffix, and the preceding part of the filename is not constrained. Here is a test file with a few examples:

NOTE: The test files are now backed by a formal Pydantic data model. The two key
schema files are
You can give the file any name for example `evaluation.test.json`.The framework only checks for the `.test.json` suffix, and the preceding part of the filename is not constrained. The test files are backed by a formal Pydantic data model. The two key schema files are
[Eval Set](https://github.com/google/adk-python/blob/main/src/google/adk/evaluation/eval_set.py) and
[Eval Case](https://github.com/google/adk-python/blob/main/src/google/adk/evaluation/eval_case.py)
[Eval Case](https://github.com/google/adk-python/blob/main/src/google/adk/evaluation/eval_case.py).
Here is a test file with a few examples:

*(Note: Comments are included for explanatory purposes and should be removed for the JSON to be valid.)*

Expand Down Expand Up @@ -154,12 +152,13 @@ The evalset approach utilizes a dedicated dataset called an "evalset" for evalua

An evalset file contains multiple "evals," each representing a distinct session. Each eval consists of one or more "turns," which include the user query, expected tool use, expected intermediate agent responses, and a reference response. These fields have the same meaning as they do in the test file approach. Each eval is identified by a unique name. Furthermore, each eval includes an associated initial session state.

Creating evalsets manually can be complex, therefore UI tools are provided to help capture relevant sessions and easily convert them into evals within your evalset. Learn more about using the web UI for evaluation below. Here is an example evalset containing two sessions.

NOTE: The eval set files are now backed by a formal Pydantic data model. The two key
schema files are
Creating evalsets manually can be complex, therefore UI tools are provided to help capture relevant sessions and easily convert them into evals within your evalset. Learn more about using the web UI for evaluation below. Here is an example evalset containing two sessions. The eval set files are backed by a formal Pydantic data model. The two key schema files are
[Eval Set](https://github.com/google/adk-python/blob/main/src/google/adk/evaluation/eval_set.py) and
[Eval Case](https://github.com/google/adk-python/blob/main/src/google/adk/evaluation/eval_case.py)
[Eval Case](https://github.com/google/adk-python/blob/main/src/google/adk/evaluation/eval_case.py).

!!! warning
This evalset evaluation method requires the use of a paid service,
[Vertex Gen AI Evaluation Service API](https://cloud.google.com/vertex-ai/generative-ai/docs/model-reference/evaluation).

*(Note: Comments are included for explanatory purposes and should be removed for the JSON to be valid.)*

Expand Down