-
Notifications
You must be signed in to change notification settings - Fork 1.2k
Description
- I have checked the documentation and related resources and couldn't resolve my bug.
Describe the bug
When running evaluate() with the answer_correctness metric, the evaluation fails with an OutputParserException caused by a pydantic.ValidationError.
The error occurs because the statement_generator_prompt inside answer_correctness returns JSON with a statements field (e.g., {"statements": [...]}) but the output parser is using a model (StringIO) that expects a text field.
This mismatch causes the parser to throw a Field required error for text.
Ragas version: v0.2.15
Python version: 3.9.x
Code to Reproduce (pseudocode - Cannot share full reproduction code as it contains proprietary information)
# prepare minimal dataset
create dataset with columns:
- question: list[str]
- answer: list[str]
- ground_truth: list[str]
# initialize metric(s)
metrics = [answer_correctness] # from ragas.metrics
# (depending on environment) prepare LLM and embeddings
llm = <any langchain-compatible LLM> # e.g., a local stub or provider-backed
embeddings = <any langchain-compatible embeddings> # shape compatible with ragas
# run evaluation
score = ragas.evaluate(
dataset,
metrics=metrics,
llm=llm,
embeddings=embeddings
)
# observe failure
# OutputParserException bubbling up from:
# statement_generator_prompt → PydanticOutputParser(StringIO[text]) → ValidationError (missing "text")
print(score.to_pandas())
Error trace
OutputParserException: Failed to parse StringIO from completion {"statements": ["..."]}.
Got: 1 validation error for StringIO
text
Field required [type=missing, input_value={'statements': [...]}, input_type=dict]
For further information visit https://errors.pydantic.dev/2.11/v/missing
Expected behavior
The statement_generator_prompt output should be successfully parsed without raising a ValidationError. The parser schema and the prompt output format should be aligned.
Additional context
The statement_generator_prompt examples in answer_correctness.get_prompts() show an expected model with a statements: List[str] field, but the parser is still configured for a StringIO model with a single text field. This mismatch causes the evaluation to fail when the LLM outputs match the examples but not the parser schema.