docs: test based eval documentation#916
Conversation
|
The PR description has been updated. Please fill out the template for your PR to be reviewed. |
planetf1
left a comment
There was a problem hiding this comment.
A few minor suggestions - but one syntax correction I think is needed as users will hopefully follow the docs
Co-authored-by: Nigel Jones <nigel.l.jones+git@gmail.com>
Co-authored-by: Nigel Jones <nigel.l.jones+git@gmail.com>
|
On other thing I noticed -- not touched by this PR, but in the same file -- we don't explain 'verdict', at least not in terms of it's content. A reader might think it's a boolean (we refer to this in the docs when talking about llm as a judge... but that's using a different pattern, with conversion) Here it could be anything I think - whatever the model returns. That may be worth clarifying? Just saying it's the raw llm output? |
|
One structural suggestion: the three-level table at the top sets up a useful mental model, but A small addition could help — a fourth row in the table:
And a bridging sentence after the table: "For levels 1–3, use pytest with the patterns below. For semantic evaluation against reference examples — where you want a judge model to score your model's outputs in bulk — see The |
|
The Next steps section currently points to the Requirements System and Handling Exceptions. Worth adding a link to Evaluate with LLM-as-a-Judge here — it covers the |
The default judge prompt in the jinja template does use a boolean value. This could be adjusted of course. |
Added this |
Added this |
|
I tested the code example end-to-end in a fresh project ( Bug 1 —
|
Co-authored-by: Nigel Jones <nigel.l.jones+git@gmail.com>
Co-authored-by: Nigel Jones <nigel.l.jones+git@gmail.com>
208ca9b
Misc PR
Type of PR
Description
We are pleased to see documentation of our TestBasedEval contribution in the Mellea documentation. We have made some adjustments to add further clarification of the functionality and advantage of using Generative Unit Tests via LLM-as-a-Judge:
Testing
Attribution