-
Notifications
You must be signed in to change notification settings - Fork 20
✨ add groundtruth-based scoring and evaluation prompt #151
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Introduced groundtruth/compliance metrics and new evaluation prompt file.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull Request Overview
This PR introduces groundtruth-based scoring in the evaluation prompt and extends test data structures to include groundtruth information.
- Add a new prompt file (use_groundtruth_rules.prompty) that defines evaluation guidelines incorporating a groundtruth value.
- Update testrun.mts and resolvers.mts to propagate groundtruth and groundtruthModel properties.
Reviewed Changes
Copilot reviewed 3 out of 3 changed files in this pull request and generated 1 comment.
| File | Description |
|---|---|
| src/prompts/use_groundtruth_rules.prompty | New evaluation prompt file defining groundtruth-based scoring and compliance checks |
| src/genaisrc/src/testrun.mts | Added optional groundtruth and groundtruthModel fields to the test result structure |
| src/genaisrc/src/resolvers.mts | Extended resolver arguments to include groundtruth and groundtruthModel properties |
Improved test result consistency, prompt file error, and command args.
Standardizes metric key separator with METRIC_SEPARATOR and adds a test script.
src/genaisrc/promptpex.genai.mts
Outdated
| scenario, | ||
| input, | ||
| output, | ||
| groundtruth: groundtruth, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
grouptruth,
if the variable and the field are the same, you can just put the variable and typescript does the rest
| dbg(`evaluating ${metrics.length} metrics with eval model(s) %O`, eModel) | ||
| for (const metric of metrics) { | ||
| const key = metricName(metric)+"|em|"+eModel | ||
| const key = metricName(metric)+METRIC_SEPARATOR+eModel |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
run the linter
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why not using JSON to serialize a key?
|
Refactor groundtruth parsing / mangling into functions (probably new file) to avoide repeating parsing code in various places. |
pelikhan
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Refactor key parsing code
Added full CLI and script parameter documentation, improved model suggestions, fixed groundtruth model handling, and clarified output formatting in reports.
Refactored runTests options for clarity; normalized groundtruth checks.
- Introduces example-use and test-collection-review docs, updates scripts, fixes CLI usage.
Introduced groundtruth/compliance metrics and new evaluation prompt file.