-
Notifications
You must be signed in to change notification settings - Fork 20
Adding ground truth generation #145
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
* 🎛️ feat: switch evalModel to evalModelSet for test evaluation Replaces evalModel with evalModelSet, allowing multiple evaluation models. * ✨ feat: add multi-model evaluation to metrics and compliance Support evaluating metrics and compliance with multiple models via evalModelSet. * ✨ Refine evaluation model handling and debug logging Improved evalModelSet defaults, header levels, and debugging output. * ✨ Enhance evalModelSet sourcing and logging in promptpex Now supports sourcing evalModelSet from env var, adds validation, and logging. * ✨ refactor test metric evaluation and overview model handling Refined evalModelSet parsing and updated test metric iteration logic. * ✨ feat: Add combined avg metric across eval models Compute and store average metric score for all evaluation models used. * ✨ Enhance promptpex test evaluation and script logic Added separate eval-only/test-run modes, improved metric evaluations * ♻️ Rename evalModelSet to evalModel throughout codebase Standardizes config and variable naming from evalModelSet to evalModel.
…#141) * ✨ Enhance test results saving and eval metrics workflow Improved control of results file writing and evaluation metrics assignment. * ✨ Add evals config flag to control evaluation execution Introduces evals boolean for toggling evaluation of test results. * ✨: Enable direct context-loading from JSON files Refactored CLI to load PromptPexContext from JSON, updating file flow.
* ✨ Add scripts and logic for multi-stage sample evaluations Introduces zx scripts for gen/run/eval sample tests and conditional test executions. * 🔀 rename: Samples scripts renamed to .zx.mjs extensions All run-samples-*.mjs scripts updated to .zx.mjs for zx compatibility. * ♻️ refactor: Rename sample scripts to .zx.mjs extensions Updated script names in package.json and renamed a sample file for zx compatibility
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull Request Overview
This PR enhances ground truth generation by revising evaluation parameters and updating how evaluation models are handled throughout the code.
- Introduces a new evaluation flag (evals) and updates the evalModel type in types.mts.
- Adjusts test and metric evaluation functions to operate on evalModel as an array, and modifies related scripts and configurations.
Reviewed Changes
Copilot reviewed 16 out of 16 changed files in this pull request and generated 4 comments.
Show a summary per file
| File | Description |
|---|---|
| src/genaisrc/src/types.mts | Added new evals flag and updated the evalModel type signature. |
| src/genaisrc/src/testrun.mts | Modified test run logic to extract the first element from evalModel for evaluation. |
| src/genaisrc/src/testresulteval.mts | Updated evaluateTestResult signature to accept an explicit evalModel parameter. |
| src/genaisrc/src/testquality.mts | Changed how evalModel is accessed (using index 0) for quality evaluation. |
| src/genaisrc/src/testevalmetric.mts | Iterates over options.evalModel for computing metrics. |
| src/genaisrc/src/rulesgroundeness.mts | Uses the first element of evalModel for rule grounding. |
| src/genaisrc/promptpex.genai.mts | Adjusted option parsing and ensured evalModel is processed as an array. |
| Other files (samples, package.json, GitHub workflows) | Updated scripts and configuration settings to be consistent with evaluation changes. |
Comments suppressed due to low confidence (1)
src/genaisrc/src/types.mts:132
- The evalModel property now accepts an array or the literal 'eval'. Ensure that its usage is consistent throughout the codebase, as some modules assume it is always an array.
evalModel?: ModelType[] | "eval"
| const eModel = options.evalModel[0] | ||
| testRes.compliance = undefined | ||
| const compliance = await evaluateTestResult(files, testRes, options) | ||
| const compliance = await evaluateTestResult(files, eModel, testRes, options) | ||
| testRes.complianceText = compliance.content | ||
| updateTestResultCompliant(testRes) |
Copilot
AI
Jun 5, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Indexing evalModel without verifying its type assumes it is an array. Add a type check or normalization to prevent unexpected behavior when evalModel is a string.
| const { force, evalModel = MODEL_ALIAS_EVAL } = options || {} | ||
|
|
||
| const { force } = options || {} | ||
| const evalModel = options.evalModel[0] |
Copilot
AI
Jun 5, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Accessing the first element of evalModel assumes it is always an array. Ensure that evalModel is normalized to an array type or add a safeguard before indexing.
| // Remove all previous metrics before computing new ones | ||
| testResult.metrics = {} | ||
|
|
||
| for (const eModel of options.evalModel) { |
Copilot
AI
Jun 5, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Iterating over evalModel assumes that it is an array. If evalModel might be a string, ensure it is converted to an array to avoid iterating over its characters.
| for (const eModel of options.evalModel) { | |
| let evalModelArr: string[] = [] | |
| if (typeof options.evalModel === "string") { | |
| evalModelArr = options.evalModel.split(";").map(s => s.trim()).filter(Boolean) | |
| } else if (Array.isArray(options.evalModel)) { | |
| evalModelArr = options.evalModel | |
| } | |
| for (const eModel of evalModelArr) { |
| { | ||
| ...modelOptions(evalModel, options), | ||
| // assume first evalModel is the one to use for rule grounding | ||
| ...modelOptions(evalModel[0], options), |
Copilot
AI
Jun 5, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Using evalModel[0] without ensuring evalModel is an array can lead to errors if a string is passed. Ensure consistent type normalization for evalModel.
| ...modelOptions(evalModel[0], options), | |
| ...modelOptions(normalizedEvalModel[0], options), |
|
Meant to have the PR to dev |
No description provided.