Skip to content

Conversation

@bzorn
Copy link
Contributor

@bzorn bzorn commented Jun 5, 2025

No description provided.

pelikhan and others added 5 commits June 2, 2025 23:45
* 🎛️ feat: switch evalModel to evalModelSet for test evaluation

Replaces evalModel with evalModelSet, allowing multiple evaluation models.

* ✨ feat: add multi-model evaluation to metrics and compliance

Support evaluating metrics and compliance with multiple models via evalModelSet.

* ✨ Refine evaluation model handling and debug logging

Improved evalModelSet defaults, header levels, and debugging output.

* ✨ Enhance evalModelSet sourcing and logging in promptpex

Now supports sourcing evalModelSet from env var, adds validation, and logging.

* ✨ refactor test metric evaluation and overview model handling

Refined evalModelSet parsing and updated test metric iteration logic.

* ✨ feat: Add combined avg metric across eval models

Compute and store average metric score for all evaluation models used.

* ✨ Enhance promptpex test evaluation and script logic

Added separate eval-only/test-run modes, improved metric evaluations

* ♻️ Rename evalModelSet to evalModel throughout codebase

Standardizes config and variable naming from evalModelSet to evalModel.
…#141)

* ✨ Enhance test results saving and eval metrics workflow

Improved control of results file writing and evaluation metrics assignment.

* ✨ Add evals config flag to control evaluation execution

Introduces evals boolean for toggling evaluation of test results.

* ✨: Enable direct context-loading from JSON files

Refactored CLI to load PromptPexContext from JSON, updating file flow.
* ✨ Add scripts and logic for multi-stage sample evaluations

Introduces zx scripts for gen/run/eval sample tests and conditional test executions.

* 🔀 rename: Samples scripts renamed to .zx.mjs extensions

All run-samples-*.mjs scripts updated to .zx.mjs for zx compatibility.

* ♻️ refactor: Rename sample scripts to .zx.mjs extensions

Updated script names in package.json and renamed a sample file for zx compatibility
@bzorn bzorn requested review from Copilot and pelikhan June 5, 2025 20:22
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

This PR enhances ground truth generation by revising evaluation parameters and updating how evaluation models are handled throughout the code.

  • Introduces a new evaluation flag (evals) and updates the evalModel type in types.mts.
  • Adjusts test and metric evaluation functions to operate on evalModel as an array, and modifies related scripts and configurations.

Reviewed Changes

Copilot reviewed 16 out of 16 changed files in this pull request and generated 4 comments.

Show a summary per file
File Description
src/genaisrc/src/types.mts Added new evals flag and updated the evalModel type signature.
src/genaisrc/src/testrun.mts Modified test run logic to extract the first element from evalModel for evaluation.
src/genaisrc/src/testresulteval.mts Updated evaluateTestResult signature to accept an explicit evalModel parameter.
src/genaisrc/src/testquality.mts Changed how evalModel is accessed (using index 0) for quality evaluation.
src/genaisrc/src/testevalmetric.mts Iterates over options.evalModel for computing metrics.
src/genaisrc/src/rulesgroundeness.mts Uses the first element of evalModel for rule grounding.
src/genaisrc/promptpex.genai.mts Adjusted option parsing and ensured evalModel is processed as an array.
Other files (samples, package.json, GitHub workflows) Updated scripts and configuration settings to be consistent with evaluation changes.
Comments suppressed due to low confidence (1)

src/genaisrc/src/types.mts:132

  • The evalModel property now accepts an array or the literal 'eval'. Ensure that its usage is consistent throughout the codebase, as some modules assume it is always an array.
evalModel?: ModelType[] | "eval"

Comment on lines +234 to 238
const eModel = options.evalModel[0]
testRes.compliance = undefined
const compliance = await evaluateTestResult(files, testRes, options)
const compliance = await evaluateTestResult(files, eModel, testRes, options)
testRes.complianceText = compliance.content
updateTestResultCompliant(testRes)
Copy link

Copilot AI Jun 5, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Indexing evalModel without verifying its type assumes it is an array. Add a type check or normalization to prevent unexpected behavior when evalModel is a string.

Copilot uses AI. Check for mistakes.
const { force, evalModel = MODEL_ALIAS_EVAL } = options || {}

const { force } = options || {}
const evalModel = options.evalModel[0]
Copy link

Copilot AI Jun 5, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Accessing the first element of evalModel assumes it is always an array. Ensure that evalModel is normalized to an array type or add a safeguard before indexing.

Copilot uses AI. Check for mistakes.
// Remove all previous metrics before computing new ones
testResult.metrics = {}

for (const eModel of options.evalModel) {
Copy link

Copilot AI Jun 5, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Iterating over evalModel assumes that it is an array. If evalModel might be a string, ensure it is converted to an array to avoid iterating over its characters.

Suggested change
for (const eModel of options.evalModel) {
let evalModelArr: string[] = []
if (typeof options.evalModel === "string") {
evalModelArr = options.evalModel.split(";").map(s => s.trim()).filter(Boolean)
} else if (Array.isArray(options.evalModel)) {
evalModelArr = options.evalModel
}
for (const eModel of evalModelArr) {

Copilot uses AI. Check for mistakes.
{
...modelOptions(evalModel, options),
// assume first evalModel is the one to use for rule grounding
...modelOptions(evalModel[0], options),
Copy link

Copilot AI Jun 5, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Using evalModel[0] without ensuring evalModel is an array can lead to errors if a string is passed. Ensure consistent type normalization for evalModel.

Suggested change
...modelOptions(evalModel[0], options),
...modelOptions(normalizedEvalModel[0], options),

Copilot uses AI. Check for mistakes.
@bzorn bzorn closed this Jun 5, 2025
@bzorn
Copy link
Contributor Author

bzorn commented Jun 5, 2025

Meant to have the PR to dev

@bzorn bzorn deleted the ground-truth branch June 5, 2025 21:11
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants