✨ add groundtruth-based scoring and evaluation prompt #151

bzorn · 2025-06-06T19:01:21Z

Introduced groundtruth/compliance metrics and new evaluation prompt file.

Copilot

Pull Request Overview

This PR introduces groundtruth-based scoring in the evaluation prompt and extends test data structures to include groundtruth information.

Add a new prompt file (use_groundtruth_rules.prompty) that defines evaluation guidelines incorporating a groundtruth value.
Update testrun.mts and resolvers.mts to propagate groundtruth and groundtruthModel properties.

Reviewed Changes

Copilot reviewed 3 out of 3 changed files in this pull request and generated 1 comment.

File	Description
src/prompts/use_groundtruth_rules.prompty	New evaluation prompt file defining groundtruth-based scoring and compliance checks
src/genaisrc/src/testrun.mts	Added optional groundtruth and groundtruthModel fields to the test result structure
src/genaisrc/src/resolvers.mts	Extended resolver arguments to include groundtruth and groundtruthModel properties

src/prompts/use_groundtruth_rules.prompty

Improved test result consistency, prompt file error, and command args.

Standardizes metric key separator with METRIC_SEPARATOR and adds a test script.

pelikhan · 2025-06-06T21:24:55Z

src/genaisrc/promptpex.genai.mts

                scenario,
                input,
                output,
+                groundtruth: groundtruth,


grouptruth,

if the variable and the field are the same, you can just put the variable and typescript does the rest

src/genaisrc/src/resolvers.mts

pelikhan · 2025-06-06T21:26:26Z

src/genaisrc/src/testevalmetric.mts

        dbg(`evaluating ${metrics.length} metrics with eval model(s) %O`, eModel)
        for (const metric of metrics) {
-            const key = metricName(metric)+"|em|"+eModel
+            const key = metricName(metric)+METRIC_SEPARATOR+eModel


run the linter

why not using JSON to serialize a key?

pelikhan · 2025-06-06T21:27:52Z

Refactor groundtruth parsing / mangling into functions (probably new file) to avoide repeating parsing code in various places.

pelikhan

Refactor key parsing code

Added full CLI and script parameter documentation, improved model suggestions, fixed groundtruth model handling, and clarified output formatting in reports.

Refactored runTests options for clarity; normalized groundtruth checks.

- Introduces example-use and test-collection-review docs, updates scripts, fixes CLI usage.

✨ add groundtruth-based scoring and evaluation prompt

ad6ef85

Introduced groundtruth/compliance metrics and new evaluation prompt file.

bzorn requested review from Copilot and pelikhan June 6, 2025 19:01

Copilot AI reviewed Jun 6, 2025

View reviewed changes

src/prompts/use_groundtruth_rules.prompty Outdated Show resolved Hide resolved

bzorn added 2 commits June 6, 2025 20:15

♻️ Refactor test run logic and groundtruth handling

d2407c3

Improved test result consistency, prompt file error, and command args.

✨ Unify metric separator usage and add new test script

e5e6794

Standardizes metric key separator with METRIC_SEPARATOR and adds a test script.

pelikhan reviewed Jun 6, 2025

View reviewed changes

src/genaisrc/src/resolvers.mts Outdated Show resolved Hide resolved

pelikhan reviewed Jun 6, 2025

View reviewed changes

pelikhan requested changes Jun 6, 2025

View reviewed changes

bzorn added 3 commits June 6, 2025 22:15

📝: Add CLI parameter docs, improve model handling

bfdaabd

Added full CLI and script parameter documentation, improved model suggestions, fixed groundtruth model handling, and clarified output formatting in reports.

✨ Enhance groundtruth/runTests param handling and clarity

1655361

Refactored runTests options for clarity; normalized groundtruth checks.

✨: Add PromptPex usage and review docs, update commands

0f91669

- Introduces example-use and test-collection-review docs, updates scripts, fixes CLI usage.

bzorn merged commit da90957 into dev Jun 6, 2025
4 checks passed

bzorn deleted the new-metric-with-groundtruth branch June 6, 2025 23:53

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

✨ add groundtruth-based scoring and evaluation prompt #151

✨ add groundtruth-based scoring and evaluation prompt #151

Uh oh!

bzorn commented Jun 6, 2025

Uh oh!

Copilot AI left a comment

Uh oh!

Uh oh!

pelikhan Jun 6, 2025

Uh oh!

Uh oh!

pelikhan Jun 6, 2025

Uh oh!

pelikhan Jun 6, 2025

Uh oh!

pelikhan commented Jun 6, 2025

Uh oh!

pelikhan left a comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

✨ add groundtruth-based scoring and evaluation prompt #151

✨ add groundtruth-based scoring and evaluation prompt #151

Uh oh!

Conversation

bzorn commented Jun 6, 2025

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull Request Overview

Reviewed Changes

Uh oh!

Uh oh!

pelikhan Jun 6, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!

pelikhan Jun 6, 2025

Choose a reason for hiding this comment

Uh oh!

pelikhan Jun 6, 2025

Choose a reason for hiding this comment

Uh oh!

pelikhan commented Jun 6, 2025

Uh oh!

pelikhan left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants