feat: promptfoo eval --filter-failing outputFile.json #742

mikkoh · 2024-04-30T20:54:10Z

Makes it so that you can iterate more quickly by simply running failing tests given an output file.

Example Flow

First run: promptfoo eval --output result.json

Next run: promptfoo eval --failing result.json

mikkoh · 2024-04-30T21:14:01Z

src/commands/eval/filterFailingTests.ts

+  const {results} = await readOutput(outputPath);
+
+  if (results.version < 2) {
+    throw new Error(`Unsupported output version: ${results.version}`);


I put this here just cause I'm not familiar with the differences between versions

Should be safe to remove this check 👍

mikkoh · 2024-04-30T21:15:20Z

src/main.ts

          firstN: cmdObj.firstN,
          pattern: cmdObj.pattern,
+          failing: cmdObj.failing,


I'm wondering if these command line params should be changed to be something like:

filterFirstN filterPattern filterFailing

I support this

Ok... I'll make the change in this PR

typpo

LGTM - responses to your comments inline. Let me know if you'd like to rename the params to filter... in this PR or a separate one.

typpo · 2024-05-01T03:54:30Z

src/main.ts

          firstN: cmdObj.firstN,
          pattern: cmdObj.pattern,
+          failing: cmdObj.failing,


I support this

typpo · 2024-05-01T03:55:02Z

src/commands/eval/filterFailingTests.ts

+  const {results} = await readOutput(outputPath);
+
+  if (results.version < 2) {
+    throw new Error(`Unsupported output version: ${results.version}`);


Should be safe to remove this check 👍

typpo · 2024-05-01T04:05:25Z

src/util.ts

@@ -1228,3 +1244,26 @@ export function getStandaloneEvals(): StandaloneEval[] {
  });
  return flatResults;
 }
+
+export function providerToIdentifier(provider: TestCase['provider']): string | undefined {


Really appreciate you adding these helper functions. I'm aware that ApiProvider vs ProviderOptions and similar variations are some of the ugliest parts of the code :(

No prob. I love the flexibility of promptfoo. You can use it many ways. But it also does brings on complexity. I suspect little utility functions to simplify logic can go a long way.

mikkoh · 2024-05-01T13:24:26Z

src/main.ts

@@ -552,8 +552,9 @@ async function main() {
      'Run providers interactively, one at a time',
      defaultConfig?.evaluateOptions?.interactiveProviders,
    )
-    .option('-n, --first-n <number>', 'Only run the first N tests')
-    .option('--pattern <pattern>', 'Only run tests whose description matches the regular expression pattern')
+    .option('-n, --filter-first-n <number>', 'Only run the first N tests')


This will be a breaking change so will need to do a version bump for the next release. Unsure if theres a mechanism you use eg Github labels to determine if a version bump is needed for the next release

Should we keep -n? Should the other two filters have short forms also?

My selfish preference is to keep -n because it feels familiar, like head -n :). Open to other short forms but I don't think it's necessary. If we find ourselves getting tired of typing everything out let's add it separately.

I do version bumps manually - since we're pre-1.0 I've included breaking changes in minor versions, but generally trying to avoid them. I think this is acceptable though. I'll merge with feat! breaking change notation and make a note in the release notes.

mikkoh changed the title ~~promptfoo eval --failing outputFile.json~~ feat: promptfoo eval --failing outputFile.json Apr 30, 2024

mikkoh force-pushed the mikkoh/run-failing branch 2 times, most recently from 921864d to 3bd36a3 Compare April 30, 2024 21:01

mikkoh commented Apr 30, 2024

View reviewed changes

mikkoh marked this pull request as ready for review April 30, 2024 21:21

typpo approved these changes May 1, 2024

View reviewed changes

promptfoo eval --failing outputFile.json

1e5ec6d

mikkoh force-pushed the mikkoh/run-failing branch from 89bbcbd to 1e5ec6d Compare May 1, 2024 13:23

mikkoh commented May 1, 2024

View reviewed changes

mikkoh changed the title ~~feat: promptfoo eval --failing outputFile.json~~ feat: promptfoo eval --filter-failing outputFile.json May 1, 2024

mikkoh mentioned this pull request May 1, 2024

docs: Update to include --filter-* cli args #747

Merged

typpo merged commit 767caca into promptfoo:main May 1, 2024
9 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: promptfoo eval --filter-failing outputFile.json #742

feat: promptfoo eval --filter-failing outputFile.json #742

mikkoh commented Apr 30, 2024 •

edited

Loading

mikkoh Apr 30, 2024

typpo May 1, 2024

mikkoh Apr 30, 2024

typpo May 1, 2024

mikkoh May 1, 2024

typpo left a comment •

edited

Loading

typpo May 1, 2024

typpo May 1, 2024

typpo May 1, 2024

mikkoh May 1, 2024

mikkoh May 1, 2024 •

edited

Loading

typpo May 1, 2024

feat: promptfoo eval --filter-failing outputFile.json #742

feat: promptfoo eval --filter-failing outputFile.json #742

Conversation

mikkoh commented Apr 30, 2024 • edited Loading

Example Flow

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

typpo left a comment • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

mikkoh May 1, 2024 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

mikkoh commented Apr 30, 2024 •

edited

Loading

typpo left a comment •

edited

Loading

mikkoh May 1, 2024 •

edited

Loading