Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: promptfoo eval --filter-failing outputFile.json #742

Merged
merged 1 commit into from
May 1, 2024

Conversation

mikkoh
Copy link
Contributor

@mikkoh mikkoh commented Apr 30, 2024

Makes it so that you can iterate more quickly by simply running failing tests given an output file.

Example Flow

First run: promptfoo eval --output result.json
Screenshot 2024-04-30 at 5 17 28 PM

Next run: promptfoo eval --failing result.json
Screenshot 2024-04-30 at 5 18 41 PM

@mikkoh mikkoh changed the title promptfoo eval --failing outputFile.json feat: promptfoo eval --failing outputFile.json Apr 30, 2024
@mikkoh mikkoh force-pushed the mikkoh/run-failing branch 2 times, most recently from 921864d to 3bd36a3 Compare April 30, 2024 21:01
const {results} = await readOutput(outputPath);

if (results.version < 2) {
throw new Error(`Unsupported output version: ${results.version}`);
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I put this here just cause I'm not familiar with the differences between versions

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should be safe to remove this check 👍

src/main.ts Outdated
firstN: cmdObj.firstN,
pattern: cmdObj.pattern,
failing: cmdObj.failing,
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm wondering if these command line params should be changed to be something like:

filterFirstN
filterPattern
filterFailing

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I support this

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok... I'll make the change in this PR

@mikkoh mikkoh marked this pull request as ready for review April 30, 2024 21:21
Copy link
Collaborator

@typpo typpo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM - responses to your comments inline. Let me know if you'd like to rename the params to filter... in this PR or a separate one.

src/main.ts Outdated
firstN: cmdObj.firstN,
pattern: cmdObj.pattern,
failing: cmdObj.failing,
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I support this

const {results} = await readOutput(outputPath);

if (results.version < 2) {
throw new Error(`Unsupported output version: ${results.version}`);
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should be safe to remove this check 👍

@@ -1228,3 +1244,26 @@ export function getStandaloneEvals(): StandaloneEval[] {
});
return flatResults;
}

export function providerToIdentifier(provider: TestCase['provider']): string | undefined {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Really appreciate you adding these helper functions. I'm aware that ApiProvider vs ProviderOptions and similar variations are some of the ugliest parts of the code :(

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No prob. I love the flexibility of promptfoo. You can use it many ways. But it also does brings on complexity. I suspect little utility functions to simplify logic can go a long way.

@@ -552,8 +552,9 @@ async function main() {
'Run providers interactively, one at a time',
defaultConfig?.evaluateOptions?.interactiveProviders,
)
.option('-n, --first-n <number>', 'Only run the first N tests')
.option('--pattern <pattern>', 'Only run tests whose description matches the regular expression pattern')
.option('-n, --filter-first-n <number>', 'Only run the first N tests')
Copy link
Contributor Author

@mikkoh mikkoh May 1, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This will be a breaking change so will need to do a version bump for the next release. Unsure if theres a mechanism you use eg Github labels to determine if a version bump is needed for the next release

Should we keep -n? Should the other two filters have short forms also?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My selfish preference is to keep -n because it feels familiar, like head -n :). Open to other short forms but I don't think it's necessary. If we find ourselves getting tired of typing everything out let's add it separately.

I do version bumps manually - since we're pre-1.0 I've included breaking changes in minor versions, but generally trying to avoid them. I think this is acceptable though. I'll merge with feat! breaking change notation and make a note in the release notes.

@mikkoh mikkoh changed the title feat: promptfoo eval --failing outputFile.json feat: promptfoo eval --filter-failing outputFile.json May 1, 2024
@typpo typpo merged commit 767caca into promptfoo:main May 1, 2024
9 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants