Allow separation of 'tests' vs. 'scenarios' #73

therealpaulgg · 2023-07-18T23:00:17Z

Another feature that we think would be desirable is the ability to run a provided number of tests against 'test cases' or 'test data'.

For example, there may be the following definition:

tests:
  - description: Is Valid JSON
    assert
      - type: is-json
  - description: Similarity
    assert:
      - type: similar

This means there are two tests that will be run. In our case, there could be 5-10 'scenarios' we would like to run, both with a somewhat different input and different expected output.

Rough idea of what a config with 'scenarios' could look like:

prompts: [prompts.txt]
providers: [openai:gpt-3.5-turbo]
scenarios:
  - testData: testData1.txt
    expectedOutput: output1.txt
    expectedSimilarity: 0.8
  
  - testData: testData2.txt
    expectedOutput: output2.txt
    expectedSimilarity: 0.9
  
  - testData: testData3.txt
    expectedOutput: output3.txt
    expectedSimilarity: 0.5

tests:
  - description: Is Valid JSON
    vars:
      testData: {{testData}}
    assert:
      - type: is-json
      - type: javascript
        value: typeof JSON.parse(output) === 'object'

  - description: Meets Expected Output
    vars:
      expectedOutput: {{expectedOutput}}
    assert:      
      - type: similar
        value: {{expectedOutput}}
        threshold: {{expectedSimilarity}}

This would result in a total of 6 tests being run: 3 scenarios, 2 tests, 1 provider, 1 prompt.

We also would love to be able to import variables from text files so the yaml test configuration can be cleaner.

Interested in thoughts on this one, and I can help out where needed.

typpo · 2023-07-22T17:22:11Z

Picking this one up soon - thinking about ways to clean up:

the number of configs (enabling multiple test suites)
the number of test cases (enabling better importing/organization for test cases and vars)

therealpaulgg · 2023-07-23T01:27:45Z

Appreciate it!

Striving to achieve a cleaner config with a lot of tests has been the most challenging aspect so far. We used $ref in our config extensively.

I definitely like the idea of being able to run multiple TestSuite objects, one for each dataset. All those test suites could use the exact same configuration.

typpo · 2023-08-10T15:23:01Z

Closing this as complete! Scenarios are documented here: https://promptfoo.dev/docs/configuration/scenarios

typpo mentioned this issue Jul 24, 2023

Add support for loading test cases from file/directory path #88

Merged

Skylertodd mentioned this issue Jul 25, 2023

Promptfoo Theories #89

Merged

typpo closed this as completed Aug 10, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Allow separation of 'tests' vs. 'scenarios' #73

Allow separation of 'tests' vs. 'scenarios' #73

therealpaulgg commented Jul 18, 2023

typpo commented Jul 22, 2023

therealpaulgg commented Jul 23, 2023

typpo commented Aug 10, 2023

Allow separation of 'tests' vs. 'scenarios' #73

Allow separation of 'tests' vs. 'scenarios' #73

Comments

therealpaulgg commented Jul 18, 2023

typpo commented Jul 22, 2023

therealpaulgg commented Jul 23, 2023

typpo commented Aug 10, 2023