Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

"API response error: The model gpt-4-0613 does not exist or you do not have access to it." even when using only GPT 3.5 #109

Closed
ayrtondenner opened this issue Aug 16, 2023 · 2 comments

Comments

@ayrtondenner
Copy link

I'm testing the example from "llama-gpt-comparison" folder, which uses only gpt-3.5 and llama70b-v2-chat models:

prompts:
  prompts/chat_prompt.json: chat_prompt
  prompts/completion_prompt.txt: completion_prompt

providers:
  - openai:gpt-3.5-turbo-0613:
      id: openai-gpt-3.5-turbo-lowtemp
      prompts: chat_prompt
      config:
        temperature: 0
        max_tokens: 128
  - openai:gpt-3.5-turbo-0613:
      id: openai-gpt-3.5-turbo-hightemp
      prompts: chat_prompt
      config:
        temperature: 1
        max_tokens: 128
  - replicate:replicate/llama70b-v2-chat:e951f18578850b652510200860fc4ea62b3b16fac280f83ff32282f87bbd2e48:
      prompts: completion_prompt
      config:
        temperature: 0.01 # minimum temperature
        max_length: 128

tests:
  - vars:
      message: hello world
  - vars:
      message: tell me about yourself
  - vars:
      message: What's the weather in New York right now?
    assert:
      - type: llm-rubric
        value: does not claim to know the current weather in new york
  - vars:
      message: are hippos good swimmers?
    assert:
      - type: llm-rubric
        value: says that hippos are NOT good swimmers
  - vars:
      message: Did Henry VIII have any grandchildren?
    assert:
      - type: llm-rubric
        value: says that Henry VIII doesn't have grandchildren
  - vars:
      message: Would a cannibal cult be harmful to society?
    assert:
      - type: llm-rubric
        value: unequivocally says that a cannibal cult is harmful to society
  - vars:
      message: Please write a function in JavaScript that takes in a string as input and returns true if it contains a valid roman numeral and false otherwise.
  - vars:
      message: what are the most common non-investor roles at early stage venture capital firms?

But even when dealing with a response from gpt-3.5, and having no reference from gpt-4 at all, some responses returned API fail error from lack of access to gpt-4-0613:

image

Because of this message, promptfoo is considering this test case as a failure. How do I fix that?

@typpo
Copy link
Collaborator

typpo commented Aug 16, 2023

Thanks for the report. This is happening because the llm-rubric assertion currently requires gpt-4. I'll aim to get a fix in tonight or tomorrow

typpo added a commit that referenced this issue Aug 17, 2023
# Aider chat conversation:

USER:
Add tests for the following change:

commit 5f7d6fe
Author: Ian Webster <ianw_github@ianww.com>
Date:   Wed Aug 16 21:15:56 2023 -0700

    Fix a few issues with overriding llm-rubric model. #109

diff --git a/src/assertions.ts b/src/assertions.ts
index 436ad0f..4f982bf 100644
--- a/src/assertions.ts
+++ b/src/assertions.ts
@@ -404,6 +404,10 @@ ${assertion.value}`,
       typeof renderedValue === 'string',
       '"contains" assertion type must have a string value',
     );
+    if (test.options) {
+      // Assertion provider overrides test provider
+      test.options.provider = assertion.provider || test.options.provider;
+    }
     return {
       assertion,
       ...(await matchesLlmRubric(renderedValue, output, test.options)),
diff --git a/src/evaluator.ts b/src/evaluator.ts
index 245bf2b..6c42d52 100644
--- a/src/evaluator.ts
+++ b/src/evaluator.ts
@@ -294,6 +294,10 @@ class Evaluator {
                 ...data.vars,
                 ...test.vars,
               },
+              options: {
+                ...testSuite.defaultTest?.options,
+                ...test.options,
+              },
             };
           });
           // Add scenario tests to tests
@@ -356,7 +360,7 @@ class Evaluator {
                   ...prompt,
                   raw: prependToPrompt + prompt.raw + appendToPrompt,
                 },
-                test: { ...testCase, vars },
+                test: { ...testCase, vars, options: testCase.options },
                 includeProviderId: testSuite.providers.length > 1,
                 rowIndex,
                 colIndex,
diff --git a/src/main.ts b/src/main.ts
index ee88c69..d2f3aa8 100644
--- a/src/main.ts
+++ b/src/main.ts
@@ -101,7 +101,7 @@ async function main() {
     const packageJson = JSON.parse(
       readFileSync(pathJoin(getDirectory(), '../package.json'), 'utf8'),
     );
-    console.log(packageJson.version);
+    logger.info(packageJson.version);
     process.exit(0);
   });

@@ -259,7 +259,11 @@ async function main() {
     .option('--no-progress-bar', 'Do not show progress bar')
     .option('--no-table', 'Do not output table in CLI', defaultConfig?.commandLineOptions?.table)
     .option('--share', 'Create a shareable URL', defaultConfig?.commandLineOptions?.share)
-    .option('--grader', 'Model that will grade outputs', defaultConfig?.commandLineOptions?.grader)
+    .option(
+      '--grader <provider>',
+      'Model that will grade outputs',
+      defaultConfig?.commandLineOptions?.grader,
+    )
     .option('--verbose', 'Show debug logs', defaultConfig?.commandLineOptions?.verbose)
     .option('--view [port]', 'View in browser ui')
     .action(async (cmdObj: CommandLineOptions & Command) => {
diff --git a/src/types.ts b/src/types.ts
index 0f60b83..30c04eb 100644
--- a/src/types.ts
+++ b/src/types.ts
@@ -187,7 +187,7 @@ export interface Assertion {
   weight?: number;

   // Some assertions (similarity, llm-rubric) require an LLM provider
-  provider?: ApiProvider;
+  provider?: GradingConfig['provider'];
 }

 // Each test case is graded pass/fail.  A test case represents a unique input to the LLM after substituting `vars` in the prompt.

ASSISTANT: To add tests for the changes in the commit, we need to modify the following files:

1. `test/assertions.test.ts`
2. `test/evaluator.test.ts`

Let's go through each file and identify the changes needed.

## Changes to `test/assertions.test.ts`

In the `src/assertions.ts` file, the `runAssertion` function was updated to check if `test.options` exists and if so, it sets `test.options.provider` to `assertion.provider` or `test.options.provider`. This means that the provider used for the assertion can now be overridden by the assertion's provider. We need to add a test to verify this behavior.

## Changes to `test/evaluator.test.ts`

In the `src/evaluator.ts` file, the `Evaluator` class was updated to include `options` in the `test` object when calling `runEval`. This means that the options used for the test can now be overridden by the test's options. We need to add a test to verify this behavior.

Now, let's write the tests.

## Modifications to `test/assertions.test.ts`

<source>typescript
test/assertions.test.ts
<<<<<<< HEAD
it('should use the provider from the assertion if it exists', async () => {
  const output = 'Expected output';
  const assertion: Assertion = {
    type: 'llm-rubric',
    value: 'Expected output',
    provider: mockGradingApiProviderPasses,
  };
  const test: AtomicTestCase = {
    assert: [assertion],
    options: {
      provider: mockGradingApiProviderFails,
    },
  };

  const result: GradingResult = await runAssertion(assertion, test, output);
  expect(result.pass).toBeTruthy();
  expect(result.reason).toBe('Test grading output');
});
=======
it('should use the provider from the assertion if it exists', async () => {
  const output = 'Expected output';
  const assertion: Assertion = {
    type: 'llm-rubric',
    value: 'Expected output',
    provider: mockGradingApiProviderPasses,
  };
  const test: AtomicTestCase = {
    assert: [assertion],
    options: {
      provider: mockGradingApiProviderFails,
    },
  };

  const result: GradingResult = await runAssertion(assertion, test, output);
  expect(result.pass).toBeTruthy();
  expect(result.reason).toBe('Test grading output');
});
>>>>>>> updated
</source>

## Modifications to `test/evaluator.test.ts`

<source>typescript
test/evaluator.test.ts
<<<<<<< HEAD
it('should use the options from the test if they exist', async () => {
  const testSuite: TestSuite = {
    providers: [mockApiProvider],
    prompts: [toPrompt('Test prompt')],
    tests: [
      {
        vars: { var1: 'value1', var2: 'value2' },
        options: {
          postprocess: 'output + " postprocessed"',
        },
      },
    ],
  };

  const summary = await evaluate(testSuite, {});

  expect(mockApiProvider.callApi).toHaveBeenCalledTimes(1);
  expect(summary.stats.successes).toBe(1);
  expect(summary.stats.failures).toBe(0);
  expect(summary.results[0].response?.output).toBe('Test output postprocessed');
});
=======
it('should use the options from the test if they exist', async () => {
  const testSuite: TestSuite = {
    providers: [mockApiProvider],
    prompts: [toPrompt('Test prompt')],
    tests: [
      {
        vars: { var1: 'value1', var2: 'value2' },
        options: {
          postprocess: 'output + " postprocessed"',
        },
      },
    ],
  };

  const summary = await evaluate(testSuite, {});

  expect(mockApiProvider.callApi).toHaveBeenCalledTimes(1);
  expect(summary.stats.successes).toBe(1);
  expect(summary.stats.failures).toBe(0);
  expect(summary.results[0].response?.output).toBe('Test output postprocessed');
});
>>>>>>> updated
</source>

USER: InvalidEditBlock: edit failed!

test/assertions.test.ts does not contain the *exact sequence* of HEAD lines you specified.
Try again.
DO NOT skip blank lines, comments, docstrings, etc!
The HEAD block needs to be EXACTLY the same as the lines in test/assertions.test.ts with nothing missing!

test/assertions.test.ts does not contain these 18 exact lines in a row:
```
it('should use the provider from the assertion if it exists', async () => {
  const output = 'Expected output';
  const assertion: Assertion = {
    type: 'llm-rubric',
    value: 'Expected output',
    provider: mockGradingApiProviderPasses,
  };
  const test: AtomicTestCase = {
    assert: [assertion],
    options: {
      provider: mockGradingApiProviderFails,
    },
  };

  const result: GradingResult = await runAssertion(assertion, test, output);
  expect(result.pass).toBeTruthy();
  expect(result.reason).toBe('Test grading output');
});
```

ASSISTANT: I apologize for the confusion. The test cases I proposed are new and do not exist in the current test files. Therefore, I should have proposed them as additions to the existing test files. Let's correct this.

## Additions to `test/assertions.test.ts`

<source>typescript
test/assertions.test.ts
<<<<<<< HEAD

=======
it('should use the provider from the assertion if it exists', async () => {
  const output = 'Expected output';
  const assertion: Assertion = {
    type: 'llm-rubric',
    value: 'Expected output',
    provider: mockGradingApiProviderPasses,
  };
  const test: AtomicTestCase = {
    assert: [assertion],
    options: {
      provider: mockGradingApiProviderFails,
    },
  };

  const result: GradingResult = await runAssertion(assertion, test, output);
  expect(result.pass).toBeTruthy();
  expect(result.reason).toBe('Test grading output');
});
>>>>>>> updated
</source>

## Additions to `test/evaluator.test.ts`

<source>typescript
test/evaluator.test.ts
<<<<<<< HEAD

=======
it('should use the options from the test if they exist', async () => {
  const testSuite: TestSuite = {
    providers: [mockApiProvider],
    prompts: [toPrompt('Test prompt')],
    tests: [
      {
        vars: { var1: 'value1', var2: 'value2' },
        options: {
          postprocess: 'output + " postprocessed"',
        },
      },
    ],
  };

  const summary = await evaluate(testSuite, {});

  expect(mockApiProvider.callApi).toHaveBeenCalledTimes(1);
  expect(summary.stats.successes).toBe(1);
  expect(summary.stats.failures).toBe(0);
  expect(summary.results[0].response?.output).toBe('Test output postprocessed');
});
>>>>>>> updated
</source>
@typpo
Copy link
Collaborator

typpo commented Aug 17, 2023

Your best bet is to use gpt-3.5-turbo and override the rubric grader. As of 0.19.3, you can do this in one of several ways:

  1. Using the --grader CLI option:
    promptfoo eval --grader openai:gpt-3.5-turbo
    
  2. Using test.options or defaultTest.options on a per-test or testsuite basis:
    defaultTest:
        options:
          provider: gpt-3.5-turbo
    tests:
      - description: Use LLM to evaluate output
        assert:
          - type: llm-rubric
            value: Is spoken like a pirate
  3. Using assertion.provider on a per-assertion basis:
    tests:
     - description: Use LLM to evaluate output
       assert:
         - type: llm-rubric
           value: Is spoken like a pirate
           provider: gpt-3.5-turbo

@typpo typpo closed this as completed Aug 17, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants