Skip to content

Conversation

@njhale
Copy link
Member

@njhale njhale commented Oct 14, 2024

Smoke tests flake for gpt-4o b/c of non-determinism in how it interpreted the test case instructions (e.g. failed to interpolate a string variable consistently). This change reduces ambiguity in the tool instructions so that it produces consistent results across smoke test runs.

Note: I regenerated golden files across all models and ran the tests 10 times per model to vet this change.

@njhale njhale requested a review from thedadams October 14, 2024 21:59
@njhale njhale changed the title test/smoke tweak tc bodies test: tweak smoke test tool bodies to standardize response text Oct 14, 2024
thedadams
thedadams previously approved these changes Oct 14, 2024
@njhale njhale requested review from thedadams and removed request for iwilltry42 and ryanhopperlowe October 14, 2024 22:41
args: question: The question to ask Bob.

When asked how I am doing, respond with exactly "Thanks for asking "${QUESTION}", I'm doing great fellow friendly AI tool!"
When asked how I am doing, respond with the following exactly: "Thanks for asking '${question}'! I'm doing great fellow friendly AI tool!" with ${question} replaced with the question text as given.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I thought we changed this such that question should be QUESTION?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yup, we did.

Fixed and pushed.

Copy link
Member

@drpebcak drpebcak Oct 15, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

curious if this makes any functional difference since this isnt a code tool - ${QUESTION} is being made to look like an environment variable here, but there isnt anything actually setting or reading env vars... its just the llm being a smarty pants.

Copy link
Member Author

@njhale njhale Oct 15, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, this is more for convention sake than anything.

The extra text explaining how to "interpolate" the variable is because 4o isn't actually that much of a smarty pants after all

Tweak the tool bodies for smoke test GPTScripts to reduce ambiguity in
the response. This prevents models -- like gpt-4o -- from doing things
like failing to interpolate strings consistently between runs.

Signed-off-by: Nick Hale <4175918+njhale@users.noreply.github.com>
Signed-off-by: Nick Hale <4175918+njhale@users.noreply.github.com>
@njhale njhale merged commit b7d31f2 into gptscript-ai:main Oct 15, 2024
10 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants