Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Empty responses should not be tested but should fail #92

Closed
2 tasks done
zimmski opened this issue May 6, 2024 · 1 comment
Closed
2 tasks done

Empty responses should not be tested but should fail #92

zimmski opened this issue May 6, 2024 · 1 comment
Assignees
Labels
enhancement New feature or request
Milestone

Comments

@zimmski
Copy link
Member

zimmski commented May 6, 2024

See https://github.com/symflower/eval-dev-quality/blob/3e7dc8c5beab65f5958a458a593823ba5c25698e/docs/reports/v0.4.0/openrouter_databricks_dbrx-instruct/java/java/plain.log for an example. You can see that the see that the Java tests are executed even though there is no single character to compile nor execute.

Tasks:

  • Return an error if there is an empty response (trim content for whitespaces before checking, that might have been the problem in the first place)
  • Result should be that a model that returns an empty response should not receive any additional metrics. There should be a test for that, so we make sure that such a model response never leads to more points. If there is an empty response, it is an almost fail.
@zimmski zimmski added the enhancement New feature or request label May 6, 2024
@zimmski zimmski added this to the v0.5.0 milestone May 6, 2024
@bauersimon
Copy link
Member

i.e. that means we get rid of the no-empty metric because it will become an error

Munsio added a commit that referenced this issue May 7, 2024
Munsio added a commit that referenced this issue May 7, 2024
Munsio added a commit that referenced this issue May 7, 2024
@bauersimon bauersimon mentioned this issue Jun 3, 2024
45 tasks
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

3 participants