-
-
Notifications
You must be signed in to change notification settings - Fork 218
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Evaluation Methods: Similarity Check ? #6
Comments
Correction: link is https://github.com/squidgyai/squidgy-testy |
I appreciate the feedback! Right now, evaluation is done in one of three ways:
In short, keyword matching and exact overlap should be handled by case 1. Semantic similarity testing is a great suggestion. I'll look into this and see if I can get it added :) See also: https://www.promptfoo.dev/docs/configuration/expected-outputs |
Support for semantic similarity is added in #7. When it lands, I'll deploy a new version of the library 0.5.0. It works like this: Semantic similarity: using the For example, the directive Hope this helps! |
Thank you for the super-swift reply @typpo and planning on #7 ! With semantic similarity added, promptfoo should be among, if not the, very best open-source prompt-testing framework! (and even be on par with what commercial platforms like Vellum are offering for testing) Just one more quick suggestion: You might want to consolidate your documentation into one place. It's excellent on https://www.promptfoo.dev/docs/intro - which is also where I found after a bit of digging the evaluation methods you mentioned. But if you also keep a version with different content in the readme.md, it can be confusing as there's no single source of truth (SSOT). I would suggest keeping the intro and "promo gifts" along with the icon grid in the readme, and have a big link directly to the documentation at https://www.promptfoo.dev/docs/intro. :) |
Thank you and darn, that was quick ! I just finished writing a blog post about open-source PT frameworks and already had to update it. 😅 |
Thanks for the suggestions, @MentalGear! I've simplified the github readme and pointed users toward the docs website, which is definitely easier to navigate. |
First off: Thank you for providing a node FOSS prompt-testing framework! Also, the web view is really handy !
Yet, when it comes to explaining how evaluation is done, I find it lacking in details: How exactly are outputs scored, simply by keyword matching, exact overlap or are advanced functions like distance similarity based on the embeddings build-in?
An excellent library to get inspired from that does semantic similarity testing (python) is squidgy-testy.
EDIT: Corrected Link.
The text was updated successfully, but these errors were encountered: