-
Notifications
You must be signed in to change notification settings - Fork 791
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat: update evaluation flow sample for abstractive summarization with g-eval method to enable GPT-4-Turbo #3317
base: main
Are you sure you want to change the base?
Conversation
…version under sampling based
…nged gpt-4-turbo back to gpt-4 to pass CI's model
Hi, thank you for your interest in helping to improve the prompt flow experience and for your contribution. We've noticed that there hasn't been recent engagement on this pull request. If this is still an active work stream, please let us know by pushing some changes or leaving a comment. |
Hi, thank you for your interest in helping to improve the prompt flow experience and for your contribution. We've noticed that there hasn't been recent engagement on this pull request. If this is still an active work stream, please let us know by pushing some changes or leaving a comment. |
Description
This PR updates a evaluation flow example that was introduced by #2037. This example only supported GPT-4 previously as GPT-4-Turbo was showing poor performance with previous approach. With this update, GPT-4-Turbo is introduced and meta-evaluated along with the implementation update from sampling based approach to weighted average over probability approach. New implementation outperformed previous evaluation performance according to meta-evaluation result. Besides, this new approach reduces estimated cost of evaluation from $6.19 to $1.32 per 100 documents.
Previous approach is still kept under
sampling_based
directory to provide backward compatibility with GPT-4 evaluator and reference for meta-evaluationAll Promptflow Contribution checklist:
General Guidelines and Best Practices
Testing Guidelines