-
Notifications
You must be signed in to change notification settings - Fork 18
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Implement MMLU_Evaluator.run()
#10
Conversation
MMLU_Evaluator.run()
If anyone needs to test this PR locally, after initializing the virtual env in the repo and installing all the dependencies this python script executes the function:
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
very nice PR, makes sense to me. couple of nits.
I've mentioned this in some of my comments in code review, Getting mmlu working would be a big deal for the library so I propose we do the following and then merge the PR ASAP:
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
missing model path, run tox -e ruff as well! looks good though, I will approve after those changes
@alinaryan can you rebase? |
Signed-off-by: Alina Ryan <aliryan@redhat.com>
Replicates functionality in backend evaluation code for mmlu. Model that is tested is served my lm-eval code internally. Signed-off-by: Ali Maredia <amaredia@redhat.com>
Signed-off-by: Alina Ryan <aliryan@redhat.com>
Signed-off-by: Alina Ryan <aliryan@redhat.com>
Signed-off-by: Alina Ryan <aliryan@redhat.com>
Signed-off-by: Alina Ryan <aliryan@redhat.com>
src/instructlab/eval/mmlu.py
Outdated
|
||
individual_scores: dict = {} | ||
agg_score: float = 0.0 | ||
model_args = "pretrained=" + self.model_path + ",dtype=" + self.model_dtype |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nit: Use f""
Signed-off-by: Alina Ryan <aliryan@redhat.com>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
This PR updates the MMLUEvaluator child class to include the lm-evaluation-harness lib dependency and adds an API call to simple_evaluate() to run MMLU.
ToDo: