Implement `MMLU_Evaluator.run()` #10

alinaryan · 2024-06-19T01:09:45Z

This PR updates the MMLUEvaluator child class to include the lm-evaluation-harness lib dependency and adds an API call to simple_evaluate() to run MMLU.

ToDo:

test running in a VM
observe the output of results to help decide how to access individual scores vs. overall score
finish writing the helper function to calculate the overall score based on the results
parameterize these 57 tasks (I think CLI needs to do this?) https://github.com/instructlab/evaluation/blob/main/scripts/run_mmlu.sh#L24

alimaredia · 2024-06-20T11:41:01Z

If anyone needs to test this PR locally, after initializing the virtual env in the repo and installing all the dependencies this python script executes the function:

from instructlab.eval.mmlu import MMLU_Evaluator

tasks = ["mmlu_anatomy","mmlu_astronomy"]

# this path is to the granite-7b-lab model on huggingface not on the local filesystem 
model_path = "instructlab/granite-7b-lab"
mmlu = MMLU_Evaluator(model_path, tasks)
overall_score, individual_scores = mmlu.run()
print(overall_score)
print(individual_scores)

JamesKunstle

very nice PR, makes sense to me. couple of nits.

src/instructlab/eval/mmlu.py

alimaredia · 2024-06-25T14:33:53Z

I've mentioned this in some of my comments in code review, Getting mmlu working would be a big deal for the library so I propose we do the following and then merge the PR ASAP:

Remove the testing bits from mmlu.py
Someone besides the authors (Alina or I) test the PR by hand, to see the scores being output properly
Create follow up issues for anything else that we know needs to be done.

cdoern

missing model path, run tox -e ruff as well! looks good though, I will approve after those changes

src/instructlab/eval/mmlu.py

nathan-weinberg · 2024-06-25T19:16:50Z

@alinaryan can you rebase?

Signed-off-by: Alina Ryan <aliryan@redhat.com>

Replicates functionality in backend evaluation code for mmlu. Model that is tested is served my lm-eval code internally. Signed-off-by: Ali Maredia <amaredia@redhat.com>

Signed-off-by: Alina Ryan <aliryan@redhat.com>

danmcp · 2024-06-25T20:53:45Z

src/instructlab/eval/mmlu.py

+
+        individual_scores: dict = {}
+        agg_score: float = 0.0
+        model_args = "pretrained=" + self.model_path + ",dtype=" + self.model_dtype


Nit: Use f""

Signed-off-by: Alina Ryan <aliryan@redhat.com>

cdoern

LGTM

alinaryan self-assigned this Jun 19, 2024

alinaryan changed the title ~~WIP: Introduce lm-evaluation-harness dependency~~ WIP: MMLU: Introduce lm-evaluation-harness dependency Jun 19, 2024

alimaredia changed the title ~~WIP: MMLU: Introduce lm-evaluation-harness dependency~~ Implement MMLU_Evaluator.run() Jun 20, 2024

alimaredia marked this pull request as ready for review June 20, 2024 11:19

JamesKunstle requested changes Jun 21, 2024

View reviewed changes

src/instructlab/eval/mmlu.py Outdated Show resolved Hide resolved

src/instructlab/eval/mmlu.py Show resolved Hide resolved

src/instructlab/eval/mmlu.py Outdated Show resolved Hide resolved

danmcp reviewed Jun 22, 2024

View reviewed changes

src/instructlab/eval/mmlu.py Show resolved Hide resolved

src/instructlab/eval/mmlu.py Outdated Show resolved Hide resolved

nathan-weinberg self-requested a review June 24, 2024 02:02

nathan-weinberg reviewed Jun 24, 2024

View reviewed changes

src/instructlab/eval/mmlu.py Show resolved Hide resolved

src/instructlab/eval/mmlu.py Outdated Show resolved Hide resolved

src/instructlab/eval/mmlu.py Outdated Show resolved Hide resolved

alinaryan force-pushed the mmlu branch from 2a68d9e to 8901ecb Compare June 25, 2024 15:26

cdoern reviewed Jun 25, 2024

View reviewed changes

src/instructlab/eval/mmlu.py Show resolved Hide resolved

danmcp reviewed Jun 25, 2024

View reviewed changes

src/instructlab/eval/mmlu.py Show resolved Hide resolved

alinaryan and others added 5 commits June 25, 2024 15:50

Introduce lm-evaluation-harness dependency

6db41b1

Signed-off-by: Alina Ryan <aliryan@redhat.com>

working MMLU_Evaluator.run()

8e024b8

Replicates functionality in backend evaluation code for mmlu. Model that is tested is served my lm-eval code internally. Signed-off-by: Ali Maredia <amaredia@redhat.com>

Remove testing code and change the MMLU class names and descriptions

05233b4

Signed-off-by: Alina Ryan <aliryan@redhat.com>

Add missing model_path param and description

2e607f5

Signed-off-by: Alina Ryan <aliryan@redhat.com>

Fix lint errors

270bf53

Signed-off-by: Alina Ryan <aliryan@redhat.com>

alinaryan force-pushed the mmlu branch from b8ff955 to 270bf53 Compare June 25, 2024 20:02

suppress import err

88b2e65

Signed-off-by: Alina Ryan <aliryan@redhat.com>

alinaryan mentioned this pull request Jun 25, 2024

Add mmlu testing code #15

Closed

danmcp reviewed Jun 25, 2024

View reviewed changes

alinaryan mentioned this pull request Jun 25, 2024

Add specific run methods to mmlu #16

Closed

alinaryan requested review from danmcp, alimaredia, cdoern and nathan-weinberg June 25, 2024 21:10

Add fstring

899aaf9

Signed-off-by: Alina Ryan <aliryan@redhat.com>

alinaryan force-pushed the mmlu branch from 953eeae to 899aaf9 Compare June 25, 2024 21:13

cdoern approved these changes Jun 25, 2024

View reviewed changes

alimaredia approved these changes Jun 25, 2024

View reviewed changes

JamesKunstle approved these changes Jun 25, 2024

View reviewed changes

danmcp approved these changes Jun 25, 2024

View reviewed changes

alimaredia merged commit 100c512 into instructlab:main Jun 25, 2024
6 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Implement `MMLU_Evaluator.run()` #10

Implement `MMLU_Evaluator.run()` #10

alinaryan commented Jun 19, 2024 •

edited by alimaredia

Loading

alimaredia commented Jun 20, 2024

JamesKunstle left a comment

alimaredia commented Jun 25, 2024

cdoern left a comment

nathan-weinberg commented Jun 25, 2024

danmcp Jun 25, 2024

cdoern left a comment

Implement MMLU_Evaluator.run() #10

Implement MMLU_Evaluator.run() #10

Conversation

alinaryan commented Jun 19, 2024 • edited by alimaredia Loading

alimaredia commented Jun 20, 2024

JamesKunstle left a comment

Choose a reason for hiding this comment

alimaredia commented Jun 25, 2024

cdoern left a comment

Choose a reason for hiding this comment

nathan-weinberg commented Jun 25, 2024

danmcp Jun 25, 2024

Choose a reason for hiding this comment

cdoern left a comment

Choose a reason for hiding this comment

Implement `MMLU_Evaluator.run()` #10

Implement `MMLU_Evaluator.run()` #10

alinaryan commented Jun 19, 2024 •

edited by alimaredia

Loading