Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Implement MMLU_Evaluator.run() #10

Merged
merged 7 commits into from
Jun 25, 2024
Merged

Conversation

alinaryan
Copy link
Member

@alinaryan alinaryan commented Jun 19, 2024

This PR updates the MMLUEvaluator child class to include the lm-evaluation-harness lib dependency and adds an API call to simple_evaluate() to run MMLU.

ToDo:

@alinaryan alinaryan self-assigned this Jun 19, 2024
@alinaryan alinaryan changed the title WIP: Introduce lm-evaluation-harness dependency WIP: MMLU: Introduce lm-evaluation-harness dependency Jun 19, 2024
@alimaredia alimaredia changed the title WIP: MMLU: Introduce lm-evaluation-harness dependency Implement MMLU_Evaluator.run() Jun 20, 2024
@alimaredia alimaredia marked this pull request as ready for review June 20, 2024 11:19
@alimaredia
Copy link
Contributor

If anyone needs to test this PR locally, after initializing the virtual env in the repo and installing all the dependencies this python script executes the function:

from instructlab.eval.mmlu import MMLU_Evaluator

tasks = ["mmlu_anatomy","mmlu_astronomy"]

# this path is to the granite-7b-lab model on huggingface not on the local filesystem 
model_path = "instructlab/granite-7b-lab"
mmlu = MMLU_Evaluator(model_path, tasks)
overall_score, individual_scores = mmlu.run()
print(overall_score)
print(individual_scores)

Copy link
Contributor

@JamesKunstle JamesKunstle left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

very nice PR, makes sense to me. couple of nits.

src/instructlab/eval/mmlu.py Outdated Show resolved Hide resolved
src/instructlab/eval/mmlu.py Show resolved Hide resolved
src/instructlab/eval/mmlu.py Outdated Show resolved Hide resolved
src/instructlab/eval/mmlu.py Show resolved Hide resolved
src/instructlab/eval/mmlu.py Outdated Show resolved Hide resolved
@nathan-weinberg nathan-weinberg self-requested a review June 24, 2024 02:02
src/instructlab/eval/mmlu.py Show resolved Hide resolved
src/instructlab/eval/mmlu.py Outdated Show resolved Hide resolved
src/instructlab/eval/mmlu.py Outdated Show resolved Hide resolved
@alimaredia
Copy link
Contributor

I've mentioned this in some of my comments in code review, Getting mmlu working would be a big deal for the library so I propose we do the following and then merge the PR ASAP:

  1. Remove the testing bits from mmlu.py
  2. Someone besides the authors (Alina or I) test the PR by hand, to see the scores being output properly
  3. Create follow up issues for anything else that we know needs to be done.

Copy link
Contributor

@cdoern cdoern left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

missing model path, run tox -e ruff as well! looks good though, I will approve after those changes

src/instructlab/eval/mmlu.py Show resolved Hide resolved
@nathan-weinberg
Copy link
Member

@alinaryan can you rebase?

alinaryan and others added 5 commits June 25, 2024 15:50
Signed-off-by: Alina Ryan <aliryan@redhat.com>
Replicates functionality in backend evaluation
code for mmlu.

Model that is tested is served my lm-eval code
internally.

Signed-off-by: Ali Maredia <amaredia@redhat.com>
Signed-off-by: Alina Ryan <aliryan@redhat.com>
Signed-off-by: Alina Ryan <aliryan@redhat.com>
Signed-off-by: Alina Ryan <aliryan@redhat.com>
Signed-off-by: Alina Ryan <aliryan@redhat.com>
@alinaryan alinaryan mentioned this pull request Jun 25, 2024

individual_scores: dict = {}
agg_score: float = 0.0
model_args = "pretrained=" + self.model_path + ",dtype=" + self.model_dtype
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: Use f""

Signed-off-by: Alina Ryan <aliryan@redhat.com>
Copy link
Contributor

@cdoern cdoern left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@alimaredia alimaredia merged commit 100c512 into instructlab:main Jun 25, 2024
6 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants