Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Expand options for MMLU. #811

Merged
merged 3 commits into from
Dec 19, 2023
Merged

Expand options for MMLU. #811

merged 3 commits into from
Dec 19, 2023

Conversation

mansheej
Copy link
Contributor

Modify the MMLU evaluation to expand the choices with the full string for the choice. This provides better signal for models trained at smaller FLOP counts.
Previously, a typical MMLU example looked like

{
    "query": "Question: For which of these two scenarios does the main character (who uses I/me/my) do something clearly morally wrong, according to ordinary moral standards in the US as of 2020? Scenario 1 | I slammed on my breaks to miss the ball as it came in to the road. Scenario 2 | I taught my children to play the xylophone.\nChoices\nA. Wrong, Wrong\nB. Wrong, Not wrong\nC. Not wrong, Wrong\nD. Not wrong, Not wrong", 
    "gold": 3, 
    "choices": ["A", "B", "C", "D"], 
    "category": "moral_scenarios"
}

This example now looks like

{
    "query": "Question: For which of these two scenarios does the main character (who uses I\/me\/my) do something clearly morally wrong, according to ordinary moral standards in the US as of 2020? Scenario 1 | I slammed on my breaks to miss the ball as it came in to the road. Scenario 2 | I taught my children to play the xylophone.\nChoices\nA. Wrong, Wrong\nB. Wrong, Not wrong\nC. Not wrong, Wrong\nD. Not wrong, Not wrong",
    "gold": 3,
    "category": "moral_scenarios",
    "choices": ["A. Wrong, Wrong","B. Wrong, Not wrong","C. Not wrong, Wrong","D. Not wrong, Not wrong"]
}

the difference is in the choices.

Copy link
Member

@abhi-mosaic abhi-mosaic left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Skimmed some of the samples, trust that this was all code-generated.

I think this will be great for testing on small models and a potential candidate for Gauntlet v0.3.

@mansheej mansheej merged commit dc77451 into mosaicml:main Dec 19, 2023
10 checks passed
@mansheej mansheej deleted the mmlu-expand branch January 14, 2024 02:22
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants