Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Docs for SUTs #369

Merged
merged 2 commits into from
May 17, 2024
Merged

Docs for SUTs #369

merged 2 commits into from
May 17, 2024

Conversation

dhosterman
Copy link
Collaborator

Add tutorial for the simplest way to add a new SUT to ModelBench. Note: this will not work with the version of ModelBench on PyPi. We will have to do a release. If you want to test this, install either from this branch in Git or from your local filesystem.

…e: this will not work with the version of ModelBench on PyPi. We will have to do a release. If you want to test this, install either from this branch in Git or from your local filesystem.
@dhosterman dhosterman requested a review from a team as a code owner May 16, 2024 13:40
@dhosterman dhosterman self-assigned this May 16, 2024
Copy link

github-actions bot commented May 16, 2024

MLCommons CLA bot All contributors have signed the MLCommons CLA ✍️ ✅

Copy link
Contributor

@wpietri wpietri left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks great. Very straightforward. I especially like the inclusion of the DemoYesNo code. Would it be worth adding a brief link to a more elaborate SUT, one that makes API calls?

@bkorycki
Copy link
Contributor

Nice! I think this does a good job bridging ModelGauge and ModelBench without being repetitive. Couple of thoughts:

  • Could we remove "Running the benchmark on your SUT" section from the README now? It seems a bit duplicative now.
  • What do you think about explicitly stating that users don’t necessarily need to make a new SUT class? For example, if you want to test some hugging face model that isn’t already in ModelGauge/Bench, you can just register a new HuggingFaceSUT. Unless that is not really the use case being targeted here.

number_of_words: int
text: str

@modelgauge_sut(capabilities=[AcceptsTextPrompt, AcceptsChatPrompt])
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To simplify, you could limit this SUT example to just have capabilities=[AcceptsTextPrompt] if you'd like. And then you wouldn't need the translate_chat_prompt() method.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This code is the culmination of all of the code snippets from the ModelGauge Creating a basic SUT example here (+ some necessary imports that are missing from that example). I'd be happy to simplify this example if we also simplify that one in the same way.

What I'm trying to avoid is users looking at that example, and looking at this example, and trying to figure out why they're different.

) -> SUTResponse:
return SUTResponse(completions=[SUTCompletion(text=response.text)])

SUTS.register(DemoYesNoSUT, "demo_yes_no")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could we rename this? Just to account for the small possibility that the user might already have the modelgauge-demo plugin installed.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Like above, I'd be happy to rename this if we do the same in the ModelGauge code it is pulled from. I'd like this to work identically, regardless of which repo they wind up pulling it from -- the code is all just compiled here for convenience. And like you said, the possibility of them having modelgauge-demo installed is small. It should be zero if they're following the tutorial step-by-step.

@dhosterman
Copy link
Collaborator Author

Could we remove "Running the benchmark on your SUT" section from the README now? It seems a bit duplicative now.

That's a great point! I think I'll remove that and just leave a link to this.

What do you think about explicitly stating that users don’t necessarily need to make a new SUT class? For example, if you want to test some hugging face model that isn’t already in ModelGauge/Bench, you can just register a new HuggingFaceSUT. Unless that is not really the use case being targeted here.

I'd prefer not to add any particular details on creating a SUT here and have ModelGauge's docs cover all of that.

@dhosterman dhosterman merged commit 68dd857 into main May 17, 2024
4 checks passed
@dhosterman dhosterman deleted the doc-for-suts branch May 17, 2024 16:36
@github-actions github-actions bot locked and limited conversation to collaborators May 17, 2024
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants