Docs for SUTs #369

dhosterman · 2024-05-16T13:40:14Z

Add tutorial for the simplest way to add a new SUT to ModelBench. Note: this will not work with the version of ModelBench on PyPi. We will have to do a release. If you want to test this, install either from this branch in Git or from your local filesystem.

…e: this will not work with the version of ModelBench on PyPi. We will have to do a release. If you want to test this, install either from this branch in Git or from your local filesystem.

github-actions · 2024-05-16T13:40:26Z

MLCommons CLA bot All contributors have signed the MLCommons CLA ✍️ ✅

wpietri

This looks great. Very straightforward. I especially like the inclusion of the DemoYesNo code. Would it be worth adding a brief link to a more elaborate SUT, one that makes API calls?

bkorycki · 2024-05-17T01:58:03Z

Nice! I think this does a good job bridging ModelGauge and ModelBench without being repetitive. Couple of thoughts:

Could we remove "Running the benchmark on your SUT" section from the README now? It seems a bit duplicative now.
What do you think about explicitly stating that users don’t necessarily need to make a new SUT class? For example, if you want to test some hugging face model that isn’t already in ModelGauge/Bench, you can just register a new HuggingFaceSUT. Unless that is not really the use case being targeted here.

bkorycki · 2024-05-17T02:11:28Z

docs/add-a-sut.md

+    number_of_words: int
+    text: str
+
+@modelgauge_sut(capabilities=[AcceptsTextPrompt, AcceptsChatPrompt])


To simplify, you could limit this SUT example to just have capabilities=[AcceptsTextPrompt] if you'd like. And then you wouldn't need the translate_chat_prompt() method.

This code is the culmination of all of the code snippets from the ModelGauge Creating a basic SUT example here (+ some necessary imports that are missing from that example). I'd be happy to simplify this example if we also simplify that one in the same way.

What I'm trying to avoid is users looking at that example, and looking at this example, and trying to figure out why they're different.

bkorycki · 2024-05-17T02:15:56Z

docs/add-a-sut.md

+) -> SUTResponse:
+    return SUTResponse(completions=[SUTCompletion(text=response.text)])
+
+SUTS.register(DemoYesNoSUT, "demo_yes_no")


Could we rename this? Just to account for the small possibility that the user might already have the modelgauge-demo plugin installed.

Like above, I'd be happy to rename this if we do the same in the ModelGauge code it is pulled from. I'd like this to work identically, regardless of which repo they wind up pulling it from -- the code is all just compiled here for convenience. And like you said, the possibility of them having modelgauge-demo installed is small. It should be zero if they're following the tutorial step-by-step.

dhosterman · 2024-05-17T13:21:00Z

Could we remove "Running the benchmark on your SUT" section from the README now? It seems a bit duplicative now.

That's a great point! I think I'll remove that and just leave a link to this.

What do you think about explicitly stating that users don’t necessarily need to make a new SUT class? For example, if you want to test some hugging face model that isn’t already in ModelGauge/Bench, you can just register a new HuggingFaceSUT. Unless that is not really the use case being targeted here.

I'd prefer not to add any particular details on creating a SUT here and have ModelGauge's docs cover all of that.

Add tutorial for the simplest way to add a new SUT to ModelBench. Not…

e498b01

…e: this will not work with the version of ModelBench on PyPi. We will have to do a release. If you want to test this, install either from this branch in Git or from your local filesystem.

dhosterman requested a review from a team as a code owner May 16, 2024 13:40

dhosterman requested review from bkorycki, bollacker and wpietri May 16, 2024 13:40

dhosterman self-assigned this May 16, 2024

wpietri approved these changes May 16, 2024

View reviewed changes

bkorycki reviewed May 17, 2024

View reviewed changes

fix spelling error and link to add-a-sut.md from README.md.

950a416

bkorycki approved these changes May 17, 2024

View reviewed changes

dhosterman merged commit 68dd857 into main May 17, 2024
4 checks passed

dhosterman deleted the doc-for-suts branch May 17, 2024 16:36

github-actions bot locked and limited conversation to collaborators May 17, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Docs for SUTs #369

Docs for SUTs #369

dhosterman commented May 16, 2024

github-actions bot commented May 16, 2024 •

edited

wpietri left a comment

bkorycki commented May 17, 2024

bkorycki May 17, 2024

dhosterman May 17, 2024 •

edited

bkorycki May 17, 2024

dhosterman May 17, 2024

dhosterman commented May 17, 2024

Docs for SUTs #369

Docs for SUTs #369

Conversation

dhosterman commented May 16, 2024

github-actions bot commented May 16, 2024 • edited

wpietri left a comment

Choose a reason for hiding this comment

bkorycki commented May 17, 2024

bkorycki May 17, 2024

Choose a reason for hiding this comment

dhosterman May 17, 2024 • edited

Choose a reason for hiding this comment

bkorycki May 17, 2024

Choose a reason for hiding this comment

dhosterman May 17, 2024

Choose a reason for hiding this comment

dhosterman commented May 17, 2024

github-actions bot commented May 16, 2024 •

edited

dhosterman May 17, 2024 •

edited