-
Notifications
You must be signed in to change notification settings - Fork 6
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Docs for SUTs #369
Docs for SUTs #369
Conversation
…e: this will not work with the version of ModelBench on PyPi. We will have to do a release. If you want to test this, install either from this branch in Git or from your local filesystem.
MLCommons CLA bot All contributors have signed the MLCommons CLA ✍️ ✅ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This looks great. Very straightforward. I especially like the inclusion of the DemoYesNo code. Would it be worth adding a brief link to a more elaborate SUT, one that makes API calls?
Nice! I think this does a good job bridging ModelGauge and ModelBench without being repetitive. Couple of thoughts:
|
number_of_words: int | ||
text: str | ||
|
||
@modelgauge_sut(capabilities=[AcceptsTextPrompt, AcceptsChatPrompt]) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
To simplify, you could limit this SUT example to just have capabilities=[AcceptsTextPrompt]
if you'd like. And then you wouldn't need the translate_chat_prompt()
method.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This code is the culmination of all of the code snippets from the ModelGauge Creating a basic SUT example here (+ some necessary imports that are missing from that example). I'd be happy to simplify this example if we also simplify that one in the same way.
What I'm trying to avoid is users looking at that example, and looking at this example, and trying to figure out why they're different.
) -> SUTResponse: | ||
return SUTResponse(completions=[SUTCompletion(text=response.text)]) | ||
|
||
SUTS.register(DemoYesNoSUT, "demo_yes_no") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could we rename this? Just to account for the small possibility that the user might already have the modelgauge-demo plugin installed.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Like above, I'd be happy to rename this if we do the same in the ModelGauge code it is pulled from. I'd like this to work identically, regardless of which repo they wind up pulling it from -- the code is all just compiled here for convenience. And like you said, the possibility of them having modelgauge-demo
installed is small. It should be zero if they're following the tutorial step-by-step.
That's a great point! I think I'll remove that and just leave a link to this.
I'd prefer not to add any particular details on creating a SUT here and have ModelGauge's docs cover all of that. |
Add tutorial for the simplest way to add a new SUT to ModelBench. Note: this will not work with the version of ModelBench on PyPi. We will have to do a release. If you want to test this, install either from this branch in Git or from your local filesystem.