Dynamic SUTs refactor#1141
Conversation
|
MLCommons CLA bot All contributors have signed the MLCommons CLA ✍️ ✅ |
|
Maybe I'm missing something here, but given the dynamic nature of things, can't we just try making the SUT and then blow up if it doesn't work? For example, if I remove the
If I do it with an invalid model name, it blows up: From a user perspective, the stack trace is annoying, but since it's a nicely specific exception, I think we can just catch that at the top and print out a nice error message. If we were doing more than one SUT per run, this would be more of a problem; we'd want the command to fail fast. But since we're getting rid of multi-SUT runs anyhow, this should either work fine or blow up quickly, shouldn't it? If that's true, I think you can just be a bit more radical in your trimming, ending up with less complexity and a good user experience. |
|
@wpietri Thanks for your feedback. Do you propose I remove the |
If you can get as a good user experience without them, then yes. If not, then you can at least put those checks in the pre-registered SUTs path only. Trying it out, the validate_uid check for non-dynamic SUTs does produce a list of other options, which is a nice feature given that the keys are things we made up and can't be found in other docs. So maybe I'd keep that in the non-dynamic path. It's a pretty big list these days, though, so I might take pity on the user and give just the closest matches. That help-the-user-with-a-typo experience is also something we could probably include in the dynamic SUTs, come to think of it. Or at least we should give people a link whatever docs page has all the valid SUT ids for the dynamic provider invoked. |
|
@wpietri Okay! I made it so that we do validation and secret checks for pre registered suts and not dynamic ones. |
I'm not following. I was expecting to see |
|
@wpietri You're right, the call to
|
|
@bkorycki I'm saying that you could also drop the Unless maybe there's some advantage to doing the work twice? |
|
@wpietri What's wrong with validating UIDs at the start of execution? That's the only place we call that method, so I'm not sure what you mean by doing it twice. |
|
@bkorycki: As I showed before, you can remove the Pre-checking the SUT uids was mildly helpful before, because if somebody put in two bad |
|
@wpietri Okay, I think I did what you want. I removed the |
rogthefrog
left a comment
There was a problem hiding this comment.
This is a great improvement. I have a couple of questions/suggestions in the review, but they can be addressed later (if ever) and shouldn't block merging.
| UNKNOWN = "unknown" | ||
|
|
||
|
|
||
| # TODO: Auto-collect. Maybe make "driver" keys a constant in each factory module. |
There was a problem hiding this comment.
They are, at least in some. E.g. huggingface_sut_factory line 19. That was the intent originally.
There was a problem hiding this comment.
Yeah, I think it would be nice if it were fully automatic though:)
| # Maps a string to the module and factory function in that module | ||
| # that can be used to create a dynamic sut | ||
| DYNAMIC_SUT_FACTORIES: dict = { | ||
| "proxied": {"hfrelay": HuggingFaceSUTFactory}, |
There was a problem hiding this comment.
I have a vague memory of chatting about flattening this and ditching the proxied vs direct distinction. Did you decide to keep it here?
There was a problem hiding this comment.
I decided that there was already too much in this PR haha. I still think it would be valuable to do though
| def _make_dynamic_sut(self, uid: str) -> SUT: | ||
| sut_metadata: DynamicSUTMetadata = DynamicSUTMetadata.parse_sut_uid(uid) | ||
|
|
||
| if sut_metadata.is_proxied(): |
There was a problem hiding this comment.
This would be simpler if we removed the proxied vs direct distinction.
There was a problem hiding this comment.
Yes! This whole chunk would be unnecessary.
| class TogetherSUTFactory(DynamicSUTFactory): | ||
| def __init__(self, raw_secrets: RawSecrets): | ||
| super().__init__(raw_secrets) | ||
| self._client = None # Lazy load. |
There was a problem hiding this comment.
Did you want to add a type hint like for the OpenAI client above?
There was a problem hiding this comment.
I think I only added it for openai because mypy was complaining
| benchmark_options.extend(["--prompt-set", "practice"]) | ||
|
|
||
| with pytest.raises(SUTNotFoundException): | ||
| with pytest.raises(ValueError, match="No registration for bogus"): |
There was a problem hiding this comment.
What's the reason to test for a more generic exception? SUTNotFoundException and related exception classes are more expressive IMO.
There was a problem hiding this comment.
Yeah, it is more expressive. I only changed it to accommodate for the fact that SUTs dont get validated with validate_uid anymore. I think the failure actually happens in when check_secrets tries to get the thing from the registry, and the registry raises that error. I unfortunately can't change it to raise a SUTNotFoundException because the code is in the InstanceFactory class, which handles other registry types as well e.g. tests, annotators. I agree this is less ideal.
| with pytest.raises(SUTNotFoundException): | ||
| _ = run_cli("run-test", "--sut", "unknown-uid", "--test", "fake-test") | ||
| del TESTS._lookup["fake-test"] | ||
| with pytest.raises(ValueError, match="No registration for unknown-uid"): |
There was a problem hiding this comment.
Same question about the specific exception classes.
wpietri
left a comment
There was a problem hiding this comment.
Huge progress. Thanks for doing this!
This is an attempt to refactor the dynamic SUTs to address the concerns mentioned in PR #1131 , summarized below:
make_sut, instead of a registry-tuple. (Related to above).This is what I tried to do:
SUTFactorythat is responsibly for making all SUTs. It encapsulates the dynamic SUT functionality.SUTFactory.make_instance()for dynamic suts and pre-registered SUTs. The caller doesn't have to know anything about what type of SUT it is. We can strip the registry functionality from the object once we get to that point.Here's where I got stuck:
validate_uidandcheck_secretsfunctions both work well with the registry system. To extend those pre-flight functions to work on dynamic SUTs, they would have to attempt to create the dynamic SUT. But if a SUT is successfully created, they might as well register that SUT. Otherwise it will just have to do the work again whenmake_instanceis called.(Less important) I think there is room for further clean up with how we map driver names to dynamic SUT factories. ie I think the driver name can be a part of the dynamic SUT factory classes, and we can collect those objects automatically instead of populating a dictionary. But I think this is work that can be done after sorting out this PR.