-
Notifications
You must be signed in to change notification settings - Fork 2.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Skops integration: Load tabular classification and regression models from the hub #2126
Conversation
All the demos for this PR have been deployed at https://huggingface.co/spaces/gradio-pr-deploys/pr-2126-all-demos |
0a62f8f
to
c6285d2
Compare
For skops models, instead of reading the sample data from the README, it would also be possible to read the Example: https://huggingface.co/scikit-learn/tabular-playground/blob/main/config.json |
as long as |
Good point about the config.json vs README @BenjaminBossan @adrinjalali ! Since reading the input data from the README will work for models that were uploaded with and without skops, e.g. (https://huggingface.co/julien-c/wine-quality), I think I will stick with that. |
c6285d2
to
06db951
Compare
example_yaml = next(yaml.safe_load_all(readme.text[: yaml_regex.span()[-1]])) | ||
example_data = example_yaml.get("widget", {}).get("structuredData", {}) | ||
if not example_data: | ||
raise ValueError( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
My reasoning for error-ing if there is not example data in the repo is that without it we'd display a bare dataframe as input and it'd be cumbersome for users to type out all the feature names and inputs. Cumbersome enough that it defeats the shareability of gradio demos.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Regardless of the feature names being provided, each feature has it's own value range or feature type anyway, so it doesn't make sense even if you provide everything. What would make sense would be people calling it and loading the interface with dynamic dataframe and still provide an example themselves in the interface.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Heads up: ended up filing the issue we talked about #2155 . Once this is fixed it may be possible to show an empty dataframe and have users type in all the values themselves.
@@ -42,6 +49,53 @@ def load_blocks_from_repo(name, src=None, api_key=None, alias=None, **kwargs): | |||
return blocks | |||
|
|||
|
|||
def get_tabular_examples(model_name) -> Dict[str, List[float]]: | |||
readme = requests.get(f"https://huggingface.co/{model_name}/resolve/main/README.md") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We can either get the example data from the README or the config.json
but the config.json
will only have the example data if the model was uploaded with skops.
I think it would be better if gradio could create a demo for any tabular model and not just those created with skops. Downside is that it introduces a pyyaml dependency.
In the future, once the skops config json file contains richer metadata about feature types (categorical vs null) etc we can read from the config.json if it's present.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
it was something we didn't want to have in model card specifically
(cc @adrinjalali is working on having dtypes atm)
Maybe you could check for both? @freddyaboulton
] | ||
|
||
|
||
@pytest.mark.parametrize( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should cover most of the weirdness from malformed READMEs. Haven't actually come across a repo with bad README data but it's possible.
@BenjaminBossan @adrinjalali @merveenoyan Can't officially tag you as a reviewer but feel free to give this a look when you get a chance! |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not much of a feedback but to help you out with perspective 🙂
example_yaml = next(yaml.safe_load_all(readme.text[: yaml_regex.span()[-1]])) | ||
example_data = example_yaml.get("widget", {}).get("structuredData", {}) | ||
if not example_data: | ||
raise ValueError( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Regardless of the feature names being provided, each feature has it's own value range or feature type anyway, so it doesn't make sense even if you provide everything. What would make sense would be people calling it and loading the interface with dynamic dataframe and still provide an example themselves in the interface.
@@ -242,5 +243,64 @@ def test_interface_load_cache_examples(tmp_path): | |||
) | |||
|
|||
|
|||
def test_cols_to_rows(): | |||
assert cols_to_rows({"a": [1, 2, "NaN"], "b": [1, "NaN", 3]}) == ( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just a small question, if there's a cell left empty, how is it handled? Do you impute "NaN" directly? (how is it sent to inference?)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Right now it would be left empty but that doesn't work for the inference API so I'll replace with "NaN" instead! Thank you
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM overall. I know too little about what we can expect from the input data to judge whether cols_to_rows
and rows_to_cols
cover all edge cases, but I couldn't spot anything obviously incorrect.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks a lot for this PR, I will make sure this is well adopted 🙂❤️
@@ -42,6 +49,53 @@ def load_blocks_from_repo(name, src=None, api_key=None, alias=None, **kwargs): | |||
return blocks | |||
|
|||
|
|||
def get_tabular_examples(model_name) -> Dict[str, List[float]]: | |||
readme = requests.get(f"https://huggingface.co/{model_name}/resolve/main/README.md") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
it was something we didn't want to have in model card specifically
(cc @adrinjalali is working on having dtypes atm)
Maybe you could check for both? @freddyaboulton
@freddyaboulton when would this get merged? I'm planning to do a blog post on skops want to include gradio :) |
@merveenoyan just merged! Looking forward to the post :) |
Description
Ability to load tabular classification and regression models from the hub and turn it into a demo.
Closes: #2015
Tabular Regression
Users can edit the dataframe to get new predictions
Tabular classification
Demo with missing data in the input widget
What happens when api request fails
Checklist: