Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Skops integration: Load tabular classification and regression models from the hub #2126

Merged
merged 7 commits into from
Sep 6, 2022

Conversation

freddyaboulton
Copy link
Collaborator

@freddyaboulton freddyaboulton commented Aug 30, 2022

Description

Ability to load tabular classification and regression models from the hub and turn it into a demo.

Closes: #2015

Tabular Regression

import gradio as gr

gr.Interface.load("models/skops-ci/test-3255bd22-bb75-4641-8655-824cd25d140f").launch()

Users can edit the dataframe to get new predictions

tabular_regression_demo

Tabular classification

import gradio as gr

gr.Interface.load("models/scikit-learn/tabular-playground").launch()

image

Demo with missing data in the input widget

import gradio as gr

gr.Interface.load("models/demo-org/tabular-playground").launch()

image

What happens when api request fails

image

Checklist:

  • I have performed a self-review of my own code
  • My code follows the style guidelines of this project
  • I have commented my code in hard-to-understand areas
  • I have made corresponding changes to the documentation
  • I have added tests that prove my fix is effective or that my feature works
  • New and existing unit tests pass locally with my changes

@github-actions
Copy link
Contributor

All the demos for this PR have been deployed at https://huggingface.co/spaces/gradio-pr-deploys/pr-2126-all-demos

@BenjaminBossan
Copy link

BenjaminBossan commented Aug 31, 2022

For skops models, instead of reading the sample data from the README, it would also be possible to read the config.json, which also contains the sample data (or is there any reason not to @adrinjalali?). That way, you can get rid of the regex and pyyaml dependency.

Example: https://huggingface.co/scikit-learn/tabular-playground/blob/main/config.json

@adrinjalali
Copy link

as long as skops is concerned, we only deal with the config.json file, so we don't have pyyaml and regex work in it. Those are only in the README file since the widget expects them.

@freddyaboulton
Copy link
Collaborator Author

Good point about the config.json vs README @BenjaminBossan @adrinjalali ! Since reading the input data from the README will work for models that were uploaded with and without skops, e.g. (https://huggingface.co/julien-c/wine-quality), I think I will stick with that.

example_yaml = next(yaml.safe_load_all(readme.text[: yaml_regex.span()[-1]]))
example_data = example_yaml.get("widget", {}).get("structuredData", {})
if not example_data:
raise ValueError(
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My reasoning for error-ing if there is not example data in the repo is that without it we'd display a bare dataframe as input and it'd be cumbersome for users to type out all the feature names and inputs. Cumbersome enough that it defeats the shareability of gradio demos.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Regardless of the feature names being provided, each feature has it's own value range or feature type anyway, so it doesn't make sense even if you provide everything. What would make sense would be people calling it and loading the interface with dynamic dataframe and still provide an example themselves in the interface.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Heads up: ended up filing the issue we talked about #2155 . Once this is fixed it may be possible to show an empty dataframe and have users type in all the values themselves.

@@ -42,6 +49,53 @@ def load_blocks_from_repo(name, src=None, api_key=None, alias=None, **kwargs):
return blocks


def get_tabular_examples(model_name) -> Dict[str, List[float]]:
readme = requests.get(f"https://huggingface.co/{model_name}/resolve/main/README.md")
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can either get the example data from the README or the config.json but the config.json will only have the example data if the model was uploaded with skops.

I think it would be better if gradio could create a demo for any tabular model and not just those created with skops. Downside is that it introduces a pyyaml dependency.

In the future, once the skops config json file contains richer metadata about feature types (categorical vs null) etc we can read from the config.json if it's present.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it was something we didn't want to have in model card specifically
(cc @adrinjalali is working on having dtypes atm)
Maybe you could check for both? @freddyaboulton

]


@pytest.mark.parametrize(
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should cover most of the weirdness from malformed READMEs. Haven't actually come across a repo with bad README data but it's possible.

@freddyaboulton freddyaboulton marked this pull request as ready for review September 1, 2022 14:27
@freddyaboulton
Copy link
Collaborator Author

@BenjaminBossan @adrinjalali @merveenoyan Can't officially tag you as a reviewer but feel free to give this a look when you get a chance!

Copy link
Contributor

@merveenoyan merveenoyan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not much of a feedback but to help you out with perspective 🙂

example_yaml = next(yaml.safe_load_all(readme.text[: yaml_regex.span()[-1]]))
example_data = example_yaml.get("widget", {}).get("structuredData", {})
if not example_data:
raise ValueError(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Regardless of the feature names being provided, each feature has it's own value range or feature type anyway, so it doesn't make sense even if you provide everything. What would make sense would be people calling it and loading the interface with dynamic dataframe and still provide an example themselves in the interface.

@@ -242,5 +243,64 @@ def test_interface_load_cache_examples(tmp_path):
)


def test_cols_to_rows():
assert cols_to_rows({"a": [1, 2, "NaN"], "b": [1, "NaN", 3]}) == (
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just a small question, if there's a cell left empty, how is it handled? Do you impute "NaN" directly? (how is it sent to inference?)

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Right now it would be left empty but that doesn't work for the inference API so I'll replace with "NaN" instead! Thank you

Copy link

@BenjaminBossan BenjaminBossan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM overall. I know too little about what we can expect from the input data to judge whether cols_to_rows and rows_to_cols cover all edge cases, but I couldn't spot anything obviously incorrect.

Copy link
Contributor

@merveenoyan merveenoyan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks a lot for this PR, I will make sure this is well adopted 🙂❤️

@@ -42,6 +49,53 @@ def load_blocks_from_repo(name, src=None, api_key=None, alias=None, **kwargs):
return blocks


def get_tabular_examples(model_name) -> Dict[str, List[float]]:
readme = requests.get(f"https://huggingface.co/{model_name}/resolve/main/README.md")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it was something we didn't want to have in model card specifically
(cc @adrinjalali is working on having dtypes atm)
Maybe you could check for both? @freddyaboulton

@merveenoyan
Copy link
Contributor

@freddyaboulton when would this get merged? I'm planning to do a blog post on skops want to include gradio :)

@freddyaboulton freddyaboulton merged commit eb81fa2 into main Sep 6, 2022
@freddyaboulton freddyaboulton deleted the 2015-load-skops-models branch September 6, 2022 13:15
@freddyaboulton
Copy link
Collaborator Author

@merveenoyan just merged! Looking forward to the post :)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Create Gradio Demo from skops pipeline
4 participants