Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Create Gradio Demo from skops pipeline #2015

Closed
1 task done
freddyaboulton opened this issue Aug 15, 2022 · 8 comments · Fixed by #2126
Closed
1 task done

Create Gradio Demo from skops pipeline #2015

freddyaboulton opened this issue Aug 15, 2022 · 8 comments · Fixed by #2126
Assignees
Labels
enhancement New feature or request

Comments

@freddyaboulton
Copy link
Collaborator

freddyaboulton commented Aug 15, 2022

  • I have searched to see if a similar issue already exists.

Is your feature request related to a problem? Please describe.
(skops)[https://huggingface.co/blog/skops] makes it easy to host a scikit learn model on the model hub and tag it with a model card that includes important metadata, like sklearn version, input data types, and example inputs.

It would be really helpful to users to automatically create gradio demos from scikit learn models on the hub.

Describe the solution you'd like

The existing api for loading transformers pipelines on the hub should work for skops pipelines.

gr.Interface.load("models/<user-name>/<sklearn-pipeline-name>").launch(...)

Additional context
Out of scope - "rich" ui that depends on the input types, e.g. drop-down for categorical. This information is not tracked by skops in the metadata yet.

@abidlabs abidlabs added the enhancement New feature or request label Aug 15, 2022
@freddyaboulton freddyaboulton self-assigned this Aug 25, 2022
@merveenoyan
Copy link
Contributor

x-posting

I would like a huggingface transformers-like integration for gradio where you can call the model from the Hub and build an interface in one line of code, like below:

interface = gr.Interface.load("huggingface/scikit-learn/tabular-playground")

What I have in mind is dataframe component in, dataframe component out UI.

Wanted to open discussion for requirements to have this and what to do (might be cool if you could somehow put links on to-do's for previous transformers integration). @freddyaboulton @abidlabs (might be interested)

The UI looks like this: https://huggingface.co/spaces/scikit-learn/tabular-playground (dataframe in, dataframe out)
However, I couldn't manage to pass examples like we used to do as list of lists, where in each inner list is a row (an example) I opened an issue (#2089 ).

We can automatically receive headers and example inputs from config.json files of skops repositories and construct dataframe component by passing those.

Also pinging @adrinjalali and @BenjaminBossan here.

@freddyaboulton
Copy link
Collaborator Author

@merveenoyan Thanks for filing! Really excited for this.

In terms of requirements, what if instead of dataframe in, dataframe out, we go from List[Component] to either Label or Number depending on whether it's classification or regression?

Something like this: https://huggingface.co/spaces/gradio/xgboost-income-prediction-with-explainability . My reasoning is that if we focus the demo on a single input at a time, you can provide nicer UI depending on what each feature type to the model is (Slider vs Dropdown) as opposed to putting everything in the same dataframe. The Label component might be better than a dataframe column output as well.

@adrinjalali
Copy link

I think there are at least two usecases here. One is that we want predictions for a single input, in which case @freddyaboulton's suggestion makes sense, and another one is to have a csv or a data file for which we'd like the output. To me, most tabular usecases don't want to have the output for a single input. They need the output for a batch, and if they need the output for a single input, it's not the user wanting to get that, it's a program sending API calls.

I'm not sure how these components work, but I think having a table with inputs and having the output in the same or a different table would make sense.

@merveenoyan
Copy link
Contributor

merveenoyan commented Aug 26, 2022

Ekran Resmi 2022-08-26 20 55 42

@adrinjalali I'm gonna make a template of how it should look like for each task. I feel like for the above one it might be good if for tabular-classification models, we could do predict_proba and then bar chart for each class probability. (never tried for tabular though)

@merveenoyan
Copy link
Contributor

merveenoyan commented Aug 29, 2022

@freddyaboulton if you need help on skops side feel free to ping me for discussions & more!

@freddyaboulton
Copy link
Collaborator Author

freddyaboulton commented Aug 30, 2022

@adrinjalali @merveenoyan @BenjaminBossan I have an MVP for loading skops models from the hub in #2126

Before going any deeper, I wanted to check in with you!

You can test as follows:

Tabular Regression

import gradio as gr

gr.Interface.load("models/skops-ci/test-de5a7f55-f6de-4879-899c-d9ca096be32b").launch()

tabular_regression_demo

Tabular Classification

import gradio as gr

gr.Interface.load("models/julien-c/wine-quality").launch()

tabular_classification_demo

I can't find a tabular classification model that works since they all emit warnings and this returns a 400 error code despite being able to return predictions.

Slack discussion about this here: https://huggingface.slack.com/archives/C016D661PAN/p1661878721296949

The examples are displayed a bit weird but we’re working on fixing that.

Questions:

  1. Do we want to return predicted class labels or probabilities for classification problems? And if we want probabilities, is there a way to get that from the inference api?
  2. What's the long term plan regarding warnings being emitted from sklearn models? In my PR, gradio will error out but display the warnings/errors in the log but ideally the inference api should return a 200 code I think.

@adrinjalali
Copy link

Thanks for the work here, looking great.

As for a classification one, you could use this one for instance: https://huggingface.co/scikit-learn/tabular-playground

The output of the API will be labels instead of numbers if the model is trained on labels, otherwise numbers. We'll be working on putting better examples online which return labels.

Returning probabilities and soft values is in the roadmap, we still need to figure out when and how to do it.

As for warnings, if there are warnings, for now we shouldn't be trusting model output. The model might not even be getting the input features in the right order with some of those warnings. So the output is really not reliable.

cc @Narsil @BenjaminBossan

@freddyaboulton
Copy link
Collaborator Author

Thank you for the helpful comment @adrinjalali ! I will use the tabular-playground model to test and I'll keep returning an error in gradio if the api doesn't return a 200.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants