-
Notifications
You must be signed in to change notification settings - Fork 229
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
add metric template #14
Conversation
I've building on @osanseviero's POC to add more functionalities (including displaying the metric cards): https://huggingface.co/spaces/huggingface/metric-explorer |
Maybe we can have something similar to the CLI command in transformers that creates a new model and renames the classes automatically ? |
As for cookiecutter, this is a template done by @nateraw that might be useful here https://github.com/nateraw/spaces-template |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is very nice!
templates/README.md
Outdated
# Metric | ||
|
||
## Metric description | ||
|
||
## How to use | ||
|
||
## Examples | ||
|
||
## References | ||
|
||
## Limitations and bias |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This won't show up in the space, so I wonder if this should instead be in some place that will later be displayed in the Space. You already do this with _DESCRIPTION
for example, so I was wondering if all of this should be over there instead.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, indeed there is some duplication here. I was thinking we could read the README.md
in the gradio app and display it. The reasons why I thought it is nicer as a separate file:
- it's easier to edit a markdown file directly than a string in Python
- if we ever decide to make a dedicated metrics/evaluate repository type we would already have a README for all repos
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, I think that's good. We can read the md
file content and add it to the article section of the demo.
templates/tests.py
Outdated
@@ -0,0 +1,12 @@ | |||
test_cases = [ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Very nice idea!
templates/app.py
Outdated
|
||
iface = gr.Interface( | ||
fn=compute, | ||
inputs=gr.inputs.Dataframe(headers=metric_features, col_width=len(metric_features), datatype="number"), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should the datatype also be specified programatically?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes I was thinking about that, too. With #15 we should be able to infer that. I think we also need to add a case for when we can't infer the type (e.g. somebody implements a metric for a new modality). Maybe we can then just display a text saying that the widget is not available for that metric but it could be implemented in app.py
.
Co-authored-by: Quentin Lhoest <42851186+lhoestq@users.noreply.github.com>
I have been working on the Gradio widget a bit more to make it more generic. Here's the ideas: Parsing between
|
This is great!! I particularly like the idea of moving the Gradio code inside General thoughts, by topic: READMEs Input/outputs Comparison feature |
I've updated the PR with a working cookiecutter template and CLI. You can now run: evaluate-cli create "Aweeesoooome Metric" which creates a new gradio space and clones it, populates the template and adds it to the space repo and pushes the changes. One then only needs to adapt the folder and push the changes again. The following message is displayed at the end of the command:
The resulting space of that command can be found here: @sashavor Regarding the README.md: At the moment the README is displayed after the widget so I don't think it is a big issue if it is too long. We can open a issue on the gradio repo should we need it, but let's have a look first how they'll look. Next steps: Setup a separate PR to enable loading metrics from the Hub. The last step in the instructions does not work, yet, as it will look for the metric in the |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM, just a few nits:
import subprocess | ||
from pathlib import Path | ||
|
||
from cookiecutter.main import cookiecutter |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think you need to add cookiecutter in setup.py ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done
Co-authored-by: Quentin Lhoest <42851186+lhoestq@users.noreply.github.com>
Thanks @lhoestq for your suggestions - I added them! |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is super nice! Thanks for this 🔥
@@ -89,6 +89,8 @@ | |||
# Utilities from PyPA to e.g., compare versions | |||
"packaging", | |||
"responses<0.19", | |||
# to populate metric template | |||
"cookiecutter" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
do you want this to be a required dependency?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
moved it to a "template" requirement group.
REGEX_YAML_BLOCK = re.compile(r"---[\n\r]+([\S\s]*?)[\n\r]+---[\n\r]") | ||
|
||
|
||
def infer_gradio_input_types(feature_types): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think you do this in the repo rn, but WDYT of making internal functions a bit more explicitly internal/private? That way other people don't import it and handling backwards compatibility is easier.
I would just add _
prefix and don't export it. Same for other functions
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For this specifically the reason I think it would be a good idea is to expose them to the user is that some metrics might require a custom gradio widget and the user can easily reuse these helper functions to make it more useful.
Do you think that's not necessary or would you avoid custom widgets?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this might be a bit too early of an optimization, but I Have no strong opinion, so feel free to make it public if you think it will be useful to users
return examples | ||
|
||
|
||
def launch_gradio_widget(metric): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is super nice
BTW not sure if this was mentioned but you'll be able to list all those Gradio apps with |
This PR adds a proposal template to create new metrics incl. an
app.py
such that they can be pushed toSpaces
and displayed with Gradio (@osanseviero made a PoC here). The template includes:new_metric_script.py
: the main code for the metric is hereREADME.md
: includes the Spaces tags in meta and takes inspiration from @sashavor's and @emibaylor's template for the body: Create SQuAD metric README.md datasets#3873requirements.txt
: file to include dependancies specific to a metrictests.py
: includes a few input/output pairs that we could use this to automatically test metrics and populate the spaces widget with examples. the idea of this vs. the doctests was to be more thorough and include edge casesapp.py
: the code for the Gradio appWe could use
cookiecutter
to easily setup a new metric and populate some of the information, such that the main manual work would be addin content instead of renaming files/classes etc.What do you think? @lhoestq @sashavor @osanseviero