Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Alignment on modelcard metadata specification #39

Merged
merged 3 commits into from
May 21, 2021
Merged

Conversation

LysandreJik
Copy link
Member

@LysandreJik LysandreJik commented May 6, 2021

Hi all, opening this PR so that we can all align on the metadata spec for model cards. This metadata is important as it closes the bridge between tasks, datasets, and metrics for a given checkpoint. This will eventually allow programmatic analysis and handling of model cards' metadata on the hub.

The metadata was drafted during the collaboration with papers-with-code in order to have a ranked leaderboard for the XLSR sprint. The following format was adopted:

model-index:  
- name: {model_id}
  results:
  - task: 
      name: {task_name}
      type: {task_type}
    dataset:
      name: {dataset_name}
      type: {dataset_type}
      args: {arg_0}
    metrics:
      - name: {metric_name} 
        type: {metric_type}
        value: {metric_value}
        args: {arg_0}

This format should allow for multiple tasks and multiple metrics within each task.

Some existing examples of modelcards with this format are the modelcards uploaded during the XLSR sprint, like the following ydshieh/wav2vec2-large-xlsr-53-chinese-zh-cn-gpt

Looking forward to your feedback.

Transformers: @sgugger @patrickvonplaten
Datasets: @lhoestq
AutoNLP: @abhi1thakur @SBrandeis
Evaluation: @lewtun

@Pierrci @julien-c @thomwolf

@lhoestq
Copy link
Member

lhoestq commented May 6, 2021

Cool thanks !

One note regarding metrics: they might need args as well.
For example model-based metrics such a BLEURT, BERTScore or COMET may require a model_name argument.
Another example is BLEU that have a max_order parameter (max n-gram order to use when computing BLEU score), as well smoothing parameters, or the tokenizer to use.

@julien-c
Copy link
Member

julien-c commented May 6, 2021

Very excited about this, and let's also ping the Paperswithcode team here for validation?

@LysandreJik
Copy link
Member Author

Absolutely! Pinged them offline for a review.

@jspisak
Copy link

jspisak commented May 6, 2021

validation and integration? :)

@rstojnic
Copy link

rstojnic commented May 6, 2021

Thanks for looping us in! Yeah the format looks good. Since we already have the integration for wav2vec2 running, it should be pretty easy from our side to extend it to any result on any benchmark.

A couple of things to consider:

  1. To make sure we are using the same task/data names, I think it would be great to ask users to check the task/dataset name on https://paperswithcode.com/sota and https://paperswithcode.com/datasets . This would ensure all the results are put in the correct leaderboard on Papers with Code, and that it remains interoperable with anyone else wanting to use this data.
  2. It would be great if there was an automated badge or something like that would link to the Papers with Code leaderboard on the model page. We could create a URL you could easily construct from the metadata, e.g. for the example above it could be https://paperswithcode.com/sota/?task=Speech Recognition&dataset=Common Voice zh-CN that would redirect you to the correct leaderboard Or, there could also be a small embeddable graph or just a badge with ranking.
  3. For us to be able to efficiently track this metadata it would be useful to have an API endpoint where we can access all the latest modelcard changes. I.e. something similar to what we are using in the current integration (https://huggingface.co/api/models), but just with the ability to order by last changed.

@lewtun
Copy link
Member

lewtun commented May 11, 2021

One question re the front-end: can we use this schema to enable sorting by metric value on the Hub?

For example, it would be cool to help users answer questions like "Which model achieves the highest metric value X on dataset Y?" I realise this overlaps with PWC's leaderboards (example), but still think there's value in providing this kind of overview to Hub users.

@julien-c
Copy link
Member

@lewtun At some point we might want to display some sort of leaderboard-lite on the hf.co hub, but for now I feel like our main goal is on the data side, i.e. to ensure that as many models as possible contain the correct metadata in a format that's easily validated/leveraged by tools including Paperswithcode

@LysandreJik
Copy link
Member Author

Will add an args field to the metrics as mentioned by @lhoestq. Will add and merge by Friday if no one is opposed to that change.

@LysandreJik LysandreJik mentioned this pull request May 19, 2021
@lhoestq
Copy link
Member

lhoestq commented May 20, 2021

One question regarding dataset versioning: is this also something we want to include in the dataset card ? IMO this would be nice for reproducibility.

@julien-c
Copy link
Member

@rstojnic Regarding your point 3. you'll have this here when #41 is merged (or you're welcome to just call the underlying API if it's simpler – its params should be mostly stable now)

@LysandreJik LysandreJik merged commit 2828953 into main May 21, 2021
@LysandreJik LysandreJik deleted the modelcard-spec branch May 21, 2021 07:54
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

6 participants