Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Enable automated model list copying for localized READMEs #13465

Merged
merged 10 commits into from
Sep 8, 2021

Conversation

qqaatw
Copy link
Contributor

@qqaatw qqaatw commented Sep 7, 2021

What does this PR do?

Currently, the model list in each localized README such as README_zh-hans.md is updated manually, this PR introduces automated model list copying for localized READMEs. A proper tester for this change has been included in this PR.

The model list of a localized README is updated through the following steps:

  1. Check if every model in the model list of README.md exists in a localized README, according to the model name e.g. BERT.
  2. If a model doesn't exist in a localized README, the metadata of the model is fetched from README.md and substituted to a predefined localized format string.
  3. Repeat step 1 and 2 until all models are checked.
  4. Sort models in a localized model list alphabetically.

Who can review?

Anyone in the community is free to review the PR once the tests have passed. Feel free to tag
members/contributors who may be interested in your PR.

@sgugger

Copy link
Collaborator

@sgugger sgugger left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks a lot for your PR, it looks very cool! I have left a few comments and let's also see if @JetRunner has some comments.

utils/check_copies.py Outdated Show resolved Hide resolved
utils/check_copies.py Outdated Show resolved Hide resolved
utils/check_copies.py Outdated Show resolved Hide resolved
@JetRunner
Copy link
Contributor

@qqaatw Thanks for taking care of this - I'm thinking since currently, the traditional Chinese version of readme doesn't translate this list, maybe we can just leave it untranslated for simplified Chinese as well. Does that simplify things?

@qqaatw
Copy link
Contributor Author

qqaatw commented Sep 8, 2021

@JetRunner Thanks for your reply!

I think this depends on the preference of simplified Chinese users. Translated or not has no difference since this PR introduces a way that directly captures metadata from the English version, and the captured metadata can be substituted into any predefined localized model description format string. Considering If there are more localized READMEs translated in the future, let's say Japanese version of README, this method can also apply to the model list of these READMEs.

In addition, some of models have supplemental data that can be manually translated after automated copying. For example, the text below is the supplemental data of DistilBERT:

The same method has been applied to compress GPT2 into [DistilGPT2](https://github.com/huggingface/transformers/tree/master/examples/distillation), RoBERTa into [DistilRoBERTa](https://github.com/huggingface/transformers/tree/master/examples/distillation), Multilingual BERT into [DistilmBERT](https://github.com/huggingface/transformers/tree/master/examples/distillation) and a German version of DistilBERT.

The simplified Chinese version has the supplemental data translated, which looks good to me:

同样的方法也应用于压缩 GPT-2 到 [DistilGPT2](https://github.com/huggingface/transformers/tree/master/examples/distillation), RoBERTa 到 [DistilRoBERTa](https://github.com/huggingface/transformers/tree/master/examples/distillation), Multilingual BERT 到 [DistilmBERT](https://github.com/huggingface/transformers/tree/master/examples/distillation) 和德语版 DistilBERT。

@JetRunner
Copy link
Contributor

@JetRunner Thanks for your reply!

I think this depends on the preference of simplified Chinese users. Translated or not has no difference since this PR introduces a way that directly captures metadata from the English version, and the captured metadata can be substituted into any predefined localized model description format string. Considering If there are more localized READMEs translated in the future, let's say Japanese version of README, this method can also apply to the model list of these READMEs.

In addition, some of models have supplemental data that can be manually translated after automated copying. For example, the text below is the supplemental data of DistilBERT:


The same method has been applied to compress GPT2 into [DistilGPT2](https://github.com/huggingface/transformers/tree/master/examples/distillation), RoBERTa into [DistilRoBERTa](https://github.com/huggingface/transformers/tree/master/examples/distillation), Multilingual BERT into [DistilmBERT](https://github.com/huggingface/transformers/tree/master/examples/distillation) and a German version of DistilBERT.

The simplified Chinese version has the supplemental data translated, which looks good to me:


同样的方法也应用于压缩 GPT-2 到 [DistilGPT2](https://github.com/huggingface/transformers/tree/master/examples/distillation), RoBERTa 到 [DistilRoBERTa](https://github.com/huggingface/transformers/tree/master/examples/distillation), Multilingual BERT 到 [DistilmBERT](https://github.com/huggingface/transformers/tree/master/examples/distillation) 和德语版 DistilBERT。

Sounds great ;)

@qqaatw
Copy link
Contributor Author

qqaatw commented Sep 8, 2021

One exception I found is that there are few models (GPT-J, GPT-Neo, and T5v1.1) using released in the repository not the regular released with the paper. The localized description of this kind of models should manually be updated.

@qqaatw qqaatw requested a review from sgugger September 8, 2021 10:19
@qqaatw
Copy link
Contributor Author

qqaatw commented Sep 8, 2021

@sgugger The suggestions you provided have been applied. Thanks for the review!

utils/check_copies.py Outdated Show resolved Hide resolved
@sgugger sgugger merged commit 18447c2 into huggingface:master Sep 8, 2021
@sgugger
Copy link
Collaborator

sgugger commented Sep 8, 2021

Just committed one last ypo fix, thanks a lot for your PR!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants