Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add DistilRoberta Model to OSS (cherry picked commit) #1998

Merged
merged 3 commits into from
Dec 1, 2022
Merged

Conversation

rshraga
Copy link
Contributor

@rshraga rshraga commented Dec 1, 2022

Summary:
This diff adds a DistilRoberta to torchtext oss

This model is a distilled version of the full Roberta Base model. Weights for this model are taken from HF https://huggingface.co/distilroberta-base

The state dict is loaded and modified to work with the internal Roberta implementation here: https://www.internalfb.com/intern/anp/view/?id=2794739

Comparison of DistilRoberta to Roberta-base on the GLUE benchmark (as reported here https://github.com/huggingface/transformers/blob/main/examples/research_projects/distillation/README.md) {F806809901}
DistilRoBERTa reaches 95% of RoBERTa-base's performance on GLUE while being twice faster and 35% smaller.

Reviewed By: Nayef211

Differential Revision: D41590601

fbshipit-source-id: 394d10c45bbee5d2e71e14e30edf9b1a9d9380e6

Summary:
This diff adds a DistilRoberta to torchtext oss

This model is a distilled version of the full Roberta Base model. Weights for this model are taken from HF https://huggingface.co/distilroberta-base

The state dict is loaded and modified to work with the internal Roberta implementation here: https://www.internalfb.com/intern/anp/view/?id=2794739

Comparison of DistilRoberta to Roberta-base on the GLUE benchmark (as reported here https://github.com/huggingface/transformers/blob/main/examples/research_projects/distillation/README.md)
{F806809901}
DistilRoBERTa reaches 95% of RoBERTa-base's performance on GLUE while being twice faster and 35% smaller.

Reviewed By: Nayef211

Differential Revision: D41590601

fbshipit-source-id: 394d10c45bbee5d2e71e14e30edf9b1a9d9380e6
@rshraga
Copy link
Contributor Author

rshraga commented Dec 1, 2022

check failures look unrelated

@joecummings
Copy link
Contributor

I think you need to rebase on main. The failures are related to a fix that went in yesterday @rshraga .

Roman Shraga and others added 2 commits December 1, 2022 14:58
Summary:
This diff adds a DistilRoberta to torchtext oss

This model is a distilled version of the full Roberta Base model. Weights for this model are taken from HF https://huggingface.co/distilroberta-base

The state dict is loaded and modified to work with the internal Roberta implementation here: https://www.internalfb.com/intern/anp/view/?id=2794739

Comparison of DistilRoberta to Roberta-base on the GLUE benchmark (as reported here https://github.com/huggingface/transformers/blob/main/examples/research_projects/distillation/README.md)
{F806809901}
DistilRoBERTa reaches 95% of RoBERTa-base's performance on GLUE while being twice faster and 35% smaller.

Reviewed By: Nayef211

Differential Revision: D41590601

fbshipit-source-id: 394d10c45bbee5d2e71e14e30edf9b1a9d9380e6
Copy link
Contributor

@Nayef211 Nayef211 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM!

@rshraga rshraga merged commit 1020fae into main Dec 1, 2022
@rshraga rshraga deleted the distilroberta branch December 1, 2022 21:54
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants