Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Community contribution - BetterTransformer integration for more models! #20372

Closed
6 of 14 tasks
younesbelkada opened this issue Nov 22, 2022 · 80 comments · Fixed by huggingface/optimum#542 · May be fixed by huggingface/optimum#548, huggingface/optimum#907 or huggingface/optimum#1065

Comments

@younesbelkada
Copy link
Contributor

younesbelkada commented Nov 22, 2022

BetterTransformer integration for more models!

BetterTransformer API provides faster inference on CPU & GPU through a simple interface!

Models can benefit from very interesting speedups using a one liner and by making sure to install the latest version of PyTorch. A complete guideline on how to convert a new model has been created on the BetterTransformer documentation!

Here is a list of models that could be potentially supported, pick one of the architecture below and let's discuss about the conversion!

Text models 🖊️ :

Vision models 📷 :

Audio models 🔉 :

Let us also know if you think that some architectures can be supported that we missed. Note that for encoder-decoder based models below, we expect to convert the encoder only.

Support for decoder-based models coming soon!

cc @michaelbenayoun @fxmarty

huggingface/optimum#488

@hamishdickson
Copy link

NotImplementedError: The Better Transformers implementation for the model DebertaV2Model has not beenimplemented yet. Please open an issue requesting the addition of this model with its BetterTransformerimplementation.

It's not on your list, but would you complain if I did this for DebertaV2Model?

@michaelbenayoun
Copy link
Member

It is not in the list because DebertaV2 does not have a regular attention mechanism, so it is not possible to use it with BetterTransformer.

@younesbelkada
Copy link
Contributor Author

younesbelkada commented Nov 22, 2022

Yes I second what @michaelbenayoun said, please see related: huggingface/optimum#487

@hamishdickson
Copy link

makes a lot of sense - sorry I should have thought about that a bit harder before posting!

@GenVr
Copy link

GenVr commented Nov 23, 2022

I noticed that Better Transformers for the T5 model has not been implemented yet. Will it be implemented in the future (if possible)? Thanks.

@younesbelkada
Copy link
Contributor Author

Hi @GenVr
Thanks a lot for your reply! Unfortunately T5 cannot be supported because of the nature of its attention mechanism. In fact T5 uses attention bias and this is not supported for BetterTransformer
Thanks!

@RJZauner
Copy link

Hi :) I would like to work on the implementation for RemBertLayer.

What are the next steps in getting started?

Thank you!

@younesbelkada
Copy link
Contributor Author

Hey @RJZauner !
Thanks so much for your interest in helping us integrating more models for BetterTransformer !
RemBert seems to use the same attention mechanism as BERT, the only difference should be on the Embedding layer, which is fine for us! So I would say you can move ahead and start forking optimum library, create a new branch and open a draft PR. Feel free to have some inspiration from what has been done by huggingface/optimum#494 and huggingface/optimum#508 to see what exactly needs to be done ;) Ping us (myself, @michaelbenayoun & @fxmarty) whenever you feel that you need help!

@shogohida
Copy link
Contributor

Hi @younesbelkada, I would like to work on the easiest of the models mentioned above. Which one do you recommend? What I said might sound a bit weird but I want to tackle a simple one since I'm not very familiar with these models 🙏

@JuheonChu
Copy link
Contributor

Hello, I would like to tackle the implementation for TapasLayer.

May I ask you how I can start the further steps?

Thank you for your time.

@michaelbenayoun
Copy link
Member

Hi @shogohida and @JuheonChu ,

You can read this page for learning how to contribute. You can then open a PR with your code, and ask questions there, we will be glad to help!

Also @shogohida, I think they are all similar in terms of difficulty, so do not block on that, maybe choose a model with the modality the most familiar to you.

@younesbelkada
Copy link
Contributor Author

Seconding what @michaelbenayoun said, feel free to check some example PRs huggingface/optimum#508 or huggingface/optimum#494 for reference!
@shogohida , you can take RocBERT, actually it copies from Bert so the conversion will be very easy :)

@shogohida
Copy link
Contributor

Thanks guys for your replies! I will take RocBERT then!

@JuheonChu
Copy link
Contributor

Thanks @michaelbenayoun ! I will take TapasLayer !

@ravenouse
Copy link

ravenouse commented Nov 26, 2022

Hi! Thank you so much for opening this issue.

  1. I was implementing the RemBERT and had some questions. But then I noticed that @RJZauner had already been working on that. I am going to hold my work on that and I am looking forward to see RJZauner's implementations!
  2. I will work on the mBART.
  3. I also found some dead links and some points unclear on this page. How should I report and help to solve the problems I found?

@blakechi
Copy link

Hello @younesbelkada,

I would like to take DetrLayer. Nice tutorial btw 😀

@younesbelkada
Copy link
Contributor Author

Hi @blakechi !
Sure you can take it ;) let me know if you need help opening a PR!

@younesbelkada
Copy link
Contributor Author

Hi @ravenouse !
Thanks for your help! Yes you can take MBART ;)
Regarding the dead link could you open an issue at optimum?
Thanks!

@RJZauner
Copy link

Hey @RJZauner !
Thanks so much for your interest in helping us integrating more models for BetterTransformer !
RemBert seems to use the same attention mechanism as BERT, the only difference should be on the Embedding layer, which is fine for us! So I would say you can move ahead and start forking optimum library, create a new branch and open a draft PR. Feel free to have some inspiration from what has been done by huggingface/optimum#494 and huggingface/optimum#508 to see what exactly needs to be done ;) Ping us (myself, @michaelbenayoun & @fxmarty) whenever you feel that you need help!

Thank you for the info!

@lucaspct
Copy link

Hello @michaelbenayoun and @younesbelkada !

First time contributing for me :)

I would like to handle the implementation for Speech2Text

What are the first steps ? Create a PR ?

Thanks in advance.

@JuheonChu
Copy link
Contributor

Hello @michaelbenayoun and @younesbelkada !

First time contributing for me :)

I would like to handle the implementation for Speech2Text

What are the first steps ? Create a PR ?

Thanks in advance.

Hello, I am absolutely sure that they will give you a better suggestion than what I have.
I would like to share that it is good to read CONTRIBUTING.md in the transformer repository.
I read through every content very carefully and made my first contribution!

@lucaspct
Copy link

Hello @michaelbenayoun and @younesbelkada !
First time contributing for me :)
I would like to handle the implementation for Speech2Text
What are the first steps ? Create a PR ?
Thanks in advance.

Hello, I am absolutely sure that they will give you a better suggestion than what I have. I would like to share that it is good to read CONTRIBUTING.md in the transformer repository. I read through every content very carefully and made my first contribution!

Hello @JuheonChu :)

I am definitely have a look at it ! thanks

@michaelbenayoun
Copy link
Member

Hi @lucaspct,

Yes the first steps would be to read the guide explaining how to contribute to optimum.bettertransformer, and then opening a PR on Optimum, we will support you there!

@miyu386
Copy link
Contributor

miyu386 commented Nov 30, 2022

Hi @younesbelkada @michaelbenayoun I'd love to take on the RoFormer model if it isn't claimed yet. Will open a PR after I read through the guide!

@adit299
Copy link
Contributor

adit299 commented Dec 1, 2022

I would like to take a crack at the ProphetNet encoder if it has not been claimed yet

@younesbelkada
Copy link
Contributor Author

Thank you very much @miyu386 & @adit299 !
Of course yes you can give a try on that ;) feel free to start to open a PR on optimum and we'll guide you from there 💪

@younesbelkada
Copy link
Contributor Author

Hi @HVjay,
Thanks for your interest! I think Detr can be supported as well as ConditionalDetr as it seems to use classic attention mechanism - this can be also confirmed by the paper that states that the method uses classic transformer-based models. However note that only the encoder part can be converted.

Hi @mszsorondo,
Thank you for your message! Recently BLIP has been added, the model should support BetterTransformer integration (Vision + text)

@younesbelkada
Copy link
Contributor Author

Hi @HVjay ,

Actually there is already someone working on Detr, check: huggingface/optimum#684

@JanFidor
Copy link

JanFidor commented Feb 12, 2023

Hi @younesbelkada , could I pick up RoFormer ?

@dewasahu2003
Copy link
Contributor

@sushmanthreddy are you doing Detr anymore...? if doing please tell

@dewasahu2003
Copy link
Contributor

@younesbelkada Hi 👋 could I take Speech2Text 🙂

@y3sar
Copy link
Contributor

y3sar commented May 4, 2023

@younesbelkada Hello, I would love to contribute to this issue. I am new to contributing in transformers. Can you please tell me which of the model layers are vacant I would like to take one up :)

@awinml
Copy link
Contributor

awinml commented May 4, 2023

@younesbelkada I would like to work on Detr.

@mszsorondo Are you still working on it? There has not been any activity on your PR since Jan 8. I can pull from your PR and fix the failing tests.

@dewasahu2003
Copy link
Contributor

dewasahu2003 commented May 4, 2023

@awinml I actaully submitted the pr for Detr Model

  • so i forgot to mention,sorry buddy
  • you can look for other model available
  • here is the pr

@awinml
Copy link
Contributor

awinml commented May 4, 2023

@dewasahu2003 No problem.

Its always better to inform the original author and pull from their PR so they get due credit. Hence the question was aimed at @mszsorondo.

@dewasahu2003
Copy link
Contributor

@younesbelkada Hey 👋

I have submitted the pr for BetterTransformer for detr
I mentioned you there PR
From next time i would keep in mind to ask pr authors

@mobley-trent
Copy link

Hi, @younesbelkada I'd like to work on ProphetNet 😀

@mszsorondo
Copy link

@younesbelkada I would like to work on Detr.

@mszsorondo Are you still working on it? There has not been any activity on your PR since Jan 8. I can pull from your PR and fix the failing tests.

Go for it! Sorry for the delay

@Jack-Chuang
Copy link

Jack-Chuang commented May 27, 2023

Hi @younesbelkada, @michaelbenayoun, and @fxmarty,

I would like to work on Speech2TextLayer.

What are the next steps in getting started?

Thank you!

@jucamohedano
Copy link

Hi! @younesbelkada @michaelbenayoun @fxmarty
I'm interested in adding support for one of the models in the list. Although, I believe that the only model left might be Speech2TextLayer and has been claimed by @Jack-Chuang

@mobley-trent
Copy link

mobley-trent commented May 28, 2023

Hello @younesbelkada @fxmarty and @michaelbenayoun
I would like to work on the RoFormer layer since I saw that someone had already worked on ProphetNet. Has the model been claimed ?

@RoboTuan
Copy link

Hello @younesbelkada @fxmarty and @michaelbenayoun
I would love to help you with the integration of more models for BetterTransformer! I'm happy to take what is left since a lot of developers are already contributing to most of the models I think. Let me know if I can still help with something!

@mohammedElfatihSalah
Copy link

@younesbelkada is there anything I can help with in this issue?

@deepwilson
Copy link

@younesbelkada please could you update the original list of pending items?
Or has this project been stalled?

@sam-h-bean
Copy link
Contributor

Is splade possible?

@ghost
Copy link

ghost commented Oct 6, 2023

Hi @younesbelkada ! I'm new to the Open Source community but have good experience with torch, transformers, numpy, etc. can I be assigned the RoFormer task, I'd like to give it a shot!

@adeepbiswas
Copy link

Hi @younesbelkada,
Can I take up ProphetNet task? I'm new to open source and might take some time but eager to try my hands at this.

@younesbelkada
Copy link
Contributor Author

Hi everyone,
Sorry for the delay in replying to this issue and community contribution - we had some internal discussion and we decided to migrate the BetterTransformer API into transformers core by directly supporting torch.scaled_dot_product_attention in the modeling files. Check out this issue: #26557 for more details and this PR for the PoC: #26572
We will possibly open a community contribution for that to extend the support for all architectures but not sure. I will keep you all posted!
Thanks again for all your effort and amazing contribution! 🎉

@vu0607
Copy link

vu0607 commented Jan 24, 2024

Hi @younesbelkada, @michaelbenayoun, and @fxmarty
The model type vision-encoder-decoder is not yet supported to be used with BetterTransformer !!!
Hope you to support soon <3

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment