New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
CodeGen2 compatibility #202
Comments
Can't tell if the model architecture has changed – any idea? |
I don't know it's realy what you whant to know but as I can see you can use it the same way (Causal) but they added Infill way. And it seems to support a lot of more langages too.
It's probably possible with a single modification of setup.sh |
@moyix how did you come up with the calculations & code in codegen_gptj_convert.py? It seems like this conversion from CodeGen to GPT-J is the most difficult part of supporting a new model type. I was able to modify the python backend to support bigcode/starcoder. It's obviously really slow because we are just loading the model via transformers library in the python backend (are we sure that is the right way to do the python backend thing?). I got fairly far along with the faster transformer conversion but stopped when I saw the bit of math / calculations going on in |
@moyix Let me bring some light into the dark. Below explainations for Salesforce/codegen2-1B sizes
All you need to do is to make the permutation configurable. Anyhow, the Triton Server is not really performant, when comparing to Ctranslate2 [https://github.com/OpenNMT/CTranslate2]. It also can do batching, and there is no need to perform padding to certain shapes in the FastAPI proxy. (Ctranslate2-codegen2 on int8-CPU is around 4.1x faster, and takes ~4 less memory than huggingface-codegen2) I'll try to add some models for Codegen-1 and Codegen-2 for all sizes for Ctranslate2-framework, stay tuned. |
Oops, really sorry that I didn't see this before you figured it out on your own. I wrote up an article explaining how the permutation was derived here: https://gist.github.com/moyix/7896575befbe1b99162ccfec8d135566 I'll look into Ctranslate2 – are there gains over FT when using GPUs for inference? |
Not sure about FT. On my GPU:
For the smaller models (2B etc) should be more like 3x speedup, for large models the size of tensors benefit less from the c++ implementation: 16B more 1.5. Edit: I found another of your markdown posts, which helped me to derive the codegen2 conversion. |
Here are some benchmarks for codegen2 on FasterTransformers. This is with A6000s.
|
Do you have a comparison on the transformers float16 or bitsandbytes int8 version? Can‘t benchmark While you are one it, you can pull the Ctranslate2 model from here, should just take 2-3 min to install + a download, see: |
Do we have any updates on this? |
New version of CodeGen was released: https://github.com/salesforce/CodeGen2
The text was updated successfully, but these errors were encountered: