Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add support codegen2 #209

Closed
wants to merge 3 commits into from

Conversation

michaelfeil
Copy link

@michaelfeil michaelfeil commented May 22, 2023


1. General Description

This PR adds support to convert models from Codegen-2 to GPT-J.
It does not modify the functionality of the existing converter. The PR seems quite small, but took me hours of debugging to figure out that the architecture of codegen2 is actually fully compatible for the large (7, 16B versions).
For the smaller models, a different permutation order is required, because the smaller models (1, 3_7B versions)were trained on another TPU setting.

2. Changes proposed in this PR:

  1. CodeGen2 compatibility #202

Resolves: #202

3. How to evaluate:

  1. Describe how to evaluate such that it may be reproduced by the reviewer (s).
    1.

  2. Self assessment:

@michaelfeil michaelfeil requested a review from moyix as a code owner May 22, 2023 18:55
@moyix
Copy link
Collaborator

moyix commented May 26, 2023

Oh this is wonderful! I didn't realize CodeGen2 was still a standard GPT-J model! I'll try to test this out as soon as possible and get it merged :)

@michaelfeil
Copy link
Author

Can it be merged? :)

@liasece
Copy link

liasece commented Jun 30, 2023

Great job. Has anyone experienced the Codegen-2? Is it worth upgrading?

@pgarba
Copy link

pgarba commented Jul 7, 2023

Any update ? Would really like to try this out

@pai4451
Copy link

pai4451 commented Jul 7, 2023

Also interested if anyone experienced the Codegen-2. Is it worth upgrading?
The CodeGen-2 model card also emphasizes its ability to do infilling. Is it possible to do it on the fauxpilot fastertransformer backend?
It seems that Salesforce just introduce another Codegen family - CodeGen2.5

@richjohnson-wwt
Copy link

Is it possible to test this PR by pulling the branch, following the build steps, and editing the setup.sh to add the codegen2.5 model as an option?

@michaelfeil
Copy link
Author

CodeGen 2.5 is based on LLama architecture, no longer on CodeGen architecture.

@pai4451
Copy link

pai4451 commented Aug 2, 2023

@michaelfeil Hi, I read the blog of CodeGen 2.5 and Salesforce indeed serve it and evaluate the latency on NVIDIA triton server. Do you know how to serve CodeGen 2.5 with triton? I feel like there’s other ways to make the CodeGen based model supported instead of converting to GPT-J.

@michaelfeil
Copy link
Author

I would back for tutorials on how to run llama-2-7b on triton, and start from there.

@Hoekz Hoekz mentioned this pull request Aug 21, 2023
2 tasks
@michaelfeil
Copy link
Author

@Hoekz Should i close this one in favor of #230 ?

@fdegier fdegier changed the base branch from main to dev February 7, 2024 13:28
@fdegier
Copy link
Collaborator

fdegier commented Feb 7, 2024

Closed in favor of #230

@fdegier fdegier closed this Feb 7, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

CodeGen2 compatibility
7 participants