CodeGen2 compatibility #202

gilsdav · 2023-05-11T15:13:48Z

New version of CodeGen was released: https://github.com/salesforce/CodeGen2

moyix · 2023-05-11T17:03:09Z

Can't tell if the model architecture has changed – any idea?

gilsdav · 2023-05-11T17:17:27Z

I don't know it's realy what you whant to know but as I can see you can use it the same way (Causal) but they added Infill way. And it seems to support a lot of more langages too.

For infill sampling, we introduce three new special token types:

<mask_N>: N-th span to be masked. In practice, use <mask_1> to where you want to sample infill.

<sep>: Seperator token between the suffix and the infilled sample. See below.

<eom>: "End-Of-Mask" token that model will output at the end of infilling. You may use this token to truncate the output.

It's probably possible with a single modification of setup.sh

spew · 2023-05-12T17:32:56Z

@moyix how did you come up with the calculations & code in codegen_gptj_convert.py? It seems like this conversion from CodeGen to GPT-J is the most difficult part of supporting a new model type.

I was able to modify the python backend to support bigcode/starcoder. It's obviously really slow because we are just loading the model via transformers library in the python backend (are we sure that is the right way to do the python backend thing?). I got fairly far along with the faster transformer conversion but stopped when I saw the bit of math / calculations going on in codegen_gptj_convert. I haven't tried just removing it and seeing if the conversion from GPTBigCode -> GPT-J is simpler than CodeGen -> GPT-J.

michaelfeil · 2023-05-21T16:49:38Z

@moyix Let me bring some light into the dark.
After comparing which configuration Codegen-2 (trust_remote_code=True) uses over Codegen-1, I found one obvious hyperparameter with mp_num = 8 over mp_num=4.
After debugging some hours, I reverse engineered that the following tweak in the permutation order should do the job.

Below explainations for Salesforce/codegen2-1B sizes

1B: qkv_proj has shape [6144, other]
6144 contains for 8*mp_num all 2d vectors for Q,K, and V, which now need to go to 8, 256 shape
toy example: qkv_proj were just np.arange, if would go through this transformation
qkv_proj[:,0] = np.arange(0,6144)
1B: qw has shape [1,2,8,256]
qw = 
tensor([[   0.,  768., 1536., 2304., 3072., 3840., 4608., 5376.])
        [...254 rows missing]
        [ 255., 1023., 1791., 2559., 3327., 4095., 4863., 5631.]])
value gets
qv =  tensor(
        [[ 256., 1024., 1792., 2560., 3328., 4096., 4864., 5632.])
        [...254 rows missing]
        [ 511., 1279., 2047., 2815., 3583., 4351., 5119., 5887.]])
rest goes to kv ..
generalized vector for permutation is therefore:
```python
mp_num =8
mp_num = codegen_2 = 8
base_permutation = np.arange(0,mp_num*3).reshape(-1,3).T.flatten().tolist()
base_permutation == [0, 3, 6, 9, 12, 15, 18, 21,
                                    1, 4, 7, 10, 13, 16, 19, 22,
                                    2, 5, 8, 11, 14, 17, 20, 23]

All you need to do is to make the permutation configurable.

Anyhow, the Triton Server is not really performant, when comparing to Ctranslate2 [https://github.com/OpenNMT/CTranslate2]. It also can do batching, and there is no need to perform padding to certain shapes in the FastAPI proxy. (Ctranslate2-codegen2 on int8-CPU is around 4.1x faster, and takes ~4 less memory than huggingface-codegen2)

I'll try to add some models for Codegen-1 and Codegen-2 for all sizes for Ctranslate2-framework, stay tuned.
https://github.com/OpenNMT/CTranslate2/pull/1230/files

moyix · 2023-05-29T18:37:59Z

Oops, really sorry that I didn't see this before you figured it out on your own. I wrote up an article explaining how the permutation was derived here:

https://gist.github.com/moyix/7896575befbe1b99162ccfec8d135566

I'll look into Ctranslate2 – are there gains over FT when using GPUs for inference?

michaelfeil · 2023-05-30T17:54:25Z

Not sure about FT. On my GPU:
task: input 16 tokens -> generate ten times exactly 64 tokens
timings

ct2 codegen2-7B on float16 =9.55seconds (67 tokens/s, 1x GPU, 7gb Vram)
huggingface codegen2-7B on int8 =17.06seconds (37.5 tokens/s, 1x GPU, 7gb Vram)

For the smaller models (2B etc) should be more like 3x speedup, for large models the size of tensors benefit less from the c++ implementation: 16B more 1.5.
Most importantly, only ct2 only takes half the memory. I am not sure about the speeds of FT (I think you used to write ~2x speedup).

Edit: I found another of your markdown posts, which helped me to derive the codegen2 conversion.

moyix · 2023-06-02T21:08:24Z

Here are some benchmarks for codegen2 on FasterTransformers. This is with A6000s.

codegen2-1B   on 4 GPUs generated 16+64 tokens in  0.18s ~349.40 tokens/sec
codegen2-1B   on 2 GPUs generated 16+64 tokens in  0.19s ~337.68 tokens/sec
codegen2-1B   on 1 GPU  generated 16+64 tokens in  0.25s ~253.55 tokens/sec
codegen2-3_7B on 4 GPUs generated 16+64 tokens in  0.29s ~220.00 tokens/sec
codegen2-3_7B on 2 GPUs generated 16+64 tokens in  0.43s ~148.69 tokens/sec
codegen2-3_7B on 1 GPU  generated 16+64 tokens in  0.73s ~ 87.47 tokens/sec
codegen2-7B   on 4 GPUs generated 16+64 tokens in  0.51s ~125.93 tokens/sec
codegen2-7B   on 2 GPUs generated 16+64 tokens in  0.80s ~ 80.26 tokens/sec
codegen2-7B   on 1 GPU  generated 16+64 tokens in  1.38s ~ 46.26 tokens/sec
codegen2-16B  on 4 GPUs generated 16+64 tokens in  0.99s ~ 64.97 tokens/sec
codegen2-16B  on 2 GPUs generated 16+64 tokens in  1.68s ~ 38.13 tokens/sec
codegen2-16B  on 1 GPU  generated 16+64 tokens in  3.10s ~ 20.61 tokens/sec

michaelfeil · 2023-06-02T22:18:33Z

Do you have a comparison on the transformers float16 or bitsandbytes int8 version? Can‘t benchmark

While you are one it, you can pull the Ctranslate2 model from here, should just take 2-3 min to install + a download, see:
https://huggingface.co/michaelfeil/ct2fast-codegen2-7B

batuhanfaik · 2023-08-16T07:49:34Z

Do we have any updates on this?

michaelfeil mentioned this issue May 22, 2023

Add support codegen2 #209

Closed

3 tasks

Hoekz mentioned this issue Aug 21, 2023

Support for CodeGen2 #230

Merged

2 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

CodeGen2 compatibility #202

CodeGen2 compatibility #202

gilsdav commented May 11, 2023

moyix commented May 11, 2023

gilsdav commented May 11, 2023 •

edited

spew commented May 12, 2023 •

edited

michaelfeil commented May 21, 2023 •

edited

moyix commented May 29, 2023

michaelfeil commented May 30, 2023 •

edited

moyix commented Jun 2, 2023

michaelfeil commented Jun 2, 2023 •

edited

batuhanfaik commented Aug 16, 2023

CodeGen2 compatibility #202

CodeGen2 compatibility #202

Comments

gilsdav commented May 11, 2023

moyix commented May 11, 2023

gilsdav commented May 11, 2023 • edited

spew commented May 12, 2023 • edited

michaelfeil commented May 21, 2023 • edited

moyix commented May 29, 2023

michaelfeil commented May 30, 2023 • edited

moyix commented Jun 2, 2023

michaelfeil commented Jun 2, 2023 • edited

batuhanfaik commented Aug 16, 2023

gilsdav commented May 11, 2023 •

edited

spew commented May 12, 2023 •

edited

michaelfeil commented May 21, 2023 •

edited

michaelfeil commented May 30, 2023 •

edited

michaelfeil commented Jun 2, 2023 •

edited