How to fine tune Codegen? #16

smith-co · 2022-06-09T05:50:50Z

I would like to fine tune the Codegen model. Can you provide any documentation in this regard?

shmuelhizmi · 2022-06-12T07:35:04Z

+1

enijkamp · 2022-06-21T20:52:33Z

The converted PyTorch models can be fine-tuned similarly to other causal LMs in HuggingFace.

See tutorials like http://reyfarhan.com/posts/easy-gpt2-finetuning-huggingface/.

TheodoreGalanos · 2022-06-25T03:00:11Z

Would you be releasing training code for the original models? Would be nice to try some on v3s (if possible).

thisisanshgupta · 2022-06-26T06:06:19Z

I think this script might help in finetuning:

https://colab.research.google.com/drive/13dZVYEOMhXhkXWfvSMVM1TTtUDrT6Aeh?usp=sharing#scrollTo=vCPohrZ-CTWu

enijkamp · 2022-06-28T20:26:29Z

@TheodoreGalanos Working on a release for the JAX coding. I trained the models on TPU-v4 and have to resolve a blocker for v3.

smith-co · 2022-06-28T22:12:02Z

@enijkamp @thisisanshgupta I am checking the link you have shared.

Still I think it would greatly help everyone if it is possible to provide fine tuning steps in the repo. 🙏

Ontopic · 2022-07-13T07:18:11Z

I for one would appreciate any code/directions needed to run things on a TPU-v4. Great work all!

enijkamp · 2022-07-13T07:26:00Z

@thisisanshgupta @Ontopic Yes, I'm working on the release of my training library for TPU-v3/v4 and will keep you posted.

tlkh · 2022-08-17T05:29:19Z

Hello @enijkamp thank you for your work. Looking forward to some fine-tuning instructions and code.

Currently, I have tried to fine-tune as if it is GPT-2, but I am running into issues where the model's quality degrades significantly.

Is there any particular way the data has to be structured for fine-tuning? Currently, I am just concatenating together the prompts and code as follows:

def xyz():
    """abc"""
    code()

def xyz():
    """abc"""
    code()

enijkamp · 2022-08-17T05:39:41Z

@smith-co @thisisanshgupta @tlkh

For torch, I wrote up a minimal example in deepspeed, which can train the 16B on a ~24 GB gpu. You would need to sanity test this, optimize the configuration, plug in the data loader, and save the weights to disk:
https://github.com/salesforce/CodeGen/blob/main/jaxformer/hf/train_deepspeed.py

For jax, the training library in is undergoing sanity checks on TPU-v3 and should be released soon.

enijkamp · 2022-09-29T20:48:10Z

@smith-co @thisisanshgupta @tlkh @Ontopic @TheodoreGalanos @shmuelhizmi A first release of the training code for TPU-v3/v4 is here:

https://github.com/salesforce/jaxformer

zhangybuaa · 2022-11-02T07:14:52Z

@enijkamp I want to fine-tune the model with my own code data, how should I build the dataset. Are there any requirements for the format of the dataset, whether the data needs to be labeled and what format should it be labeled in. Can some guidance or examples be given, thanks！

calix · 2022-11-18T17:09:11Z

@smith-co @thisisanshgupta @tlkh

For torch, I wrote up a minimal example in deepspeed, which can train the 16B on a ~24 GB gpu. You would need to sanity test this, optimize the configuration, plug in the data loader, and save the weights to disk: https://github.com/salesforce/CodeGen/blob/main/jaxformer/hf/train_deepspeed.py

For jax, the training library in is undergoing sanity checks on TPU-v3 and should be released soon.

Besides the VRAM, how much RAM would be required to train the model?

glicerico · 2023-01-28T01:09:58Z

@enijkamp , or anyone who has used jaxformer to fine-tune on TPU-v4, what is the approximate cost?

enijkamp · 2023-01-28T20:45:10Z

@glicerico Roughly speaking, cost is a function of the size of the model and data. How much data do you have? Which model do you want to fine-tune?

glicerico · 2023-01-30T04:36:33Z

@enijkamp , trying to reproduce the work by Shin and Van Durme, who used a few hundred (sentence, parse) pairs to fine tune codex for semantic parsing. I would like to do this with CodeGen. Seeing your results, I would probably want to fine tune the 16GB model.

srnsrn120 · 2023-02-01T14:03:55Z

@enijkamp : I want to finetune mono model , Can you please share dataset format for python and details steps or notebook .

watreyoung · 2023-06-09T15:00:32Z

@glicerico Roughly speaking, cost is a function of the size of the model and data. How much data do you have? Which model do you want to fine-tune?

Is there any more easier code script template withouth deep-speed to fine-tune CodeGen(350M)?
Plus: Is the data format same as other pre-trained model like CodeT5 or CodeBERT?
Looking forward to the reply.

xanderdunn mentioned this issue Oct 4, 2022

How did you train the large-sized models without out-of-memory? #27

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

How to fine tune Codegen? #16

How to fine tune Codegen? #16

smith-co commented Jun 9, 2022

shmuelhizmi commented Jun 12, 2022

enijkamp commented Jun 21, 2022

TheodoreGalanos commented Jun 25, 2022

thisisanshgupta commented Jun 26, 2022

enijkamp commented Jun 28, 2022 •

edited

smith-co commented Jun 28, 2022

Ontopic commented Jul 13, 2022 •

edited

enijkamp commented Jul 13, 2022 •

edited

tlkh commented Aug 17, 2022

enijkamp commented Aug 17, 2022 •

edited

enijkamp commented Sep 29, 2022

zhangybuaa commented Nov 2, 2022

calix commented Nov 18, 2022

glicerico commented Jan 28, 2023

enijkamp commented Jan 28, 2023 •

edited

glicerico commented Jan 30, 2023 •

edited

srnsrn120 commented Feb 1, 2023

watreyoung commented Jun 9, 2023 •

edited

How to fine tune Codegen? #16

How to fine tune Codegen? #16

Comments

smith-co commented Jun 9, 2022

shmuelhizmi commented Jun 12, 2022

enijkamp commented Jun 21, 2022

TheodoreGalanos commented Jun 25, 2022

thisisanshgupta commented Jun 26, 2022

enijkamp commented Jun 28, 2022 • edited

smith-co commented Jun 28, 2022

Ontopic commented Jul 13, 2022 • edited

enijkamp commented Jul 13, 2022 • edited

tlkh commented Aug 17, 2022

enijkamp commented Aug 17, 2022 • edited

enijkamp commented Sep 29, 2022

zhangybuaa commented Nov 2, 2022

calix commented Nov 18, 2022

glicerico commented Jan 28, 2023

enijkamp commented Jan 28, 2023 • edited

glicerico commented Jan 30, 2023 • edited

srnsrn120 commented Feb 1, 2023

watreyoung commented Jun 9, 2023 • edited

enijkamp commented Jun 28, 2022 •

edited

Ontopic commented Jul 13, 2022 •

edited

enijkamp commented Jul 13, 2022 •

edited

enijkamp commented Aug 17, 2022 •

edited

enijkamp commented Jan 28, 2023 •

edited

glicerico commented Jan 30, 2023 •

edited

watreyoung commented Jun 9, 2023 •

edited