about chatGPT three steps #2793

leizhu1989 · 2023-02-17T04:46:58Z

📚 The doc issue

hello author！
I don't know training correspondence. Maybe My understanding is wrong 。
In /applications/ChatGPT)/examples/ , as far as I think:
'Train with dummy prompt data' is first step of chatGPT,
'Train the reward model' is second step of chatGPT,
but I dont't know the three step by RLHF using Pre-training language model with reward model，and what is about 'Train with real prompt data' step ?

zhouzhou12 · 2023-02-21T07:59:11Z

同问，我也想知道如何用ColossalAI实现ChatGPT的三步训练。

Issues-translate-bot · 2023-02-21T07:59:23Z

Bot detected the issue body's language is not English, translate it automatically. 👯👭🏻🧑‍🤝‍🧑👫🧑🏿‍🤝‍🧑🏻👩🏾‍🤝‍👨🏿👬🏿

The same question, I also want to know how to use ColossalAI to implement the three-step training of ChatGPT.

cloudfool · 2023-02-21T14:22:15Z

I think 'Train with dummy prompt data' is the 3rd step of chatGPT,

Muzzypepper · 2023-02-22T02:57:39Z

我也有同样的问题，求指教

Issues-translate-bot · 2023-02-22T02:57:51Z

Bot detected the issue body's language is not English, translate it automatically. 👯👭🏻🧑‍🤝‍🧑👫🧑🏿‍🤝‍🧑🏻👩🏾‍🤝‍👨🏿👬🏿

I have the same problem, please help

Muzzypepper · 2023-02-22T03:31:55Z

I think 【train_prompts.py】 is the first step to train SFT, 【train_reward_models.py】 is the second step to train RM,
【train_dummy.py】 uses PPO training, initial_model uses the model of the first step, critic_model uses the model of the second step, so this is the RLHF of the third step. As for train_prompts.py, PPOTrainer is also used. Initial_model and critic_model can use the original pre-trained model. I don’t know if this is the case.

leizhu1989 · 2023-02-22T03:45:48Z

I think 【train_prompts.py】 is the first step to train SFT, 【train_reward_models.py】 is the second step to train RM, 【train_dummy.py】 uses PPO training, initial_model uses the model of the first step, critic_model uses the model of the second step, so this is the RLHF of the third step. As for train_prompts.py, PPOTrainer is also used. Initial_model and critic_model can use the original pre-trained model. I don’t know if this is the case.

thank you for your reply

yaoing · 2023-02-22T06:22:26Z

I think train_prompts.py is the last step. As for the first step, it doesn't seem to be provided in the code and is introduced as a pre-trained model in a later step. We can train it in a fine-tuned way.

Muzzypepper · 2023-02-22T06:30:04Z

I think train_prompts.py is the last step. As for the first step, it doesn't seem to be provided in the code and is introduced as a pre-trained model in a later step. We can train it in a fine-tuned way.

Looking at the paper, the first and second steps use prompt data, and the last step does not seem to require prompt data. I'm not sure either. In addition, do you know how to use the trained model for inference or deployment prediction?

yaoing · 2023-02-22T06:43:29Z

The train_dummy.py is copy from train_prompts.py, with only one line of code added for generating dummy data.

As we can see by the figure in the paper, the third step uses the prompt data and the gpt3 model to generate some results, and then uses reinforcement learning to learn how to choose better responses. So I think that the third step is actually doing prompt training as well.

As for the model training, I am also exploring it, and there is a lack of data at the moment.

leizhu1989 · 2023-02-22T07:19:52Z

Looking at the paper, the first and second steps use prompt data, and the last step does not seem to require prompt data. I'm not sure either. In addition, do you know how to use the trained model for inference or deployment prediction?

I think inference like GPT2, it also predicts word one by one, Then load last trained model can be inference like GPT2

leizhu1989 · 2023-02-22T07:20:22Z

As for the model training, I am also exploring it, and there is a lack of data at the moment.

ok,my qq:805650606

Muzzypepper · 2023-02-22T08:18:37Z

Thanks for your reply!

cloudfool · 2023-02-22T09:18:36Z

I think train_prompts.py is the last step. As for the first step, it doesn't seem to be provided in the code and is introduced as a pre-trained model in a later step. We can train it in a fine-tuned way.

Now that we need do the finetune(1st) step by ourselves. Do you know any finetune code that could be integrated into this project?

yaoing · 2023-02-22T11:00:20Z

I think train_prompts.py is the last step. As for the first step, it doesn't seem to be provided in the code and is introduced as a pre-trained model in a later step. We can train it in a fine-tuned way.

Now that we need do the finetune(1st) step by ourselves. Do you know any finetune code that could be integrated into this project?

Training with the Transformers framework is relatively simple, and there are plenty of tutorials on the web for fine-tuning, or you can refer to the official documentation

ht-zhou · 2023-02-24T03:01:40Z

Thank you for your feedback, and sorry about late reply.
And in /applications/ChatGPT)/examples/ ,we have 3 examples :
train_dummy -> show the vanilla way to start training step 3.
train_prompts -> use prompts to train in training step 3
trian_reward_model -> to train rm in training step 2
Because training step 1 is a simple supervised finetune progress as many other models, we don't implement it here.

cloudfool · 2023-02-24T05:34:49Z

Thank you for your feedback, and sorry about late reply. And in /applications/ChatGPT)/examples/ ,we have 3 examples : train_dummy -> show the vanilla way to start training step 3. train_prompts -> use prompts to train in training step 3 trian_reward_model -> to train rm in training step 2 Because training step 1 is a simple supervised finetune progress as many other models, we don't implement it here.

thanks! Could you pls add a vanilla infer code for chatgpt?

wqw547243068 · 2023-03-05T15:21:41Z

Could you show this simple SFT code ?

graciechen · 2023-03-07T02:44:26Z

i have the same problem，too~Could you show this simple SFT code ?

binmakeswell · 2023-04-18T11:42:26Z

Hi @graciechen @wqw547243068 @cloudfool We have updated a lot. Please check the latest code and doc.
https://github.com/hpcaitech/ColossalAI/tree/main/applications/Chat/examples
This issue was closed due to inactivity. Thanks.

leizhu1989 added the documentation Improvements or additions to documentation label Feb 17, 2023

wqw547243068 mentioned this issue Mar 5, 2023

[DOC]: How to do the supervised fine-tuning (SFT) (Stage 1 of chatGPT) ？ #2980

Closed

binmakeswell closed this as completed Apr 18, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

about chatGPT three steps #2793

about chatGPT three steps #2793

leizhu1989 commented Feb 17, 2023

zhouzhou12 commented Feb 21, 2023

Issues-translate-bot commented Feb 21, 2023

cloudfool commented Feb 21, 2023

Muzzypepper commented Feb 22, 2023

Issues-translate-bot commented Feb 22, 2023

Muzzypepper commented Feb 22, 2023

leizhu1989 commented Feb 22, 2023

yaoing commented Feb 22, 2023

Muzzypepper commented Feb 22, 2023

yaoing commented Feb 22, 2023 •

edited

Loading

leizhu1989 commented Feb 22, 2023

leizhu1989 commented Feb 22, 2023

Muzzypepper commented Feb 22, 2023

cloudfool commented Feb 22, 2023

yaoing commented Feb 22, 2023

ht-zhou commented Feb 24, 2023

cloudfool commented Feb 24, 2023

wqw547243068 commented Mar 5, 2023

graciechen commented Mar 7, 2023

binmakeswell commented Apr 18, 2023

about chatGPT three steps #2793

about chatGPT three steps #2793

Comments

leizhu1989 commented Feb 17, 2023

📚 The doc issue

zhouzhou12 commented Feb 21, 2023

Issues-translate-bot commented Feb 21, 2023

cloudfool commented Feb 21, 2023

Muzzypepper commented Feb 22, 2023

Issues-translate-bot commented Feb 22, 2023

Muzzypepper commented Feb 22, 2023

leizhu1989 commented Feb 22, 2023

yaoing commented Feb 22, 2023

Muzzypepper commented Feb 22, 2023

yaoing commented Feb 22, 2023 • edited Loading

leizhu1989 commented Feb 22, 2023

leizhu1989 commented Feb 22, 2023

Muzzypepper commented Feb 22, 2023

cloudfool commented Feb 22, 2023

yaoing commented Feb 22, 2023

ht-zhou commented Feb 24, 2023

cloudfool commented Feb 24, 2023

wqw547243068 commented Mar 5, 2023

graciechen commented Mar 7, 2023

binmakeswell commented Apr 18, 2023

yaoing commented Feb 22, 2023 •

edited

Loading