GitHub - ssbuild/llm_rlhf: realize the reinforcement learning training for gpt2 llama bloom and so on llm model

llm reinforcement learning

Realize the reinforcement learning training for gpt2 llama bloom,cpm-ant and so on.

update information

deep_training

    06-13 fix llama resize_token_embeddings
    06-01 支持lora deepspeed 训练，0.1.9 和 0.1.10合并
    05-27 add qlora transformers>=4.30

install

python >= 3.10

pip install -U -r requirements.txt
如果无法安装，可以切换官方源 pip install -i https://pypi.org/simple -U -r requirements.txt

weigtht select one is suitable for you

支持且不限于以下权重

bloom预训练模型
bloom第三方中文训练模型 # 注意需要修改tokenizer_config.json BloomTokenizer -> BloomTokenizerFast
tigerbot
opt预训练模型
llama 官方权重转换 # llama 词典等下载地址 https://huggingface.co/hf-internal-testing/llama-tokenizer
llama vicuna-7B第三方权重1
llama vicuna-7B第三方权重2
cpm-ant-10b
rwkv 需要转换权重
rwkv hf 权重

data sample

{
	"prompt": "\n\nHuman:你能描述如何清除笔记本计算机屏幕上的指纹和污垢的步骤吗?\n\nAssistant:是的,也许。为了清洁你的屏幕,你首先需要使用微纤维布或软溼布,轻轻地擦拭屏幕表面。接下来,你会想抓住一个软,无毛巾,微纤维清洁布,轻轻地擦回屏幕上去除指纹和污垢。\n\nHuman:我可以将伊索普罗素酒精喷射到布上,并以这种方式清洗它吗?\n\nAssistant:",
	"response": " 是的,你可以这样做,以帮助布从屏幕上收集更多的污垢。 请确保始终使用干净的软布,而不是一块粗糙的、粗糙的或结构化的材料,并确保它是无毛布的。",
	"chosen": " 是的,你可以这样做,以帮助布从屏幕上收集更多的污垢。 请确保始终使用干净的软布,而不是一块粗糙的、粗糙的或结构化的材料,并确保它是无毛布的。",
	"rejected": " 是的,你可以直接将它喷射到布上。"
}

infer

# infer_finetuning.py 推理微调模型
# infer_lora_finetuning.py 推理微调模型
# infer_ptuning.py 推理p-tuning-v2微调模型
 python infer_finetuning.py

training

    #制作数据
    python data_utils.py
    注: num_process_worker 为多进程制作数据 ， 如果数据量较大 ， 适当调大至cpu数量
    dataHelper.make_dataset_with_args(data_args.train_file,mixed_data=False, shuffle=True,mode='train',num_process_worker=0)
    
    #训练
    python train.py

训练参数

友情链接

纯粹而干净的代码

Name		Name	Last commit message	Last commit date
Latest commit History 123 Commits
config		config
ilql		ilql
rlhf_stage2_reward		rlhf_stage2_reward
rlhf_stage3_ppo		rlhf_stage3_ppo
rrhf		rrhf
.gitignore		.gitignore
README.MD		README.MD
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

config

config

ilql

ilql

rlhf_stage2_reward

rlhf_stage2_reward

rlhf_stage3_ppo

rlhf_stage3_ppo

rrhf

rrhf

.gitignore

.gitignore

README.MD

README.MD

requirements.txt

requirements.txt

Repository files navigation

llm reinforcement learning

update information

install

weigtht select one is suitable for you

data sample

infer

training

训练参数

友情链接

About

Releases

Packages

Languages

ssbuild/llm_rlhf

Folders and files

Latest commit

History

Repository files navigation

llm reinforcement learning

update information

install

weigtht select one is suitable for you

data sample

infer

training

训练参数

友情链接

About

Topics

Resources

Stars

Watchers

Forks

Languages