ChatGLM-LoRA-RLHF-PyTorch

a full pipeline to finetune ChatGLM LLM with LoRA and RLHF on consumer hardware

Environment Setup

穷人卡：2080Ti 12G
torch==2.0.0
cuda==11.8

Todo List

Run

Data Process

转化alpaca数据集为jsonl

python cover_alpaca2jsonl.py --data_path data/alpaca_data.json --save_path data/alpaca_data.jsonl

tokenization

python tokenize_dataset_rows.py --jsonl_path data/alpaca_data.jsonl --save_path data/alpaca --max_seq_length 200 --skip_overlength True

Supervised Finetune

must use latest peft version

pip uninstall peft -y
pip install git+https://github.com/huggingface/peft.git  # 最新版本 >=0.3.0.dev0

python supervised_finetune.py --dataset_path data/alpaca --lora_rank 8 --per_device_train_batch_size 1 --gradient_accumulation_steps 32 --save_steps 200 --save_total_limit 3  --learning_rate 1e-4 --fp16 --remove_unused_columns false --logging_steps 10 --output_dir output

Merge PEFT adapter into Model

pip uninstall peft -y
pip install peft==0.2.0  # 0.3.0.dev0 raise many errors
python merge_peft_adapter.py --model_name ./output

Reward Modeling

python train_reward_model.py --model_name 'THUDM/chatglm-6b' --gradient_accumulation_steps 32 --per_device_train_batch_size 1 --train_subset 100 --eval_subset 10 --local_rank 0 --bf16 False

merge reward model into Model

python merge_peft_adapter.py --model_name ./reward_model_chatglm-6b

Notes

PEFT的版本，目前从git上安装的是 0.3.0.dev0 版本，在merge_peft_adapter的时候有问题，需要切换到peft==0.2.0 (0.3.0.dev0 没有 _get_submodules()这个函数)
因为huggingface的transformer暂时不支持ChatGLM的封装接口，需要自己从ChatGLM的hub上下载代码放到本地目录 models 下面，供后续使用
同样，ChatGLM的model代码是自己的，和huggingface没合并，所以在调用加载的时候，都主要加上参数 trust_remote_code=True
训练 Reward Model 需要执行 SeqCLS 这个Task： huggingface 的 transformer 提供 "AutoModelForSequenceClassification" 这个类。但是 ChatGLM 只有 "ChatGLMForConditionalGeneration" 这个类。
自己实现 Reward model, reward_model.py，完成奖励模型的训练过程

Reference

data preprocess: cover_alpaca2jsonl.py 和 tokenize_dataset_rows.py 来自项目 ChatGLM-Tuning

requirements 主要是按照 alpaca-lora 来配环境。

Star-History

Donation

If this project help you reduce time to develop, you can give me a cup of coffee :)

AliPay(支付宝)

WechatPay(微信)

License

MIT © Kun

Name		Name	Last commit message	Last commit date
Latest commit History 17 Commits
data		data
datasets		datasets
misc		misc
models		models
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
cover_alpaca2jsonl.py		cover_alpaca2jsonl.py
merge_peft_adapter.py		merge_peft_adapter.py
reward_model.py		reward_model.py
supervised_finetune.py		supervised_finetune.py
tokenize_dataset_rows.py		tokenize_dataset_rows.py
train_reward_model.py		train_reward_model.py

License

jackaduma/ChatGLM-LoRA-RLHF-PyTorch

Folders and files

Latest commit

History

Repository files navigation

ChatGLM-LoRA-RLHF-PyTorch

Table of Contents

Environment Setup

Todo List

Run

Data Process

Supervised Finetune

Merge PEFT adapter into Model

Reward Modeling

merge reward model into Model

Notes

Reference

Star-History

Donation

License

About

Topics

Resources

License

Stars

Watchers

Forks

Languages