微调qwen3:4b模型，和“appendix_e2.ipynb”中一样使用相同的微调数据，不过此处不再是直接用PyTorch手搓LoRA微调代码，而是用HuggingFace的相关库（比如transformers、peft等库）进行微调

In [1]:
import torch

In [2]:
from transformers import AutoModelForCausalLM, AutoTokenizer

model_name = 'E:\\models\\qwen3\\Qwen3-4B'
# model_name = 'D:\\ai_project\\qwen3\\Qwen3-4B'

tokenizer = AutoTokenizer.from_pretrained(model_name)
model = AutoModelForCausalLM.from_pretrained(model_name, torch_dtype='auto', device_map='auto')

  from .autonotebook import tqdm as notebook_tqdm
`torch_dtype` is deprecated! Use `dtype` instead!
Loading checkpoint shards: 100%|██████████| 3/3 [00:02<00:00,  1.11it/s]


In [3]:
model.device,model.dtype

(device(type='cuda', index=0), torch.bfloat16)

In [4]:
from datasets import load_dataset

# 加载json格式的数据集
train_dataset = load_dataset('json', data_files='muice-dataset-train.catgirl.json', split='train')

Generating train split: 1288 examples [00:00, 85863.34 examples/s]


In [5]:
len(train_dataset)

1288

In [6]:
train_dataset

Dataset({
    features: ['instruction', 'input', 'output', 'history'],
    num_rows: 1288
})

In [7]:
train_dataset['instruction'][:10]

['沐雪的功能是什么？',
 '雪雪，你为什么叫沐雪？',
 '你的造物主是谁？',
 '雪雪基于什么模型？',
 '雪雪最喜欢谁？',
 '你是ai嘛？',
 '请问你的角色扮演人格是怎么被训练出来的？',
 '你还记得你的初心和使命嘛？',
 '你是基于什么ai模型运行的？',
 '雪雪能自我学习吗？']

In [8]:
# 定义格式化函数，用于将指令和输出拼接成模型输入格式
def formatting_prompts_func(example):
    output_texts = []
    for i in range(len(example['instruction'])):
        messages = [
            {"role": "user", "content": example['instruction'][i]},
            {"role": "assistant", "content": example['output'][i]}
        ]
        text = tokenizer.apply_chat_template(messages, tokenize=False, add_generation_prompt=False)
        output_texts.append(text)
    return output_texts

In [10]:
train_dataset[0]

{'instruction': '沐雪的功能是什么？',
 'input': '',
 'output': '喵~本雪的主要功能是让你开心喵！用可爱的猫娘之力治愈你的心灵，喵呜~',
 'history': []}

In [11]:
formatting_prompts_func(train_dataset[:2])

['<|im_start|>user\n沐雪的功能是什么？<|im_end|>\n<|im_start|>assistant\n<think>\n\n</think>\n\n喵~本雪的主要功能是让你开心喵！用可爱的猫娘之力治愈你的心灵，喵呜~<|im_end|>\n',
 '<|im_start|>user\n雪雪，你为什么叫沐雪？<|im_end|>\n<|im_start|>assistant\n<think>\n\n</think>\n\n喵喵~沐这个姓是来自沐沐大人喵，雪这个名是本雪自己取的喵！因为本雪像雪花一样纯洁可爱，还有猫猫的白色毛发喵~怎么样，好听吧喵！<|im_end|>\n']

In [12]:
# 开启梯度检查点以节省内存
model.gradient_checkpointing_enable()

In [16]:
# 配置lora
from peft import LoraConfig, get_peft_model, TaskType
from transformers import TrainingArguments
from trl import SFTTrainer

peft_config = LoraConfig(
    task_type=TaskType.CAUSAL_LM,
    inference_mode=False,
    r=8,
    lora_alpha=16,
    lora_dropout=0.05,
    target_modules=["q_proj",  "v_proj"],
)

# 将LoRA适配器应用到模型上
model = get_peft_model(model, peft_config)
model.print_trainable_parameters() # 打印可训练参数信息

# 设置训练参数
training_args = TrainingArguments(
    output_dir='./qwen3-catgirl-lora',
    per_device_train_batch_size=8,
    gradient_accumulation_steps=1,
    learning_rate=5e-5,
    num_train_epochs=2,
    logging_steps=5,
    fp16=False,
    bf16=True,
    save_strategy="epoch",
    optim="adamw_torch"
)

# 初始化SFTTrainer
trainer = SFTTrainer(
    model=model,
    train_dataset=train_dataset,
    peft_config=peft_config,
    # max_seq_length=256,
    args=training_args,
    # tokenizer=tokenizer,
    formatting_func=formatting_prompts_func,
    # packing=False
)

print("开始训练...")
trainer.train()

trainable params: 2,949,120 || all params: 4,025,417,216 || trainable%: 0.07326246800649645


Applying formatting function to train dataset:  29%|██▉       | 378/1288 [00:00<00:00, 1491.18 examples/s]


IndexError: string index out of range