# LLM 微调入门 (Qwen2.5-1.5B + Ruozhiba)

本 Notebook 将指导你在 Google Colab 上运行 `finetune-simple-llm` 项目。

**注意**：请确保在菜单栏中选择 **修改 (Runtime)** -> **更改运行时类型 (Change runtime type)**，并将硬件加速器设置为 **T4 GPU** (或更好)。

## 1. 安装依赖

安装项目所需的 Python 库。

In [None]:
!pip install torch transformers>=4.37.0 datasets peft accelerate bitsandbytes scipy tiktoken einops modelscope

## 2. 克隆代码仓库

从 GitHub 克隆项目代码。

In [None]:
import os

repo_url = "https://github.com/metaxiuyi/finetune-simple-llm.git"
repo_name = "finetune-simple-llm"

if not os.path.exists(repo_name):
    print(f"Cloning {repo_url}...")
    ret = os.system(f"git clone {repo_url}")
    if ret != 0:
        print("\n\033[91mError: Git clone failed!\033[0m")
        print("Most likely reason: The repository is PRIVATE.")
        print("Please go to GitHub Settings -> General -> Danger Zone -> Change visibility and set it to PUBLIC.")
    else:
        print("Clone successful!")
else:
    print(f"Repository {repo_name} already exists. Skipping clone.")

if os.path.exists(repo_name):
    os.chdir(repo_name)
    print(f"Current working directory: {os.getcwd()}")
else:
    print("\n\033[91mFailed to enter repository directory.\033[0m")

## 3. 数据准备

下载并处理弱智吧数据集。

In [None]:
!python prepare_data.py

## 4. 模型微调 (LoRA)

开始训练模型。在 T4 GPU 上，这可能需要几分钟到十几分钟。

In [None]:
!python train.py

## 5. 推理测试

加载微调后的模型进行对话。
注意：`inference.py` 是交互式的，但在 Colab 的非交互式单元格中可能运行不便。我们可以直接在 Notebook 中运行推理代码。

In [None]:
import torch
from transformers import AutoModelForCausalLM, AutoTokenizer
from peft import PeftModel

def run_inference():
    base_model_name = "Qwen/Qwen2.5-1.5B-Instruct"
    adapter_path = "qwen_ruozhiba_finetuned"
    
    print(f"Loading base model: {base_model_name}")
    tokenizer = AutoTokenizer.from_pretrained(base_model_name, trust_remote_code=True)
    base_model = AutoModelForCausalLM.from_pretrained(
        base_model_name,
        device_map="auto",
        torch_dtype=torch.float16,
        trust_remote_code=True
    )

    print(f"Loading LoRA adapter from: {adapter_path}")
    try:
        model = PeftModel.from_pretrained(base_model, adapter_path)
    except Exception as e:
        print(f"Could not load adapter: {e}")
        model = base_model

    model.eval()
    
    # 测试问题
    test_questions = [
        "钢筋混凝土是荤菜还是素菜？",
        "蓝牙耳机坏了，去医院挂牙科还是耳科？",
        "如果我吃了感冒药，是不是就不能生病了？"
    ]
    
    print("\n" + "="*50)
    print("Start Inference Testing")
    print("="*50)
    
    for question in test_questions:
        print(f"\nUser: {question}")
        messages = [
            {"role": "system", "content": "You are a helpful assistant."},
            {"role": "user", "content": question}
        ]
        
        text = tokenizer.apply_chat_template(
            messages,
            tokenize=False,
            add_generation_prompt=True
        )
        
        model_inputs = tokenizer([text], return_tensors="pt").to(model.device)
        
        with torch.no_grad():
            generated_ids = model.generate(
                **model_inputs,
                max_new_tokens=512,
                temperature=0.7,
                top_p=0.9
            )
            
        generated_ids = [
            output_ids[len(input_ids):] for input_ids, output_ids in zip(model_inputs.input_ids, generated_ids)
        ]
        
        response = tokenizer.batch_decode(generated_ids, skip_special_tokens=True)[0]
        print(f"Assistant: {response}")

run_inference()