<a href="https://colab.research.google.com/github/nana-lyj/BilingualChildEmo/blob/main/DeepSeek-R1-Distill-Qwen-7B-unsloth-bnb-4bit.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

In [4]:
# 1. 安装依赖
!pip install -q transformers datasets accelerate peft bitsandbytes unsloth

# 2. 初始化设置
from unsloth import FastLanguageModel
import torch
from transformers import TrainingArguments
from trl import SFTTrainer
from datasets import load_dataset
import pandas as pd

max_seq_length = 1024
model_name = "unsloth/DeepSeek-R1-Distill-Qwen-7B-unsloth-bnb-4bit"
dtype = torch.float16
load_in_4bit = True

In [5]:
# 3. 加载模型（使用Unsloth优化）
model, tokenizer = FastLanguageModel.from_pretrained(
    model_name = model_name,
    max_seq_length = max_seq_length,
    dtype = dtype,
    load_in_4bit = load_in_4bit,
)

==((====))==  Unsloth 2025.3.19: Fast Qwen2 patching. Transformers: 4.50.0.
   \\   /|    Tesla T4. Num GPUs = 1. Max memory: 14.741 GB. Platform: Linux.
O^O/ \_/ \    Torch: 2.6.0+cu124. CUDA: 7.5. CUDA Toolkit: 12.4. Triton: 3.2.0
\        /    Bfloat16 = FALSE. FA [Xformers = 0.0.29.post3. FA2 = False]
 "-____-"     Free license: http://github.com/unslothai/unsloth
Unsloth: Fast downloading is enabled - ignore downloading bars which are red colored!


Loading checkpoint shards:   0%|          | 0/2 [00:00<?, ?it/s]

In [6]:
# 4. 添加LoRA适配器
model = FastLanguageModel.get_peft_model(
    model,
    r = 16,
    target_modules = ["q_proj", "k_proj", "v_proj", "o_proj"],
    lora_alpha = 16,
    lora_dropout = 0,
    bias = "none",
    use_gradient_checkpointing = "unsloth",
)


Not an error, but Unsloth cannot patch MLP layers with our manual autograd engine since either LoRA adapters
are not enabled or a bias term (like in Qwen) is used.
Unsloth 2025.3.19 patched 28 layers with 28 QKV layers, 28 O layers and 0 MLP layers.


In [7]:
# 5. 加载并预处理数据
dataset = load_dataset("nanaaaa/BilingualChildEmo")

The repository for nanaaaa/BilingualChildEmo contains custom code which must be executed to correctly load the dataset. You can inspect the repository content at https://hf.co/datasets/nanaaaa/BilingualChildEmo.
You can avoid this prompt in future by passing the argument `trust_remote_code=True`.

Do you wish to run the custom code? [y/N] y


Downloading data:   0%|          | 0.00/169k [00:00<?, ?B/s]

Downloading data:   0%|          | 0.00/36.4k [00:00<?, ?B/s]

Downloading data:   0%|          | 0.00/35.6k [00:00<?, ?B/s]

Generating train split: 0 examples [00:00, ? examples/s]

Generating validation split: 0 examples [00:00, ? examples/s]

Generating test split: 0 examples [00:00, ? examples/s]

In [8]:
# 定义分类模板
classification_template = """### Instruction:
分析文本情感并从[joy, sadness, anger, fear, love]中选择最合适的类别。

### Input:
{text}

### Response:
{label}"""



In [9]:
# 格式化函数
def format_data(examples):
    label_map = {0: "joy", 1: "sadness", 2: "anger", 3: "fear", 4: "love"}
    texts = []
    for sentence, label in zip(examples["sentence"], examples["label"]):
        formatted = classification_template.format(
            text=sentence,
            label=label_map[label]
        ) + tokenizer.eos_token
        texts.append(formatted)
    return {"text": texts}

# 应用格式化
dataset = dataset.map(format_data, batched=True)

Map:   0%|          | 0/1524 [00:00<?, ? examples/s]

Map:   0%|          | 0/326 [00:00<?, ? examples/s]

Map:   0%|          | 0/327 [00:00<?, ? examples/s]

In [10]:
# 6. 训练配置
trainer = SFTTrainer(
    model = model,
    tokenizer = tokenizer,
    train_dataset = dataset["train"],
    eval_dataset = dataset["validation"],
    dataset_text_field = "text",
    max_seq_length = max_seq_length,
    args = TrainingArguments(
        output_dir = "./results",
        per_device_train_batch_size = 2,
        gradient_accumulation_steps = 4,
        warmup_ratio = 0.1,
        num_train_epochs = 3,
        learning_rate = 2e-5,
        fp16 = not torch.cuda.is_bf16_supported(),
        bf16 = torch.cuda.is_bf16_supported(),
        logging_steps = 10,
        optim = "adamw_8bit",
        weight_decay = 0.01,
        lr_scheduler_type = "linear",
        seed = 3407,
        evaluation_strategy = "steps",
        eval_steps = 200,
        save_strategy = "steps",
        save_steps = 200,
    ),
)



Unsloth: Tokenizing ["text"] (num_proc=2):   0%|          | 0/1524 [00:00<?, ? examples/s]

Unsloth: Tokenizing ["text"] (num_proc=2):   0%|          | 0/326 [00:00<?, ? examples/s]

In [11]:
trainer.train()

==((====))==  Unsloth - 2x faster free finetuning | Num GPUs used = 1
   \\   /|    Num examples = 1,524 | Num Epochs = 3 | Total steps = 570
O^O/ \_/ \    Batch size per device = 2 | Gradient accumulation steps = 4
\        /    Data Parallel GPUs = 1 | Total batch size (2 x 4 x 1) = 8
 "-____-"     Trainable parameters = 10,092,544/7,000,000,000 (0.14% trained)
[34m[1mwandb[0m: Using wandb-core as the SDK backend.  Please refer to https://wandb.me/wandb-core for more information.


<IPython.core.display.Javascript object>

[34m[1mwandb[0m: Logging into wandb.ai. (Learn how to deploy a W&B server locally: https://wandb.me/wandb-server)
[34m[1mwandb[0m: You can find your API key in your browser here: https://wandb.ai/authorize
wandb: Paste an API key from your profile and hit enter:

 ··········


[34m[1mwandb[0m: No netrc file found, creating one.
[34m[1mwandb[0m: Appending key for api.wandb.ai to your netrc file: /root/.netrc
[34m[1mwandb[0m: Currently logged in as: [33myanjin-liu[0m ([33myanjin-liu-cityu[0m) to [32mhttps://api.wandb.ai[0m. Use [1m`wandb login --relogin`[0m to force relogin


Step,Training Loss,Validation Loss
200,2.3095,2.339494
400,2.3111,2.202253


Unsloth: Will smartly offload gradients to save VRAM!


Unsloth: Not an error, but Qwen2ForCausalLM does not accept `num_items_in_batch`.
Using gradient accumulation will be very slightly less accurate.
Read more on gradient accumulation issues here: https://unsloth.ai/blog/gradient


TrainOutput(global_step=570, training_loss=2.7246931444134628, metrics={'train_runtime': 1665.4602, 'train_samples_per_second': 2.745, 'train_steps_per_second': 0.342, 'total_flos': 1.307470405420032e+16, 'train_loss': 2.7246931444134628})

In [12]:
# 8. 保存模型
model.save_pretrained("classifier_model")
tokenizer.save_pretrained("classifier_model")

('classifier_model/tokenizer_config.json',
 'classifier_model/special_tokens_map.json',
 'classifier_model/tokenizer.json')

In [13]:
# 9. 推理函数
def predict_emotion(text):
    prompt = classification_template.format(text=text, label="").strip()
    inputs = tokenizer(prompt, return_tensors="pt", truncation=True).to("cuda")

    outputs = model.generate(
        **inputs,
        max_new_tokens=10,
        temperature=0.01,
        do_sample=False,
        pad_token_id=tokenizer.eos_token_id,
    )

    generated = tokenizer.decode(outputs[0], skip_special_tokens=True)
    response = generated.split("### Response:")[-1].strip().lower()

    label_map = {"joy": 0, "sadness": 1, "anger": 2, "fear": 3, "love": 4}
    return label_map.get(response, -1)  # 返回-1表示预测失败


In [14]:
from sklearn.metrics import classification_report, f1_score
import json

In [15]:
# 10. 在测试集上评估模型性能
def evaluate_on_test_set():
    test_data = dataset["test"]
    test_sentences = test_data["sentence"]
    true_labels = test_data["label"]

    # 批量预测
    predicted_labels = []
    for sentence in test_sentences:
        predicted_labels.append(predict_emotion(sentence))
        # 过滤掉预测失败的样本
    valid_indices = [i for i, label in enumerate(predicted_labels) if label != -1]
    filtered_true = [true_labels[i] for i in valid_indices]
    filtered_pred = [predicted_labels[i] for i in valid_indices]

    if len(filtered_true) == 0:
        print("所有预测都失败了，无法计算评估指标")
        return

    # 计算评估指标
    print("\n测试集评估结果:")
    print(classification_report(
        filtered_true,
        filtered_pred,
        target_names=["joy", "sadness", "anger", "fear", "love"],
        digits=4
    ))

    # 计算F1分数
    f1_micro = f1_score(filtered_true, filtered_pred, average='micro')
    f1_macro = f1_score(filtered_true, filtered_pred, average='macro')
    f1_weighted = f1_score(filtered_true, filtered_pred, average='weighted')

    print(f"\nF1 Micro: {f1_micro:.4f}")
    print(f"F1 Macro: {f1_macro:.4f}")
    print(f"F1 Weighted: {f1_weighted:.4f}")

    # 保存评估结果
    results = {
        "f1_micro": f1_micro,
        "f1_macro": f1_macro,
        "f1_weighted": f1_weighted,
        "classification_report": classification_report(
            filtered_true,
            filtered_pred,
            target_names=["joy", "sadness", "anger", "fear", "love"],
            output_dict=True
        )
    }

    with open("evaluation_results.json", "w") as f:
        json.dump(results, f, indent=4)

    print("\n评估结果已保存到 evaluation_results.json")

# 执行评估
evaluate_on_test_set()



测试集评估结果:
              precision    recall  f1-score   support

         joy     0.6071    0.8831    0.7196        77
     sadness     0.6250    0.6164    0.6207        73
       anger     0.5000    0.2895    0.3667        38
        fear     0.4444    0.4348    0.4396        46
        love     0.9000    0.3333    0.4865        27

    accuracy                         0.5862       261
   macro avg     0.6153    0.5114    0.5266       261
weighted avg     0.5982    0.5862    0.5671       261


F1 Micro: 0.5862
F1 Macro: 0.5266
F1 Weighted: 0.5671

评估结果已保存到 evaluation_results.json
