# 模型全参数微调，sft, 因为训练的是causalLM, 因此不用划分数据集

## Step1 导入相关包

In [1]:
from datasets import Dataset
from transformers import AutoTokenizer, AutoModelForCausalLM, DataCollatorForSeq2Seq, TrainingArguments, Trainer

  from .autonotebook import tqdm as notebook_tqdm


## Step2 加载数据集

In [3]:
ds = Dataset.load_from_disk("./data/alpaca_data_zh/")
ds

Dataset({
    features: ['output', 'input', 'instruction'],
    num_rows: 26858
})

In [5]:
print(ds[0])

{'output': '以下是保持健康的三个提示：\n\n1. 保持身体活动。每天做适当的身体运动，如散步、跑步或游泳，能促进心血管健康，增强肌肉力量，并有助于减少体重。\n\n2. 均衡饮食。每天食用新鲜的蔬菜、水果、全谷物和脂肪含量低的蛋白质食物，避免高糖、高脂肪和加工食品，以保持健康的饮食习惯。\n\n3. 睡眠充足。睡眠对人体健康至关重要，成年人每天应保证 7-8 小时的睡眠。良好的睡眠有助于减轻压力，促进身体恢复，并提高注意力和记忆力。', 'input': '', 'instruction': '保持健康的三个提示。'}


## Step3 数据集预处理

In [6]:
tokenizer = AutoTokenizer.from_pretrained("D:/pretrained_model/models--Langboat--bloom-389m-zh")
tokenizer

BloomTokenizerFast(name_or_path='D:/pretrained_model/models--Langboat--bloom-389m-zh', vocab_size=42437, model_max_length=1000000000000000019884624838656, is_fast=True, padding_side='left', truncation_side='right', special_tokens={'bos_token': '<s>', 'eos_token': '</s>', 'unk_token': '<unk>', 'pad_token': '<pad>'}, clean_up_tokenization_spaces=False)

In [18]:
def process_func(example):
    MAX_LENGTH = 256
    input_ids, attention_mask, labels = [], [], []
    instruction = tokenizer("\n".join(["Human: " + example["instruction"], example["input"]]).strip() + "\n\nAssistant: ")
    response = tokenizer(example["output"] + tokenizer.eos_token)
    input_ids = instruction["input_ids"] + response["input_ids"]
    attention_mask = instruction["attention_mask"] + response["attention_mask"]
    labels = [-100] * len(instruction["input_ids"]) + response["input_ids"]
    if len(input_ids) > MAX_LENGTH:
        input_ids = input_ids[:MAX_LENGTH]
        attention_mask = attention_mask[:MAX_LENGTH]
        labels = labels[:MAX_LENGTH]
    return {
        "input_ids": input_ids,
        "attention_mask": attention_mask,
        "labels": labels
    }

In [19]:
tokenized_ds = ds.map(process_func, remove_columns=ds.column_names)
tokenized_ds

Map:   0%|          | 0/26858 [00:00<?, ? examples/s]

Map: 100%|██████████| 26858/26858 [00:11<00:00, 2245.00 examples/s]


Dataset({
    features: ['input_ids', 'attention_mask', 'labels'],
    num_rows: 26858
})

In [21]:
print(tokenizer.decode(tokenized_ds[1]['input_ids']))

Human: 解释为什么以下分数等同于1/4
输入：4/16

Assistant: 4/16等于1/4是因为我们可以约分分子分母都除以他们的最大公约数4，得到（4÷4）/ (16÷4）=1/4。分数的约分是用分子和分母除以相同的非零整数，来表示分数的一个相同的值，这因为分数实际上表示了分子除以分母，所以即使两个数同时除以同一个非零整数，分数的值也不会改变。所以4/16 和1/4是两种不同的书写形式，但它们的值相等。</s>


In [23]:
# tokenizer.decode(tokenized_ds[1]['labels']) # 不行的有负号
print(tokenizer.decode(list(filter(lambda x: x!=-100, tokenized_ds[1]['labels']))))

4/16等于1/4是因为我们可以约分分子分母都除以他们的最大公约数4，得到（4÷4）/ (16÷4）=1/4。分数的约分是用分子和分母除以相同的非零整数，来表示分数的一个相同的值，这因为分数实际上表示了分子除以分母，所以即使两个数同时除以同一个非零整数，分数的值也不会改变。所以4/16 和1/4是两种不同的书写形式，但它们的值相等。</s>


## Step4 创建模型

In [25]:
model = AutoModelForCausalLM.from_pretrained("D:/pretrained_model/models--Langboat--bloom-389m-zh")

  return torch.load(checkpoint_file, map_location="cpu")


## Step5 配置训练参数

In [26]:
args = TrainingArguments(
    output_dir="./chatbot",
    per_device_train_batch_size=2,
    gradient_accumulation_steps=8,
    logging_steps=10,
    num_train_epochs=1
)

## Step6 创建训练器

In [27]:
trainer = Trainer(
    model=model,
    args=args,
    tokenizer=tokenizer,
    train_dataset=tokenized_ds,
    data_collator=DataCollatorForSeq2Seq(tokenizer=tokenizer, padding=True)
)

## Step7 模型训练

In [28]:
trainer.train()

  0%|          | 0/1678 [00:00<?, ?it/s]You're using a BloomTokenizerFast tokenizer. Please note that with a fast tokenizer, using the `__call__` method is faster than using a method to encode the text followed by a call to the `pad` method to get a padded encoding.
  1%|          | 10/1678 [00:13<35:34,  1.28s/it]

{'loss': 3.279, 'learning_rate': 4.9702026221692494e-05, 'epoch': 0.01}


  1%|          | 20/1678 [00:27<38:18,  1.39s/it]

{'loss': 3.2787, 'learning_rate': 4.9404052443384986e-05, 'epoch': 0.01}


  2%|▏         | 30/1678 [00:42<42:26,  1.55s/it]

{'loss': 3.533, 'learning_rate': 4.910607866507748e-05, 'epoch': 0.02}


  2%|▏         | 40/1678 [00:58<41:49,  1.53s/it]

{'loss': 3.4023, 'learning_rate': 4.880810488676997e-05, 'epoch': 0.02}


  3%|▎         | 50/1678 [01:13<39:25,  1.45s/it]

{'loss': 3.1344, 'learning_rate': 4.851013110846246e-05, 'epoch': 0.03}


  4%|▎         | 60/1678 [01:27<37:21,  1.39s/it]

{'loss': 3.1611, 'learning_rate': 4.821215733015495e-05, 'epoch': 0.04}


  4%|▍         | 70/1678 [01:43<41:31,  1.55s/it]

{'loss': 3.0368, 'learning_rate': 4.791418355184744e-05, 'epoch': 0.04}


  5%|▍         | 80/1678 [02:00<42:35,  1.60s/it]

{'loss': 3.0464, 'learning_rate': 4.7616209773539935e-05, 'epoch': 0.05}


  5%|▌         | 90/1678 [02:15<41:16,  1.56s/it]

{'loss': 2.9177, 'learning_rate': 4.7318235995232426e-05, 'epoch': 0.05}


  6%|▌         | 100/1678 [02:31<43:45,  1.66s/it]

{'loss': 2.9482, 'learning_rate': 4.702026221692492e-05, 'epoch': 0.06}


  7%|▋         | 110/1678 [02:48<43:26,  1.66s/it]

{'loss': 3.0042, 'learning_rate': 4.672228843861741e-05, 'epoch': 0.07}


  7%|▋         | 120/1678 [03:02<38:53,  1.50s/it]

{'loss': 3.141, 'learning_rate': 4.6424314660309894e-05, 'epoch': 0.07}


  8%|▊         | 130/1678 [03:18<40:52,  1.58s/it]

{'loss': 3.1005, 'learning_rate': 4.6126340882002386e-05, 'epoch': 0.08}


  8%|▊         | 140/1678 [03:33<40:35,  1.58s/it]

{'loss': 3.0337, 'learning_rate': 4.582836710369488e-05, 'epoch': 0.08}


  9%|▉         | 150/1678 [03:49<38:30,  1.51s/it]

{'loss': 2.9104, 'learning_rate': 4.553039332538737e-05, 'epoch': 0.09}


 10%|▉         | 160/1678 [04:05<41:02,  1.62s/it]

{'loss': 3.0624, 'learning_rate': 4.523241954707986e-05, 'epoch': 0.1}


 10%|█         | 170/1678 [04:21<40:36,  1.62s/it]

{'loss': 3.1202, 'learning_rate': 4.4934445768772345e-05, 'epoch': 0.1}


 11%|█         | 180/1678 [04:37<39:28,  1.58s/it]

{'loss': 3.0363, 'learning_rate': 4.463647199046484e-05, 'epoch': 0.11}


 11%|█▏        | 190/1678 [04:53<38:54,  1.57s/it]

{'loss': 2.9799, 'learning_rate': 4.433849821215733e-05, 'epoch': 0.11}


 12%|█▏        | 200/1678 [05:08<35:33,  1.44s/it]

{'loss': 2.8807, 'learning_rate': 4.404052443384982e-05, 'epoch': 0.12}


 13%|█▎        | 210/1678 [05:23<33:51,  1.38s/it]

{'loss': 3.0459, 'learning_rate': 4.374255065554231e-05, 'epoch': 0.13}


 13%|█▎        | 220/1678 [05:37<36:28,  1.50s/it]

{'loss': 2.933, 'learning_rate': 4.34445768772348e-05, 'epoch': 0.13}


 14%|█▎        | 230/1678 [05:51<33:22,  1.38s/it]

{'loss': 2.962, 'learning_rate': 4.3146603098927295e-05, 'epoch': 0.14}


 14%|█▍        | 240/1678 [06:05<33:05,  1.38s/it]

{'loss': 2.8912, 'learning_rate': 4.2848629320619786e-05, 'epoch': 0.14}


 15%|█▍        | 250/1678 [06:20<34:15,  1.44s/it]

{'loss': 2.8362, 'learning_rate': 4.255065554231228e-05, 'epoch': 0.15}


 15%|█▌        | 260/1678 [06:34<33:23,  1.41s/it]

{'loss': 2.8152, 'learning_rate': 4.225268176400477e-05, 'epoch': 0.15}


 16%|█▌        | 270/1678 [06:48<32:46,  1.40s/it]

{'loss': 2.9008, 'learning_rate': 4.195470798569726e-05, 'epoch': 0.16}


 17%|█▋        | 280/1678 [07:02<31:31,  1.35s/it]

{'loss': 2.99, 'learning_rate': 4.165673420738975e-05, 'epoch': 0.17}


 17%|█▋        | 290/1678 [07:16<32:32,  1.41s/it]

{'loss': 2.924, 'learning_rate': 4.1358760429082244e-05, 'epoch': 0.17}


 18%|█▊        | 300/1678 [07:30<33:40,  1.47s/it]

{'loss': 2.8344, 'learning_rate': 4.1060786650774736e-05, 'epoch': 0.18}


 18%|█▊        | 310/1678 [07:44<33:27,  1.47s/it]

{'loss': 2.8214, 'learning_rate': 4.076281287246723e-05, 'epoch': 0.18}


 19%|█▉        | 320/1678 [07:58<31:59,  1.41s/it]

{'loss': 2.8091, 'learning_rate': 4.046483909415972e-05, 'epoch': 0.19}


 20%|█▉        | 330/1678 [08:13<32:16,  1.44s/it]

{'loss': 2.8178, 'learning_rate': 4.016686531585221e-05, 'epoch': 0.2}


 20%|██        | 340/1678 [08:27<31:56,  1.43s/it]

{'loss': 2.825, 'learning_rate': 3.98688915375447e-05, 'epoch': 0.2}


 21%|██        | 350/1678 [08:40<29:35,  1.34s/it]

{'loss': 2.8523, 'learning_rate': 3.9570917759237194e-05, 'epoch': 0.21}


 21%|██▏       | 360/1678 [08:54<30:02,  1.37s/it]

{'loss': 2.7406, 'learning_rate': 3.927294398092968e-05, 'epoch': 0.21}


 22%|██▏       | 370/1678 [09:09<32:29,  1.49s/it]

{'loss': 2.7398, 'learning_rate': 3.897497020262217e-05, 'epoch': 0.22}


 23%|██▎       | 380/1678 [09:23<29:37,  1.37s/it]

{'loss': 2.7246, 'learning_rate': 3.867699642431466e-05, 'epoch': 0.23}


 23%|██▎       | 390/1678 [09:37<30:41,  1.43s/it]

{'loss': 2.8834, 'learning_rate': 3.837902264600715e-05, 'epoch': 0.23}


 24%|██▍       | 400/1678 [09:52<31:23,  1.47s/it]

{'loss': 2.7544, 'learning_rate': 3.8081048867699645e-05, 'epoch': 0.24}


 24%|██▍       | 410/1678 [10:07<30:05,  1.42s/it]

{'loss': 2.8128, 'learning_rate': 3.7783075089392136e-05, 'epoch': 0.24}


 25%|██▌       | 420/1678 [10:21<29:18,  1.40s/it]

{'loss': 2.7465, 'learning_rate': 3.748510131108463e-05, 'epoch': 0.25}


 26%|██▌       | 430/1678 [10:35<30:52,  1.48s/it]

{'loss': 2.6899, 'learning_rate': 3.718712753277712e-05, 'epoch': 0.26}


 26%|██▌       | 440/1678 [10:50<30:20,  1.47s/it]

{'loss': 2.7636, 'learning_rate': 3.688915375446961e-05, 'epoch': 0.26}


 27%|██▋       | 450/1678 [11:04<30:07,  1.47s/it]

{'loss': 2.822, 'learning_rate': 3.6591179976162096e-05, 'epoch': 0.27}


 27%|██▋       | 460/1678 [11:19<29:14,  1.44s/it]

{'loss': 2.7316, 'learning_rate': 3.629320619785459e-05, 'epoch': 0.27}


 28%|██▊       | 470/1678 [11:34<29:12,  1.45s/it]

{'loss': 2.8172, 'learning_rate': 3.599523241954708e-05, 'epoch': 0.28}


 29%|██▊       | 480/1678 [11:47<27:20,  1.37s/it]

{'loss': 2.737, 'learning_rate': 3.569725864123957e-05, 'epoch': 0.29}


 29%|██▉       | 490/1678 [12:01<29:36,  1.50s/it]

{'loss': 2.7425, 'learning_rate': 3.539928486293206e-05, 'epoch': 0.29}


 30%|██▉       | 500/1678 [12:16<28:54,  1.47s/it]

{'loss': 2.6966, 'learning_rate': 3.5101311084624553e-05, 'epoch': 0.3}


 30%|███       | 510/1678 [12:35<30:38,  1.57s/it]

{'loss': 2.8204, 'learning_rate': 3.4803337306317045e-05, 'epoch': 0.3}


 31%|███       | 520/1678 [12:49<28:36,  1.48s/it]

{'loss': 2.7103, 'learning_rate': 3.4505363528009537e-05, 'epoch': 0.31}


 32%|███▏      | 530/1678 [13:03<27:11,  1.42s/it]

{'loss': 2.7144, 'learning_rate': 3.420738974970203e-05, 'epoch': 0.32}


 32%|███▏      | 540/1678 [13:18<25:47,  1.36s/it]

{'loss': 2.6901, 'learning_rate': 3.390941597139452e-05, 'epoch': 0.32}


 33%|███▎      | 550/1678 [13:32<26:57,  1.43s/it]

{'loss': 2.6668, 'learning_rate': 3.361144219308701e-05, 'epoch': 0.33}


 33%|███▎      | 560/1678 [13:46<27:09,  1.46s/it]

{'loss': 2.621, 'learning_rate': 3.33134684147795e-05, 'epoch': 0.33}


 34%|███▍      | 570/1678 [14:01<27:02,  1.46s/it]

{'loss': 2.7327, 'learning_rate': 3.3015494636471994e-05, 'epoch': 0.34}


 35%|███▍      | 580/1678 [14:16<28:21,  1.55s/it]

{'loss': 2.6707, 'learning_rate': 3.2717520858164486e-05, 'epoch': 0.35}


 35%|███▌      | 590/1678 [14:32<28:41,  1.58s/it]

{'loss': 2.7587, 'learning_rate': 3.241954707985698e-05, 'epoch': 0.35}


 36%|███▌      | 600/1678 [14:46<25:28,  1.42s/it]

{'loss': 2.6334, 'learning_rate': 3.212157330154946e-05, 'epoch': 0.36}


 36%|███▋      | 610/1678 [15:00<24:46,  1.39s/it]

{'loss': 2.568, 'learning_rate': 3.1823599523241954e-05, 'epoch': 0.36}


 37%|███▋      | 620/1678 [15:14<25:38,  1.45s/it]

{'loss': 2.7312, 'learning_rate': 3.1525625744934445e-05, 'epoch': 0.37}


 38%|███▊      | 630/1678 [15:29<24:34,  1.41s/it]

{'loss': 2.5713, 'learning_rate': 3.122765196662694e-05, 'epoch': 0.38}


 38%|███▊      | 640/1678 [15:42<24:59,  1.44s/it]

{'loss': 2.6151, 'learning_rate': 3.092967818831943e-05, 'epoch': 0.38}


 39%|███▊      | 650/1678 [15:58<24:57,  1.46s/it]

{'loss': 2.715, 'learning_rate': 3.063170441001192e-05, 'epoch': 0.39}


 39%|███▉      | 660/1678 [16:13<23:54,  1.41s/it]

{'loss': 2.5989, 'learning_rate': 3.0333730631704412e-05, 'epoch': 0.39}


 40%|███▉      | 670/1678 [16:28<25:55,  1.54s/it]

{'loss': 2.5898, 'learning_rate': 3.00357568533969e-05, 'epoch': 0.4}


 41%|████      | 680/1678 [16:42<22:31,  1.35s/it]

{'loss': 2.6637, 'learning_rate': 2.973778307508939e-05, 'epoch': 0.41}


 41%|████      | 690/1678 [16:55<23:28,  1.43s/it]

{'loss': 2.6558, 'learning_rate': 2.9439809296781883e-05, 'epoch': 0.41}


 42%|████▏     | 700/1678 [17:09<22:37,  1.39s/it]

{'loss': 2.6496, 'learning_rate': 2.9141835518474375e-05, 'epoch': 0.42}


 42%|████▏     | 710/1678 [17:23<23:22,  1.45s/it]

{'loss': 2.6456, 'learning_rate': 2.8843861740166866e-05, 'epoch': 0.42}


 43%|████▎     | 720/1678 [17:38<22:53,  1.43s/it]

{'loss': 2.5743, 'learning_rate': 2.8545887961859358e-05, 'epoch': 0.43}


 44%|████▎     | 730/1678 [17:51<21:37,  1.37s/it]

{'loss': 2.5314, 'learning_rate': 2.824791418355185e-05, 'epoch': 0.43}


 44%|████▍     | 740/1678 [18:06<22:50,  1.46s/it]

{'loss': 2.6191, 'learning_rate': 2.794994040524434e-05, 'epoch': 0.44}


 45%|████▍     | 750/1678 [18:21<23:09,  1.50s/it]

{'loss': 2.5727, 'learning_rate': 2.7651966626936832e-05, 'epoch': 0.45}


 45%|████▌     | 760/1678 [18:35<22:24,  1.46s/it]

{'loss': 2.6351, 'learning_rate': 2.7353992848629324e-05, 'epoch': 0.45}


 46%|████▌     | 770/1678 [18:49<20:25,  1.35s/it]

{'loss': 2.5667, 'learning_rate': 2.7056019070321816e-05, 'epoch': 0.46}


 46%|████▋     | 780/1678 [19:03<20:59,  1.40s/it]

{'loss': 2.7063, 'learning_rate': 2.6758045292014307e-05, 'epoch': 0.46}


 47%|████▋     | 790/1678 [19:17<20:35,  1.39s/it]

{'loss': 2.5885, 'learning_rate': 2.6460071513706795e-05, 'epoch': 0.47}


 48%|████▊     | 800/1678 [19:32<21:01,  1.44s/it]

{'loss': 2.6194, 'learning_rate': 2.6162097735399287e-05, 'epoch': 0.48}


 48%|████▊     | 810/1678 [19:46<19:56,  1.38s/it]

{'loss': 2.5526, 'learning_rate': 2.586412395709178e-05, 'epoch': 0.48}


 49%|████▉     | 820/1678 [20:01<19:55,  1.39s/it]

{'loss': 2.534, 'learning_rate': 2.556615017878427e-05, 'epoch': 0.49}


 49%|████▉     | 830/1678 [20:15<20:52,  1.48s/it]

{'loss': 2.5839, 'learning_rate': 2.526817640047676e-05, 'epoch': 0.49}


 50%|█████     | 840/1678 [20:29<19:06,  1.37s/it]

{'loss': 2.5468, 'learning_rate': 2.497020262216925e-05, 'epoch': 0.5}


 51%|█████     | 850/1678 [20:44<19:04,  1.38s/it]

{'loss': 2.6666, 'learning_rate': 2.467222884386174e-05, 'epoch': 0.51}


 51%|█████▏    | 860/1678 [20:58<19:47,  1.45s/it]

{'loss': 2.5744, 'learning_rate': 2.4374255065554233e-05, 'epoch': 0.51}


 52%|█████▏    | 870/1678 [21:13<20:24,  1.52s/it]

{'loss': 2.4634, 'learning_rate': 2.4076281287246724e-05, 'epoch': 0.52}


 52%|█████▏    | 880/1678 [21:28<19:36,  1.47s/it]

{'loss': 2.6992, 'learning_rate': 2.3778307508939216e-05, 'epoch': 0.52}


 53%|█████▎    | 890/1678 [21:43<18:54,  1.44s/it]

{'loss': 2.6921, 'learning_rate': 2.3480333730631707e-05, 'epoch': 0.53}


 54%|█████▎    | 900/1678 [21:57<18:48,  1.45s/it]

{'loss': 2.5812, 'learning_rate': 2.3182359952324196e-05, 'epoch': 0.54}


 54%|█████▍    | 910/1678 [22:12<17:15,  1.35s/it]

{'loss': 2.652, 'learning_rate': 2.2884386174016687e-05, 'epoch': 0.54}


 55%|█████▍    | 920/1678 [22:25<17:51,  1.41s/it]

{'loss': 2.5326, 'learning_rate': 2.258641239570918e-05, 'epoch': 0.55}


 55%|█████▌    | 930/1678 [22:40<18:10,  1.46s/it]

{'loss': 2.6264, 'learning_rate': 2.228843861740167e-05, 'epoch': 0.55}


 56%|█████▌    | 940/1678 [22:54<16:42,  1.36s/it]

{'loss': 2.5591, 'learning_rate': 2.1990464839094162e-05, 'epoch': 0.56}


 57%|█████▋    | 950/1678 [23:08<17:40,  1.46s/it]

{'loss': 2.5254, 'learning_rate': 2.169249106078665e-05, 'epoch': 0.57}


 57%|█████▋    | 960/1678 [23:23<16:46,  1.40s/it]

{'loss': 2.7223, 'learning_rate': 2.139451728247914e-05, 'epoch': 0.57}


 58%|█████▊    | 970/1678 [23:37<17:02,  1.44s/it]

{'loss': 2.4981, 'learning_rate': 2.1096543504171633e-05, 'epoch': 0.58}


 58%|█████▊    | 980/1678 [23:51<15:25,  1.33s/it]

{'loss': 2.5844, 'learning_rate': 2.0798569725864125e-05, 'epoch': 0.58}


 59%|█████▉    | 990/1678 [24:05<16:10,  1.41s/it]

{'loss': 2.5601, 'learning_rate': 2.0500595947556616e-05, 'epoch': 0.59}


 60%|█████▉    | 1000/1678 [24:20<15:46,  1.40s/it]

{'loss': 2.5934, 'learning_rate': 2.0202622169249108e-05, 'epoch': 0.6}


 60%|██████    | 1010/1678 [24:39<17:19,  1.56s/it]

{'loss': 2.5926, 'learning_rate': 1.99046483909416e-05, 'epoch': 0.6}


 61%|██████    | 1020/1678 [24:54<15:23,  1.40s/it]

{'loss': 2.5231, 'learning_rate': 1.9606674612634088e-05, 'epoch': 0.61}


 61%|██████▏   | 1030/1678 [25:08<15:21,  1.42s/it]

{'loss': 2.4268, 'learning_rate': 1.930870083432658e-05, 'epoch': 0.61}


 62%|██████▏   | 1040/1678 [25:22<15:29,  1.46s/it]

{'loss': 2.5049, 'learning_rate': 1.901072705601907e-05, 'epoch': 0.62}


 63%|██████▎   | 1050/1678 [25:37<15:31,  1.48s/it]

{'loss': 2.5418, 'learning_rate': 1.8712753277711562e-05, 'epoch': 0.63}


 63%|██████▎   | 1060/1678 [25:51<15:07,  1.47s/it]

{'loss': 2.5673, 'learning_rate': 1.8414779499404054e-05, 'epoch': 0.63}


 64%|██████▍   | 1070/1678 [26:05<14:09,  1.40s/it]

{'loss': 2.5023, 'learning_rate': 1.8116805721096545e-05, 'epoch': 0.64}


 64%|██████▍   | 1080/1678 [26:20<14:03,  1.41s/it]

{'loss': 2.4782, 'learning_rate': 1.7818831942789037e-05, 'epoch': 0.64}


 65%|██████▍   | 1090/1678 [26:35<14:12,  1.45s/it]

{'loss': 2.5405, 'learning_rate': 1.7520858164481525e-05, 'epoch': 0.65}


 66%|██████▌   | 1100/1678 [26:49<14:23,  1.49s/it]

{'loss': 2.4342, 'learning_rate': 1.7222884386174017e-05, 'epoch': 0.66}


 66%|██████▌   | 1110/1678 [27:03<12:44,  1.35s/it]

{'loss': 2.49, 'learning_rate': 1.692491060786651e-05, 'epoch': 0.66}


 67%|██████▋   | 1120/1678 [27:18<13:16,  1.43s/it]

{'loss': 2.4585, 'learning_rate': 1.6626936829559e-05, 'epoch': 0.67}


 67%|██████▋   | 1130/1678 [27:33<13:25,  1.47s/it]

{'loss': 2.4504, 'learning_rate': 1.632896305125149e-05, 'epoch': 0.67}


 68%|██████▊   | 1140/1678 [27:47<13:28,  1.50s/it]

{'loss': 2.4517, 'learning_rate': 1.603098927294398e-05, 'epoch': 0.68}


 69%|██████▊   | 1150/1678 [28:00<11:47,  1.34s/it]

{'loss': 2.4387, 'learning_rate': 1.573301549463647e-05, 'epoch': 0.69}


 69%|██████▉   | 1160/1678 [28:15<13:01,  1.51s/it]

{'loss': 2.4971, 'learning_rate': 1.5435041716328963e-05, 'epoch': 0.69}


 70%|██████▉   | 1170/1678 [28:29<11:30,  1.36s/it]

{'loss': 2.4493, 'learning_rate': 1.5137067938021454e-05, 'epoch': 0.7}


 70%|███████   | 1180/1678 [28:43<11:17,  1.36s/it]

{'loss': 2.4004, 'learning_rate': 1.4839094159713946e-05, 'epoch': 0.7}


 71%|███████   | 1190/1678 [28:57<11:09,  1.37s/it]

{'loss': 2.5376, 'learning_rate': 1.4541120381406437e-05, 'epoch': 0.71}


 72%|███████▏  | 1200/1678 [29:12<11:07,  1.40s/it]

{'loss': 2.5211, 'learning_rate': 1.4243146603098927e-05, 'epoch': 0.71}


 72%|███████▏  | 1210/1678 [29:26<10:44,  1.38s/it]

{'loss': 2.3994, 'learning_rate': 1.3945172824791419e-05, 'epoch': 0.72}


 73%|███████▎  | 1220/1678 [29:41<11:45,  1.54s/it]

{'loss': 2.4107, 'learning_rate': 1.364719904648391e-05, 'epoch': 0.73}


 73%|███████▎  | 1230/1678 [29:56<11:12,  1.50s/it]

{'loss': 2.4586, 'learning_rate': 1.3349225268176402e-05, 'epoch': 0.73}


 74%|███████▍  | 1240/1678 [30:10<10:14,  1.40s/it]

{'loss': 2.4632, 'learning_rate': 1.3051251489868894e-05, 'epoch': 0.74}


 74%|███████▍  | 1250/1678 [30:25<10:25,  1.46s/it]

{'loss': 2.4765, 'learning_rate': 1.2753277711561385e-05, 'epoch': 0.74}


 75%|███████▌  | 1260/1678 [30:39<09:32,  1.37s/it]

{'loss': 2.3726, 'learning_rate': 1.2455303933253875e-05, 'epoch': 0.75}


 76%|███████▌  | 1270/1678 [30:54<09:48,  1.44s/it]

{'loss': 2.4195, 'learning_rate': 1.2157330154946365e-05, 'epoch': 0.76}


 76%|███████▋  | 1280/1678 [31:08<09:56,  1.50s/it]

{'loss': 2.4797, 'learning_rate': 1.1859356376638856e-05, 'epoch': 0.76}


 77%|███████▋  | 1290/1678 [31:22<09:11,  1.42s/it]

{'loss': 2.3348, 'learning_rate': 1.1561382598331346e-05, 'epoch': 0.77}


 77%|███████▋  | 1300/1678 [31:37<09:12,  1.46s/it]

{'loss': 2.3751, 'learning_rate': 1.1263408820023838e-05, 'epoch': 0.77}


 78%|███████▊  | 1310/1678 [31:51<08:55,  1.46s/it]

{'loss': 2.5221, 'learning_rate': 1.096543504171633e-05, 'epoch': 0.78}


 79%|███████▊  | 1320/1678 [32:05<08:44,  1.46s/it]

{'loss': 2.3937, 'learning_rate': 1.0667461263408821e-05, 'epoch': 0.79}


 79%|███████▉  | 1330/1678 [32:19<07:59,  1.38s/it]

{'loss': 2.438, 'learning_rate': 1.0369487485101313e-05, 'epoch': 0.79}


 80%|███████▉  | 1340/1678 [32:33<07:49,  1.39s/it]

{'loss': 2.3657, 'learning_rate': 1.0071513706793802e-05, 'epoch': 0.8}


 80%|████████  | 1350/1678 [32:48<07:50,  1.44s/it]

{'loss': 2.4145, 'learning_rate': 9.773539928486292e-06, 'epoch': 0.8}


 81%|████████  | 1360/1678 [33:01<07:08,  1.35s/it]

{'loss': 2.4197, 'learning_rate': 9.475566150178784e-06, 'epoch': 0.81}


 82%|████████▏ | 1370/1678 [33:15<07:01,  1.37s/it]

{'loss': 2.4907, 'learning_rate': 9.177592371871275e-06, 'epoch': 0.82}


 82%|████████▏ | 1380/1678 [33:29<06:31,  1.31s/it]

{'loss': 2.4205, 'learning_rate': 8.879618593563767e-06, 'epoch': 0.82}


 83%|████████▎ | 1390/1678 [33:44<07:03,  1.47s/it]

{'loss': 2.5285, 'learning_rate': 8.581644815256259e-06, 'epoch': 0.83}


 83%|████████▎ | 1400/1678 [33:58<06:46,  1.46s/it]

{'loss': 2.4104, 'learning_rate': 8.28367103694875e-06, 'epoch': 0.83}


 84%|████████▍ | 1410/1678 [34:12<06:34,  1.47s/it]

{'loss': 2.303, 'learning_rate': 7.98569725864124e-06, 'epoch': 0.84}


 85%|████████▍ | 1420/1678 [34:27<06:42,  1.56s/it]

{'loss': 2.3716, 'learning_rate': 7.68772348033373e-06, 'epoch': 0.85}


 85%|████████▌ | 1430/1678 [34:41<05:58,  1.45s/it]

{'loss': 2.4535, 'learning_rate': 7.389749702026222e-06, 'epoch': 0.85}


 86%|████████▌ | 1440/1678 [34:56<05:57,  1.50s/it]

{'loss': 2.4235, 'learning_rate': 7.091775923718713e-06, 'epoch': 0.86}


 86%|████████▋ | 1450/1678 [35:10<05:34,  1.47s/it]

{'loss': 2.375, 'learning_rate': 6.7938021454112046e-06, 'epoch': 0.86}


 87%|████████▋ | 1460/1678 [35:24<05:09,  1.42s/it]

{'loss': 2.3313, 'learning_rate': 6.495828367103695e-06, 'epoch': 0.87}


 88%|████████▊ | 1470/1678 [35:39<05:05,  1.47s/it]

{'loss': 2.4056, 'learning_rate': 6.197854588796186e-06, 'epoch': 0.88}


 88%|████████▊ | 1480/1678 [35:54<04:57,  1.50s/it]

{'loss': 2.2971, 'learning_rate': 5.8998808104886775e-06, 'epoch': 0.88}


 89%|████████▉ | 1490/1678 [36:09<04:51,  1.55s/it]

{'loss': 2.382, 'learning_rate': 5.601907032181168e-06, 'epoch': 0.89}


 89%|████████▉ | 1500/1678 [36:23<03:59,  1.34s/it]

{'loss': 2.3929, 'learning_rate': 5.30393325387366e-06, 'epoch': 0.89}


 90%|████████▉ | 1510/1678 [36:42<04:18,  1.54s/it]

{'loss': 2.3235, 'learning_rate': 5.0059594755661505e-06, 'epoch': 0.9}


 91%|█████████ | 1520/1678 [36:56<03:56,  1.50s/it]

{'loss': 2.3925, 'learning_rate': 4.707985697258641e-06, 'epoch': 0.91}


 91%|█████████ | 1530/1678 [37:12<03:41,  1.50s/it]

{'loss': 2.3807, 'learning_rate': 4.410011918951133e-06, 'epoch': 0.91}


 92%|█████████▏| 1540/1678 [37:26<03:21,  1.46s/it]

{'loss': 2.3646, 'learning_rate': 4.1120381406436235e-06, 'epoch': 0.92}


 92%|█████████▏| 1550/1678 [37:40<02:56,  1.38s/it]

{'loss': 2.385, 'learning_rate': 3.8140643623361143e-06, 'epoch': 0.92}


 93%|█████████▎| 1560/1678 [37:55<02:55,  1.49s/it]

{'loss': 2.3076, 'learning_rate': 3.516090584028606e-06, 'epoch': 0.93}


 94%|█████████▎| 1570/1678 [38:10<02:48,  1.56s/it]

{'loss': 2.4728, 'learning_rate': 3.218116805721097e-06, 'epoch': 0.94}


 94%|█████████▍| 1580/1678 [38:24<02:19,  1.42s/it]

{'loss': 2.4863, 'learning_rate': 2.920143027413588e-06, 'epoch': 0.94}


 95%|█████████▍| 1590/1678 [38:39<02:08,  1.46s/it]

{'loss': 2.4004, 'learning_rate': 2.622169249106079e-06, 'epoch': 0.95}


 95%|█████████▌| 1600/1678 [38:53<01:52,  1.44s/it]

{'loss': 2.366, 'learning_rate': 2.32419547079857e-06, 'epoch': 0.95}


 96%|█████████▌| 1610/1678 [39:07<01:33,  1.37s/it]

{'loss': 2.3148, 'learning_rate': 2.026221692491061e-06, 'epoch': 0.96}


 97%|█████████▋| 1620/1678 [39:21<01:22,  1.42s/it]

{'loss': 2.394, 'learning_rate': 1.728247914183552e-06, 'epoch': 0.97}


 97%|█████████▋| 1630/1678 [39:35<01:04,  1.35s/it]

{'loss': 2.2375, 'learning_rate': 1.430274135876043e-06, 'epoch': 0.97}


 98%|█████████▊| 1640/1678 [39:50<00:57,  1.52s/it]

{'loss': 2.3483, 'learning_rate': 1.132300357568534e-06, 'epoch': 0.98}


 98%|█████████▊| 1650/1678 [40:04<00:38,  1.39s/it]

{'loss': 2.3245, 'learning_rate': 8.343265792610251e-07, 'epoch': 0.98}


 99%|█████████▉| 1660/1678 [40:18<00:26,  1.46s/it]

{'loss': 2.3588, 'learning_rate': 5.363528009535162e-07, 'epoch': 0.99}


100%|█████████▉| 1670/1678 [40:32<00:11,  1.43s/it]

{'loss': 2.2839, 'learning_rate': 2.3837902264600714e-07, 'epoch': 0.99}


100%|██████████| 1678/1678 [40:44<00:00,  1.46s/it]

{'train_runtime': 2444.6557, 'train_samples_per_second': 10.986, 'train_steps_per_second': 0.686, 'train_loss': 2.6338295271626815, 'epoch': 1.0}





TrainOutput(global_step=1678, training_loss=2.6338295271626815, metrics={'train_runtime': 2444.6557, 'train_samples_per_second': 10.986, 'train_steps_per_second': 0.686, 'train_loss': 2.6338295271626815, 'epoch': 1.0})

## Step8 模型推理

In [29]:
from transformers import pipeline

pipe = pipeline("text-generation", model=model, tokenizer=tokenizer, device=0)

Xformers is not installed correctly. If you want to use memory_efficient_attention to accelerate training use the following command to install Xformers
pip install xformers.


In [32]:
ipt = "Human: {}\n{}".format("考试有哪些技巧？", "").strip() + "\n\nAssistant: "
res = pipe(ipt, max_length=256, do_sample=True, num_return_sequences=4)
print(res)

[{'generated_text': 'Human: 考试有哪些技巧？\n\nAssistant: 考试有很多技巧。这里列举一些常见的方法:\n\n1. 提前准备：使用备考资料中的提醒来准备考试。例如，列出过去几年的考试时间和题源，以便在考试前做好准备。\n\n2. 保持良好的学习习惯：坚持每天学习，保持充足的睡眠质量和良好的生活环境，并且不断更新自己和学习资料，以备考更加顺利。\n\n3. 复习巩固：根据考试科目和考试主题，通过不断练习来巩固所学知识，并确保充分掌握所要技能。\n\n4. 勤奋努力：多参加各类考试，以便在考试中取得优异的成绩。这样，可以提高你的学习效率和考试能力。\n\n总之，坚持上述方法，不仅能够帮助你应对考试的氛围，还会帮助你在学习和生活中取得更好的成绩。\n\n再次提示，无论是在线教学还是传统课堂教学，都要根据自己需要来学习和备考知识。只要保持学习态度和知识技能，就能取得优异的成绩。'}, {'generated_text': 'Human: 考试有哪些技巧？\n\nAssistant: 考试技巧有很多种，每种技巧都会涉及不同的类型。例如，题目技巧通常会使用一些结构化的步骤和语法组织来帮助考生更好地思考题。解题技巧也常常用于快速解决题目，帮助学生更好地掌握考试内容。除此之外，复习技巧也会影响到考试的成功，帮助学生取得更好的成绩。总之，考试技巧对于考试的成功至关重要。'}, {'generated_text': 'Human: 考试有哪些技巧？\n\nAssistant: 考试技巧包括多种方面，其中最常见的技巧有：考试时间限制，准备考试材料，认真复习，以及应对考试的策略。考试技巧包括书写技巧，逻辑思考技巧，考试技巧和经验。此外，每个人都可以学习和改进考试技巧，以保持学习的有效性和有效性。'}, {'generated_text': 'Human: 考试有哪些技巧？\n\nAssistant: 考试技巧有很多，这里有一些常见的技巧：\n\n1. 组织：考前制定计划并按计划执行。将目标组织好，并在每道题目完成后按时完成。\n2. 掌握公式：一题一答案。可以按照逻辑顺序整理答案。\n3. 思考问题：设题时，要注意仔细思考每一个题目，这样可以避免因遗漏或拼写错误而犯错时。\n4. 控制时间: 在考试开始之前，要认真准备练习，避免在考试时不掌握题目要求

In [33]:
for item in res:
    print('+++'*20)
    print(item)

++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
{'generated_text': 'Human: 考试有哪些技巧？\n\nAssistant: 考试有很多技巧。这里列举一些常见的方法:\n\n1. 提前准备：使用备考资料中的提醒来准备考试。例如，列出过去几年的考试时间和题源，以便在考试前做好准备。\n\n2. 保持良好的学习习惯：坚持每天学习，保持充足的睡眠质量和良好的生活环境，并且不断更新自己和学习资料，以备考更加顺利。\n\n3. 复习巩固：根据考试科目和考试主题，通过不断练习来巩固所学知识，并确保充分掌握所要技能。\n\n4. 勤奋努力：多参加各类考试，以便在考试中取得优异的成绩。这样，可以提高你的学习效率和考试能力。\n\n总之，坚持上述方法，不仅能够帮助你应对考试的氛围，还会帮助你在学习和生活中取得更好的成绩。\n\n再次提示，无论是在线教学还是传统课堂教学，都要根据自己需要来学习和备考知识。只要保持学习态度和知识技能，就能取得优异的成绩。'}
++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
{'generated_text': 'Human: 考试有哪些技巧？\n\nAssistant: 考试技巧有很多种，每种技巧都会涉及不同的类型。例如，题目技巧通常会使用一些结构化的步骤和语法组织来帮助考生更好地思考题。解题技巧也常常用于快速解决题目，帮助学生更好地掌握考试内容。除此之外，复习技巧也会影响到考试的成功，帮助学生取得更好的成绩。总之，考试技巧对于考试的成功至关重要。'}
++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
{'generated_text': 'Human: 考试有哪些技巧？\n\nAssistant: 考试技巧包括多种方面，其中最常见的技巧有：考试时间限制，准备考试材料，认真复习，以及应对考试的策略。考试技巧包括书写技巧，逻辑思考技巧，考试技巧和经验。此外，每个人都可以学习和改进考试技巧，以保持学习的有效性和有效性。'}
+++++++++++++++++++++++++++++++++++++++++++++++++