# 轻量微调和推理Alpaca实践
当前的Alpaca模型是在Self-Instruct论文中使用的技术生成的52K条指令数据，从7B LLaMA模型微调而来，并进行了一些修改。本文将以Alpaca为例，为您介绍如何在PAI-DSW中训练微调推理Alpaca。

## 运行环境要求

Python环境3.9以上 and GPU机器显存32G以上

## 准备工作
#### 下载stanford_alpaca

In [None]:
!wget https://atp-modelzoo-sh.oss-cn-shanghai.aliyuncs.com/release/tutorials/alpaca/stanford_alpaca.tgz
!tar -xvf stanford_alpaca.tgz

### 安装依赖

In [None]:
!cd stanford_alpaca &&  echo y | pip uninstall torch &&  echo y | pip uninstall torchvision && pip install -r requirements.txt && pip install gradio

#### 配置transformer依赖

In [None]:
!git clone https://ghproxy.com/https://github.com/huggingface/transformers.git && \
cd transformers && \
git checkout 165dd6dc916a43ed9b6ce8c1ed62c3fe8c28b6ef && \
pip install -e .

### 数据准备

数据格式如下，如需使用自己的数据进行微调可以转化成如下形式：</br>
"instruction"：用于描述模型应该执行的任务</br>
"input" ： 任务的可选上下文或输入。例如，当指令是“总结以下文章”时，输入就是文章。</br>
"output" ：需要模型输出的答案</br>

格式如下
```python
[
    {
        "instruction": "Give three tips for staying healthy.",
        "input": "",
        "output": "1.Eat a balanced diet and make sure to include plenty of fruits and vegetables. \n2. Exercise regularly to keep your body active and strong. \n3. Get enough sleep and maintain a consistent sleep schedule."
    }
]
```

In [4]:
# 下载数据集，如有重名文件，先将文件夹中的重名文件重命名。
!wget https://atp-modelzoo-sh.oss-cn-shanghai.aliyuncs.com/release/tutorials/alpaca/alpaca_data.json

--2023-07-06 11:16:39--  https://atp-modelzoo-sh.oss-cn-shanghai.aliyuncs.com/release/tutorials/alpaca/alpaca_data.json
Resolving atp-modelzoo-sh.oss-cn-shanghai.aliyuncs.com (atp-modelzoo-sh.oss-cn-shanghai.aliyuncs.com)... 47.101.88.27
Connecting to atp-modelzoo-sh.oss-cn-shanghai.aliyuncs.com (atp-modelzoo-sh.oss-cn-shanghai.aliyuncs.com)|47.101.88.27|:443... connected.
HTTP request sent, awaiting response... 200 OK
Length: 8799 (8.6K) [application/json]
Saving to: ‘alpaca_data.json’


2023-07-06 11:16:39 (99.4 MB/s) - ‘alpaca_data.json’ saved [8799/8799]



## 微调模型

#### 准备权重
在训练之前，我们需要预先下载预训练权重，该权重过大(12G)，下载，解压需较长时间，大约15分钟左右，保险建议复制下面命令（去掉！）前往**终端**运行

In [None]:
!wget https://atp-modelzoo-sh.oss-cn-shanghai.aliyuncs.com/release/tutorials/alpaca/llama-7b-hf.tar.gz && tar -xvf llama-7b-hf.tar.gz

--2023-07-06 11:20:44--  https://atp-modelzoo-sh.oss-cn-shanghai.aliyuncs.com/release/tutorials/alpaca/llama-7b-hf.tar.gz
Resolving atp-modelzoo-sh.oss-cn-shanghai.aliyuncs.com (atp-modelzoo-sh.oss-cn-shanghai.aliyuncs.com)... 47.101.88.27
Connecting to atp-modelzoo-sh.oss-cn-shanghai.aliyuncs.com (atp-modelzoo-sh.oss-cn-shanghai.aliyuncs.com)|47.101.88.27|:443... connected.
HTTP request sent, awaiting response... 200 OK
Length: 12432047721 (12G) [application/gzip]
Saving to: ‘llama-7b-hf.tar.gz’


#### 参数调节
下载完预训练权重后，我们需要改下参数适配机器微调，不然容易发生显存过载，我们可以修改部分参数来保证在较小显存和单卡上也可以测试</br>

根据预训练路径找到对应的config.json文件</br>
并按照下面的参数修改  **./llama-7b-hf/**  路径下面的**config.json**文件</br>
**model_max_length=4**和**num_hidden_layers=4**等参数以保证较小显存也可以训练
```json
{
    "architectures": ["LLaMAForCausalLM"], 
    "bos_token_id": 0, 
    "eos_token_id": 1, 
    "hidden_act": "silu", 
    "hidden_size": 4096, 
    "intermediate_size": 11008, 
    "initializer_range": 0.02, 
    "max_sequence_length": 4, 
    "model_type": "llama", 
    "num_attention_heads": 32, 
    "num_hidden_layers": 4, 
    "pad_token_id": -1, 
    "rms_norm_eps": 1e-06, 
    "torch_dtype": "float16", 
    "transformers_version": "4.27.0.dev0", 
    "use_cache": true, 
    "vocab_size": 32000
}
```


#### 训练阶段
训练前，把**model_name_or_path**改为我们预训练权重的路径，训练批次**num_train_epochs**参数可自行修改，训练阶段中间会有询问是否要**wandb**日志保存的阶段，所以我们建议复制下面复制到**终端**运行为好，出现**wandb**选择，我们直接填3即可

In [8]:
# 执行训练指令
!torchrun --nproc_per_node=1 --master_port=29588 ./stanford_alpaca/train.py \
 --model_name_or_path "./llama-7b-hf" \
 --data_path ./alpaca_data.json \
 --bf16 False \
 --output_dir /models/alpaca-2 \
 --num_train_epochs 1 \
 --per_device_train_batch_size 1 \
 --per_device_eval_batch_size 1 \
 --gradient_accumulation_steps 8 \
 --evaluation_strategy "no" \
 --save_strategy "steps" \
 --save_steps 2000 \
 --save_total_limit 1 \
 --learning_rate 2e-5 \
 --model_max_length 4 \
 --weight_decay 0. \
 --warmup_ratio 0.03 \
 --lr_scheduler_type "cosine" \
 --logging_steps 1 \
 --fsdp "full_shard auto_wrap" \
 --fsdp_transformer_layer_cls_to_wrap 'LlamaDecoderLayer' \
 --tf32 False 

训练代码顺利执行情况如下图所示。</br>


![png](./img/pp.png)

## 推理阶段
可以用以下代码进行推理</br>
在notebook推理前，建议**重启**notebook，防止python环境未连接上而出现包导入不了

In [None]:
import transformers
tokenizers = transformers.LlamaTokenizer.from_pretrained("./models/alpaca-2")
model = transformers.LlamaForCausalLM.from_pretrained("./models/alpaca-2").cuda()
model.eval()
def gen(req):
    batch = tokenizers(req, return_tensors='pt', add_special_tokens=False)
    batch = {k: v.cuda() for k, v in batch.items()}
    full_completion = model.generate(inputs=batch["input_ids"],
                                    attention_mask=batch["attention_mask"],
                                    temperature=0.7,
                                    top_p=0.9,
                                    do_sample=True,
                                    num_beams=1,
                                    max_new_tokens=600,
                                    eos_token_id=tokenizers.eos_token_id,
                                    pad_token_id=tokenizers.pad_token_id)
    print(tokenizers.decode(full_completion[0]))

In [None]:
gen("List all Canadian provinces in alphabetical order.")

也可以用以下文件进行推理</br>
记得修改gen.py文件里面代码的模型路径，使用上述训练**output_dir**参数的路径

In [None]:
!wget  https://atp-modelzoo-sh.oss-cn-shanghai.aliyuncs.com/release/tutorials/alpaca/gen.py

In [None]:
!python gen.py


## 试玩模型

In [None]:
import gradio as gr
import requests
import json
import transformers

tokenizers = transformers.LlamaTokenizer.from_pretrained("./models/alpaca-2")
model = transformers.LlamaForCausalLM.from_pretrained("./models/alpaca-2").cuda()
model.eval()


def inference(text):
    batch  = tokenizers(text, return_tensors="pt", add_special_tokens=False)                                                                                                                                                      
    batch = {k: v.cuda() for k, v in batch.items()}                                                                                                                                                                              
    full_completion = model.generate(inputs=batch["input_ids"],                                                                                                                                                                  
                                     attention_mask=batch["attention_mask"],                                                                                                                                                      
                                     temperature=0.7,                                                                                                                                                                             
                                     top_p=0.9,                                                                                                                                                                                   
                                     do_sample=True,                                                                                                                                                                              
                                     num_beams=1,                                                                                                                                                                                 
                                     max_new_tokens=600,                                                                                                                                                                          
                                     eos_token_id=tokenizers.eos_token_id,                                                                                                                                                        
                                     pad_token_id=tokenizers.pad_token_id)                                                                                                                                                                                                                                                                                                                                                              
    print(tokenizers.decode(full_completion[0]))
    return tokenizers.decode(full_completion[0])

demo = gr.Blocks()
with demo:
    input_prompt = gr.Textbox(label="请输入需求", 
                                value="帮我写一篇安全检查的新闻稿件。",
                                lines=6)
    generated_txt = gr.Textbox(lines=6)

    b1 = gr.Button("发送")
    b1.click(inference, inputs=[input_prompt], outputs=generated_txt) 

demo.launch(enable_queue=True, share=True)