<a href="https://colab.research.google.com/github/shake/colab-Llama-2-ipynb/blob/main/step_by_step_llama_2_7b_shake.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

In [1]:
# huggingface cli
!pip install -q huggingface_hub

In [None]:
# import 密钥
from google.colab import userdata
hf_token = userdata.get('huggingface')
!git config --global credential.helper store
!huggingface-cli login --token $hf_token --add-to-git-credential

In [None]:
# 安装微调需要包
!pip install git+https://github.com/huggingface/transformers
!pip install sentencepiece
!pip install -q accelerate==0.21.0 peft==0.4.0 bitsandbytes==0.40.2  trl==0.4.7

In [4]:
# 模型
MODEL_ID = "meta-llama/Llama-2-7b"
MODEL_NAME = MODEL_ID.split('/')[-1]

In [None]:
# download Llama-2-7b
!huggingface-cli download \
	--local-dir=/content/$MODEL_NAME \
	$MODEL_ID \
	checklist.chk consolidated.00.pth params.json \
	tokenizer.model tokenizer_checklist.chk

In [None]:
# 下载HF格式转换工具
!wget https://raw.githubusercontent.com/huggingface/transformers/main/src/transformers/models/llama/convert_llama_weights_to_hf.py

In [7]:
# 格式转换需要提前创建好目录
!mkdir /content/Llama-2-7b/7B
!cp /content/Llama-2-7b/params.json /content/Llama-2-7b/7B/params.json


In [None]:
# update cache
from transformers.utils.hub import move_cache

In [9]:
#解决colab字符集错误
import locale
def getpreferredencoding(do_setlocale = True):
    return "UTF-8"
locale.getpreferredencoding = getpreferredencoding

In [None]:
# 开始转换
!python convert_llama_weights_to_hf.py \
    --input_dir /content/$MODEL_NAME  --model_size 7B --output_dir $MODEL_NAME-hf

In [None]:
# 查看转换结果
!ls ./Llama-2-7b-hf

In [12]:
#测试没有微调之前的模型
import torch
from transformers import LlamaForCausalLM, LlamaTokenizer
model_id="/content/Llama-2-7b-hf"
tokenizer = LlamaTokenizer.from_pretrained(model_id)
model =LlamaForCausalLM.from_pretrained(model_id, load_in_8bit=True, device_map='auto', torch_dtype=torch.float16)

Loading checkpoint shards:   0%|          | 0/3 [00:00<?, ?it/s]

  return self.fget.__get__(instance, owner)()


In [13]:
# 测试
eval_prompt = """
Summarize this dialog:
A: Hi Tom, are you busy tomorrow’s afternoon?
B: I’m pretty sure I am. What’s up?
A: Can you go with me to the animal shelter?.
B: What do you want to do?
A: I want to get a puppy for my son.
B: That will make him so happy.
A: Yeah, we’ve discussed it many times. I think he’s ready now.
B: That’s good. Raising a dog is a tough issue. Like having a baby ;-)
A: I'll get him one of those little dogs.
B: One that won't grow up too big;-)
A: And eat too much;-))
B: Do you know which one he would like?
A: Oh, yes, I took him there last Monday. He showed me one that he really liked.
B: I bet you had to drag him away.
A: He wanted to take it home right away ;-).
B: I wonder what he'll name it.
A: He said he’d name it after his dead hamster – Lemmy  - he's  a great Motorhead fan :-)))
---
Summary:
"""

model_input = tokenizer(eval_prompt, return_tensors="pt").to("cuda")

model.eval()
with torch.no_grad():
    print(tokenizer.decode(model.generate(**model_input, max_new_tokens=100)[0], skip_special_tokens=True))

Setting `pad_token_id` to `eos_token_id`:2 for open-end generation.



Summarize this dialog:
A: Hi Tom, are you busy tomorrow’s afternoon?
B: I’m pretty sure I am. What’s up?
A: Can you go with me to the animal shelter?.
B: What do you want to do?
A: I want to get a puppy for my son.
B: That will make him so happy.
A: Yeah, we’ve discussed it many times. I think he’s ready now.
B: That’s good. Raising a dog is a tough issue. Like having a baby ;-)
A: I'll get him one of those little dogs.
B: One that won't grow up too big;-)
A: And eat too much;-))
B: Do you know which one he would like?
A: Oh, yes, I took him there last Monday. He showed me one that he really liked.
B: I bet you had to drag him away.
A: He wanted to take it home right away ;-).
B: I wonder what he'll name it.
A: He said he’d name it after his dead hamster – Lemmy  - he's  a great Motorhead fan :-)))
---
Summary:
A: Hi Tom, are you busy tomorrow’s afternoon?
B: I’m pretty sure I am. What’s up?
A: Can you go with me to the animal shelter?.
B: What do you want to do?
A: I want to get a pu

# 微调


在自然语言处理 (NLP) 中，PEFT（parameter-efficient fine-tuning） 是一种用于改进语言模型性能的技术。它通过在注意力层中添加一个矩阵来实现。该矩阵用于调整注意力权重，以便模型能够更好地理解句子中的关系。

PEFT 分成3种方法
* Prefix/Prompt-Tuning
* Adapter-Tuning
* LoRA

下面的例子是采用LoRA的方式。
* 需要用到A100，40g，才能完成这个微调。*

In [None]:
# pip 直接安装
!pip install --extra-index-url https://download.pytorch.org/whl/test/cu118 llama-recipes


In [None]:
!export CUDA_VISIBLE_DEVICES=0
!python -m llama_recipes.finetuning  --use_peft --peft_method lora --quantization  \
--model_name {model_id} \
--output_dir {model_id}-peft

In [None]:
#对微调后的模型进行推理测试
import torch
from transformers import LlamaForCausalLM, LlamaTokenizer
from peft import PeftModel, PeftConfig

In [None]:
model_id="/content/Llama-2-7b-hf-peft"
tokenizer = LlamaTokenizer.from_pretrained(model_id)
model =LlamaForCausalLM.from_pretrained(model_id, load_in_8bit=True, device_map='auto', torch_dtype=torch.float16)

In [None]:
# 还没理解为啥需要再次加载
model = PeftModel.from_pretrained(model, "/content/Llama-2-7b-hf-peft")

In [None]:
# 测试微调后的效果
eval_prompt = """
Summarize this dialog:
A: Hi Tom, are you busy tomorrow’s afternoon?
B: I’m pretty sure I am. What’s up?
A: Can you go with me to the animal shelter?.
B: What do you want to do?
A: I want to get a puppy for my son.
B: That will make him so happy.
A: Yeah, we’ve discussed it many times. I think he’s ready now.
B: That’s good. Raising a dog is a tough issue. Like having a baby ;-)
A: I'll get him one of those little dogs.
B: One that won't grow up too big;-)
A: And eat too much;-))
B: Do you know which one he would like?
A: Oh, yes, I took him there last Monday. He showed me one that he really liked.
B: I bet you had to drag him away.
A: He wanted to take it home right away ;-).
B: I wonder what he'll name it.
A: He said he’d name it after his dead hamster – Lemmy  - he's  a great Motorhead fan :-)))
---
Summary:
"""

model_input = tokenizer(eval_prompt, return_tensors="pt").to("cuda")

model.eval()
with torch.no_grad():
    print(tokenizer.decode(model.generate(**model_input, max_new_tokens=100)[0], skip_special_tokens=True))