<a href="https://colab.research.google.com/github/shake/colab-Llama-2-ipynb/blob/main/02-quickstart.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

## Quick Start Notebook

This notebook shows how to train a Llama 2 model on a single GPU (e.g. A10 with 24GB) using int8 quantization and LoRA.

### Step 0: Install pre-requirements and convert checkpoint

The example uses the Hugging Face trainer and model which means that the checkpoint has to be converted from its original format into the dedicated Hugging Face format.
The conversion can be achieved by running the `convert_llama_weights_to_hf.py` script provided with the transformer package.
Given that the original checkpoint resides under `models/7B` we can install all requirements and convert the checkpoint with:

此笔记本展示了如何使用 int8 量化和 LoRA 在单个 GPU（例如 24GB 的 A10）上训练 Llama 2 模型。

该示例使用 Hugging Face 训练器和模型，这意味着检查点必须从其原始格式转换为专用的 Hugging Face 格式。可以通过运行转换器包提供的 convert_llama_weights_to_hf.py 脚本来实现转换。鉴于原始检查点位于以下位置，我们可以安装所有要求并使用以下 models/7B 命令转换检查点：

In [None]:
# %%bash
# pip install llama-recipes transformers datasets accelerate sentencepiece protobuf==3.20 py7zr scipy peft bitsandbytes fire torch_tb_profiler ipywidgets
# TRANSFORM=`python -c "import transformers;print('/'.join(transformers.__file__.split('/')[:-1])+'/models/llama/convert_llama_weights_to_hf.py')"`
# python ${TRANSFORM} --input_dir models --model_size 7B --output_dir models_hf/7B

In [None]:
# 安装需要的包
!pip -q install gradio huggingface_hub

# import
import os
import shutil
import huggingface_hub as hh
import pandas as pd

# 下载llama 2，需要使用HuggingFace的token通过验证才能下载，其他模型，这一步可以省掉。
# 配置git存储密钥
! git config --global credential.helper store
!huggingface-cli login --token hf_FqOyPDAURgkbG --add-to-git-credential

# 下载模型，设置huggingface的repo_id，更换不同的模型，
# 只需要在repo_id设置就可以。其他地方无需调整。
repo_id = "meta-llama/Llama-2-7b"
repo_name = repo_id.replace("/","---")

# 定义容量显示和下载路径

def format_size(bytes, precision=2):
	"""
	Convert a file size in bytes to a human-readable format like KB, MB, GB, etc.
	Huggingface use 1000 not 1024
	"""
	units = ["B", "KB", "MB", "GB", "TB", "PB"]
	size = float(bytes)
	index = 0

	while size >= 1000 and index < len(units) - 1:
		index += 1
		size /= 1000

	return f"{size:.{precision}f} {units[index]}"


def list_repo_files_info(repo_id,token=None):
	data_ls = []
	for file in list(hh.list_files_info(repo_id)):
		data_ls.append([file.path,format_size(file.size)])
	files = [file[0] for file in data_ls]
	data = pd.DataFrame(data_ls,columns = ['文件名','大小'])
	return data, files

# 模型下载到当前目录下的"./download"目录
def download_file(repo_id,filenames):
	print(filenames)
	repo_name = repo_id.replace("/","---")

	for filename in filenames:
		print(filename)
		out = hh.hf_hub_download(repo_id=repo_id,filename=filename,local_dir=f"./download/{repo_name}",local_dir_use_symlinks=False,force_download =True)
	out_path = f"./download/{repo_name}"
	return out_path

# 查看模型的文件
data, filenames = list_repo_files_info(repo_id)
filenames

# 开始下载模型
out_path = download_file(repo_id,filenames)

In [None]:
!wget https://raw.githubusercontent.com/huggingface/transformers/main/src/transformers/models/llama/convert_llama_weights_to_hf.py
!pip install -q accelerate==0.21.0 peft==0.4.0 bitsandbytes==0.40.2 transformers==4.31.0 trl==0.4.7 huggingface_hub sentencepiece

In [None]:
!mv /content/download/meta-llama---Llama-2-7b /content/Llama-2-7b
!mkdir /content/Llama-2-7b/7B
!cp /content/Llama-2-7b/params.json /content/Llama-2-7b/7B/params.json

In [None]:
ls /content/Llama-2-7b/7B/

In [None]:
from transformers.utils.hub import move_cache

In [None]:
# 开始从llama 原始格式转换成hf格式，
!python3 convert_llama_weights_to_hf.py \
    --input_dir ./Llama-2-7b  --model_size 7B --output_dir ./Llama-2-7b-hf

In [None]:
#解决colab字符集错误
import locale
def getpreferredencoding(do_setlocale = True):
    return "UTF-8"
locale.getpreferredencoding = getpreferredencoding

In [None]:
!pip3 install -q gekko pandas
!git clone https://github.com/PanQiWei/AutoGPTQ

In [None]:
%cd AutoGPTQ
!pwd


In [None]:
!pip3 install .

In [None]:
%cd /content/
!pwd

/content
/content


In [None]:
!wget https://gist.githubusercontent.com/TheBloke/b47c50a70dd4fe653f64a12928286682/raw/ebcee019d90a178ee2e6a8107fdd7602c8f1192a/quant_autogptq.py

In [None]:
!ls


AutoGPTQ			download    Llama-2-7b-hf      sample_data
convert_llama_weights_to_hf.py	Llama-2-7b  quant_autogptq.py


In [None]:
!python3 quant_autogptq.py ./Llama-2-7b-hf ./llama-2-7b-hf-gptq \
wikitext --bits 4 --group_size 128 --desc_act 0 --damp 0.1 \
--dtype float16 --seqlen 4096 --num_samples 128 --use_fast

In [None]:
!ls

AutoGPTQ			download    Llama-2-7b-hf	quant_autogptq.py
convert_llama_weights_to_hf.py	Llama-2-7b  llama-2-7b-hf-gptq	sample_data


In [None]:
!du -sh ./llama-2-7b-hf-gptq

3.7G	./llama-2-7b-hf-gptq


### Step 1: Load the model

Point model_id to model weight folder

In [None]:
import torch
from transformers import LlamaForCausalLM, LlamaTokenizer

model_id="./Llama-2-7b-hf"

tokenizer = LlamaTokenizer.from_pretrained(model_id)

model =LlamaForCausalLM.from_pretrained(model_id, load_in_8bit=True, device_map='auto', torch_dtype=torch.float16)

### Step 2: Load the preprocessed dataset

We load and preprocess the samsum dataset which consists of curated pairs of dialogs and their summarization:

In [None]:
!pip install --extra-index-url https://download.pytorch.org/whl/test/cu118 llama-recipes


In [None]:
from llama_recipes.utils.dataset_utils import get_preprocessed_dataset
from llama_recipes.configs.datasets import samsum_dataset

train_dataset = get_preprocessed_dataset(tokenizer, samsum_dataset, 'train')

### Step 3: Check base model

Run the base model on an example input:

In [None]:
eval_prompt = """
Summarize this dialog:
A: Hi Tom, are you busy tomorrow’s afternoon?
B: I’m pretty sure I am. What’s up?
A: Can you go with me to the animal shelter?.
B: What do you want to do?
A: I want to get a puppy for my son.
B: That will make him so happy.
A: Yeah, we’ve discussed it many times. I think he’s ready now.
B: That’s good. Raising a dog is a tough issue. Like having a baby ;-)
A: I'll get him one of those little dogs.
B: One that won't grow up too big;-)
A: And eat too much;-))
B: Do you know which one he would like?
A: Oh, yes, I took him there last Monday. He showed me one that he really liked.
B: I bet you had to drag him away.
A: He wanted to take it home right away ;-).
B: I wonder what he'll name it.
A: He said he’d name it after his dead hamster – Lemmy  - he's  a great Motorhead fan :-)))
---
Summary:
"""

model_input = tokenizer(eval_prompt, return_tensors="pt").to("cuda")

model.eval()
with torch.no_grad():
    print(tokenizer.decode(model.generate(**model_input, max_new_tokens=100)[0], skip_special_tokens=True))


Summarize this dialog:
A: Hi Tom, are you busy tomorrow’s afternoon?
B: I’m pretty sure I am. What’s up?
A: Can you go with me to the animal shelter?.
B: What do you want to do?
A: I want to get a puppy for my son.
B: That will make him so happy.
A: Yeah, we’ve discussed it many times. I think he’s ready now.
B: That’s good. Raising a dog is a tough issue. Like having a baby ;-)
A: I'll get him one of those little dogs.
B: One that won't grow up too big;-)
A: And eat too much;-))
B: Do you know which one he would like?
A: Oh, yes, I took him there last Monday. He showed me one that he really liked.
B: I bet you had to drag him away.
A: He wanted to take it home right away ;-).
B: I wonder what he'll name it.
A: He said he’d name it after his dead hamster – Lemmy  - he's  a great Motorhead fan :-)))
---
Summary:
A: Hi Tom, are you busy tomorrow’s afternoon?
B: I’m pretty sure I am. What’s up?
A: Can you go with me to the animal shelter?.
B: What do you want to do?
A: I want to get a pu

We can see that the base model only repeats the conversation.

### Step 4: Prepare model for PEFT

Let's prepare the model for Parameter Efficient Fine Tuning (PEFT):

In [None]:
model.train()

def create_peft_config(model):
    from peft import (
        get_peft_model,
        LoraConfig,
        TaskType,
        prepare_model_for_int8_training,
    )

    peft_config = LoraConfig(
        task_type=TaskType.CAUSAL_LM,
        inference_mode=False,
        r=8,
        lora_alpha=32,
        lora_dropout=0.05,
        target_modules = ["q_proj", "v_proj"]
    )

    # prepare int-8 model for training
    model = prepare_model_for_int8_training(model)
    model = get_peft_model(model, peft_config)
    model.print_trainable_parameters()
    return model, peft_config

# create peft config
model, lora_config = create_peft_config(model)



trainable params: 4,194,304 || all params: 6,742,609,920 || trainable%: 0.06220594176090199




### Step 5: Define an optional profiler

In [None]:
from transformers import TrainerCallback
from contextlib import nullcontext
enable_profiler = False
output_dir = "tmp/llama-output"

config = {
    'lora_config': lora_config,
    'learning_rate': 1e-4,
    'num_train_epochs': 1,
    'gradient_accumulation_steps': 2,
    'per_device_train_batch_size': 2,
    'gradient_checkpointing': False,
}

# Set up profiler
if enable_profiler:
    wait, warmup, active, repeat = 1, 1, 2, 1
    total_steps = (wait + warmup + active) * (1 + repeat)
    schedule =  torch.profiler.schedule(wait=wait, warmup=warmup, active=active, repeat=repeat)
    profiler = torch.profiler.profile(
        schedule=schedule,
        on_trace_ready=torch.profiler.tensorboard_trace_handler(f"{output_dir}/logs/tensorboard"),
        record_shapes=True,
        profile_memory=True,
        with_stack=True)

    class ProfilerCallback(TrainerCallback):
        def __init__(self, profiler):
            self.profiler = profiler

        def on_step_end(self, *args, **kwargs):
            self.profiler.step()

    profiler_callback = ProfilerCallback(profiler)
else:
    profiler = nullcontext()

### Step 6: Fine tune the model

Here, we fine tune the model for a single epoch which takes a bit more than an hour on a A100.

In [None]:
from transformers import default_data_collator, Trainer, TrainingArguments



# Define training args
training_args = TrainingArguments(
    output_dir=output_dir,
    overwrite_output_dir=True,
    bf16=True,  # Use BF16 if available
    # logging strategies
    logging_dir=f"{output_dir}/logs",
    logging_strategy="steps",
    logging_steps=10,
    save_strategy="no",
    optim="adamw_torch_fused",
    max_steps=total_steps if enable_profiler else -1,
    **{k:v for k,v in config.items() if k != 'lora_config'}
)

with profiler:
    # Create Trainer instance
    trainer = Trainer(
        model=model,
        args=training_args,
        train_dataset=train_dataset,
        data_collator=default_data_collator,
        callbacks=[profiler_callback] if enable_profiler else [],
    )

    # Start training
    trainer.train()

`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...


Step,Training Loss
10,1.9424
20,1.8251
30,1.7823
40,1.7504
50,1.7215
60,1.6919
70,1.6968
80,1.7011
90,1.6792
100,1.6898


### Step 7:
Save model checkpoint

In [None]:
model.save_pretrained(output_dir)

### Step 8:
Try the fine tuned model on the same example again to see the learning progress:

In [None]:
model.eval()
with torch.no_grad():
    print(tokenizer.decode(model.generate(**model_input, max_new_tokens=100)[0], skip_special_tokens=True))



Summarize this dialog:
A: Hi Tom, are you busy tomorrow’s afternoon?
B: I’m pretty sure I am. What’s up?
A: Can you go with me to the animal shelter?.
B: What do you want to do?
A: I want to get a puppy for my son.
B: That will make him so happy.
A: Yeah, we’ve discussed it many times. I think he’s ready now.
B: That’s good. Raising a dog is a tough issue. Like having a baby ;-)
A: I'll get him one of those little dogs.
B: One that won't grow up too big;-)
A: And eat too much;-))
B: Do you know which one he would like?
A: Oh, yes, I took him there last Monday. He showed me one that he really liked.
B: I bet you had to drag him away.
A: He wanted to take it home right away ;-).
B: I wonder what he'll name it.
A: He said he’d name it after his dead hamster – Lemmy  - he's  a great Motorhead fan :-)))
---
Summary:
A wants to get a puppy for his son. He took him to the animal shelter last Monday. He showed him one that he really liked. He wants to name it after his dead hamster - Lemmy.
