## 載入沒有經過Instruction tunning的模型
* 這個kernel是用來展示如何載入沒有經過Instruction tunning的模型，並且進行文言文白話文之間的翻譯

### 如果在Colab執行先安裝套件

In [None]:
# ! pip install transformers datasets torch bitsandbytes peft accelerate nvidia-ml-py3 wandb trl flash-attn

In [1]:
from transformers import pipeline

In [None]:
pipe = pipeline("text-generation", model="zake7749/gemma-2-2b-it-chinese-kyara-dpo", device_map="auto")
pipe("將下面翻譯成文言文：今天天氣很好，海邊的風景也很美。")

In [None]:
pipe2 = pipeline("text-generation", model="huchiahsi/merged_model", device_map="auto")
pipe2("今天天氣很好，海邊的風景也很美，上面這句話翻譯成文言文是：")

## 沒有使用量化載入Adapter的方法
* 引入套件，包括正常的transformers套件，以及adapter的套件
* 指定要載入模型的名稱，都在huggingface上了，包括基礎模型和Adapter
* 載入模型，這裡使用的是以gemma-2-it訓練的繁體中文模型，比較適合繁體中文
* 載入之前訓練的Adapter，這裡使用的是文言文到白話文的Adapter，已經上傳到huggingface的hub上
* 指定`tokenizer`使用的模型
* 最後將模型放入`device`中，這裡使用的是GPU

In [None]:
import torch
from transformers import AutoTokenizer, AutoModelForCausalLM
from peft import PeftModel

base_model = "zake7749/gemma-2-2b-it-chinese-kyara-dpo"
peft_model = "huchiahsi/peft-model-repo" 
model = AutoModelForCausalLM.from_pretrained(base_model,
                                                device_map="auto")
model = PeftModel.from_pretrained(model, peft_model)
tokenizer = AutoTokenizer.from_pretrained(base_model)
model = model.to("cuda")

In [None]:
from transformers import AutoModelForCausalLM, AutoTokenizer
from peft import PeftModel

base_model = "mistralai/Mistral-7B-v0.1"
adapter_model = "dfurman/Mistral-7B-Instruct-v0.2"

model = AutoModelForCausalLM.from_pretrained(base_model, 
                                             device_map="auto",                                             
                                             )
model = PeftModel.from_pretrained(model, adapter_model)
tokenizer = AutoTokenizer.from_pretrained(base_model)

model.eval()

## 使用量化之後，載入Adapter的方式
* 引入套件，這次要包括量化用的套件BitsAndBytes的Config
* 設定量化的參數，和訓練時要一模一樣
    * `load_in_4bit=True`在載入模型時將模型量化為 4 位元
    * `bnb_4bit_use_double_quant=True`使用嵌套量化方案來量化已經量化的權重
    * `bnb_4bit_quant_type="nf4"`為對從常態分佈初始化的權重使用特殊的 4 位元資料類型
    * `bnb_4bit_compute_dtype=torch.bfloat16`使用 bfloat16 來加快計算速度
* 此時載入基礎模型時需要設定`config`
* 後面一樣將Adapter載入，並且指定`tokenizer`和`device`

In [None]:
import torch
from transformers import AutoTokenizer, AutoModelForCausalLM, BitsAndBytesConfig
from peft import PeftModel

bnb_config = BitsAndBytesConfig(
    load_in_4bit=True,
    bnb_4bit_use_double_quant=True,
    bnb_4bit_quant_type="nf4",
    bnb_4bit_compute_dtype=torch.bfloat16
)

base_model = "zake7749/gemma-2-2b-it-chinese-kyara-dpo"
peft_model = "huchiahsi/peft-model-repo" 
model = AutoModelForCausalLM.from_pretrained(base_model,
                                                config=bnb_config,
                                                device_map="auto")
model = PeftModel.from_pretrained(model, peft_model)
tokenizer = AutoTokenizer.from_pretrained(base_model)
model = model.to("cuda")

In [None]:
from transformers import AutoModelForCausalLM, AutoTokenizer, BitsAndBytesConfig
from peft import PeftModel, PeftConfig
import torch

bnb_config = BitsAndBytesConfig(
    load_in_4bit=True,
    bnb_4bit_use_double_quant=True,
    bnb_4bit_quant_type="nf4",
    bnb_4bit_compute_dtype=torch.bfloat16
)

base_model = "mistralai/Mistral-7B-v0.1"
adapter_model = "dfurman/Mistral-7B-Instruct-v0.2"

model = AutoModelForCausalLM.from_pretrained(base_model, config=bnb_config, device_map="auto")
model = PeftModel.from_pretrained(model, adapter_model)
tokenizer = AutoTokenizer.from_pretrained(base_model)

model.eval()

### 以下為截圖
![alt text](image.png)

In [None]:
import torch
inputs = tokenizer("台北市很漂亮，文言文怎麼說？", return_tensors="pt")

with torch.no_grad():
    outputs = model.generate(input_ids=inputs["input_ids"].to("cuda"), max_new_tokens=222)
    print(tokenizer.batch_decode(outputs.detach().cpu().numpy(), skip_special_tokens=True)[0])