- 240805(월) 중앙대학교 군 장병 AISW 역량강화: 고급자연어처리 실습 자료입니다.
- 본 내용은 IIPL (Intelligent Information Processing Lab) 소속 석사과정 김영화 조교가 작성하였습니다.

---
## 06
- Floating Points
- Mixed Precision
- LoRA

[ref](https://jiogenes.github.io/%ED%81%B4%EB%9D%BC%EC%9A%B0%EB%93%9C/2024/01/10/runpod-2.html#h-llm-%ED%8C%8C%EC%9D%B8%ED%8A%9C%EB%8B%9D)

---
## Floating Points

In [10]:
binary_fp32 = "0 10000000 10010010000111111011011"

### 강의 자료를 기반으로 2진수 -> 10진수 변환

In [11]:
# 공백 제거
binary_string = binary_fp32.replace(" ", "")

# 부호, 지수, 가수 분리
sign = int(binary_string[0])
exponent = int(binary_string[1:9], 2) - 127

# 가수 계산
mantissa = 1.0
for i, bit in enumerate(binary_string[9:], 1):
    if bit == '1':
        mantissa += 2**(-i)

# IEEE 754 표준에 따른 수식 적용
result = (-1)**sign * 2**exponent * mantissa

# 결과 출력
print(result)

3.1415927410125732


### struct 라이브러리를 활용한 변환

In [14]:
import struct

# 공백 제거
binary_string = binary_fp32.replace(" ", "")

# 2진수 문자열을 정수로 변환
integer = int(binary_string, 2)

# 정수를 바이트로 변환
packed = struct.pack('!I', integer)

# 바이트를 fp32로 해석하여 10진수로 변환
result = struct.unpack('!f', packed)[0]

print(result)

3.1415927410125732


## Mixed Precision 예시
- fp16=True

```
# 모델을 불러올 때
model = AutoModel.from_pretrained("model_name", torch_dtype=torch.float16)

# Trainer를 사용할 때
training_args = TrainingArguments(fp16=True, **default_args)

trainer = Trainer(model=model, args=training_args, train_dataset=ds)
result = trainer.train()
print_summary(result)
```

## LoRA

### 라이브러리 설치

In [1]:
!pip install trl transformers accelerate peft datasets bitsandbytes

Collecting trl
  Downloading trl-0.9.6-py3-none-any.whl.metadata (12 kB)
Collecting peft
  Downloading peft-0.12.0-py3-none-any.whl.metadata (13 kB)
Collecting datasets
  Downloading datasets-2.20.0-py3-none-any.whl.metadata (19 kB)
Collecting bitsandbytes
  Downloading bitsandbytes-0.43.3-py3-none-manylinux_2_24_x86_64.whl.metadata (3.5 kB)
Collecting tyro>=0.5.11 (from trl)
  Downloading tyro-0.8.5-py3-none-any.whl.metadata (8.2 kB)
Collecting pyarrow>=15.0.0 (from datasets)
  Downloading pyarrow-17.0.0-cp310-cp310-manylinux_2_28_x86_64.whl.metadata (3.3 kB)
Collecting dill<0.3.9,>=0.3.0 (from datasets)
  Downloading dill-0.3.8-py3-none-any.whl.metadata (10 kB)
Collecting requests (from transformers)
  Downloading requests-2.32.3-py3-none-any.whl.metadata (4.6 kB)
Collecting xxhash (from datasets)
  Downloading xxhash-3.4.1-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.metadata (12 kB)
Collecting multiprocess (from datasets)
  Downloading multiprocess-0.70.16-py310-none-

### 라이브러리 임포트

In [2]:
from datasets import load_dataset

import torch
from transformers import AutoModelForCausalLM, AutoTokenizer, BitsAndBytesConfig, TrainingArguments
from peft import LoraConfig, AutoPeftModelForCausalLM, prepare_model_for_kbit_training, get_peft_model
from trl import SFTTrainer, DataCollatorForCompletionOnlyLM

# from huggingface_hub import notebook_login

### 모델 및 데이터셋 설정
- "meta-llama/Llama-2-7b-hf" Acess 받기
  - https://huggingface.co/meta-llama/Llama-2-7b
  - https://huggingface.co/TinyLlama/TinyLlama_v1.1

In [3]:
model_name = 'TinyLlama/TinyLlama_v1.1'#'meta-llama/Llama-2-7b-hf'
data_name = 'heegyu/open-korean-instructions'
fine_tuning_model_name = f'{model_name}-finetuned-open-korean-instructions'
device_map = 'auto'

### LoRA의 하이퍼파라미터 설정
- alpha: 16, 스케일링
- r: 64, 입력 임베딩 사이즈 64 rank까지 압축
- bnb_4bit_use_double_quant: 중복 양자화 설정
- bnb_4bit_quant_type: 정밀도 데이터 타입
- bnb_4bit_compute_dtype: 역양자화 시 데이터 타입

In [4]:
peft_config = LoraConfig(
    lora_alpha=16,
    lora_dropout=0.1,
    r=64,
    bias='none',
    task_type='CAUSAL_LM',
    target_modules=["q_proj", "o_proj", "k_proj", "v_proj", "gate_proj", "up_proj", "down_proj"],
)

In [5]:
bnb_config = BitsAndBytesConfig(
    load_in_4bit=True,
    bnb_4bit_use_double_quant=True,
    bnb_4bit_quant_type='nf4',
    bnb_4bit_compute_dtype='float16',
)

### huggingface 설정
- 로그인
  - https://huggingface.co/settings/tokens

In [6]:
# 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
notebook_login()

VBox(children=(HTML(value='<center> <img\nsrc=https://huggingface.co/front/assets/huggingface_logo-noborder.sv…

### 데이터셋 로드 및 확인

In [28]:
dataset = load_dataset(data_name, split='train[:1000]')
print(dataset[0]['text'])

<usr> 유언장이 있는 것이 좋다는 말을 들었습니다. 유언장이란 무엇입니까?
<bot> 유언장은 귀하가 사망한 후 귀하의 재산이 어떻게 분배되어야 하는지를 지정하는 법적 문서입니다. 또한 귀하가 가질 수 있는 자녀나 기타 부양가족을 누가 돌봐야 하는지 명시할 수 있습니다. 유언장에 적용되는 법률이 주마다 다르기 때문에 귀하의 유언장이 유효하고 최신인지 확인하는 것이 중요합니다.


In [29]:
dataset

Dataset({
    features: ['source', 'text'],
    num_rows: 100
})

### 모델 로드

In [13]:
base_model = AutoModelForCausalLM.from_pretrained(model_name,
                                             quantization_config=bnb_config,
                                             use_cache=False,
                                             device_map=device_map)
base_model.config.pretraining_tp = 1
base_model.gradient_checkpointing_enable()
base_model = prepare_model_for_kbit_training(base_model)
peft_model = get_peft_model(base_model, peft_config)

config.json:   0%|          | 0.00/560 [00:00<?, ?B/s]

pytorch_model.bin:   0%|          | 0.00/4.40G [00:00<?, ?B/s]

generation_config.json:   0%|          | 0.00/129 [00:00<?, ?B/s]

In [14]:
tokenizer = AutoTokenizer.from_pretrained(model_name, trust_remote_code=True)
tokenizer.pad_token = tokenizer.eos_token
tokenizer.padding_side = 'right'

tokenizer_config.json:   0%|          | 0.00/776 [00:00<?, ?B/s]

tokenizer.model:   0%|          | 0.00/500k [00:00<?, ?B/s]

tokenizer.json:   0%|          | 0.00/1.84M [00:00<?, ?B/s]

special_tokens_map.json:   0%|          | 0.00/414 [00:00<?, ?B/s]

### 학습 하이퍼파라미터 설정

In [31]:
training_args = TrainingArguments(
    output_dir=fine_tuning_model_name,
    num_train_epochs=1,
    per_device_train_batch_size=4,
    gradient_accumulation_steps=2,
    gradient_checkpointing=True,
    optim='paged_adamw_32bit',
    logging_steps=5,
    save_strategy='epoch',
    learning_rate=2e-4,
    weight_decay=0.001,
    max_grad_norm=0.3,
    warmup_ratio=0.03,
    group_by_length=False,
    lr_scheduler_type='cosine',
    # disable_tqdm=True,
    seed=42
)

### 학습

In [32]:
trainer = SFTTrainer(
    model=peft_model,
    train_dataset=dataset,
    dataset_text_field='text',
    max_seq_length=min(tokenizer.model_max_length, 2048),
    tokenizer=tokenizer,
    packing=True,
    args=training_args
)


Deprecated positional argument(s) used in SFTTrainer, please use the SFTConfig to set these arguments instead.


Generating train split: 0 examples [00:00, ? examples/s]

In [33]:
trainer.train()



Step,Training Loss


TrainOutput(global_step=2, training_loss=1.4305939674377441, metrics={'train_runtime': 81.3471, 'train_samples_per_second': 0.172, 'train_steps_per_second': 0.025, 'total_flos': 186650437091328.0, 'train_loss': 1.4305939674377441, 'epoch': 1.0})

### 모델 저장

In [34]:
trainer.save_model()

In [35]:
trained_model = AutoPeftModelForCausalLM.from_pretrained(
    training_args.output_dir,
    low_cpu_mem_usage=True,
    return_dict=True,
    torch_dtype=torch.float16,
    device_map=device_map
)

lora_merged_model = trained_model.merge_and_unload()
lora_merged_model.save_pretrained('merged', safe_serialization=True)
tokenizer.save_pretrained('merged')

('merged/tokenizer_config.json',
 'merged/special_tokens_map.json',
 'merged/tokenizer.model',
 'merged/added_tokens.json',
 'merged/tokenizer.json')

### 모델 추론

In [None]:
prompt = '<usr> 누가 "공산당 선언" 이라는 책을 썼습니까?n<bot>'
input_ids = tokenizer(prompt, return_tensors='pt', truncation=True).input_ids.cuda()

print(f"-------------------------\n")
print(f"Prompt:\n{prompt}\n")
print(f"-------------------------\n")

print(f"Base Model Response :\n")
output_base = base_model.generate(input_ids=input_ids, max_new_tokens=500, do_sample=True, top_p=0.9,temperature=0.5)
print(f"{tokenizer.batch_decode(output_base.detach().cpu().numpy(), skip_special_tokens=True)[0][len(prompt):]}")
print(f"-------------------------\n")

print(f"Trained Model Response :\n")
trained_model = lora_meged_model.generate(input_ids=input_ids, max_new_tokens=500, do_sample=True, top_p=0.9,temperature=0.5)
print(f"{tokenizer.batch_decode(trained_model.detach().cpu().numpy(), skip_special_tokens=True)[0][len(prompt):]}")
print(f"-------------------------\n")

print(f"LORA Model Response :\n")
output_trained_lora = lora_merged_model.generate(input_ids=input_ids, max_new_tokens=500, do_sample=True, top_p=0.9,temperature=0.5)
print(f"{tokenizer.batch_decode(output_trained_lora.detach().cpu().numpy(), skip_special_tokens=True)[0][len(prompt):]}")
print(f"-------------------------\n")