## SFT (Supervised Fine Tuning) 을 통해 특정한 Task 성능을 향상시키기

- LLM을 사용하는 것은 쉽지만, 원하는 성능이 나오는 것은 쉽지 않습니다. 따라서 이 경우 Fine-tuning을 진행하게 됩니다.
- 이 때 LLM 모델 전체를 fine-tuning하는 것은 너무 많은 리소스가 필요하기 때문에, 최소한의 리소스로 최대의 성능을 내는 fine-tuning 기법들이 많이 등장하였고, 대표적으로 [LoRA](https://arxiv.org/abs/2106.09685) 와 같은 알고리즘이 있습니다.
- 이를 쉽게 활용할 수 있도록 한 HuggingFace의 [PEFT](https://github.com/huggingface/peft) 라이브러리를 활용하면 쉽고 빠르게 Fine-tuning이 가능합니다.
- 이 예시에서는 HF의 BLOOM 모델을 PEFT로 fine-tuning 하는 [블로그](https://www.philschmid.de/bloom-sagemaker-peft)를 참고하였습니다.
- 코드 참고 : https://github.com/huggingface/notebooks/tree/main/sagemaker/24_train_bloom_peft_lora


### Tested version

Tested on `Python 3.9.15`

```
sagemaker: 2.146.0
transformers: 4.29.2
torch: 1.13.1
accelerate: 0.19.0
datasets: 2.12.0
py7zr: 0.20.5
peft: 0.3.0
bitsandbytes: 0.38.1
```


In [None]:
!pip install -q transformers datasets py7zr

In [None]:
import transformers
import sagemaker
print(transformers.__version__)
print(sagemaker.__version__)

In [None]:
sagemaker_session = sagemaker.Session()
role = sagemaker.get_execution_role()
sm_client = sagemaker_session.sagemaker_client
sm_runtime_client = sagemaker_session.sagemaker_runtime_client

### 데이터셋 다운로드

- 주어진 사람들의 대화에 대해서 요약을 하는 [samsum dataset](https://huggingface.co/datasets/samsum) 을 활용할 것입니다.
- 데이터를 다운로드 받은 후 tokenize 하여 s3에 올려놓도록 합니다.

In [None]:
%store -r

In [None]:
from datasets import load_dataset
dataset = load_dataset("samsum", split="train")

In [None]:
print(f"Training dataset size: {len(dataset)}")

In [None]:
model_download_path

In [None]:
from transformers import AutoModelForCausalLM, AutoTokenizer

In [None]:
model_location = model_download_path
tokenizer = AutoTokenizer.from_pretrained(model_location, padding_side="left")
tokenizer.model_max_length = 2048

In [None]:
tokenizer.eos_token

In [None]:
from random import randint
from itertools import chain
from functools import partial

In [None]:
prompt_template = f"Summarize the chat dialogue:\n{{dialogue}}\n---\nSummary:\n{{summary}}{{eos_token}}"

In [None]:
def template_dataset(sample):
    sample["text"] = prompt_template.format(dialogue=sample["dialogue"],
                                            summary=sample["summary"],
                                            eos_token=tokenizer.eos_token)
    return sample

In [None]:
dataset = dataset.map(template_dataset, remove_columns=list(dataset.features))

In [None]:
print(dataset[randint(0, len(dataset))]["text"])

In [None]:
remainder = {"input_ids": [], "attention_mask": []}

In [None]:
def chunk(sample, chunk_length=2048):
    # define global remainder variable to save remainder from batches to use in next batch
    global remainder
    # Concatenate all texts and add remainder from previous batch
    concatenated_examples = {k: list(chain(*sample[k])) for k in sample.keys()}
    concatenated_examples = {k: remainder[k] + concatenated_examples[k] for k in concatenated_examples.keys()}
    # get total number of tokens for batch
    batch_total_length = len(concatenated_examples[list(sample.keys())[0]])

    # get max number of chunks for batch
    if batch_total_length >= chunk_length:
        batch_chunk_length = (batch_total_length // chunk_length) * chunk_length

    # Split by chunks of max_len.
    result = {
        k: [t[i : i + chunk_length] for i in range(0, batch_chunk_length, chunk_length)]
        for k, t in concatenated_examples.items()
    }
    # add remainder to global variable for next batch
    remainder = {k: concatenated_examples[k][batch_chunk_length:] for k in concatenated_examples.keys()}
    # prepare labels
    result["labels"] = result["input_ids"].copy()
    return result

In [None]:
lm_dataset = dataset.map(
    lambda sample: tokenizer(sample["text"]), batched=True, remove_columns=list(dataset.features)
).map(
    partial(chunk, chunk_length=2048),
    batched=True,
)

In [None]:
print(f"Total number of samples: {len(lm_dataset)}")

In [None]:
training_input_path = f"s3://{sagemaker_session.default_bucket()}/llm/databricks/dolly-v2-7b/dataset/samsum"

In [None]:
lm_dataset.save_to_disk(training_input_path)
print(f"Data uploaded : {training_input_path}")

In [None]:
lm_dataset.save_to_disk("./samsum-data")

### Fine-tuning 을 위한 코드 작성 및 SageMaker managed training

- training data를 업로드 하였으니, training을 위한 코드를 작성해야 합니다.
- 예시 코드를 학습할 때 필요한 패키지들은 아래와 같습니다. 예시 코드는 `sft-src` 디렉토리를 참고해 주세요.
- 자세한 버전은 위의 tested version 부분을 확인해 주세요.
```
- transformers (4.27 이상 - 그 전 버전은 int8 학습 지원안됨)
- peft
- datasets
- bitsandbytes
- accelerate
```
- SageMaker의 기본 HuggingFace DLC에서 이 글 작성 시점에 transformers 버전을 4.26까지만 지원하기 때문에, requirements.txt 에 그 이상 버전이 필요한 경우 버전을 명시해서 설치할 수 있습니다.
- `sft-src` 디렉토리에 있는 코드는 블로그 원본 코드에서 일부 수정이 되었습니다. 예를 들어 기존에는 pretrained model을 HuggingFace Model hub에서 가져오는데 이것은 속도도 더 느리고 안정성이 떨어지기 때문에 미리 올려 둔 S3에 있는 모델을 가져다가 학습하도록 수정을 하였습니다.

### 로컬 디버깅

- SageMaker training job을 실행하기에 앞서 local debugging을 통해 학습 코드가 정상적으로 구성되었는지 체크할 수 있습니다.
- 로컬 디버깅 스크립트는 `sft-src/local_debug.sh` 파일을 참고해 주세요.


In [None]:
import time
from sagemaker.utils import name_from_base
from sagemaker.huggingface import HuggingFace
job_name = name_from_base("dolly-peft-sft-train")
print(job_name)

In [None]:
instance_type = 'ml.p3.2xlarge'
# instance_type = 'ml.g5.4xlarge'
# instance_type = 'ml.g4dn.4xlarge'

hyperparameters ={                               
  'pretrain_model_path': '/opt/ml/input/data/pretrained-model',  # pretrained model from s3 will be located
  'dataset_path': '/opt/ml/input/data/training', # path where sagemaker will save training dataset
  'epochs': 3,                                         # number of training epochs
  'per_device_train_batch_size': 1,                    # batch size for training
  'lr': 2e-4,                                          # learning rate used during training
}

huggingface_estimator = HuggingFace(
    entry_point          = 'run_peft_train.py',      # train script
    source_dir           = 'sft-src',         # directory which includes all the files needed for training
    instance_type        = instance_type, # instances type used for the training job
    instance_count       = 1,                 # the number of instances used for training
    base_job_name        = job_name,          # the name of the training job
    role                 = role,              # Iam role used in training job to access AWS ressources, e.g. S3
    volume_size          = 300,               # the size of the EBS volume in GB
    transformers_version = '4.26',            # the transformers version used in the training job
    pytorch_version      = '1.13',            # the pytorch_version version used in the training job
    py_version           = 'py39',            # the python version used in the training job
    hyperparameters      =  hyperparameters
)

### 학습 시작

- [HuggingFace estimator](https://sagemaker.readthedocs.io/en/stable/frameworks/huggingface/sagemaker.huggingface.html) 를 정의했으므로 학습을 시작할 수 있습니다.
- 이 때, 아래에서 `fit()` 함수를 호출해서 학습을 시작할 때 training data s3 위치 뿐 아니라 pretrained model 의 s3 uri 도 넣어주게 됩니다.
- 아래와 같이 했을 때 `SM_CHANNEL_PRETRAINED-MODEL` : `/opt/ml/input/data/pretrained-model` 환경변수 값이 들어가게 됩니다.

In [None]:
pretrained_uri = model_artifact
data = {'training': training_input_path, 'pretrained-model': pretrained_uri}
print(data)

In [None]:
huggingface_estimator.fit(data, wait=False)


### 배포 및 테스트 진행

- 학습은 `g5.4xlarge`로 대략 6시간 정도 걸립니다.
- 학습 완료 후 아래와 같이 배포해서 테스트 가능합니다.
- training job은 원격에서 진행됨. kernel session이 끊겨도 아래처럼 attach() 해서 가져올 수 있습니다.
- 일반적으로 LLM 배포는 DJL을 사용하는 것이 좋지만, compressed size가 30GB 보다 훨씬 적고 inference code 도 포함되어 있다면 그냥 [HuggingFaceModel](https://sagemaker.readthedocs.io/en/stable/frameworks/huggingface/sagemaker.huggingface.html#hugging-face-model) 로 배포해도 괜찮습니다.

In [None]:
from sagemaker.estimator import Estimator
training_job_name = "dolly-peft-train-2023-04-24-10-16-54-73-2023-04-24-10-17-14-290"
estimator = Estimator.attach(training_job_name)

In [None]:
print(estimator.model_data)

### 학습된 모델 배포하기

학습 후 배포에 필요한 model.tar.gz 의 구조의 예시는 다음과 같습니다.
- code가 포함되어 있는데 이렇게 하는 것이 시간 상 더 효율적입니다. inference 용도 스크립트를 따로 명시하면 `기존 모델 decompress -> 코드 추가하여 compress -> s3 upload` 과정을 다시 진행하기 때문입니다.
- 스크립트는 `code 디렉토리` 내에 들어가고 상위 폴더에 model 들이 있는 형태이지만, model은 특정 디렉토리에 모아놓고 inference code에서 해당 디렉토리에 있는 모델을 사용하도록 변경해주는 형태로 사용해도 상관 없습니다.
```
- code
  - inference.py
  - requirements.txt
- config.json
- tokenizer.json
- tokenizer_config.json
- pytorch_model_xxx.bin
- special_tokens_map.json
```

In [None]:
from sagemaker.huggingface import HuggingFaceModel

# create Hugging Face Model Class
huggingface_model = HuggingFaceModel(
   model_data=estimator.model_data,
   #model_data="s3://hf-sagemaker-inference/model.tar.gz",  # model path 직접 주어도 됩니다.
   role=role, 
   transformers_version="4.26", 
   pytorch_version="1.13", 
   py_version="py39",
   model_server_workers=1
)


In [None]:
# deploy model to SageMaker Inference
predictor = huggingface_model.deploy(
   initial_instance_count=1,
   instance_type= "ml.g5.4xlarge"
)

### 배포된 모델 테스트

- `samsum` dataset의 test set을 받아서 테스트 진행해 볼 수 있습니다.

In [None]:
from random import randint
from datasets import load_dataset

# Load dataset from the hub
test_dataset = load_dataset("samsum", split="test")

In [None]:

# select a random test sample
sample = test_dataset[randint(0,len(test_dataset))]

# format sample
prompt_template = f"Summarize the chat dialogue:\n{{dialogue}}\n---\nSummary:\n"

fomatted_sample = {
  "inputs": prompt_template.format(dialogue=sample["dialogue"]),
  "parameters": {
    "do_sample": True,
    "top_p": 0.9,
    "temperature": 0.1,
    "max_new_tokens": 100,
  }
}

print(fomatted_sample["inputs"])

In [None]:
%%time
# predict
res = predictor.predict(fomatted_sample)

output = res[0]["generated_text"].split("Summary:")[-1]

print(output)

## 결과 비교
- SFT를 안했을 때와 비교해 보면 대화 요약에 대해서 성능이 향상되었음을 알 수 있습니다.


## 질문1
```
Summarize the chat dialogue:
John: hey laurel?
Laurel: hey 
John: whats your plan for tomorrow?
Laurel: aint that sure yet, why?
John: nothing much, just wanted to go with you and buy a birthday gift for Diana.
Laurel: OMG! i also totally forgot that her birthday is on saturday, shit!
John: you see im not the only late one here. haha
Laurel: I guess we can meet up tomorrow and go fetch something for her.
John: cool, at what time?
Laurel: lets just meet at jades at around 5 pm
John: At Jade's collection? in town?
Laurel: yeah, that place..
John: see you then.
---
Summary:
```

### base답변1
```
Laurel: hey 
John: whats your plan for tomorrow?
Laurel: aint that sure yet, why?
John: nothing much, just wanted to go with you and buy a birthday gift for Diana.
Laurel: OMG! i also totally forgot that her birthday is on saturday, shit!
John: you see im not the only late one here. haha
Laurel: I guess we can meet up tomorrow and go fetch something for her.
John: cool, at what time?
Laurel: lets just meet at jades at around 5 pm
John: At Jade's collection? in town?
Laurel: yeah, that place..
John: see you then.
```

### sft답변1
```
John and Laurel are going to meet at Jade's collection at 5 pm to buy a birthday gift for Diana.
```

## 질문2
```
Summarize the chat dialogue:
Avery: You went to Ethan's house?
David: yeah I had to babysit
Avery: Aww, how do you babysit, just curious
David: I had to go through a lot :/
Avery: Was his sister naughty
David: Tooo much
Avery: Lol
David: I will just refuse net time :/
Avery: As you wish
David: :/
Avery: I just got his text 
David: What is he saying
Avery: He is asking me to say thanks to you
David: yeah whatever<3
Avery: He was saying that your phone was switched off
David: Yeah i have just turned it on
Avery: I have told him about that
David: k
Avery: Gotta go now
---
Summary:
```

### base답변2
```
- I went to Ethan's house
- His sister was naughty
- I had to babysit her
- She was too much
- I had to turn off his phone
- I just turned it on
```

### sft답변2
```
David babysits Ethan's sister. Ethan asked David to thank him for the babysitting. David turned his phone on.
```


## 질문3
```
Summarize the chat dialogue:
Paul: Do you have your keys?
Lena: Yup. What's up?
Paul: I will be home later than I thought, so just don't wait for me and get home.
Lena: What happened?
Paul: I'll call you in 15 minutes and then I'll tell you.
Lena: Ok.
---
Summary:
```

### base답변3
```
Lena: Yup. What's up?
Paul: I will be home later than I thought, so just don't wait for me and get home.
Lena: What happened?
Paul: I'll call you in 15 minutes and then I'll tell you.
Lena: Ok.
---
Paul: I will be home later than I thought, so just don't wait for me and get home.
Lena: What happened?
Paul: I'll call you in 15 minutes and then I'll tell you.
Lena: Ok.
---
Paul: I'll tell you in 15 minutes.
Lena: Ok.
Paul: Bye!
```

### sft답변3
```
Paul will be home later than he thought. Lena should not wait for him. Paul will call her in 15 minutes.
```
