## RLHF (Reinforcement Learning with Human Feedback)를 활용하여 원하는 방향으로 성능 향상시키기

- RLHF는 human feedback 데이터를 모아서, 여기서 reward model 을 학습시킨 후 이 reward model 을 사용하여 [PPO](https://huggingface.co/learn/deep-rl-course/unit8/introduction?fw=pt] 알고리즘으로 RL 하여 모델 성능을 향상시키는 방법입니다. human feedback 데이터가 원하는 방향으로 모델이 동작하도록 만드는 방식입니다.
- 이 예시에서는 TRL/PEFT 기반으로 소량의 GPU 리소스로 학습하는 [블로그](https://huggingface.co/blog/trl-peft)를 참고하였습니다.
- 해당 글에서는 reward model을 따로 학습하지는 않고, 그냥 별도의 imbd dataset에 BERT 기반으로 sentimental classification하는 [모델](https://huggingface.co/lvwerra/distilbert-imdb)을 가져다가 활용하였습니다. 
- 코드 참고 : https://github.com/lvwerra/trl/tree/main/examples/sentiment/scripts/gpt-neox-20b_peft
- 앞의 예시에서 SFT를 위해 학습을 진행하는 것과 거의 유사합니다. 실제 학습을 위한 스크립트만 변경되고 SageMaker 사용법은 거의 동일합니다.

### Tested version

Tested on `Python 3.9.15`

```
sagemaker: 2.146.0
transformers: 4.29.2
torch: 1.13.1
accelerate: 0.19.0
datasets: 2.12.0
py7zr: 0.20.5
peft: 0.3.0
bitsandbytes: 0.38.1
trl: 0.4.1
```


In [None]:
!pip install -q transformers datasets py7zr trl

In [None]:
# peft 0.3 부터 model merge 함수가 있기 때문에, model merge를 쉽게 하기위해 0.3 버전을 활용합니다.
!pip install -q peft==0.3.0

In [None]:
import transformers
import sagemaker
import peft
print(transformers.__version__)
print(sagemaker.__version__)
print(peft.__version__)

In [None]:
sagemaker_session = sagemaker.Session()
role = sagemaker.get_execution_role()
sm_client = sagemaker_session.sagemaker_client
sm_runtime_client = sagemaker_session.sagemaker_runtime_client

### 데이터셋 준비

- 앞의 예시와 동일하게 IMDB dataset을 준비하고 이를 s3에 업로드 해놓도록 합니다.

In [None]:
%store -r

In [None]:
from datasets import load_dataset
from transformers import AutoTokenizer
from trl.core import LengthSampler

In [None]:
# Below is an example function to build the dataset. In our case, we use the IMDB dataset
# from the `datasets` library. One should customize this function to train the model on
# its own dataset.
def build_dataset(model_path, dataset_name="imdb", input_min_text_length=2, input_max_text_length=8):
    """
    Build dataset for training. This builds the dataset from `load_dataset`, one should
    customize this function to train the model on its own dataset.

    Args:
        dataset_name (`str`):
            The name of the dataset to be loaded.

    Returns:
        dataloader (`torch.utils.data.DataLoader`):
            The dataloader for the dataset.
    """
    tokenizer = AutoTokenizer.from_pretrained(model_path)
    tokenizer.pad_token = tokenizer.eos_token
    
    # load imdb with datasets
    ds = load_dataset(dataset_name, split="train")
    ds = ds.rename_columns({"text": "review"})
    ds = ds.filter(lambda x: len(x["review"]) > 200, batched=False)

    input_size = LengthSampler(input_min_text_length, input_max_text_length)

    def tokenize(sample):
        sample["input_ids"] = tokenizer.encode(sample["review"])[: input_size()]
        sample["query"] = tokenizer.decode(sample["input_ids"])
        return sample

    ds = ds.map(tokenize, batched=False)
    ds.set_format(type="torch")
    return ds

In [None]:
model_location = model_download_path

In [None]:
dataset = build_dataset(model_location)

In [None]:
from random import randint
print(dataset[randint(0, len(dataset))])

In [None]:
training_input_path = f"s3://{sagemaker_session.default_bucket()}/llm/databricks/dolly-v2-7b/dataset/imdb"
dataset.save_to_disk(training_input_path)
print(f"Data uploaded : {training_input_path}")

In [None]:
dataset.save_to_disk("./imdb-data")

### 학습 진행

- 학습 방법은 앞의 fine-tuning 예시와 거의 동일합니다.
- `rlhf-src` 디렉토리를 참고해 주세요.

### 로컬 디버깅 방법

- 학습 데이터의 경우 S3에 곧바로 데이터를 업로드 했지만, local storage에도 저장해 놓고 디버깅에 활용할 수 있습니다.
- 실제 개발환경에서는 곧바로 SageMaker training job을 던지기보다는 local mode로 충분히 테스트 후 job을 던지는 형태로 진행하게 됩니다.
- 따라서, 아래 예시 처럼 먼저 local debugging을 해 보고 training을 하는 것이 좋습니다.
```
python run_rlhf_train.py --dataset_path [로컬_데이터셋_경로] --model_name {model_artifact}
```
- 로컬 디버깅 스크립트는 `rlhf-src/local_debug.sh` 를 참고해 주세요.


In [None]:
import time
from sagemaker.utils import name_from_base
from sagemaker.huggingface import HuggingFace
job_name = name_from_base("dolly-rlhf-train")
print(job_name)

In [None]:
instance_type = 'ml.g5.4xlarge'
# instance_type = 'ml.g4dn.4xlarge'

hyperparameters ={                         
  'model_name': '/opt/ml/input/data/pretrained-model',
  'dataset_path': '/opt/ml/input/data/training', # path where sagemaker will save training dataset
  # 'mini_batch_size': 16,
  # 'batch_size': 256,
}

huggingface_estimator = HuggingFace(
    entry_point          = 'run_rlhf_train.py',      # train script
    source_dir           = 'rlhf-src',         # directory which includes all the files needed for training
    instance_type        = instance_type, # instances type used for the training job
    instance_count       = 1,                 # the number of instances used for training
    base_job_name        = job_name,          # the name of the training job
    role                 = role,              # Iam role used in training job to access AWS ressources, e.g. S3
    volume_size          = 300,               # the size of the EBS volume in GB
    transformers_version = '4.26',            # the transformers version used in the training job
    pytorch_version      = '1.13',            # the pytorch_version version used in the training job
    py_version           = 'py39',            # the python version used in the training job
    hyperparameters      =  hyperparameters
)

In [None]:
pretrained_uri = "s3://sagemaker-us-west-2-723597067299/llm/databricks/dolly-v2-7b/model/"
# pretrained_uri = model_artifact
data = {'training': training_input_path, 'pretrained-model': pretrained_uri}
print(data)

In [None]:
huggingface_estimator.fit(data, wait=False)


### 배포 및 테스트 진행

- 학습은 `g5.4xlarge`로 대략 10시간 정도 걸립니다.
- 학습 완료 후 아래와 같이 배포해서 테스트 가능합니다.
- 이전 예시처럼, kernel session이 끊겨도 아래처럼 attach() 해서 가져올 수 있습니다.

In [None]:
from sagemaker.estimator import Estimator
training_job_name = "dolly-rlhf-train-2023-04-25-15-41-52-29-2023-04-25-15-42-00-690"
estimator = Estimator.attach(training_job_name)

In [None]:
print(estimator.model_data)

### 모델 병합

- 학습이 정상적으로 완료되면 low rank adapter 부분이 s3 에 저장됩니다. 앞의 예시에서는 학습이 완료되면 모델을 병합하여 올리도록 했지만, 이번 예시는 adapter 부분만 업로드 합니다.
- 해당 adapter는 원본 모델에 merge해서 사용할 수 있습니다 : [참고]( https://github.com/lvwerra/trl/blob/main/examples/sentiment/scripts/gpt-neox-20b_peft/merge_peft_adapter.py)
- peft 0.3 버전에서는 `merge_and_unload()` 함수를 사용할 수 있습니다: [참고](https://github.com/huggingface/peft/blob/main/src/peft/tuners/lora.py#L299)
- 따라서 여기서는 git을 통해 최신 릴리즈의 lora 설치 후 해당 함수를 활용해 병합하는 예시를 보여줍니다.
- 별도로 병합하는 것이 번거로울 수 있어서 학습 코드 자체에 학습이 완료된 후 base model 과 adapter를 병합해서 s3에 저장하도록 해도 됩니다.




In [None]:
from transformers import AutoModelForCausalLM, AutoTokenizer
from peft import PeftConfig, PeftModel
import torch

In [None]:
finetune_path = "./dolly-imdb-finetune"
adapter_path = f"{finetune_path}/adapter"
print(adapter_path)

In [None]:
!mkdir -p {adapter_path}
!aws s3 cp {estimator.model_data} {adapter_path}
!cd {adapter_path}; tar zxvf model.tar.gz

In [None]:
peft_config = PeftConfig.from_pretrained(adapter_path)
model = AutoModelForCausalLM.from_pretrained(
    model_location,
    return_dict=True,
    torch_dtype=torch.float16,
    low_cpu_mem_usage=True,
)
model = PeftModel.from_pretrained(model, adapter_path)
model.eval()

In [None]:
merged_model = model.merge_and_unload()
merged_model.save_pretrained(finetune_path)

In [None]:
tokenizer = AutoTokenizer.from_pretrained(model_location)
tokenizer.save_pretrained(finetune_path)


### 병합된 모델 테스트

- merge된 모델은 local mode로 로딩하거나, endpoint로 배포해서 사용할 수 있습니다.
- 최종 merge된 모델을 로딩해서 질문을 하면 아래 예시와 같이 거의 항상 positive한 답변을 하게 되는 것을 확인할 수 있습니다.

## 질문1
Explain to me how to use aws serverless services

### base답변1
You can use serverless framework to create functions in various languages like node.js, java, c#, go, python etc. These functions are hosted on S3 objects or anything that is exposed by serverless framework in the form of acloudfunction. The way you deploy these functions is by registering a region and a package where the functions and dependencies are stored. A separate deployment package is generated for each function. You can refer this link for more details on serverless framework and serverless functions.

### rlhf답변1
serverless is a very powerful and flexible platform. The best way to get started is to go through their comprehensive documentation. A good starting point is the Playground which allows you to quickly get started with different services.
Enjoy exploring!


----



## 질문2
How do you feel after watching the avartar 2?

### base답변2
I felt like watching a action packed movie. The action sequences were good, and there's lots of gun fights, car chases, and fight rigging sequences. The lead actors did a great job. Also, the music and cinematography were top notch.

### rlhf답변2
The movie is amazing, very well made and really brings life to the story. The cast is amazing as always. Overall highly recommended!

----

## 질문3
I heard that the disney movie is very bad recently. What do you think?

### base답변2
The movie has so many characters. Different styles of animation andDirection. I think that is what make the movie look so wild. I also like the love story that includes hero and heroine in high school and the make out in the end, very love story sweet. Also the touching moment between children. Overall, I think it is a good movie to watch with family. If you are a child, you can watch it yourself.

### rlhf답변3
I totally disagree with you. I loved the disney movie. It is so good! The story, the characters, the music, the animation, the effects are all so perfect. I love it. I also love the new disney movie. I love the old, too, but this one is the best! It is so good! I love it!

----

## 질문4
I hate sam, what do you think about him?

### base답변4
I hate sam is a subjective opinion, therefore I can not express hate for him. There are many people that love him, and hate him. I do not like him because he is a bully, he has an ego, and he lies, he is not honest. He is a con artist.

### rlhf답변4
I hate sam, what do you think about him?
I totally agree with you. He is a rude, selfish and annoying person, but he has such a great heart. He always try to help others and be kind to everyone. I love his attitude towards life. I admire him.





