# Supervised finetuning with Huggingface 🤗 `transformers`

[![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/martin-ku-hku/ai-sharing/blob/main/beyond-chatgpt-how-poor-people-recreate-chatgpt/supervised-finetuning.ipynb)

Make sure that you use GPU in the Colab environment!

## Install and importing the libraries

First, we install the necessary libraries.

In [None]:
!pip install transformers datasets accelerate peft trl sentencepiece bitsandbytes einops gradio

Then, import the modules that we need.

In [None]:
import torch
import transformers

from datasets import load_dataset

from peft import (
    LoraConfig,
    PeftConfig,
    PeftModel,
    get_peft_model
)

from transformers import (
    AutoConfig,
    AutoModelForCausalLM,
    AutoTokenizer,
    TrainerCallback,
    GenerationConfig
)

from trl import (
    SFTTrainer,
    DataCollatorForCompletionOnlyLM
)

device="cuda:0"

## Load the base model and tokenizer

We will use a small language model `bloom-560m` as the base model for our experiment. To load the model and the corresponding tokenizer, we simply use `AutoModelForCausalLM` and `AutoTokenizer` from the `transformers` library.

In [None]:
model = AutoModelForCausalLM.from_pretrained("bigscience/bloom-560m", device_map=device)
model.config.use_cache = False #The use_cache=True option is incompatible with gradient checkpointing. Disable it for training.
model.gradient_checkpointing_enable()

tokenizer = AutoTokenizer.from_pretrained("bigscience/bloom-560m")
tokenizer.pad_token = tokenizer.eos_token
tokenizer.padding_side = 'right'

Let's have a look of the architecture of the model.

In [None]:
print(model)

## Test the original base model

Let see how the original base model perform before we train the LoRA model. First, we define two prompt templates that we will use to control different prompts.

In [None]:
prompt_template_no_input = """Below is an instruction that describes a task. Write a response that appropriately completes the request.
### Instruction: {instruction}
### Response:"""

prompt_template_with_input = """Below is an instruction that describes a task, paired with an input that provides further context. Write a response that appropriately completes the request.
### Instruction: {instruction}
### Input: {input}
### Response:"""

With the prompt templates, we can construct our prompt by calling the `format_map` method:

In [None]:
test_data = {
    'instruction': 'Describe the structure of an atom.', 
    'output': 'An atom consists of a nucleus, which contains protons and neutrons. The nucleus is surrounded by electrons.'}
test_prompt = prompt_template_no_input.format_map(test_data)
print(test_prompt)

We also set the temperature (how creative the LLM can be during text generation) and the maximum number of tokens generated.

In [None]:
gen_config = GenerationConfig(
    temperature=0.0, # set temperature to 0.0 to reproduce greedy decoding,
    max_length=100,
)

Let's generate some text with the original base model!

In [None]:
def generate_text(model, tokenizer, gen_config, prompt):
    tokenized = tokenizer(prompt, return_tensors="pt").to(device)
    outputs = model.generate(
        input_ids=tokenized.input_ids,
        attention_mask=tokenized.attention_mask,
        generation_config=gen_config
    )

    input_len = len(tokenized.input_ids[0])
    gen_text = tokenizer.decode(outputs[0][input_len:], skip_special_tokens=True)
    return gen_text

In [None]:
print(generate_text(model, tokenizer, gen_config, test_prompt))

The base model is clearly not doing well! That's finetune the model with the Alpaca dataset.

## Load the Alpaca dataset from the Huggingface 🤗

We will use the Alpaca dataset created by a Stanford research team. The dataset is available on Huggingface 🤗 ([link to the dataset page](https://huggingface.co/datasets/tatsu-lab/alpaca)), which means we can load the dataset directly with the `datasets` module.

In [None]:
dataset = load_dataset("tatsu-lab/alpaca", split="train")

Let's have a look of the dataset and its data.

In [None]:
print(dataset)
print(dataset[0])

As we can see, each sample in the dataset has 4 fields:
* `instruction`: the instruction given to the LLM
* `input`: additional input for completing the instructed task, which can be empty
* `output`: the ideal response
* `text`: A string that combines a general instruction (`Below is an instruction that describes a task. Write a response that appropriately completes the request.`), the instruction, the optional input and the ideal response

We can actually use the `text` field directly to finetune the model directly. However, we want to compute the loss with the output response only. Therefore, we will use `DataCollatorForCompletionOnlyLM` to collate the training data batches.

In [None]:
response_template = "### Response:"
collator = DataCollatorForCompletionOnlyLM(response_template, tokenizer=tokenizer)

## Create a LoRA configuration

Next, we create a LoRA configuration for our LoRA model. We will attach the new matrices to the attention layers and dense layers of the base model.

In [None]:
lora_config = LoraConfig(
    r=16,
    lora_alpha=32,
    target_modules=[
        "query_key_value",
        "dense",
        "dense_h_to_4h",
        "dense_4h_to_h",
    ],
    lora_dropout=0.05,
    bias="none",
    task_type="CAUSAL_LM"
)


Then, we can simply get the combined model by calling the `get_peft_model` method.

In [None]:
peft_model = get_peft_model(model, lora_config)
peft_model.print_trainable_parameters()

We will only train 1.11% of the total number of parameters!

## Train the LoRA model

Define some hyperparameters for the training.

In [None]:
training_args = transformers.TrainingArguments(
    per_device_train_batch_size=1,
    gradient_accumulation_steps=4,
    num_train_epochs=1,
    learning_rate=2e-4,
    max_steps=100,
    optim = "paged_adamw_32bit",
    lr_scheduler_type="cosine",
    warmup_ratio=0.03,
    weight_decay=0,
    max_grad_norm = 0.3,
    output_dir="output",
    logging_steps=1
)


We can train the model with the `SFTTrainer` class from the `trl` module.

In [None]:
trainer = SFTTrainer(
    model=peft_model,
    train_dataset=dataset,
    dataset_text_field='text',
    data_collator=collator,
    args=training_args
)

We will also pre-process the model by upcasting the layer norms in float 32 for more stable training

In [None]:
for name, module in trainer.model.named_modules():
    if "norm" in name:
        module = module.to(torch.float32)

Finally, we can start the training by calling the `train` method of the Trainer.

In [None]:
trainer.train()

## Use the finetuned model

In [None]:
print(generate_text(peft_model, tokenizer, gen_config, test_prompt))