# Here are the steps of finetuning a model on with some Lora methods, using Trainer API of Huggingface

## 1. Install necessary libraries

In [None]:
# Install some libraries
! pip install transformers datasets evaluate peft bitsandbytes

## 2. Login Huggingface account (this step required token in order to use models and dataset)

Some model might required permission to use. If you can't use it normally, make sure to login your account on huggingface, with token. Also, to evaluate your model for multiple-choice task, it is recommended to login your huggingface account, then push it to huggingface. Further details will be in Section 7.5

In [None]:
#login to huggingface to use some certain datasets and models
"""
from huggingface_hub import login

login(token="")
"""

## 3. Load the dataset for training and evaluating

In [None]:
from datasets import load_dataset

#take the first 500 examples
wiki = load_dataset("vlsp-2023-vllm/wikipediaqa_vi", split="test[:500]")

Split the first 500 examples of the dataset into a train and test set with the [train_test_split](https://huggingface.co/docs/datasets/main/en/package_reference/main_classes#datasets.Dataset.train_test_split) method:

In [None]:
wiki = wiki.train_test_split(test_size=0.2)

Then take a look at an example:

In [None]:
wiki["train"][0]

While this may look like a lot, you're only really interested in the `text` field. What's cool about language modeling
tasks is you don't need labels (also known as an unsupervised task) because the next word *is* the label.

In [None]:
from transformers import AutoTokenizer

tokenizer = AutoTokenizer.from_pretrained("vlsp-2023-vllm/hoa-1b4")

You'll notice from the example above, the `text` field is actually nested inside `choices`. This means you'll need to
extract the `text` subfield from its nested structure with the [`flatten`](https://huggingface.co/docs/datasets/process.html#flatten) method:

In [None]:
wiki = wiki.flatten()
wiki["train"][0]

In [None]:
wiki["train"][0:2]

## 4. Preprocessing the dataset

Each subfield is now a separate column as indicated by the `answers` prefix, and the `text` field is a list now. Instead
of tokenizing each sentence separately, convert the list to a string so you can jointly tokenize them.

Here is a first preprocessing function to join the list of strings for each example and tokenize the result:

In [None]:
def preprocess_function(examples):
    extracted_answers = []
    
    for i in range(len(examples['question'])):
        # Get the index of the correct answer
        answer_index = examples['choices.labels'][i].index(examples['answerKey'][i])
        # Use the index to find the corresponding text in 'choices.text'
        correct_answer = examples['choices.text'][i][answer_index]
        question = examples['question'][i]
        correct_answer = question + " " + correct_answer
        extracted_answers.append(correct_answer)
    # print(extracted_answers)
    # print(type(extracted_answers))
    return tokenizer(extracted_answers)

To apply this preprocessing function over the entire dataset, use the 🤗 Datasets [map](https://huggingface.co/docs/datasets/main/en/package_reference/main_classes#datasets.Dataset.map) method. You can speed up the `map` function by setting `batched=True` to process multiple elements of the dataset at once, and increasing the number of processes with `num_proc`. Remove any columns you don't need:

In [None]:
tokenized_wiki = wiki.map(
    preprocess_function,
    batched=True,
    num_proc=4,
    remove_columns=wiki["train"].column_names,
)

This dataset contains the token sequences, but some of these are longer than the maximum input length for the model.

You can now use a second preprocessing function to
- concatenate all the sequences
- split the concatenated sequences into shorter chunks defined by `block_size`, which should be both shorter than the maximum input length and short enough for your GPU RAM.

In [None]:
block_size = 128


def group_texts(examples):
    # Concatenate all texts.
    concatenated_examples = {k: sum(examples[k], []) for k in examples.keys()}
    total_length = len(concatenated_examples[list(examples.keys())[0]])
    # print(total_length)
    # We drop the small remainder, we could add padding if the model supported it instead of this drop, you can
    # customize this part to your needs.
    if total_length >= block_size:
        total_length = (total_length // block_size) * block_size
    # print(total_length)
    # Split by chunks of block_size.
    result = {
        k: [t[i : i + block_size] for i in range(0, total_length, block_size)]
        for k, t in concatenated_examples.items()
    }
    result["labels"] = result["input_ids"].copy()
    return result

Apply the `group_texts` function over the entire dataset:

In [None]:
lm_dataset = tokenized_wiki.map(group_texts, batched=True, num_proc=4)
# lm_dataset = tokenized_wiki

Now create a batch of examples using [DataCollatorForLanguageModeling](https://huggingface.co/docs/transformers/main/en/main_classes/data_collator#transformers.DataCollatorForLanguageModeling). It's more efficient to *dynamically pad* the
sentences to the longest length in a batch during collation, instead of padding the whole dataset to the maximum length.

Use the end-of-sequence token as the padding token and set `mlm=False`. This will use the inputs as labels shifted to the right by one element:

In [None]:
from transformers import DataCollatorForLanguageModeling

tokenizer.pad_token = tokenizer.eos_token
data_collator = DataCollatorForLanguageModeling(tokenizer=tokenizer, mlm=False)

In [None]:
print(data_collator)

## 5. Prepare model for training

In [None]:
from transformers import AutoModelForCausalLM, TrainingArguments, Trainer

model = AutoModelForCausalLM.from_pretrained("vlsp-2023-vllm/hoa-1b4")

## 6. Lora configuration

Here is how each Lora method is configured. We used 8 Lora methods to test the performance (with pissa having 2 ways to implement)

First, we need to import necessary libraries. In order to prevent error, we import all libraries and function that are used for all method. From 6.1 to 6.8, You should run only cells according to one method. This notebook chose 6.5 (PiSSA), since it has the best result.

In [None]:
from peft import LoraConfig, get_peft_model, TaskType, prepare_model_for_kbit_training, replace_lora_weights_loftq

### 6.1. Normal Lora

In [None]:
# Normal Lora config
Lora_config = LoraConfig(
    task_type=TaskType.CAUSAL_LM,
)

# Apply Lora config
peft_model = get_peft_model(model, peft_config)
peft_model.print_trainable_parameters()


### 6.2. Qlora: Still use Lora config, but used quantized models instead of normal model ones

In [None]:
# 2. Lora config
Qlora_config = LoraConfig(
    task_type=TaskType.CAUSAL_LM,
    target_modules="all-linear", #all linear work best according to paper
)

#Quantize the model, then apply Lora config

model.gradient_checkpointing_enable()
model = prepare_model_for_kbit_training(model)

peft_model = get_peft_model(model, peft_config)
peft_model.print_trainable_parameters()

### 6.3 LoftQ: Apply Qlora, but replace some weights with loftq weights

In [None]:
peft_config = LoraConfig(
    task_type=TaskType.CAUSAL_LM,
    target_modules="all-linear",
)

peft_model = get_peft_model(model, peft_config)
peft_model.print_trainable_parameters()

model.gradient_checkpointing_enable()
model = prepare_model_for_kbit_training(model)

replace_lora_weights_loftq(peft_model) # can apply this MORE THAN ONE
peft_model.print_trainable_parameters()

### 6.4. PiSSA_SVD: Initiate PiSSA weights with fast SVD. Faster but with lower performance

In [None]:
peft_config = LoraConfig(
    task_type=TaskType.CAUSAL_LM,
    init_lora_weights="pissa_niter_4" # this number could be changed
)

peft_model = get_peft_model(model, peft_config)
peft_model.print_trainable_parameters()

### 6.5. PiSSA: Initiate PiSSA weights

In [None]:
peft_config = LoraConfig(
    task_type=TaskType.CAUSAL_LM,
    init_lora_weights="pissa"
)

peft_model = get_peft_model(model, peft_config)
peft_model.print_trainable_parameters()

### 6.6. Olora: Initiate Olora weights

In [None]:
# 6. Olora: Initiate OLora weights
peft_config = LoraConfig(
    task_type=TaskType.CAUSAL_LM,
    init_lora_weights="olora"
)

peft_model = get_peft_model(model, peft_config)
peft_model.print_trainable_parameters()

### 6.7. RsLora method

In [None]:
# 7. RsLora:
peft_config = LoraConfig(
    task_type=TaskType.CAUSAL_LM,
    use_rslora=True,
)

peft_model = get_peft_model(model, peft_config)
peft_model.print_trainable_parameters()

### 6.8. Dora method

In [None]:
peft_config = LoraConfig(
    task_type=TaskType.CAUSAL_LM,
    use_dora=True,
)

peft_model = get_peft_model(model, peft_config)
peft_model.print_trainable_parameters()

## 7. Train the model with dataset and certain configs

### 7.1. Choose the final config

In [None]:
# final_peft_config = peft_config

### 7.2. Optional. Kaggle auto login wandb in order to check the training and testing performace. To turn this off, run this


In [None]:
import os
os.environ["WANDB_DISABLED"] = "true"

In [None]:
output_dir = "wiki_New_PiSSA_hoa1b4_no_para_changes"
training_args = TrainingArguments(
    output_dir=output_dir,
    evaluation_strategy="epoch",
    save_strategy="epoch",
    load_best_model_at_end=True,
    logging_steps=1,    # Log every X steps
)

trainer = Trainer(
    model=peft_model,
    args=training_args,
    train_dataset=lm_dataset["train"],
    eval_dataset=lm_dataset["test"],
    data_collator=data_collator,
)

trainer.train()

### 7.3. Training

### 7.4. Check the result

Once training is completed, use the [evaluate()](https://huggingface.co/docs/transformers/main/en/main_classes/trainer#transformers.Trainer.evaluate) method to evaluate your model and get its perplexity:

In [None]:
import math

eval_results = trainer.evaluate()
print(f"Perplexity: {math.exp(eval_results['eval_loss']):.2f}")

### 7.5. Push the model to Huggingface

You can share your model to the Hub with the [push_to_hub()](https://huggingface.co/docs/transformers/main/en/main_classes/trainer#transformers.Trainer.push_to_hub) method so everyone can use your model. Make sure to login your account first (back to Section 2)

In [None]:
# trainer.push_to_hub()

<Tip>

For a more in-depth example of how to finetune a model for causal language modeling, take a look at the corresponding
[PyTorch notebook](https://colab.research.google.com/github/huggingface/notebooks/blob/main/examples/language_modeling.ipynb)
or [TensorFlow notebook](https://colab.research.google.com/github/huggingface/notebooks/blob/main/examples/language_modeling-tf.ipynb).

</Tip>

## 8. Inference

Great, now that you've finetuned a model, you can use it for inference!

Come up with a prompt you'd like to generate text from:

In [None]:
prompt = "Britney và Madonna đã song ca với nhau"

The simplest way to try out your finetuned model for inference is to use it in a [pipeline()](https://huggingface.co/docs/transformers/main/en/main_classes/pipelines#transformers.pipeline). Instantiate a `pipeline` for text generation with your model, and pass your text to it:

In [None]:
from transformers import pipeline

model_path_local = "/kaggle/working/wiki_New_PiSSA_hoa1b4_no_para_changes/checkpoint-24"
# if you already login to huggingface, you can change the model path to the one you 
# have on huggingface
# model_path_local = output_dir
generator = pipeline("text-generation", model=model_path_local, device="cuda")
generator(prompt)

Tokenize the text and return the `input_ids` as PyTorch tensors:

In [None]:
from transformers import AutoTokenizer

tokenizer = AutoTokenizer.from_pretrained("vlsp-2023-vllm/hoa-1b4")
inputs = tokenizer(prompt, return_tensors="pt").input_ids

Use the [generate()](https://huggingface.co/docs/transformers/main/en/main_classes/text_generation#transformers.GenerationMixin.generate) method to generate text.
For more details about the different text generation strategies and parameters for controlling generation, check out the [Text generation strategies](https://huggingface.co/docs/transformers/main/en/tasks/../generation_strategies) page.

In [None]:
from transformers import AutoModelForCausalLM

model = AutoModelForCausalLM.from_pretrained(model_path_local)
outputs = model.generate(inputs, max_new_tokens=100, do_sample=True, top_k=50, top_p=0.95)

Decode the generated token ids back into text:

In [None]:
tokenizer.batch_decode(outputs, skip_special_tokens=True)