# Low rank finetune FLAN-T5 on ROSA dataset 

Fine tuning models where all its parameters are changed is expensive with very large models. For perspective, a T4 16GB GPU will barely be able to fine tune a 1B parameter model. Large models like GPT-3 have 175B and more parameters. Therefore, in this notebook we try efficient ways of adapting a model. 

In this notebook we will see how to use `peft` , `transformers` & `bitsandbytes` to fine-tune `flan-t5-large`. The `peft` package allows us to use low rank adaption method of training less than 1% of the original model parameters. The [paper](https://arxiv.org/abs/2106.09685) illustrates how it is comparable to a full training. 

More information about the implementation of `peft` can be found [here](https://github.com/huggingface/peft). This approach allows us to fine tune a 3B parameter model on a single T4 GPU.

As an experiment, we will try finetuning it on the ROSA QA dataset.

Inspired from Sources [1](https://github.com/huggingface/peft/blob/main/examples/int8_training/Finetune_flan_t5_large_bnb_peft.ipynb) and [2](https://www.philschmid.de/fine-tune-flan-t5-peft).

# Install requirements

In [None]:
## Imports
import os
import torch
from datasets import load_dataset, Dataset
from transformers import AutoModelForSeq2SeqLM, AutoTokenizer
from peft import LoraConfig, get_peft_model, TaskType
from peft import prepare_model_for_int8_training
from peft import PeftModel, PeftConfig
from transformers import TrainingArguments, Trainer
from transformers import DataCollatorForSeq2Seq
from datasets import concatenate_datasets
from langchain.document_loaders import DirectoryLoader, TextLoader
from langchain.text_splitter import MarkdownTextSplitter
from langchain.embeddings import HuggingFaceEmbeddings
from langchain.vectorstores import Chroma
from transformers import Seq2SeqTrainer, Seq2SeqTrainingArguments
import pandas as pd
import numpy as np

In [None]:
# Select CUDA device index
os.environ["CUDA_VISIBLE_DEVICES"] = "0"

# Utility functions
def print_trainable_parameters(model):
    """
    Prints the number of trainable parameters in the model.
    """
    trainable_params = 0
    all_param = 0
    for _, param in model.named_parameters():
        all_param += param.numel()
        if param.requires_grad:
            trainable_params += param.numel()
    print(
        f"trainable params: {trainable_params} || all params: {all_param} || trainable%: {100 * trainable_params / all_param}"
    )

# Load model and ROSA dataset 

In [2]:
model_name = "ybelkada/flan-t5-xl-sharded-bf16"
model = AutoModelForSeq2SeqLM.from_pretrained(model_name, load_in_8bit=True, device_map="auto")
tokenizer = AutoTokenizer.from_pretrained(model_name)

Loading checkpoint shards:   0%|          | 0/3 [00:00<?, ?it/s]

In [3]:
rosa_df = pd.read_csv("../data/processed/dataset-qa/final-ds.csv").rename(columns={'Unnamed: 0':'id'})
rosa_dt = Dataset.from_pandas(rosa_df)
rosa_dt

Dataset({
    features: ['id', 'Question', 'Context', 'Answer', 'Source'],
    num_rows: 7560
})

In [4]:
rosa_df.sample(5)

Unnamed: 0,id,Question,Context,Answer,Source
1698,4,What privileges does the admin role need to li...,`iam:ListPolicyTags`: This permission allows t...,The admin role needs the `iam:ListPolicyTags` ...,../data/processed/dataset-qa/rosa-docs-rosa_ar...
4699,7,How do I configure the monitoring stack? \n,Configuring the monitoring stack: This step in...,"To configure the monitoring stack, follow the ...",../data/processed/dataset-qa/rosa-docs-rosa_ge...
5096,21,How do I manually link an existing gateway to ...,Manually linking an existing gateway to a Gate...,To manually link an existing gateway to a Gate...,../data/processed/dataset-qa/rosa-docs-service...
5289,1,What is the ROSA CLI used for?\n,ROSA CLI (`rosa`): This method allows users to...,The ROSA CLI is used to create Operator roles ...,../data/processed/dataset-qa/rosa-docs-rosa_in...
5294,6,How do I deploy a ROSA with AWS Security Token...,Creating a Cluster Using Customizations: Users...,Deploy a ROSA with AWS Security Token Service ...,../data/processed/dataset-qa/rosa-docs-rosa_in...


In [22]:
validation_df = pd.read_csv("../../../data/processed/validation_data.csv").rename(columns={'Unnamed: 0':'id'})
validation_dt = Dataset.from_pandas(validation_df)
validation_dt

Dataset({
    features: ['id', 'Question', 'Answer'],
    num_rows: 65
})

In [6]:
validation_df.sample(5)

Unnamed: 0,id,Question,Answer
42,42,What infrastructure is provisioned as part of ...,ROSA makes use of a number of different cloud ...
19,19,Which compliance certifications does ROSA have...,Red Hat OpenShift Service on AWS is currently ...
14,14,Is there a discount if I have unused OCP subsc...,Yes! Customers migrating non-ROSA OpenShift wo...
41,41,How is etcd encryption configured in a ROSA cl...,The same as in OCP. The aescbc cypher is used ...
3,3,What are the differences between Red Hat OpenS...,Everything you need to deploy and manage conta...


# Preprocess

In [7]:
tokenized_inputs = rosa_dt.map(lambda x: tokenizer(x["Question"],
                                                   truncation=True),
                               batched=True,
                               remove_columns=["Question", "Answer", "Context", "Source"])

tokenized_targets = rosa_dt.map(lambda x: tokenizer(x["Answer"],
                                                   truncation=True),
                               batched=True,
                               remove_columns=["Question", "Answer"])

input_lenghts = [len(x) for x in tokenized_inputs["input_ids"]]
max_source_length = int(np.percentile(input_lenghts, 100))
target_lenghts = [len(x) for x in tokenized_targets["input_ids"]]
max_target_length = int(np.percentile(target_lenghts, 95))

Map:   0%|          | 0/7560 [00:00<?, ? examples/s]

Map:   0%|          | 0/7560 [00:00<?, ? examples/s]

In [8]:
def preprocess_function(sample, padding="max_length"):
    # add prefix to the input for t5
    inputs = [f"Quesion: ## {q} ##\n Context: ## {c} ##" for q, c in zip(sample["Question"], sample["Context"])]
    # tokenize inputs
    model_inputs = tokenizer(inputs,
                             max_length=max_source_length,
                             padding=padding,
                             truncation=True)
    
    outputs = ["Answer: " + item for item in sample["Answer"]]
    # Tokenize targets with the `text_target` keyword argument
    labels = tokenizer(text_target=outputs,
                       max_length=max_target_length,
                       padding=padding,
                       truncation=True)
    # If we are padding here, replace all tokenizer.pad_token_id in the labels by -100 when we want to ignore
    # padding in the loss.
    if padding == "max_length":
        labels["input_ids"] = [
            [(l if l != tokenizer.pad_token_id else -100) for l in label] for label in labels["input_ids"]
        ]

    model_inputs["labels"] = labels["input_ids"]
    return model_inputs

rosa_tokenized_dataset = rosa_dt.map(preprocess_function,
                                     batched=True,
                                     remove_columns=["Question",
                                                     "Answer",
                                                     "id",
                                                     "Context",
                                                     "Source"])
print(f"Keys of tokenized dataset: {list(rosa_tokenized_dataset.features)}")

Map:   0%|          | 0/7560 [00:00<?, ? examples/s]

Keys of tokenized dataset: ['input_ids', 'attention_mask', 'labels']


# Check model answers before fine tuning 

In [9]:
def gen_answers(question, context=""):
    inp = f"Quesion: ## {question} ##\n Context: ## {context} ##"
    input_ids = tokenizer(inp, return_tensors="pt", truncation=True).input_ids.cuda()
    outputs = model.generate(input_ids=input_ids, max_new_tokens=1000, do_sample=True, top_p=1)
    answer = tokenizer.batch_decode(outputs.detach().cpu().numpy(), skip_special_tokens=True)[0]
    return (question, answer)

In [10]:
questions = validation_df['Question'].tolist()
real_answers = validation_df['Answer'].tolist()
generated_answers = list()
for q in questions:
    question, answer = gen_answers(q)
    print(f"Question: {question}\n{'---'* 20}")
    print(f"Answer:\n{answer}\n\n")

Question: What is Red Hat OpenShift Service on AWS (ROSA)?
------------------------------------------------------------
Answer:
Red Hat OpenShift Service on AWS (ROSA) is Red Hat Software's cloud-native infrastructure platform to help organizations easily develop, run, and manage applications in the cloud.


Question: Where can I go to get more information/details?
------------------------------------------------------------
Answer:
In order to access details about the location of the site, please access the official site and follow the links with your specific query about the article.


Question: What are the benefits of Red Hat OpenShift Service on AWS (Key Features)?
------------------------------------------------------------
Answer:
OpenShift on AWS has partnered with AWS, enabling it to run Red Hat OpenShift software on AWS.


Question: What are the differences between Red Hat OpenShift Service on AWS and Kubernetes?
------------------------------------------------------------
An

# Training
## Prepare model for training
Some pre-processing needs to be done before training such an int8 model using `peft`, therefore let's import an utiliy function `prepare_model_for_int8_training` that will: 
- Casts all the non `int8` modules to full precision (`fp32`) for stability
- Add a `forward_hook` to the input embedding layer to enable gradient computation of the input hidden states
- Enable gradient checkpointing for more memory-efficient training

In [8]:
# Define LoRA Config (default parameters)
lora_config = LoraConfig(
 r=16,
 lora_alpha=32,
 target_modules=["q", "v"],
 lora_dropout=0.05,
 bias="none",
 task_type=TaskType.SEQ_2_SEQ_LM
)
# prepare int-8 model for training
model = prepare_model_for_int8_training(model)

# add LoRA adaptor
model = get_peft_model(model, lora_config)
model.print_trainable_parameters()

# we want to ignore tokenizer pad token in the loss
label_pad_token_id = -100

# Data collator
data_collator = DataCollatorForSeq2Seq(
    tokenizer,
    model=model,
    label_pad_token_id=label_pad_token_id,
    pad_to_multiple_of=8
)

output_dir="lora-flan-t5-xl"

# Define training args
training_args = Seq2SeqTrainingArguments(
    output_dir=output_dir,
	auto_find_batch_size=True,
    learning_rate=1e-3, # The learning rate for the neural network
    num_train_epochs=5,
    logging_dir=f"{output_dir}/logs",
    logging_strategy="steps",
    logging_steps=500,
    save_strategy="epoch",
    save_total_limit=1,
    report_to="tensorboard",
)

# Create Trainer instance
trainer = Seq2SeqTrainer(
    model=model,
    args=training_args,
    data_collator=data_collator,
    train_dataset=rosa_tokenized_dataset,
)
model.config.use_cache = False  # silence the warnings. Please re-enable for inference!

trainable params: 9437184 || all params: 2859194368 || trainable%: 0.33006444422319176


In [None]:
# train model
trainer.train()

You're using a T5TokenizerFast tokenizer. Please note that with a fast tokenizer, using the `__call__` method is faster than using a method to encode the text followed by a call to the `pad` method to get a padded encoding.


Step,Training Loss
500,0.9481
1000,0.8311
1500,0.7453
2000,0.6976
2500,0.6228
3000,0.6006
3500,0.5468
4000,0.5059
4500,0.4694




TrainOutput(global_step=4725, training_loss=0.6541617434869998, metrics={'train_runtime': 12464.4148, 'train_samples_per_second': 3.033, 'train_steps_per_second': 0.379, 'total_flos': 4.56150095364096e+16, 'train_loss': 0.6541617434869998, 'epoch': 5.0})

In [10]:
trainer.save_model('../models/rosa-flan-2')

# Validation

In [None]:
# Helper function for generating answers
def gen_answers(question, context=""):
    inp = f"Quesion: ## {question} ##\n Context: ## {context} ##"
    input_ids = tokenizer(inp, return_tensors="pt", truncation=True).input_ids.cuda()
    outputs = model.generate(input_ids=input_ids, max_new_tokens=1000, do_sample=True, top_p=1)
    answer = tokenizer.batch_decode(outputs.detach().cpu().numpy(), skip_special_tokens=True)[0]
    return (question, answer)

## Without context
In this section, the input to the model is just the question without any context. 

In [None]:
# Load peft config for pre-trained checkpoint etc.
peft_model_id = "../../../models/rosa-flan-2"
config = PeftConfig.from_pretrained(peft_model_id)

# load base LLM model and tokenizer
model = AutoModelForSeq2SeqLM.from_pretrained(config.base_model_name_or_path,  load_in_8bit=True,  device_map={"":0})
tokenizer = AutoTokenizer.from_pretrained(config.base_model_name_or_path)

# Load the Lora model
model = PeftModel.from_pretrained(model, peft_model_id, device_map={"":0})
model.eval()

print("Peft model loaded")

In [None]:
questions = validation_df['Question'].tolist()
real_answers = validation_df['Answer'].tolist()
generated_answers = list()
for q in questions:
    question, answer = gen_answers(q)
    print(f"Question: {question}\n{'---'* 20}")
    print(f"Answer:\n{answer}\n\n")

Question: What is Red Hat OpenShift Service on AWS (ROSA)?
------------------------------------------------------------
Answer:
Answer: Red Hat OpenShift Service on AWS (ROSA) is a managed service that provides a managed cloud platform and service oriented approach to container application deployment on the Amazon Web Services (AWS) platform. ROSA provides a secure and reliable environment for container applications running on the Amazon Web Services (AWS) cloud platform.


Question: Where can I go to get more information/details?
------------------------------------------------------------
Answer:
Answer: To get more information/details, you can go to the wikiHow site.


Question: What are the benefits of Red Hat OpenShift Service on AWS (Key Features)?
------------------------------------------------------------
Answer:
Answer: Red Hat OpenShift Service on AWS offers a secure, scalable, and reliable environment for running applications on OpenShift. Its features include enhanced secu

## Validation with context from embeddings

In [11]:
# Get all documents and split them into chunks
loader = DirectoryLoader('../../../data/external', glob="**/*.md", loader_cls=TextLoader)
documents = loader.load()
text_splitter = MarkdownTextSplitter(chunk_overlap=0, chunk_size=750)
texts = text_splitter.split_documents(documents)
print(f"{len(documents)} documents were loaded in {len(texts)} chunks")

75 documents were loaded in 7153 chunks


Now we create (or load, if it already exists) a vector store from the documents:

In [18]:
# Convert chunks to vectors and store them in a database
embeddings = HuggingFaceEmbeddings()
db_dir = "../../../data/interim"
docsearch = None
if os.path.isdir(os.path.join(db_dir, "index/rosa-t5")):
    # Load the existing vector store
    docsearch = Chroma(persist_directory=db_dir, embedding_function=embeddings)
else:
    # Create a new vector store
    docsearch = Chroma.from_documents(texts, embeddings, persist_directory=db_dir)
    docsearch.persist()

Unable to connect optimized C data functions [No module named '_testbuffer'], falling back to pure Python


In [34]:
questions = validation_df['Question'].tolist()
real_answers = validation_df['Answer'].tolist()
contexts = [docsearch.similarity_search(q, 1)[0].page_content for q in questions]
generated_answers = list()
for q, c in zip(questions, contexts):
    question, answer = gen_answers(q, c)
    print(f"Question: {question}\n{'---'* 20}")
    print(f"Answer:\n{answer}\n\n")

Question: What is Red Hat OpenShift Service on AWS (ROSA)?
------------------------------------------------------------
Answer:
Answer: Red Hat OpenShift Service on AWS (ROSA) is a fully-managed turnkey application platform that allows you to focus on what matters most, delivering value to your customers by building and deploying applications. Red Hat SRE experts manage the underlying platform so you don’t have to worry about the complexity of infrastructure management.


Question: Where can I go to get more information/details?
------------------------------------------------------------
Answer:
Answer: You can look at the Red Hat OpenShift Cluster Guide on GitHub for more information and a reference page to the OpenShift APIs.


Question: What are the benefits of Red Hat OpenShift Service on AWS (Key Features)?
------------------------------------------------------------
Answer:
Answer: The benefits of Red Hat OpenShift Service on AWS include access and use of Red Hat OpenShift on de

# Conclusion
Fine-tuning the T5 FLAN model using ROSA documentation data and evaluating its performance on the FAQ document, we observed that the fine-tuned model produced answers that were more relevant to the specific domain. This indicates that the model was able to learn domain-specific information and tailor its responses accordingly.

Embedding search contexts further refined the answers generated by the fine-tuned model. This suggests that incorporating contextual information during the search process can improve the relevance and accuracy of the answers.
Overall, the combination of fine-tuning the model with domain-specific data and leveraging embedding search contexts is a promising approach to question-answering systems in specialized domains such as ROSA documentation.