# Finetuning or RAG for external knowledge?

Demystify these two methods:

**Finetuning:** This is the process of taking a pre-trained LLM and further training it on a smaller, specific dataset to adapt it for a particular task or to improve its performance. By finetuning, we are adjusting the model’s weights based on our data, making it more tailored to our application’s unique needs.

**RAG:** This approach integrates the power of retrieval (or searching) into LLM text generation. It combines a retriever system, which fetches relevant document snippets from a large corpus, and an LLM, which produces answers using the information from those snippets. In essence, RAG helps the model to “look up” external information to improve its responses.

Both of the two methods can help answer specific domain knowledge based on external information. There are some factors that impact the performance of them:

1.  Dynamic vs. Static Data
    - RAG: RAG excels in dynamic data environments. It continuously queries external sources, ensuring that the information remains up-to-date without frequent model retraining.

    - Fine-Tuning: Fine-tuned models become static data snapshots during training and may quickly become outdated in dynamic data scenarios. Furthermore, fine-tuning does not guarantee recall of this knowledge, making it unreliable.
    
    - Conclusion: RAG offers agility and up-to-date responses in rapidly evolving data landscapes, making it ideal for projects with dynamic information needs.

2. External Knowledge
    - RAG: RAG is designed to augment LLM capabilities by retrieving relevant information from knowledge sources before generating a response. It's ideal for applications that query databases, documents, or other structured/unstructured data repositories. RAG excels at leveraging external sources to enhance responses.

    - Fine-Tuning: While it's possible to fine-tune an LLM to learn external knowledge, it may not be more practical for frequently changing data sources. Usually, training and evaluating models can be difficult and time-consuming.

    - Conclusion: RAG is likely the better option if your application heavily relies on external data sources due to its flexibility and ability to adapt to changing information.

3. Model Customization
    - RAG: RAG primarily focuses on information retrieval and may not inherently adapt its linguistic style or domain-specificity based on the retrieved information. It excels at incorporating external knowledge but may not fully customize the model's behavior or writing style.
    
    - Fine-Tuning: Fine-tuning allows you to adapt an LLM's behavior, writing style, or domain-specific knowledge to specific nuances, tones, or terminologies. It offers deep alignment with particular styles or expertise areas.

    - Conclusion: Fine-tuning offers a more direct route if your application demands specialized writing styles or deep alignment with domain-specific vocabulary and conventions.


Both approaches have their own strengths and weaknesses, you can unlock the full potential of your language model and create more reliable AI applications by choosing the right approach


There are an example for finetuning and RAG on a Q&A scenario.

## Prepare Environment

Recommend to use Python 3.9 or higher version.

In [None]:
!pip install intel-extension-for-transformers

Install requirements:

In [None]:
!git clone https://github.com/intel/intel-extension-for-transformers.git
%cd ./intel-extension-for-transformers/intel_extension_for_transformers/neural_chat/
!pip install -r requirements.txt
%cd ../../../
!pip uninstall torch -y
!pip install torch

## Finetuning


##### 1. prepare dataset

Convert your dataset into chat-format, which is "jsonl" file, and each line has fields like:

```python
{
	"messages": [
        {
		"from": "human",
		"content": "What is cnvrg.io?"
        }, 
        {
		"from": "chatbot",
		"content": "Cnvrg.io is a comprehensive ML/AI platform that provides tools and services to simplify the machine learning and artificial intelligence development process."
        }
    ]
}
```

##### 2. start finetuning

We employ the [LoRA approach](https://arxiv.org/pdf/2106.09685.pdf) to finetune the LLM efficiently.

Finetune the model on your chat-format dataset.

In [None]:
import os
import sys

from transformers import TrainingArguments
from intel_extension_for_transformers.neural_chat.config import (
    ModelArguments,
    DataArguments,
    FinetuningArguments,
    TextGenerationFinetuningConfig,
)

data_path = "cnvrg_dataset"
llama2_model_name_or_path = "meta-llama/Llama-2-7b-hf"

model_args = ModelArguments(
    model_name_or_path=llama2_model_name_or_path,
    use_fast_tokenizer=False,
)

data_args = DataArguments(
    dataset_name=data_path,
    max_seq_length=1024,
    max_source_length=512,
    preprocessing_num_workers=4,
    validation_split_percentage=0,
)

training_args = TrainingArguments(
    output_dir="./llama_peft_finetuned_model",
    overwrite_output_dir=True,
    do_train=True,
    do_eval=False,
    per_device_train_batch_size=2,
    gradient_accumulation_steps=2,
    learning_rate=5e-5,
    num_train_epochs=8,
    save_strategy="steps",
    save_steps=1000,
    log_level="info",
    logging_steps=10,
    save_total_limit=2,
    bf16=True,
)

finetune_args = FinetuningArguments(
    lora_alpha=64,
    lora_rank=16,
    lora_dropout=0.05,
    lora_all_linear=True,
    do_lm_eval=True,
    task="chat"
)

finetune_cfg = TextGenerationFinetuningConfig(
        model_args=model_args,
        data_args=data_args,
        training_args=training_args,
        finetune_args=finetune_args,
)

from intel_extension_for_transformers.neural_chat.chatbot import finetune_model
finetune_model(finetune_cfg)

##### 3. inference with the finetuned model 

In [None]:
from intel_extension_for_transformers.neural_chat.models.model_utils import load_model, predict_stream
from transformers import set_seed
set_seed(27)


base_model_path = "meta-llama/Llama-2-7b-hf"
peft_model_path = "./llama_peft_finetuned_model"

load_model(model_name=base_model_path,
        tokenizer_name=base_model_path,
        peft_path=peft_model_path,
        )

template = """
### System:
- You are a helpful assistant chatbot trained by Intel.
- You answer questions.
- You are excited to be able to help the user, but will refuse to do anything that could be considered harmful to the user.
- You are more than just an information source, you are also able to write poetry, short stories, and make jokes.</s>
### User:
{}</s>
### Assistant:
"""

query = "What is cnvrg.io?"

params = {
        "prompt": template.format(query),
        "model_name": base_model_path,
        "use_cache": True,
        "repetition_penalty": 1.0,
        "temperature": 0.1,
        "top_k": 10,
        "top_p": 0.75,
        "num_beams": 0,
        "max_new_tokens": 128
        }

for new_text in predict_stream(**params):
    print(new_text, end="", flush=True)

## RAG

##### 1. prepare dataset

the format for RAG, refer to: https://github.com/intel/intel-extension-for-transformers/blob/main/intel_extension_for_transformers/neural_chat/assets/docs/sample.jsonl

For the example as follows, you can define the content of `doc` to be "The cnvrg.io was founded by Yochay Ettun and Leah Forkosh Kolben."

##### 2. build RAG pipeline

In [None]:
from intel_extension_for_transformers.neural_chat import PipelineConfig
from intel_extension_for_transformers.neural_chat import plugins
from intel_extension_for_transformers.neural_chat import build_chatbot
plugins.retrieval.enable = True
plugins.retrieval.args['embedding_model'] = "hkunlp/instructor-large"
plugins.retrieval.args['process'] = False

plugins.retrieval.args["input_path"] = './cnvrg_docs_rag/'
plugins.retrieval.args["persist_dir"] = "./test_dir"
plugins.retrieval.args["response_template"] = "check the result"
plugins.retrieval.args['search_type'] = "similarity_score_threshold"
plugins.retrieval.args['append'] = False
plugins.retrieval.args['search_kwargs'] = {"score_threshold": 0.8, "k": 1}
config = PipelineConfig(model_name_or_path="meta-llama/Llama-2-7b-hf", plugins=plugins)

chatbot = build_chatbot(config)

response = chatbot.predict("Who are the founders of cnvrg.io?")
print('response',response)