# Building Efficient RAG pipelines using Open Source LLMs

## Framework Use: BeyondLLM: https://github.com/aiplanethub/beyondllm

Beyond LLM offers an all-in-one toolkit for experimentation, evaluation, and deployment of Retrieval-Augmented Generation (RAG) systems, simplifying the process with automated integration, customizable evaluation metrics, and support for various Large Language Models (LLMs) tailored to specific needs, ultimately aiming to reduce LLM hallucination risks and enhance reliability.

## Quick install

```bash
pip install beyondllm
```

In [None]:
!pip install beyondllm

In [None]:
!pip install llama-index-embeddings-fastembed

> After Installation: Restart the session

### Simple RAG

#### Data Source and Chunking

In [None]:
from beyondllm.source import fit

> Load your document, in our case our data source is YouTube

In [None]:
data = fit(
    path = "https://www.youtube.com/watch?v=qQwiAOQfILY",
    dtype = "youtube",
    chunk_size = 512, #optional
    chunk_overlap = 0 #optional
)

The feature you're trying to use requires an additional library(s):youtube_transcript_api,llama-index-readers-youtube-transcript. Would you like to install it now? [y/N]: y
['https://www.youtube.com/watch?v=qQwiAOQfILY']


> Load open source embeddings

In [None]:
from beyondllm.embeddings import FastEmbedEmbeddings

In [None]:
embed_model = FastEmbedEmbeddings(model_name="thenlper/gte-large")

Fetching 5 files:   0%|          | 0/5 [00:00<?, ?it/s]

config.json:   0%|          | 0.00/660 [00:00<?, ?B/s]

tokenizer_config.json:   0%|          | 0.00/1.41k [00:00<?, ?B/s]

special_tokens_map.json:   0%|          | 0.00/695 [00:00<?, ?B/s]

tokenizer.json:   0%|          | 0.00/712k [00:00<?, ?B/s]

model.onnx:   0%|          | 0.00/1.34G [00:00<?, ?B/s]

> Index your document

In [None]:
from beyondllm.retrieve import auto_retriever

In [None]:
retriever = auto_retriever(
    data = data,
    embed_model = embed_model,
    type = "normal",
    top_k = 3
)

> Generator - Define Large Language Model

In [None]:
from beyondllm.llms import HuggingFaceHubModel

In [None]:
from getpass import getpass

HF_TOKEN = getpass("Access token:")

Access token:··········


In [None]:
llm = HuggingFaceHubModel(
    token = HF_TOKEN,
    model = "mistralai/Mistral-7B-Instruct-v0.2",
    model_kwargs = {"max_new_tokens": 1024,
                    "temperature": 0.1,
                    "top_p": 0.95,
                    "repetition_penalty": 1.1,
                    "return_full_text": False
                  }
)

> Create the pipeline

In [None]:
query = "which organization was Tarun part of in GSoC 2023"
prompt = f"<s>[INST] {query} [/INST]"

In [None]:
from beyondllm.generator import Generate

In [None]:
pipeline = Generate(question=prompt,llm=llm,retriever=retriever)

In [None]:
print(pipeline.call())

 based on the context provided, Tarun was part of an organization called CM Microscope during Google Summer of Code (GSoC) 2023.


> Evaluation

In [None]:
print(pipeline.get_rag_triad_evals())

Executing RAG Triad Evaluations...
Context relevancy Score: 1.0
This response does not meet the evaluation threshold. Consider refining the structure and content for better clarity and effectiveness.
Answer relevancy Score: 10.0
This response meets the evaluation threshold. It demonstrates strong comprehension and coherence.
Groundness score: 5.0
This response does not meet the evaluation threshold. Consider refining the structure and content for better clarity and effectiveness.


## Re-ranking

In [None]:
re_ranker = auto_retriever(
    data = data,
    embed_model = embed_model,
    type = "cross-rerank", # flag-rerank
    top_k = 3
)

The feature you're trying to use requires a additional libraries:sentence-transformers, torch. Would you like to install it now? [y/N]: y


In [None]:
pipeline = Generate(question=prompt,llm=llm,retriever=re_ranker)

In [None]:
print(pipeline.call())

 based on the context and chat history provided, Tarun was part of an organization called CM Microscope during Google Summer of Code (GSoC) 2023.


In [None]:
print(pipeline.get_rag_triad_evals())

Executing RAG Triad Evaluations...
Context relevancy Score: 1.0
This response does not meet the evaluation threshold. Consider refining the structure and content for better clarity and effectiveness.
Answer relevancy Score: 10.0
This response meets the evaluation threshold. It demonstrates strong comprehension and coherence.
Groundness score: 5.0
This response does not meet the evaluation threshold. Consider refining the structure and content for better clarity and effectiveness.


## Fine Tune Embeddings- Advanced RAG

Notebook-2: [Fine Tune Embeddings](https://colab.research.google.com/drive/1a6f7l3pPtmjJRwOHSscHUHUn1qJ-PmCh?usp=sharing)

Thanks to [Muhammad Taha](https://www.linkedin.com/in/mtaha21/)- student at University of Southampton and AI Intern at AI Planet, for helping in preparing this notebook.

Don't forget to star the repository: [github.com/aiplanethub/beyondllm/](github.com/aiplanethub/beyondllm/)