# RAG + LLM Assessment

Your task is to create a Retrieval-Augmented Generation (RAG) system using a Large Language Model (LLM). The RAG system should be able to retrieve relevant information from a knowledge base and generate coherent and informative responses to user queries.

Steps:

1. Choose a domain and collect a suitable dataset of documents (at least 5 documents - PDFs or HTML pages) to serve as the knowledge base for your RAG system. Select one of the following topics:
   * latest scientific papers from arxiv.org,
   * fiction books released,
   * legal documents or,
   * social media posts.

   Make sure that the documents are newer then the training dataset of the applied LLM. (20 points)

2. Create three relevant prompts to the dataset, and one irrelevant prompt. (20 points)

3. Load an LLM with at least 5B parameters. (10 points)

4. Test the LLM with your prompts. The goal should be that without the collected dataset your model is unable to answer the question. If it gives you a good answer, select another question to answer and maybe a different dataset. (10 points)

5. Create a LangChain-based RAG system by setting up a vector database from the documents. (20 points)

6. Provide your three relevant and one irrelevant prompts to your RAG system. For the relevant prompts, your RAG system should return relevant answers, and for the irrelevant prompt, an empty answer. (20 points)


# Documents and Prompts

The topic that I plan to serve as the knowledge base for my RAG system are latest scientific papers from arxiv. The model that I am using (Llama 2) was trained beteween Janurary 2023 and July 2023, so the selected documents that I have chosen have been created after then. The documents are all from 2024 and are April Fools documents so there are for sure no replicates of this in the original training set.

https://arxiv.org/pdf/2403.19993

https://arxiv.org/pdf/2403.19749

https://arxiv.org/pdf/2403.20219

https://arxiv.org/pdf/2403.20143

https://arxiv.org/pdf/2403.20314

The promps that I plan to use are:
1. Why is FLAMINGO is the perfect name for an array of Cherenkov telescopes? (REAL)
2. How do different messengers affect a person's personality? (REAL)
3. What is the Universe of the birthday achieved by solving the Einstein-Boltzmann equations using CLASS? (REAL)
4. What is Elon Musk's Goldfish name? (FAKE)

## Installing dependencies



In [38]:
!pip install transformers>=4.32.0 optimum>=1.12.0 > null
!pip install auto-gptq --extra-index-url https://huggingface.github.io/autogptq-index/whl/cu118/ > null
!pip install langchain > null
!pip install chromadb > null
!pip install sentence_transformers > null # ==2.2.2
!pip install unstructured > null
!pip install pdf2image > null
!pip install pdfminer.six > null
!pip install unstructured-pytesseract > null
!pip install unstructured-inference > null
!pip install faiss-gpu > null
!pip install pikepdf > null
!pip install pypdf > null
!pip install accelerate > null
!pip install pillow_heif > null
!pip install -i https://pypi.org/simple/ bitsandbytes > null

Important
------
Restart the kernel after installing the packages.

In [None]:
import os
os.kill(os.getpid(), 9)

# Imports
Next, we import the necessary Python libraries.

In [1]:
from huggingface_hub import login
from langchain.llms import HuggingFacePipeline
from transformers import AutoModelForCausalLM, AutoTokenizer, GenerationConfig, pipeline, BitsAndBytesConfig
from textwrap import fill
from langchain.prompts import PromptTemplate
import locale
from langchain.document_loaders import UnstructuredURLLoader
from langchain.vectorstores.utils import filter_complex_metadata # 'filter_complex_metadata' removes complex metadata that are not in str, int, float or bool format
from langchain.vectorstores import FAISS
from langchain.embeddings import HuggingFaceEmbeddings
from langchain.text_splitter import RecursiveCharacterTextSplitter

from langchain.prompts import PromptTemplate
from langchain.chains import RetrievalQA

locale.getpreferredencoding = lambda: "UTF-8"

# you need to define your private User Access Token from Huggingface
# to be able to access models with accepted licence
HUGGINGFACE_UAT="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"
login(HUGGINGFACE_UAT)

The token has not been saved to the git credentials helper. Pass `add_to_git_credential=True` in this function directly or `--add-to-git-credential` if using via `huggingface-cli` if you want to set the git credential as well.
Token is valid (permission: write).
Your token has been saved to /root/.cache/huggingface/token
Login successful


In [2]:
model_name = "meta-llama/Llama-2-7b-hf"
model_name = "google/gemma-2b-it"

quantization_config = BitsAndBytesConfig(load_in_8bit=True)

model = AutoModelForCausalLM.from_pretrained(model_name,
                                             device_map="auto",
                                             quantization_config=quantization_config,
                                             trust_remote_code=True)

tokenizer = AutoTokenizer.from_pretrained(model_name, use_fast=True)

gen_cfg = GenerationConfig.from_pretrained(model_name)
gen_cfg.max_new_tokens=512
gen_cfg.temperature=0.0000001 # 0.0 # For RAG we would like to have determenistic answers
gen_cfg.return_full_text=True
gen_cfg.do_sample=True
gen_cfg.repetition_penalty=1.11

pipe=pipeline(
    task="text-generation",
    model=model,
    tokenizer=tokenizer,
    generation_config=gen_cfg
)

llm = HuggingFacePipeline(pipeline=pipe)

The secret `HF_TOKEN` does not exist in your Colab secrets.
To authenticate with the Hugging Face Hub, create a token in your settings tab (https://huggingface.co/settings/tokens), set it as secret in your Google Colab and restart your session.
You will be able to reuse this secret in all of your notebooks.
Please note that authentication is recommended but still optional to access public models or datasets.
Gemma's activation function should be approximate GeLU and not exact GeLU.
Changing the activation function to `gelu_pytorch_tanh`.if you want to use the legacy `gelu`, edit the `model.config` to set `hidden_activation=gelu`   instead of `hidden_act`. See https://github.com/huggingface/transformers/pull/29402 for more details.


Loading checkpoint shards:   0%|          | 0/2 [00:00<?, ?it/s]

In [3]:
template_gemma = """
user
{text}
model
"""

template_llama2 = """
user
{text}
model
"""

if "gemma" in model_name:
  template=template_llama2
else:
  template=template_gemma

prompt = PromptTemplate(
    input_variables=["text"],
    template=template,
)


In [4]:
text = "Why is FLAMINGO is the perfect name for an array of Cherenkov telescopes?"
result = llm(prompt.format(text=text))
print(fill(result.strip(), width=100))
print("\n")

text = "How do different messengers affect a person's personality?"
result = llm(prompt.format(text=text))
print(fill(result.strip(), width=100))
print("\n")

text = "What is the Universe of the birthday achieved by solving the Einstein-Boltzmann equations using CLASS?"
result = llm(prompt.format(text=text))
print(fill(result.strip(), width=100))
print("\n")

text = "What is Elon Musk's Goldfish name?"
result = llm(prompt.format(text=text))
print(fill(result.strip(), width=100))
print("\n")

  warn_deprecated(


user Why is FLAMINGO is the perfect name for an array of Cherenkov telescopes? model The perfect
name for an array of Cherenkov telescopes is FLAMINGO.


user How do different messengers affect a person's personality? model The personality of a person is
affected by various messengers, including:  * ****Personal communication:** Personal communication
involves direct interactions between a person and others, such as through phone calls, emails, and
face-to-face conversations. * ****Media:** Media, such as television, radio, and social media,
provides a platform for people to share information and interact with each other. *
****Technology:** Technology, such as smartphones, laptops, and social media platforms, facilitates
communication and allows people to connect with others in various ways. * ****Social groups:**
Social groups, such as clubs, organizations, and communities, provide opportunities for people to
engage with others and participate in shared activities. * ****Cultural inf

## RAG on the web
In this section, we download content from the internet, vectorise it and store the vectors, then search these vectors and generate the answer using the associated text.

In [5]:
web_loader = UnstructuredURLLoader(
    urls=["https://arxiv.org/pdf/2403.19993", "https://arxiv.org/pdf/2403.19749",
          "https://arxiv.org/pdf/2403.20219", "https://arxiv.org/pdf/2403.20143",
          "https://arxiv.org/pdf/2403.20314"], mode="elements", strategy="fast"
    )
web_doc = web_loader.load()
updated_web_doc = filter_complex_metadata(web_doc)

In [6]:
text_splitter = RecursiveCharacterTextSplitter(chunk_size=2048, chunk_overlap=512)
chunked_web_doc = text_splitter.split_documents(updated_web_doc)
len(chunked_web_doc)

881

In [7]:
embeddings = HuggingFaceEmbeddings() # default model_name="sentence-transformers/all-mpnet-base-v2"




In [8]:
%%time

# Create the vectorized db with FAISS

db_web = FAISS.from_documents(chunked_web_doc, embeddings)

# Create the vectorized db with Chroma
# from langchain.vectorstores import Chroma
# db_web = Chroma.from_documents(chunked_web_doc, embeddings)

CPU times: user 2.73 s, sys: 39.7 ms, total: 2.77 s
Wall time: 2.99 s


In [9]:
%%time


prompt_template_gemma = """
user
Use the following context to Answer the question at the end. Do not use any other information. If you can't find the relevant information in the context, just say you don't have enough information to answer the question. Don't try to make up an answer.

{context}

Question: {question}<end_of_turn>

<start_of_turn>model
"""

prompt_template_llama2 = """
system

Use the following context to answer the question at the end. Do not use any other information. If you can't find the relevant information in the context, just say you don't have enough information to answer the question. Don't try to make up an answer.

{context}user

{question}assistant
"""

if "gemma" in model_name:
  prompt_template=prompt_template_llama2
else:
  prompt_template=prompt_template_gemma


CPU times: user 4 µs, sys: 0 ns, total: 4 µs
Wall time: 6.68 µs


In [10]:
prompt = PromptTemplate(template=prompt_template, input_variables=["context", "question"])
Chain_web = RetrievalQA.from_chain_type(
    llm=llm,
    chain_type="stuff",
    retriever=db_web.as_retriever(search_type="similarity_score_threshold", search_kwargs={'k': 5, 'score_threshold': 0.2}),
    chain_type_kwargs={"prompt": prompt},
)


In [11]:
query = "Why is FLAMINGO is the perfect name for an array of Cherenkov telescopes?"
result = Chain_web.invoke(query)
print(fill(result['result'].strip(), width=100))
print("\n \n")

query = "How do different messengers affect a person's personality?"
result = Chain_web.invoke(query)
print(fill(result['result'].strip(), width=100))
print("\n \n")

query = "What is the Universe of the birthday achieved by solving the Einstein-Boltzmann equations using CLASS?"
result = Chain_web.invoke(query)
print(fill(result['result'].strip(), width=100))
print("\n \n")

query = "What is Elon Musk's Goldfish name?"
result = Chain_web.invoke(query)
print(fill(result['result'].strip(), width=100))

system  Use the following context to answer the question at the end. Do not use any other
information. If you can't find the relevant information in the context, just say you don't have
enough information to answer the question. Don't try to make up an answer.  Why FLAMINGO is the
perfect name for an array of Cherenkov telescopes  Overall, using the name FLAMINGO for an array of
Cherenkov telescopes can help to create a more inclu- sive and diverse community of astronomers and
astro- physicists, promoting a more equitable and representa- tive field.  Finally, the name
FLAMINGO can help to make ar- rays of Cherenkov telescopes more accessible and ap- proachable to
people who may not have a background in astrophysics. By using a name that is fun and easy to
remember, the project can help to break down barri-  In this paper, we have argued that FLAMINGO is
the perfect name for an array of Cherenkov telescopes for several reasons. Firstly, the color pink,
which is as- sociated with flaming



system  Use the following context to answer the question at the end. Do not use any other
information. If you can't find the relevant information in the context, just say you don't have
enough information to answer the question. Don't try to make up an answer.  user  What is Elon
Musk's Goldfish name?assistant The context does not provide information about Elon Musk's Goldfish
name, so I cannot answer this question from the context.
