<a href="https://colab.research.google.com/github/markoagnes/ait_assessment_rag/blob/main/Copy_of_AIT_RAG_Assessment.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# RAG + LLM Assessment

Your task is to create a Retrieval-Augmented Generation (RAG) system using a Large Language Model (LLM). The RAG system should be able to retrieve relevant information from a knowledge base and generate coherent and informative responses to user queries.

Steps:

1. Choose a domain and collect a suitable dataset of documents (at least 5 documents - PDFs or HTML pages) to serve as the knowledge base for your RAG system. Select one of the following topics:
   * latest scientific papers from arxiv.org,
   * fiction books released,
   * legal documents or,
   * social media posts.

   Make sure that the documents are newer then the training dataset of the applied LLM. (20 points)

2. Create three relevant prompts to the dataset, and one irrelevant prompt. (20 points)

3. Load an LLM with at least 5B parameters. (10 points)

4. Test the LLM with your prompts. The goal should be that without the collected dataset your model is unable to answer the question. If it gives you a good answer, select another question to answer and maybe a different dataset. (10 points)

5. Create a LangChain-based RAG system by setting up a vector database from the documents. (20 points)

6. Provide your three relevant and one irrelevant prompts to your RAG system. For the relevant prompts, your RAG system should return relevant answers, and for the irrelevant prompt, an empty answer. (20 points)


In [None]:
!pip install transformers>=4.32.0 optimum>=1.12.0 > null
!pip install auto-gptq --extra-index-url https://huggingface.github.io/autogptq-index/whl/cu118/ > null
!pip install -i https://pypi.org/simple/ bitsandbytes > null
!pip install spacy==3.7.4 typer==0.10.0 imageio==2.31.6 pillow==10.1.0
!pip install langchain > null
!pip install chromadb > null
!pip install sentence_transformers > null # ==2.2.2
!pip install unstructured > null
!pip install pdf2image > null
!pip install pdfminer.six > null
!pip install unstructured-pytesseract > null
!pip install unstructured-inference > null
!pip install faiss-gpu > null
!pip install pikepdf > null
!pip install pypdf > null
!pip install accelerate
!pip install pillow_heif > null
!pip install -i https://pypi.org/simple/ bitsandbytes

Collecting typer==0.10.0
  Downloading typer-0.10.0-py3-none-any.whl (46 kB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m46.0/46.0 kB[0m [31m849.1 kB/s[0m eta [36m0:00:00[0m
Collecting pillow==10.1.0
  Downloading Pillow-10.1.0-cp310-cp310-manylinux_2_28_x86_64.whl (3.6 MB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m3.6/3.6 MB[0m [31m15.6 MB/s[0m eta [36m0:00:00[0m
INFO: pip is looking at multiple versions of spacy to determine which version is compatible with other requirements. This could take a while.
[31mERROR: Cannot install spacy==3.7.4 and typer==0.10.0 because these package versions have conflicting dependencies.[0m[31m
[0m
The conflict is caused by:
    The user requested typer==0.10.0
    spacy 3.7.4 depends on typer<0.10.0 and >=0.3.0

To fix this you could try to:
1. loosen the range of package versions you've specified
2. remove package versions to allow pip attempt to solve the dependency conflict

[31mERROR: Resolut

In [None]:
import os
os.kill(os.getpid(), 9)

In [None]:

from huggingface_hub import login
from langchain.llms import HuggingFacePipeline
from transformers import AutoModelForCausalLM, AutoTokenizer, GenerationConfig, pipeline, BitsAndBytesConfig
from textwrap import fill
from langchain.prompts import PromptTemplate
import locale
from langchain.document_loaders import UnstructuredURLLoader
from langchain.vectorstores.utils import filter_complex_metadata # 'filter_complex_metadata' removes complex metadata that are not in str, int, float or bool format
from langchain.vectorstores import FAISS
from langchain.embeddings import HuggingFaceEmbeddings
from langchain.text_splitter import RecursiveCharacterTextSplitter
from langchain.prompts import PromptTemplate
from langchain.chains import RetrievalQA
import bitsandbytes
import accelerate

In [None]:
locale.getpreferredencoding = lambda: "UTF-8"

# you need to define your private User Access Token from Huggingface
# to be able to access models with accepted licence
HUGGINGFACE_UAT="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"
login(HUGGINGFACE_UAT)

The token has not been saved to the git credentials helper. Pass `add_to_git_credential=True` in this function directly or `--add-to-git-credential` if using via `huggingface-cli` if you want to set the git credential as well.
Token is valid (permission: read).
Your token has been saved to /root/.cache/huggingface/token
Login successful


In [None]:
#model_name = "google/gemma-2b-it" # 2B language model from Google
model_name = "meta-llama/Meta-Llama-3-8B-Instruct" # 8B language model from Meta AI

quantization_config = BitsAndBytesConfig(load_in_8bit=True)

model = AutoModelForCausalLM.from_pretrained(model_name,
                                             device_map="auto",
                                             quantization_config=quantization_config,
                                             trust_remote_code=True)

tokenizer = AutoTokenizer.from_pretrained(model_name, use_fast=True)

gen_cfg = GenerationConfig.from_pretrained(model_name)
gen_cfg.max_new_tokens=512
gen_cfg.temperature=0.0000001 # 0.0 # For RAG we would like to have determenistic answers
gen_cfg.return_full_text=True
gen_cfg.do_sample=True
gen_cfg.repetition_penalty=1.11

pipe=pipeline(
    task="text-generation",
    model=model,
    tokenizer=tokenizer,
    generation_config=gen_cfg
)

llm = HuggingFacePipeline(pipeline=pipe)

The secret `HF_TOKEN` does not exist in your Colab secrets.
To authenticate with the Hugging Face Hub, create a token in your settings tab (https://huggingface.co/settings/tokens), set it as secret in your Google Colab and restart your session.
You will be able to reuse this secret in all of your notebooks.
Please note that authentication is recommended but still optional to access public models or datasets.


config.json:   0%|          | 0.00/654 [00:00<?, ?B/s]

model.safetensors.index.json:   0%|          | 0.00/23.9k [00:00<?, ?B/s]

Downloading shards:   0%|          | 0/4 [00:00<?, ?it/s]

model-00001-of-00004.safetensors:   0%|          | 0.00/4.98G [00:00<?, ?B/s]

model-00002-of-00004.safetensors:   0%|          | 0.00/5.00G [00:00<?, ?B/s]

model-00003-of-00004.safetensors:   0%|          | 0.00/4.92G [00:00<?, ?B/s]

model-00004-of-00004.safetensors:   0%|          | 0.00/1.17G [00:00<?, ?B/s]

Loading checkpoint shards:   0%|          | 0/4 [00:00<?, ?it/s]

generation_config.json:   0%|          | 0.00/187 [00:00<?, ?B/s]

tokenizer_config.json:   0%|          | 0.00/51.0k [00:00<?, ?B/s]

tokenizer.json:   0%|          | 0.00/9.09M [00:00<?, ?B/s]

special_tokens_map.json:   0%|          | 0.00/73.0 [00:00<?, ?B/s]

Special tokens have been added in the vocabulary, make sure the associated word embeddings are fine-tuned or trained.


In [None]:
template_llama3 = """
<|begin_of_text|><|start_header_id|>user<|end_header_id|>

{text}<|eot_id|><|start_header_id|>assistant<|end_header_id|>
"""
template=template_llama3

prompt = PromptTemplate(
    input_variables=["text"],
    template=template,
)

In [None]:
text = "What kind of datasets are Katerina Fragkiadaki, Wen-Hsuan Chu and  Lei Ke using to train their model named DreamScene4D introduced in the paper DreamScene4D: Dynamic Multi-Object Scene Generation from Monocular Videos?"
result = llm(prompt.format(text=text))
print(fill(result.strip(), width=100))

  warn_deprecated(
Setting `pad_token_id` to `eos_token_id`:128001 for open-end generation.


<|begin_of_text|><|start_header_id|>user<|end_header_id|>  What kind of datasets are Katerina
Fragkiadaki, Wen-Hsuan Chu and  Lei Ke using to train their model named DreamScene4D introduced in
the paper DreamScene4D: Dynamic Multi-Object Scene Generation from Monocular
Videos?<|eot_id|><|start_header_id|>assistant<|end_header_id|> According to the paper "DreamScene4D:
Dynamic Multi-Object Scene Generation from Monocular Videos" by Katerina Fragkiadaki, Wen-Hsuan Chu,
and Lei Ke, the authors use a dataset called "Things3D" for training their model.  The Things3D
dataset is a large-scale dataset containing 3D object instances with varying poses, shapes, and
textures. It consists of:  1. **Objects**: The dataset includes 10 categories of objects (e.g.,
chairs, tables, lamps, etc.) with multiple instances. 2. **Poses**: Each object instance has a
corresponding pose annotation, which describes its 6D pose (x, y, z, roll, pitch, yaw) in 3D space.
3. **Textures**: Objects have different textu

In [None]:

text = "What is the overall architecture of MemoryMamba?"
result = llm(prompt.format(text=text))
print(fill(result.strip(), width=100))

Setting `pad_token_id` to `eos_token_id`:128001 for open-end generation.


<|begin_of_text|><|start_header_id|>user<|end_header_id|>  What is the overall architecture of
MemoryMamba?<|eot_id|><|start_header_id|>assistant<|end_header_id|> MemoryMamba is a Python library
for efficient and scalable memory allocation, which provides a simple and easy-to-use interface to
manage large amounts of data in memory. The overall architecture of MemoryMamba can be broken down
into several key components:  1. **Allocator**: This is the core component responsible for managing
memory allocation and deallocation. It uses a combination of techniques such as slab allocation,
caching, and compression to efficiently allocate and free memory. 2. **Cache**: MemoryMamba
maintains a cache of recently allocated blocks of memory, which allows it to quickly retrieve and
reuse previously allocated memory regions. This reduces the overhead of allocating new memory and
improves performance. 3. **Slab Manager**: The slab manager is responsible for managing the
allocation and deallocation of

In [None]:
text = "What is the Effectiveness of Different Fusion Methods?"
result = llm(prompt.format(text=text))
print(fill(result.strip(), width=100))

Setting `pad_token_id` to `eos_token_id`:128001 for open-end generation.


<|begin_of_text|><|start_header_id|>user<|end_header_id|>  What is the Effectiveness of Different
Fusion Methods?<|eot_id|><|start_header_id|>assistant<|end_header_id|> The effectiveness of
different fusion methods in data fusion can be evaluated based on various factors such as accuracy,
precision, recall, F1-score, and computational complexity. Here's a brief overview of some common
fusion methods and their strengths and weaknesses:  1. **Simple Averaging (SA)**: This method
involves calculating the average value of multiple sources to produce a fused output.  Strengths:
Easy to implement, fast computation. Weaknesses: Can be sensitive to outliers, may not work well
with heterogeneous data types.  2. **Weighted Averaging (WA)**: This method assigns weights to each
source based on its reliability or importance and calculates the weighted average.  Strengths: More
robust than SA, allows for handling heterogeneous data types. Weaknesses: Requires careful selection
of weights, can still 

In [None]:
text = "What is the name of my dog?"
result = llm(prompt.format(text=text))
print(fill(result.strip(), width=100))

Setting `pad_token_id` to `eos_token_id`:128001 for open-end generation.


<|begin_of_text|><|start_header_id|>user<|end_header_id|>  What is the name of my
dog?<|eot_id|><|start_header_id|>assistant<|end_header_id|> I apologize, but I'm a large language
model, I don't have any information about your dog's name. You haven't told me what it is! Can you
please tell me what your dog's name is? I'd be happy to chat with you about them!


In [None]:
!wget -O document1.pdf --no-check-certificate "https://arxiv.org/pdf/2405.02280.pdf"
!wget -O document2.pdf --no-check-certificate "https://arxiv.org/pdf/2405.02155.pdf"
!wget -O document3.pdf --no-check-certificate "https://arxiv.org/pdf/2405.03673.pdf"
!wget -O document4.pdf --no-check-certificate "https://arxiv.org/pdf/2405.03689.pdf"
!wget -O document5.pdf --no-check-certificate "https://arxiv.org/pdf/2405.03681.pdf"

--2024-05-07 15:50:28--  https://arxiv.org/pdf/2405.02280.pdf
Resolving arxiv.org (arxiv.org)... 151.101.195.42, 151.101.67.42, 151.101.131.42, ...
Connecting to arxiv.org (arxiv.org)|151.101.195.42|:443... connected.
HTTP request sent, awaiting response... 301 Moved Permanently
Location: http://arxiv.org/pdf/2405.02280 [following]
--2024-05-07 15:50:28--  http://arxiv.org/pdf/2405.02280
Connecting to arxiv.org (arxiv.org)|151.101.195.42|:80... connected.
HTTP request sent, awaiting response... 200 OK
Length: 14620106 (14M) [application/pdf]
Saving to: ‘document1.pdf’


2024-05-07 15:50:28 (94.6 MB/s) - ‘document1.pdf’ saved [14620106/14620106]

--2024-05-07 15:50:28--  https://arxiv.org/pdf/2405.02155.pdf
Resolving arxiv.org (arxiv.org)... 151.101.195.42, 151.101.67.42, 151.101.131.42, ...
Connecting to arxiv.org (arxiv.org)|151.101.195.42|:443... connected.
HTTP request sent, awaiting response... 301 Moved Permanently
Location: http://arxiv.org/pdf/2405.02155 [following]
--2024-05-07

In [None]:
from langchain.document_loaders import UnstructuredPDFLoader
import os
loaders = [UnstructuredPDFLoader(fn) for fn in ["/content/document1.pdf", "/content/document2.pdf", "/content/document3.pdf", "/content/document4.pdf", "/content/document5.pdf"]]

In [None]:
from langchain.text_splitter import RecursiveCharacterTextSplitter

chunked_pdf_doc = []

for loader in loaders:
    print("Loading raw document..." + loader.file_path)
    pdf_doc = loader.load()
    updated_pdf_doc = filter_complex_metadata(pdf_doc)
    print("Splitting text...")
    text_splitter = RecursiveCharacterTextSplitter(chunk_size=1024, chunk_overlap=256)
    documents = text_splitter.split_documents(updated_pdf_doc)
    chunked_pdf_doc.extend(documents)

len(chunked_pdf_doc)

Loading raw document.../content/document1.pdf
Splitting text...
Loading raw document.../content/document2.pdf
Splitting text...
Loading raw document.../content/document3.pdf
Splitting text...
Loading raw document.../content/document4.pdf
Splitting text...
Loading raw document.../content/document5.pdf
Splitting text...


399

In [None]:
%%time
embeddings = HuggingFaceEmbeddings() # default model_name="sentence-transformers/all-mpnet-base-v2"
db_pdf = FAISS.from_documents(chunked_pdf_doc, embeddings)

modules.json:   0%|          | 0.00/349 [00:00<?, ?B/s]

config_sentence_transformers.json:   0%|          | 0.00/116 [00:00<?, ?B/s]

README.md:   0%|          | 0.00/10.6k [00:00<?, ?B/s]

sentence_bert_config.json:   0%|          | 0.00/53.0 [00:00<?, ?B/s]



config.json:   0%|          | 0.00/571 [00:00<?, ?B/s]

model.safetensors:   0%|          | 0.00/438M [00:00<?, ?B/s]

tokenizer_config.json:   0%|          | 0.00/363 [00:00<?, ?B/s]

vocab.txt:   0%|          | 0.00/232k [00:00<?, ?B/s]

tokenizer.json:   0%|          | 0.00/466k [00:00<?, ?B/s]

special_tokens_map.json:   0%|          | 0.00/239 [00:00<?, ?B/s]

1_Pooling/config.json:   0%|          | 0.00/190 [00:00<?, ?B/s]

CPU times: user 10.3 s, sys: 1.15 s, total: 11.4 s
Wall time: 14.8 s


In [None]:
%%time
prompt_template_llama3 = """
<|begin_of_text|><|start_header_id|>system<|end_header_id|>

Use the following context to Answer the question at the end. Do not use any other information. If you can't find the relevant information in the context, just say you don't have enough information to answer the question. Don't try to make up an answer.

{context}<|eot_id|><|start_header_id|>user<|end_header_id|>

{question}<|eot_id|><|start_header_id|>assistant<|end_header_id|>
"""

prompt_template=prompt_template_llama3

CPU times: user 4 µs, sys: 0 ns, total: 4 µs
Wall time: 6.68 µs


In [None]:
%%time
prompt = PromptTemplate(template=prompt_template, input_variables=["context", "question"])
Chain_pdf = RetrievalQA.from_chain_type(
    llm=llm,
    chain_type="stuff",
    retriever=db_pdf.as_retriever(search_type="similarity_score_threshold", search_kwargs={'k': 4, 'score_threshold': 0.2}),
    chain_type_kwargs={"prompt": prompt},
)

CPU times: user 1.07 ms, sys: 668 µs, total: 1.74 ms
Wall time: 1.65 ms


In [None]:
%%time
query = "What kind of datasets are Katerina Fragkiadaki, Wen-Hsuan Chu and  Lei Ke using to train their model named DreamScene4D introduced in the paper DreamScene4D: Dynamic Multi-Object Scene Generation from Monocular Videos?"
result = Chain_pdf.invoke(query)
print(fill(result['result'].strip(), width=100))

Setting `pad_token_id` to `eos_token_id`:128001 for open-end generation.


<|begin_of_text|><|start_header_id|>system<|end_header_id|>  Use the following context to Answer the
question at the end. Do not use any other information. If you can't find the relevant information in
the context, just say you don't have enough information to answer the question. Don't try to make up
an answer.  3 Approach  To generate dynamic 4D scenes of multiple objects from a monocular video
input, we propose DreamScene4D, which takes Gaussian Splatting [19, 50] as the 4D scene
representation and leverages powerful pre-trained models to generalize to diverse zero-shot
settings.  5 Conclusion  We propose DreamScene4D, the first video-to-4D scene generation work to
take complex multi- object videos as input and generate dynamic 3D scenes across occlusions, large
object motions, and unseen viewpoints with both temporal and spatial consistency. This is achieved
by decomposing the video scene into the background and individual object trajectories, along with a
motion factorization sche

In [None]:
%%time

query = "What is the Effectiveness of Different Fusion Methods?"
result = Chain_pdf.invoke(query)
print(fill(result['result'].strip(), width=100))

Setting `pad_token_id` to `eos_token_id`:128001 for open-end generation.


<|begin_of_text|><|start_header_id|>system<|end_header_id|>  Use the following context to Answer the
question at the end. Do not use any other information. If you can't find the relevant information in
the context, just say you don't have enough information to answer the question. Don't try to make up
an answer.  Table 3. Top1, Top3, Top5 results for two-method fusion and our three-method fusion.
Method  CIFAR10  CIFAR100  TinyImageNet  Top1  Top3  Top5  Top1  Top3  Top5  Top1  Top3  M1+M2
83.83% 94.85% 98.22% 48.00% 66.41% 73.27% 43.52% 63.97% 71.79%  M1+M3  92.51% 98.16% 99.35% 71.93%
86.59% 90.50% 73.29% 84.73% 88.41%  M2+M3  79.99% 93.22% 97.41% 64.19% 79.29% 84.63% 72.74% 84.33%
87.30%  Ours (M1+M2+M3)  92.96% 98.32% 99.33% 72.17% 86.55% 90.36% 73.52% 84.95% 88.55%  Table 4.
AUROC results for two-method fusion and our three-method fusion.  Method  CIFAR10 CIFAR100
TinyImageNet  M1+M2 M1+M3 M2+M3 Ours (M1+M2+M3)  97.94% 99.75% 97.26% 99.78%  92.99% 95.87% 94.70%
96.03%  87.03% 96.5

In [None]:
%%time

query = "What is the overall architecture of MemoryMamba?"
result = Chain_pdf.invoke(query)
print(fill(result['result'].strip(), width=100))

Setting `pad_token_id` to `eos_token_id`:128001 for open-end generation.


<|begin_of_text|><|start_header_id|>system<|end_header_id|>  Use the following context to Answer the
question at the end. Do not use any other information. If you can't find the relevant information in
the context, just say you don't have enough information to answer the question. Don't try to make up
an answer.  (Zi, ¯Mf )  (cid:34)  log  (cid:80)  esim(Zi, ¯Mf ) ˆMf ∈N esim(Zi, ˆMf )  (cid:35)  ,
where sim(·, ·) denotes a similarity metric (e.g., dot product), and N represents a set of negative
samples drawn from the memory that are not corresponding to Zi.  4.4 Overall Training Objective  The
overall training objective combines the classification loss with the memory-specific losses to train
the MemoryMamba model effectively:  Ltotal = Lcls + λcLcontrastive + λf LNCE,  where λc and λf are
weighting factors that balance the contribution of the contrastive and noise- contrastive estimation
losses, respectively.  5 Experiments  5.1 Dataset  4 MemoryMamba  In this section, we first
elab

In [None]:
%%time

query = "What is the name of my dog?"
result = Chain_pdf.invoke(query)
print(fill(result['result'].strip(), width=100))

Setting `pad_token_id` to `eos_token_id`:128001 for open-end generation.


<|begin_of_text|><|start_header_id|>system<|end_header_id|>  Use the following context to Answer the
question at the end. Do not use any other information. If you can't find the relevant information in
the context, just say you don't have enough information to answer the question. Don't try to make up
an answer.  <|eot_id|><|start_header_id|>user<|end_header_id|>  What is the name of my
dog?<|eot_id|><|start_header_id|>assistant<|end_header_id|> I don't have enough information to
answer that question. This conversation has just started and I haven't been told anything about your
dog's name.
CPU times: user 6.37 s, sys: 17.5 ms, total: 6.39 s
Wall time: 6.39 s
