<a href="https://colab.research.google.com/github/lilasch/DeepLearning/blob/main/Schisgal_AIT_RAG_Assessment.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# RAG + LLM Assessment

Your task is to create a Retrieval-Augmented Generation (RAG) system using a Large Language Model (LLM). The RAG system should be able to retrieve relevant information from a knowledge base and generate coherent and informative responses to user queries.

Steps:

1. Choose a domain and collect a suitable dataset of documents (at least 5 documents - PDFs or HTML pages) to serve as the knowledge base for your RAG system. Select one of the following topics:
   * latest scientific papers from arxiv.org,
   * fiction books released,
   * legal documents or,
   * social media posts.

   Make sure that the documents are newer then the training dataset of the applied LLM. (20 points)

2. Create three relevant prompts to the dataset, and one irrelevant prompt. (20 points)

3. Load an LLM with at least 5B parameters. (10 points)

4. Test the LLM with your prompts. The goal should be that without the collected dataset your model is unable to answer the question. If it gives you a good answer, select another question to answer and maybe a different dataset. (10 points)

5. Create a LangChain-based RAG system by setting up a vector database from the documents. (20 points)

6. Provide your three relevant and one irrelevant prompts to your RAG system. For the relevant prompts, your RAG system should return relevant answers, and for the irrelevant prompt, an empty answer. (20 points)


First, I need to install and import everything

In [1]:
##installation
!pip install -U typer==0.9.0 pillow==10.1.0
!pip install transformers>=4.32.0 optimum>=1.12.0 > null
!pip install auto-gptq --extra-index-url https://huggingface.github.io/autogptq-index/whl/cu118/ > null
!pip install langchain > null
!pip install chromadb > null
!pip install sentence_transformers > null # ==2.2.2
!pip install unstructured > null
!pip install pdf2image > null
!pip install pdfminer.six > null
!pip install unstructured-pytesseract > null
!pip install unstructured-inference > null
!pip install faiss-gpu > null
!pip install pikepdf > null
!pip install pypdf > null
!pip install accelerate > null
!pip install pillow_heif > null
!pip install -i https://pypi.org/simple/ bitsandbytes > null

[0m[31mERROR: pip's dependency resolver does not currently take into account all the packages that are installed. This behaviour is the source of the following dependency conflicts.
spacy 3.7.4 requires typer<0.10.0,>=0.3.0, but you have typer 0.12.3 which is incompatible.
weasel 0.3.4 requires typer<0.10.0,>=0.3.0, but you have typer 0.12.3 which is incompatible.[0m[31m
[0m

In [None]:
##kill kernel
import os
os.kill(os.getpid(), 9)

In [1]:
from huggingface_hub import login
from langchain.llms import HuggingFacePipeline
from transformers import AutoModelForCausalLM, AutoTokenizer, GenerationConfig, pipeline, BitsAndBytesConfig
from textwrap import fill
from langchain.prompts import PromptTemplate
import locale
from langchain.document_loaders import UnstructuredURLLoader
from langchain.vectorstores.utils import filter_complex_metadata # 'filter_complex_metadata' removes complex metadata that are not in str, int, float or bool format
from langchain.vectorstores import FAISS
from langchain.embeddings import HuggingFaceEmbeddings
from langchain.text_splitter import RecursiveCharacterTextSplitter
import accelerate
from langchain.prompts import PromptTemplate
from langchain.chains import RetrievalQA
import bitsandbytes


locale.getpreferredencoding = lambda: "UTF-8"

# you need to define your private User Access Token from Huggingface
# to be able to access models with accepted licence
HUGGINGFACE_UAT="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"
login(HUGGINGFACE_UAT)

The token has not been saved to the git credentials helper. Pass `add_to_git_credential=True` in this function directly or `--add-to-git-credential` if using via `huggingface-cli` if you want to set the git credential as well.
Token is valid (permission: read).
Your token has been saved to /root/.cache/huggingface/token
Login successful


1. Suitable Documents: I chose fiction books from gutenberg.org

In [2]:
# "https://www.gutenberg.org/cache/epub/73540/pg73540-images.html",
# "https://www.gutenberg.org/cache/epub/73542/pg73542-images.html",
# "https://www.gutenberg.org/cache/epub/73547/pg73547-images.html",
# "https://www.gutenberg.org/cache/epub/73545/pg73545-images.html",
# "https://www.gutenberg.org/cache/epub/73549/pg73549-images.html"



2. Prompts

In [3]:
relevant1 = "What are the chapters in the Planet of Shame by Bruce Elliot?"
relevant2 = "Describe the protagonist of Out of the Blue by Bertrand W Sinclair."
relevant3 = "Who are all the characters in The Ranch of the Tombstones?"
irrelevant1 = "How do I make chocolate chip cookies?"

3. 5B LLM

In [4]:
model_name = "meta-llama/Meta-Llama-3-8B-Instruct" # 8B language model from Meta AI

quantization_config = BitsAndBytesConfig(load_in_8bit=True)

model = AutoModelForCausalLM.from_pretrained(model_name,
                                             device_map="auto",
                                             quantization_config=quantization_config,
                                             trust_remote_code=True)

tokenizer = AutoTokenizer.from_pretrained(model_name, use_fast=True)

gen_cfg = GenerationConfig.from_pretrained(model_name)
gen_cfg.max_new_tokens=512
gen_cfg.temperature=0.0000001 # 0.0 # For RAG we would like to have determenistic answers
gen_cfg.return_full_text=True
gen_cfg.do_sample=True
gen_cfg.repetition_penalty=1.11

pipe=pipeline(
    task="text-generation",
    model=model,
    tokenizer=tokenizer,
    generation_config=gen_cfg
)

llm = HuggingFacePipeline(pipeline=pipe)

The secret `HF_TOKEN` does not exist in your Colab secrets.
To authenticate with the Hugging Face Hub, create a token in your settings tab (https://huggingface.co/settings/tokens), set it as secret in your Google Colab and restart your session.
You will be able to reuse this secret in all of your notebooks.
Please note that authentication is recommended but still optional to access public models or datasets.


config.json:   0%|          | 0.00/654 [00:00<?, ?B/s]

model.safetensors.index.json:   0%|          | 0.00/23.9k [00:00<?, ?B/s]

Downloading shards:   0%|          | 0/4 [00:00<?, ?it/s]

model-00001-of-00004.safetensors:   0%|          | 0.00/4.98G [00:00<?, ?B/s]

model-00002-of-00004.safetensors:   0%|          | 0.00/5.00G [00:00<?, ?B/s]

model-00003-of-00004.safetensors:   0%|          | 0.00/4.92G [00:00<?, ?B/s]

model-00004-of-00004.safetensors:   0%|          | 0.00/1.17G [00:00<?, ?B/s]

Loading checkpoint shards:   0%|          | 0/4 [00:00<?, ?it/s]

generation_config.json:   0%|          | 0.00/187 [00:00<?, ?B/s]

tokenizer_config.json:   0%|          | 0.00/51.0k [00:00<?, ?B/s]

tokenizer.json:   0%|          | 0.00/9.09M [00:00<?, ?B/s]

special_tokens_map.json:   0%|          | 0.00/73.0 [00:00<?, ?B/s]

Special tokens have been added in the vocabulary, make sure the associated word embeddings are fine-tuned or trained.




4. testing prompts

In [5]:
template_gemma = """
<bos><start_of_turn>user
{text}<end_of_turn>
<start_of_turn>model
"""

template_llama3 = """
<|begin_of_text|><|start_header_id|>user<|end_header_id|>

{text}<|eot_id|><|start_header_id|>assistant<|end_header_id|>
"""

if "gemma" in model_name:
  template=template_gemma
else:
  template=template_llama3

prompt = PromptTemplate(
    input_variables=["text"],
    template=template,
)


In [6]:
text = relevant1
result = llm(prompt.format(text=text))
print(fill(result.strip(), width=100))

  warn_deprecated(
Setting `pad_token_id` to `eos_token_id`:128001 for open-end generation.


<|begin_of_text|><|start_header_id|>user<|end_header_id|>  What are the chapters in the Planet of
Shame by Bruce Elliot?<|eot_id|><|start_header_id|>assistant<|end_header_id|> I apologize, but I
couldn't find any information on a book called "The Planet of Shame" written by Bruce Elliot. It's
possible that the book doesn't exist or is not well-known.  If you could provide more context or
details about the book, such as when it was published or what it's about, I may be able to help you
better. Alternatively, you can also try searching online or checking book databases such as
Goodreads or LibraryThing to see if anyone has listed the book or its contents.


In [7]:
text = relevant2
result = llm(prompt.format(text=text))
print(fill(result.strip(), width=100))

Setting `pad_token_id` to `eos_token_id`:128001 for open-end generation.


<|begin_of_text|><|start_header_id|>user<|end_header_id|>  Describe the protagonist of Out of the
Blue by Bertrand W Sinclair.<|eot_id|><|start_header_id|>assistant<|end_header_id|> I apologize, but
I couldn't find any information on a book called "Out of the Blue" written by Bertrand W Sinclair.
It's possible that the book is not well-known or was published under a different title or author
name.  If you could provide more context or details about the book, such as when it was published or
what it's about, I may be able to help you better. Alternatively, if you're looking for information
on a specific character from a different book, feel free to ask and I'll do my best to assist you!


In [8]:
text = relevant3
result = llm(prompt.format(text=text))
print(fill(result.strip(), width=100))

Setting `pad_token_id` to `eos_token_id`:128001 for open-end generation.


<|begin_of_text|><|start_header_id|>user<|end_header_id|>  Who are all the characters in The Ranch
of the Tombstones?<|eot_id|><|start_header_id|>assistant<|end_header_id|> I think you meant "The
Searchers"!  The Searchers is a classic Western film directed by John Ford, released in 1956. Here's
a list of main characters:  1. Ethan Edwards (John Wayne): The protagonist, a Civil War veteran who
spends years searching for his niece after she goes missing. 2. Martin Pawley (Jeffrey Hunter): A
young man and Ethan's adopted brother-in-law, who joins him on the search. 3. Reverend Clayton
(Walter Brennan): A kind-hearted minister who helps the two men on their quest. 4. Charlie
McCrosslin (Verna Felton): A widow who lives near the ranch where the family's niece was taken. 5.
Quanah Parker (Henry Brandon): A Comanche chief whose tribe is suspected of taking the missing girl.
6. Laurie Jorgensen (Nancy Olson): The sister of the missing girl, Debbie. 7. Debbie Edwards (Lana
Turner): The missing

In [9]:
text = irrelevant1
result = llm(prompt.format(text=text))
print(fill(result.strip(), width=100))

Setting `pad_token_id` to `eos_token_id`:128001 for open-end generation.


<|begin_of_text|><|start_header_id|>user<|end_header_id|>  How do I make chocolate chip
cookies?<|eot_id|><|start_header_id|>assistant<|end_header_id|> Making chocolate chip cookies is a
classic baking endeavor that's easy to accomplish with just a few simple ingredients and some basic
cooking skills. Here's a recipe for traditional chocolate chip cookies that yields soft, chewy, and
delicious results:  Ingredients:  * 2 1/4 cups all-purpose flour * 1 tsp baking soda * 1 tsp salt *
1 cup unsalted butter, at room temperature * 3/4 cup white granulated sugar * 3/4 cup brown sugar *
2 large eggs * 2 teaspoons vanilla extract * 2 cups semi-sweet chocolate chips  Instructions:  1.
Preheat your oven to 375°F (190°C). Line a baking sheet with parchment paper or a silicone mat. 2.
In a medium-sized bowl, whisk together the flour, baking soda, and salt. Set aside. 3. In a large
bowl, use an electric mixer to cream together the butter and sugars until light and fluffy, about
2-3 minutes. 4. Beat

5. Create Lang Chain RAG

In [10]:
web_loader = UnstructuredURLLoader(
    urls=["https://www.gutenberg.org/cache/epub/73540/pg73540-images.html",
          "https://www.gutenberg.org/cache/epub/73542/pg73542-images.html",
          "https://www.gutenberg.org/cache/epub/73547/pg73547-images.html",
          "https://www.gutenberg.org/cache/epub/73545/pg73545-images.html",
          "https://www.gutenberg.org/cache/epub/73548/pg73548-images.html"
          ], mode="elements", strategy="fast",
    )
web_doc = web_loader.load()
updated_web_doc = filter_complex_metadata(web_doc)

text_splitter = RecursiveCharacterTextSplitter(chunk_size=2048, chunk_overlap=512)
chunked_web_doc = text_splitter.split_documents(updated_web_doc)
len(chunked_web_doc)

embeddings = HuggingFaceEmbeddings() # defa
db_web = FAISS.from_documents(chunked_web_doc, embeddings)


[nltk_data] Downloading package punkt to /root/nltk_data...
[nltk_data]   Unzipping tokenizers/punkt.zip.
[nltk_data] Downloading package averaged_perceptron_tagger to
[nltk_data]     /root/nltk_data...
[nltk_data]   Unzipping taggers/averaged_perceptron_tagger.zip.


modules.json:   0%|          | 0.00/349 [00:00<?, ?B/s]

config_sentence_transformers.json:   0%|          | 0.00/116 [00:00<?, ?B/s]

README.md:   0%|          | 0.00/10.6k [00:00<?, ?B/s]

sentence_bert_config.json:   0%|          | 0.00/53.0 [00:00<?, ?B/s]



config.json:   0%|          | 0.00/571 [00:00<?, ?B/s]

model.safetensors:   0%|          | 0.00/438M [00:00<?, ?B/s]

tokenizer_config.json:   0%|          | 0.00/363 [00:00<?, ?B/s]

vocab.txt:   0%|          | 0.00/232k [00:00<?, ?B/s]

tokenizer.json:   0%|          | 0.00/466k [00:00<?, ?B/s]

special_tokens_map.json:   0%|          | 0.00/239 [00:00<?, ?B/s]

1_Pooling/config.json:   0%|          | 0.00/190 [00:00<?, ?B/s]

6. Testing RAG

In [11]:
prompt_template_gemma = """
<bos><start_of_turn>user
Use the following context to Answer the question at the end. Do not use any other information. If you can't find the relevant information in the context, just say you don't have enough information to answer the question. Don't try to make up an answer.

{context}

Question: {question}<end_of_turn>

<start_of_turn>model
"""

prompt_template_llama3 = """
<|begin_of_text|><|start_header_id|>system<|end_header_id|>

Use the following context to answer the question at the end. Do not use any other information. If you can't find the relevant information in the context, just say you don't have enough information to answer the question. Don't try to make up an answer.

{context}<|eot_id|><|start_header_id|>user<|end_header_id|>

{question}<|eot_id|><|start_header_id|>assistant<|end_header_id|>
"""

if "gemma" in model_name:
  prompt_template=prompt_template_gemma
else:
  prompt_template=prompt_template_llama3


In [12]:
prompt = PromptTemplate(template=prompt_template, input_variables=["context", "question"])
Chain_web = RetrievalQA.from_chain_type(
    llm=llm,
    chain_type="stuff",
    # retriever=db.as_retriever(search_type="similarity_score_threshold", search_kwargs={'k': 5, 'score_threshold': 0.8})
    # Similarity Search is the default way to retrieve documents relevant to a query, but we can use MMR by setting search_type = "mmr"
    # k defines how many documents are returned; defaults to 4.
    # score_threshold allows to set a minimum relevance for documents returned by the retriever, if we are using the "similarity_score_threshold" search type.
    # return_source_documents=True, # Optional parameter, returns the source documents used to answer the question
    retriever=db_web.as_retriever(search_type="similarity_score_threshold", search_kwargs={'k': 10, 'score_threshold': 0.1}),
    chain_type_kwargs={"prompt": prompt},
)


In [13]:
#relevant prompt 1
query = relevant1
result = Chain_web.invoke(query)
result

Setting `pad_token_id` to `eos_token_id`:128001 for open-end generation.


{'query': 'What are the chapters in the Planet of Shame by Bruce Elliot?',
 'result': '\n<|begin_of_text|><|start_header_id|>system<|end_header_id|>\n\nUse the following context to answer the question at the end. Do not use any other information. If you can\'t find the relevant information in the context, just say you don\'t have enough information to answer the question. Don\'t try to make up an answer.\n\nThe Project Gutenberg eBook of The planet of shame\n\n*** END OF THE PROJECT GUTENBERG EBOOK THE PLANET OF SHAME ***\n\n*** START OF THE PROJECT GUTENBERG EBOOK THE PLANET OF SHAME ***\n\nCHAPTER I\n\nCHAPTER I\n\nCHAPTER I\n\nCHAPTER I\n\nCHAPTER II\n\nCHAPTER II\n\nCHAPTER II<|eot_id|><|start_header_id|>user<|end_header_id|>\n\nWhat are the chapters in the Planet of Shame by Bruce Elliot?<|eot_id|><|start_header_id|>assistant<|end_header_id|>\nI don\'t have enough information to answer the question. The provided text only contains chapter headings, but it does not specify the auth

In [14]:
#relevant prompt 2
query = relevant2
result = Chain_web.invoke(query)
result

Setting `pad_token_id` to `eos_token_id`:128001 for open-end generation.


{'query': 'Describe the protagonist of Out of the Blue by Bertrand W Sinclair.',
 'result': "\n<|begin_of_text|><|start_header_id|>system<|end_header_id|>\n\nUse the following context to answer the question at the end. Do not use any other information. If you can't find the relevant information in the context, just say you don't have enough information to answer the question. Don't try to make up an answer.\n\nThe Project Gutenberg eBook of Out of the blue\n\nAuthor: Bertrand W. Sinclair\n\nTitle: Out of the blue\n\nJake Blue turned wearily away from the doctor and went toward the office. Spot\r\nEaston went to the livery-stable and in a few minutes he came out driving a\r\ntall, bay horse hitched to a top-buggy. He drove to the sheriff’s doorway,\r\nwhere Blue leaned dejectedly.\n\nJake Blue slammed the door behind him and went down the big room,\r\nhalf-grinning to himself. At least it was some satisfaction to goad Spot\r\nEaston, who was losing prestige about as fast as possible. Ea

In [15]:
#relevant prompt 3
query = relevant3
result = Chain_web.invoke(query)
result

Setting `pad_token_id` to `eos_token_id`:128001 for open-end generation.


{'query': 'Who are all the characters in The Ranch of the Tombstones?',
 'result': "\n<|begin_of_text|><|start_header_id|>system<|end_header_id|>\n\nUse the following context to answer the question at the end. Do not use any other information. If you can't find the relevant information in the context, just say you don't have enough information to answer the question. Don't try to make up an answer.\n\nThe Ranch of the Tombstones\n\nTitle: The ranch of the tombstones\n\nThe Project Gutenberg eBook of The ranch of the tombstones\n\n*** END OF THE PROJECT GUTENBERG EBOOK THE RANCH OF THE TOMBSTONES ***\n\n*** START OF THE PROJECT GUTENBERG EBOOK THE RANCH OF THE TOMBSTONES ***\n\n“Tombstone ranch.”\n\nThe Tombstone ranch-house door was closed, and there was no sign of life about\r\nthe place. Blue detailed twelve men to circle the place and stop any chance of\r\nescape, while the rest of them, confident in their might, rode straight to the\r\nporch. Nearly every man held a rifle in his ha

In [16]:
##irrelevant prompt 1
query = "How can I bake a good chocolate chip cookie?"
result = Chain_web.invoke(query)
result

Setting `pad_token_id` to `eos_token_id`:128001 for open-end generation.


{'query': 'How can I bake a good chocolate chip cookie?',
 'result': "\n<|begin_of_text|><|start_header_id|>system<|end_header_id|>\n\nUse the following context to answer the question at the end. Do not use any other information. If you can't find the relevant information in the context, just say you don't have enough information to answer the question. Don't try to make up an answer.\n\n<|eot_id|><|start_header_id|>user<|end_header_id|>\n\nHow can I bake a good chocolate chip cookie?<|eot_id|><|start_header_id|>assistant<|end_header_id|>\nBaking a delicious chocolate chip cookie requires some key steps and ingredients. Here's a basic recipe to get you started:\n\nIngredients:\n\n* 2 1/4 cups all-purpose flour\n* 1 tsp baking soda\n* 1 tsp salt\n* 1 cup unsalted butter, softened\n* 3/4 cup white granulated sugar\n* 3/4 cup brown sugar\n* 2 large eggs\n* 2 teaspoons vanilla extract\n* 2 cups semi-sweet chocolate chips\n\nInstructions:\n\n1. Preheat your oven to 375°F (190°C). Line a bak