<a href="https://colab.research.google.com/github/phoebejeske/deep-learning/blob/main/assessment7revised.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# RAG + LLM Assessment

Your task is to create a Retrieval-Augmented Generation (RAG) system using a Large Language Model (LLM). The RAG system should be able to retrieve relevant information from a knowledge base and generate coherent and informative responses to user queries.

Steps:

1. Choose a domain and collect a suitable dataset of documents (at least 5 documents - PDFs or HTML pages) to serve as the knowledge base for your RAG system. Select one of the following topics:
   * latest scientific papers from arxiv.org,
   * fiction books released,
   * legal documents or,
   * social media posts.

   Make sure that the documents are newer then the training dataset of the applied LLM. (20 points)

2. Create three relevant prompts to the dataset, and one irrelevant prompt. (20 points)

3. Load an LLM with at least 5B parameters. (10 points)

4. Test the LLM with your prompts. The goal should be that without the collected dataset your model is unable to answer the question. If it gives you a good answer, select another question to answer and maybe a different dataset. (10 points)

5. Create a LangChain-based RAG system by setting up a vector database from the documents. (20 points)

6. Provide your three relevant and one irrelevant prompts to your RAG system. For the relevant prompts, your RAG system should return relevant answers, and for the irrelevant prompt, an empty answer. (20 points)


## Installing dependencies



In [1]:
!pip install transformers>=4.32.0 optimum>=1.12.0 > null
!pip install auto-gptq --extra-index-url https://huggingface.github.io/autogptq-index/whl/cu118/ > null
!pip install langchain > null
!pip install chromadb > null
!pip install sentence_transformers > null # ==2.2.2
!pip install unstructured > null
!pip install pdf2image > null
!pip install pdfminer.six > null
!pip install unstructured-pytesseract > null
!pip install unstructured-inference > null
!pip install faiss-gpu > null
!pip install pikepdf > null
!pip install pypdf > null
!pip install accelerate > null
!pip install pillow_heif > null
!pip install -i https://pypi.org/simple/ bitsandbytes > null

[31mERROR: pip's dependency resolver does not currently take into account all the packages that are installed. This behaviour is the source of the following dependency conflicts.
spacy 3.7.4 requires typer<0.10.0,>=0.3.0, but you have typer 0.12.3 which is incompatible.
weasel 0.3.4 requires typer<0.10.0,>=0.3.0, but you have typer 0.12.3 which is incompatible.[0m[31m
[0m[31mERROR: pip's dependency resolver does not currently take into account all the packages that are installed. This behaviour is the source of the following dependency conflicts.
imageio 2.31.6 requires pillow<10.1.0,>=8.3.2, but you have pillow 10.3.0 which is incompatible.[0m[31m
[0m

In [None]:
import os
os.kill(os.getpid(), 9)

# Imports


In [1]:
from huggingface_hub import login
from langchain.llms import HuggingFacePipeline
from transformers import AutoModelForCausalLM, AutoTokenizer, GenerationConfig, pipeline, BitsAndBytesConfig
from textwrap import fill
from langchain.prompts import PromptTemplate
import locale
from langchain.document_loaders import UnstructuredURLLoader
from langchain.vectorstores.utils import filter_complex_metadata # 'filter_complex_metadata' removes complex metadata that are not in str, int, float or bool format
from langchain.vectorstores import FAISS
from langchain.embeddings import HuggingFaceEmbeddings
from langchain.text_splitter import RecursiveCharacterTextSplitter

from langchain.prompts import PromptTemplate
from langchain.chains import RetrievalQA

locale.getpreferredencoding = lambda: "UTF-8"

# you need to define your private User Access Token from Huggingface
# to be able to access models with accepted licence
HUGGINGFACE_UAT="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"
login(HUGGINGFACE_UAT)

The token has not been saved to the git credentials helper. Pass `add_to_git_credential=True` in this function directly or `--add-to-git-credential` if using via `huggingface-cli` if you want to set the git credential as well.
Token is valid (permission: read).
Your token has been saved to /root/.cache/huggingface/token
Login successful


In [2]:
model_name = "meta-llama/Meta-Llama-3-8B-Instruct" # 8B language model from Meta AI

quantization_config = BitsAndBytesConfig(load_in_8bit=True)

model = AutoModelForCausalLM.from_pretrained(model_name,
                                             device_map="auto",
                                             quantization_config=quantization_config,
                                             trust_remote_code=True)

tokenizer = AutoTokenizer.from_pretrained(model_name, use_fast=True)

gen_cfg = GenerationConfig.from_pretrained(model_name)
gen_cfg.max_new_tokens=512
gen_cfg.temperature=0.0000001 # 0.0 # For RAG we would like to have determenistic answers
gen_cfg.return_full_text=True
gen_cfg.do_sample=True
gen_cfg.repetition_penalty=1.11

pipe=pipeline(
    task="text-generation",
    model=model,
    tokenizer=tokenizer,
    generation_config=gen_cfg
)

llm = HuggingFacePipeline(pipeline=pipe)

The secret `HF_TOKEN` does not exist in your Colab secrets.
To authenticate with the Hugging Face Hub, create a token in your settings tab (https://huggingface.co/settings/tokens), set it as secret in your Google Colab and restart your session.
You will be able to reuse this secret in all of your notebooks.
Please note that authentication is recommended but still optional to access public models or datasets.


config.json:   0%|          | 0.00/654 [00:00<?, ?B/s]

model.safetensors.index.json:   0%|          | 0.00/23.9k [00:00<?, ?B/s]

Downloading shards:   0%|          | 0/4 [00:00<?, ?it/s]

model-00001-of-00004.safetensors:   0%|          | 0.00/4.98G [00:00<?, ?B/s]

model-00002-of-00004.safetensors:   0%|          | 0.00/5.00G [00:00<?, ?B/s]

model-00003-of-00004.safetensors:   0%|          | 0.00/4.92G [00:00<?, ?B/s]

model-00004-of-00004.safetensors:   0%|          | 0.00/1.17G [00:00<?, ?B/s]

Loading checkpoint shards:   0%|          | 0/4 [00:00<?, ?it/s]

generation_config.json:   0%|          | 0.00/187 [00:00<?, ?B/s]

tokenizer_config.json:   0%|          | 0.00/51.0k [00:00<?, ?B/s]

tokenizer.json:   0%|          | 0.00/9.09M [00:00<?, ?B/s]

special_tokens_map.json:   0%|          | 0.00/73.0 [00:00<?, ?B/s]

Special tokens have been added in the vocabulary, make sure the associated word embeddings are fine-tuned or trained.


Testing the model with basic prompts.

In [3]:
template = """
<|begin_of_text|><|start_header_id|>user<|end_header_id|>

{text}<|eot_id|><|start_header_id|>assistant<|end_header_id|>
"""

prompt = PromptTemplate(
    input_variables=["text"],
    template=template,
)


In [4]:
text = "What is the book The Other Half about?" # the LLM hallucinates an answer for this
result = llm(prompt.format(text=text))
print(fill(result.strip(), width=100))

  warn_deprecated(
Setting `pad_token_id` to `eos_token_id`:128001 for open-end generation.


<|begin_of_text|><|start_header_id|>user<|end_header_id|>  What is the book The Other Half
about?<|eot_id|><|start_header_id|>assistant<|end_header_id|> "The Other Half" is a novel by Elin
Hilderbrand, published in 2017. The story revolves around two families who are connected through
their children's marriage.  The story centers around Melissa and Riley Sorenson, a couple who have
been married for over a decade and have three kids together. However, their seemingly perfect life
takes an unexpected turn when they discover that their daughter, Lily, has been hiding a secret:
she's been having an affair with her husband's best friend, Keaton Connolly.  As the truth comes to
light, both families are forced to confront the consequences of their actions and the secrets
they've kept hidden. The novel explores themes of love, loyalty, trust, and the complexities of
relationships.  Throughout the book, Hilderbrand delves into the inner workings of each character's
mind, revealing their motivat

In [14]:
# load 5 books that were only released online in the last week

web_loader = UnstructuredURLLoader(
    urls=["https://www.gutenberg.org/cache/epub/73550/pg73550-images.html", "https://www.gutenberg.org/cache/epub/73541/pg73541-images.html", "https://www.gutenberg.org/cache/epub/73506/pg73506-images.html", "https://www.gutenberg.org/cache/epub/73505/pg73505-images.html", "https://www.gutenberg.org/cache/epub/73548/pg73548-images.html"], mode="elements", strategy="fast",
    )
web_doc = web_loader.load()
updated_web_doc = filter_complex_metadata(web_doc)

In [15]:
text_splitter = RecursiveCharacterTextSplitter(chunk_size=2048, chunk_overlap=512)
chunked_web_doc = text_splitter.split_documents(updated_web_doc)
len(chunked_web_doc)

3229

In [16]:
embeddings = HuggingFaceEmbeddings() # default model_name="sentence-transformers/all-mpnet-base-v2"




In [17]:
%%time

# Create the vectorized db with FAISS

db_web = FAISS.from_documents(chunked_web_doc, embeddings)

CPU times: user 12.9 s, sys: 22.9 ms, total: 12.9 s
Wall time: 13.5 s


In [63]:
%%time

prompt_template = """
<|begin_of_text|><|start_header_id|>system<|end_header_id|>

Use the following context to answer the question at the end. Do not use any other knowledge or information that you may have.
If you can't find the relevant information in the following context, your answer should use no words. Do not try to make up an answer.
For example, if the context is "" and the question is "What is a whale?", you should output
CONTEXT:

{context}<|eot_id|><|start_header_id|>user<|end_header_id|>

{question}<|eot_id|><|start_header_id|>assistant<|end_header_id|>
"""


CPU times: user 4 µs, sys: 0 ns, total: 4 µs
Wall time: 8.34 µs


In [64]:
prompt = PromptTemplate(template=prompt_template, input_variables=["context", "question"])
Chain_web = RetrievalQA.from_chain_type(
    llm=llm,
    chain_type="stuff",
    # retriever=db.as_retriever(search_type="similarity_score_threshold", search_kwargs={'k': 5, 'score_threshold': 0.8})
    # Similarity Search is the default way to retrieve documents relevant to a query, but we can use MMR by setting search_type = "mmr"
    # k defines how many documents are returned; defaults to 4.
    # score_threshold allows to set a minimum relevance for documents returned by the retriever, if we are using the "similarity_score_threshold" search type.
    # return_source_documents=True, # Optional parameter, returns the source documents used to answer the question
    retriever=db_web.as_retriever(search_type="similarity_score_threshold", search_kwargs={'k': 10, 'score_threshold': 0.1}),
    chain_type_kwargs={"prompt": prompt},
)


In [11]:
query = "What is Edwin L. Sabin's book The Other Half about?"
result = Chain_web.invoke(query)
result

Setting `pad_token_id` to `eos_token_id`:128001 for open-end generation.


{'query': "What is Edwin L. Sabin's book The Other Half about?",
 'result': '\n<|begin_of_text|><|start_header_id|>system<|end_header_id|>\n\nUse the following context to answer the question at the end. Do not use any other information. If you can\'t find the relevant information in the context, just say you don\'t have enough information to answer the question. Don\'t try to make up an answer.\n\nAuthor: Edwin L. Sabin\n\nTitle: The other half\n\nThe Project Gutenberg eBook of The other half\n\nBy Edwin L. Sabin\n\n*** END OF THE PROJECT GUTENBERG EBOOK THE OTHER HALF ***\n\n*** START OF THE PROJECT GUTENBERG EBOOK THE OTHER HALF ***\n\nThe Other Half\n\n“What? Supposed! Supposing I say there was such a woman—my own wife, sir—my bar sinister—my cross that has ruined my life and made me doubt God and man and woman for half a century. And this half coin! I vowed I’d have it back. When at old Fort Bridger I got word that she had deserted me—deserted me for a scoundrelly half-breed—I swor

In [19]:
print(fill(result['result'].strip(), width=100))

<|begin_of_text|><|start_header_id|>system<|end_header_id|>  Use the following context to answer the
question at the end. Do not use any other information. If you can't find the relevant information in
the context, just say you don't have enough information to answer the question. Don't try to make up
an answer.  Author: Edwin L. Sabin  Title: The other half  The Project Gutenberg eBook of The other
half  By Edwin L. Sabin  *** END OF THE PROJECT GUTENBERG EBOOK THE OTHER HALF ***  *** START OF THE
PROJECT GUTENBERG EBOOK THE OTHER HALF ***  The Other Half  “What? Supposed! Supposing I say there
was such a woman—my own wife, sir—my bar sinister—my cross that has ruined my life and made me doubt
God and man and woman for half a century. And this half coin! I vowed I’d have it back. When at old
Fort Bridger I got word that she had deserted me—deserted me for a scoundrelly half-breed—I swore
that I’d trail her down till I got back the only bond between us. It’s been my passion; it’s been


In [20]:
%%time

query = "Who are the major characters in Into the Blue?"
result = Chain_web.invoke(query)
print(fill(result['result'].strip(), width=100))

Setting `pad_token_id` to `eos_token_id`:128001 for open-end generation.


<|begin_of_text|><|start_header_id|>system<|end_header_id|>  Use the following context to answer the
question at the end. Do not use any other information. If you can't find the relevant information in
the context, just say you don't have enough information to answer the question. Don't try to make up
an answer.  Title: Into the blue  Into the Blue  The Project Gutenberg eBook of Into the blue
Prelude 63 -xx- CHAPTER I. Siegfried and Mime 67 II. Hate Hole 79 III. The Mountain Pass 88 IV. The
Walküres’ Rock 95  *** END OF THE PROJECT GUTENBERG EBOOK INTO THE BLUE ***  Clayton, John, 7;
character of   his acting, 174. See Calthrop   Clemenceau, M., 162  Suddenly the light seemed to die
out from the world. All grew dark.  From a black chasm in-28- the rocks rose a woman’s figure in a
strange  halo of blue light. Her face was pale, with a look of deepest mystery  upon it. Lifting her
hand, she spoke in low, solemn tones to Wotan:  It was his face, though, that made him  prominent in
any co

In [23]:
%%time

query = "What is the Rhinegold?"
result = Chain_web.invoke(query)
print(fill(result['result'].strip(), width=100))

Setting `pad_token_id` to `eos_token_id`:128001 for open-end generation.


<|begin_of_text|><|start_header_id|>system<|end_header_id|>  Use the following context to answer the
question at the end. Do not use any other information. If you can't find the relevant information in
the context, just say you don't have enough information to answer the question. Don't try to make up
an answer.  Rhinegold, Rhinegold,  STORY OF THE RHINEGOLD  THE RHINEGOLD, OR DAS RHEINGOLD  THE
RHINEGOLD, OR DAS RHEINGOLD  In the olden days they had a lovely legend of the formation of the
Rhinegold. They said that the sun’s rays poured down into the Rhine  so brilliantly every day that,
through some magic—no one knew exactly  how—the glowing reflection became bright and beautiful gold,
filled  with great mystic powers because of its glorious origin—the sunshine.  And that was the
beginning of the Rhinegold.  So the enchanted Rhinegold came back to the hands of its first
guardians—the maidens of the river; and, after great sorrow and  turmoil, there was at last peace.
“O Rhinegold! Rhi

In [65]:
%%time

query = "What is artificial intelligence?" # should return a blank output
result = Chain_web.invoke(query)
print(fill(result['result'].strip(), width=100))

Setting `pad_token_id` to `eos_token_id`:128001 for open-end generation.


<|begin_of_text|><|start_header_id|>system<|end_header_id|>  Use the following context to answer the
question at the end. Do not use any other knowledge or information that you may have.  If you can't
find the relevant information in the following context, your answer should use no words. Do not try
to make up an answer. For example, if the context is "" and the question is "What is a whale?", you
should output   CONTEXT:  <|eot_id|><|start_header_id|>user<|end_header_id|>  What is artificial
intelligence?<|eot_id|><|start_header_id|>assistant<|end_header_id|> CONTEXT:
CPU times: user 808 ms, sys: 2.19 ms, total: 811 ms
Wall time: 810 ms
