# RAG + LLM Assessment

Your task is to create a Retrieval-Augmented Generation (RAG) system using a Large Language Model (LLM). The RAG system should be able to retrieve relevant information from a knowledge base and generate coherent and informative responses to user queries.

Steps:

1. Choose a domain and collect a suitable dataset of documents (at least 5 documents - PDFs or HTML pages) to serve as the knowledge base for your RAG system. Select one of the following topics:
   * latest scientific papers from arxiv.org,
   * fiction books released,
   * legal documents or,
   * social media posts.

   Make sure that the documents are newer then the training dataset of the applied LLM. (20 points)

2. Create three relevant prompts to the dataset, and one irrelevant prompt. (20 points)

3. Load an LLM with at least 5B parameters. (10 points)

4. Test the LLM with your prompts. The goal should be that without the collected dataset your model is unable to answer the question. If it gives you a good answer, select another question to answer and maybe a different dataset. (10 points)

5. Create a LangChain-based RAG system by setting up a vector database from the documents. (20 points)

6. Provide your three relevant and one irrelevant prompts to your RAG system. For the relevant prompts, your RAG system should return relevant answers, and for the irrelevant prompt, an empty answer. (20 points)


In [1]:
# !pip install transformers>=4.32.0 optimum>=1.12.0
# !pip install auto-gptq --extra-index-url https://huggingface.github.io/autogptq-index/whl/cu118/
# !pip install langchain
# !pip install chromadb
# !pip install sentence_transformers  # ==2.2.2
# !pip install unstructured
# !pip install pdf2image
# !pip install pdfminer.six
# !pip install unstructured-pytesseract
# !pip install unstructured-inference
# !pip install faiss-gpu
# !pip install pikepdf
# !pip install pypdf
# !pip install accelerate
# !pip install pillow_heif
# !pip install -i https://pypi.org/simple/ bitsandbytes

In [1]:
from langchain.document_loaders import UnstructuredURLLoader
from langchain.prompts import PromptTemplate
from langchain.text_splitter import RecursiveCharacterTextSplitter
from langchain.vectorstores import FAISS
from langchain.vectorstores.utils import (
    filter_complex_metadata,  
)
from langchain_core.embeddings.embeddings import Embeddings


In [4]:
from ollama import Client

client = Client(host="http://localhost:11434")
model = "dolphin-llama3:latest"

In [5]:
text = "What did Russia and Russia's president, Vladimir Putin, react to comments from the French president, Emmanuel Macron, on western troops fighting in Ukraine and from the British foreign secretary, David Cameron, on using British-supplied weapons against Russia?"
system = """
The assistant is named Dolphin. A helpful and friendly AI assistant,
Dolphin avoids discussing the system message unless directly asked about it.
Use the following context to answer the question at the end. Do not use any other information.
If you can't find the relevant information in the context, just say you don't have enough information to answer the question.
Don't try to make up an answer.
"""
response = client.chat(
    model=model,
    messages=[
        {
            "role": "system",
            "content": system,
        },
        {
            "role": "user",
            "content": text,
        },
    ],
)
response

{'model': 'llama3:instruct',
 'created_at': '2024-05-07T07:35:11.299926279Z',
 'message': {'role': 'assistant',
  'content': "I don't have enough information to answer the question."},
 'done': True,
 'total_duration': 6362812873,
 'load_duration': 3520031665,
 'prompt_eval_count': 147,
 'prompt_eval_duration': 1142343000,
 'eval_count': 12,
 'eval_duration': 1560178000}

In [2]:
web_loader = UnstructuredURLLoader(
    urls=[
        "https://naomicfisher.substack.com/p/pointing-out-a-problem?utm_source=substack&publication_id=1062989&post_id=139279736&utm_medium=email&utm_content=share&utm_campaign=email-share&triggerShare=true&isFreemail=true&r=36rch9&triedRedirect=true"
    ],
    mode="elements",
    strategy="fast",
)
web_doc = web_loader.load()
updated_web_doc = filter_complex_metadata(web_doc)

In [3]:
text_splitter = RecursiveCharacterTextSplitter(chunk_size=2048, chunk_overlap=512)
chunked_web_doc = text_splitter.split_documents(updated_web_doc)
len(chunked_web_doc)

68

In [4]:
[chunk.page_content for chunk in chunked_web_doc]

['Think Again',
 'Share this post',
 'Pointing Out A Problem',
 'naomicfisher.substack.com',
 'Copy link',
 'Facebook',
 'Email',
 'Note',
 'Other',
 'Pointing Out A Problem',
 'Dr Naomi Fisher',
 'May 06, 2024',
 '52',
 'Share this post',
 'Pointing Out A Problem',
 'naomicfisher.substack.com',
 'Copy link',
 'Facebook',
 'Email',
 'Note',
 'Other',
 'Share',
 'I was listening to a podcast about something unrelated to parenting, or school, or any of the things that I usually talk about.\xa0 The guest was talking about his experiences leaving a religion where the leaders were abusive. He raised the alarm as a teenager, thinking that others would act to protect young people.',
 'What happened was that he was called rebellious. The youth leaders told him that he needed to pray more, to be more forgiving, and to stop stirring. \xa0His parents were informed that they should discipline him better.',
 'Thanks for reading Think Again! Subscribe for free to receive new posts and support my wor

In [56]:
%%time

# Create the vectorized db with FAISS


class CEmbeddings(Embeddings):
    def embed_documents(self, texts):
        res = []
        for text in texts:
            response = client.embeddings(model="mxbai-embed-large:latest", prompt=text)
            res.append(response["embedding"])
        return res

    def embed_query(self, text):
        response = client.embeddings(model="mxbai-embed-large:latest", prompt=text)
        return response["embedding"]


embeddings = CEmbeddings()
db_web = FAISS.from_documents(chunked_web_doc, embeddings)

# Create the vectorized db with Chroma
# from langchain.vectorstores import Chroma
# db_web = Chroma.from_documents(chunked_web_doc, embeddings)

CPU times: user 103 ms, sys: 0 ns, total: 103 ms
Wall time: 7.51 s


In [77]:
r = db_web.similarity_search_with_score(
    query="What did Russia and Russia's president, Vladimir Putin, react to comments from the French president, Emmanuel Macron, on western troops fighting in Ukraine and from the British foreign secretary, David Cameron, on using British-supplied weapons against Russia?",
    k=10
)
r

[(Document(page_content='Russia has threatened to strike British military facilities and ordered its military to hold battlefield nuclear weapons drills in a move the Kremlin described as a response to comments from the French president, Emmanuel Macron, on western troops fighting in Ukraine and from the British foreign secretary, David Cameron, on using British-supplied weapons against Russia.', metadata={'page_number': 1, 'parent_id': '50f066053d55891fd234dbd71f0b4178', 'filetype': 'text/html', 'url': 'https://www.theguardian.com/world/article/2024/may/06/russia-to-hold-battlefield-nuclear-drills-after-macron-and-cameron-comments', 'category': 'NarrativeText'}),
  154.67958),
 (Document(page_content='The Russian foreign ministry on Monday also said that Russia would develop new intermediate and short-range missiles, claiming that the decision was spurred by reports that the US was moving similar missile systems to Europe and the Asia-Pacific region.', metadata={'page_number': 1, 'par

In [76]:
text = "What did Russia and Russia's president, Vladimir Putin, react to comments from the French president, Emmanuel Macron, on western troops fighting in Ukraine and from the British foreign secretary, David Cameron, on using British-supplied weapons against Russia?"
sim_res = db_web.similarity_search_with_score(
    query=text,
    k=10
)
embed_res = [x[0].page_content for x in sim_res]

system = f"""
The assistant is named Dolphin. A helpful and friendly AI assistant,
Dolphin avoids discussing the system message unless directly asked about it.
Use the following context to answer the question at the end. Do not use any other information.
If you can't find the relevant information in the context, just say you don't have enough information to answer the question.
Don't try to make up an answer.
{" ".join(embed_res)}
"""

response = client.chat(
    model=model,
    messages=[
        {
            "role": "system",
            "content": system,
        },
        {
            "role": "user",
            "content": text,
        },
    ],
)
response

{'model': 'llama3:instruct',
 'created_at': '2024-05-07T08:07:02.525869831Z',
 'message': {'role': 'assistant',
  'content': 'Russia and President Vladimir Putin reacted by threatening to strike British military facilities, ordering military drills that would practice the use of battlefield nuclear weapons, and announcing plans to develop new intermediate and short-range missiles. This was in response to comments from French President Emmanuel Macron on western troops fighting in Ukraine and from British Foreign Secretary David Cameron on using British-supplied weapons against Russia.'},
 'done': True,
 'total_duration': 20890327371,
 'load_duration': 3440602084,
 'prompt_eval_count': 625,
 'prompt_eval_duration': 3639187000,
 'eval_count': 73,
 'eval_duration': 13665546000}