### Install necessary dependencies
If you need to uninstall the dependencies, you can run e.g. `!pip uninstall -r ../requirements.txt -y` in the cell below.

In [1]:
!python -m pip install -r ../requirements.txt --quiet

### Make Python scripts accessible
There is a couple of Python scripts in the `/src` directory. We can make them accessible by adding the directory to the path ENV variable. We insert the path to position 1 to make it the first path scanned for the required modules to not confuse our scripts with scripts with the same name but in unrelated locations. This change to ENV is temporary.

In [2]:
import os
import sys

path_to_src = os.path.abspath('../src')

if path_to_src not in sys.path:
    sys.path.insert(1, path_to_src)

### Import dependencies

Just common dependencies except the aws_cli - that is our own script for accessing Amazon Bedrock service. The Bedrock is a home of the models we are going to use for text embedding and text generation.

In [2]:
import numpy as np
import pandas as pd
import re
import textwrap

from src.aws_cli import Client

### Naive RAG QA
The solution from the previous notebook has been implemented in the form of a class. That's the only difference.

In [3]:
class NaiveRAGQA:
    def __init__(self):
        self.client = Client()

    def load_knowledge_base(self, dir_path):
        # read all documents
        documents_list = []
        for file_name in os.listdir(dir_path):
            if file_name.endswith('.json'):
                documents_list.append(pd.read_json(os.path.join(dir_path, file_name)))

        # build database
        documents = pd.concat(documents_list, ignore_index=True)

        # build index
        embeddings = documents['content'].apply(self.client.embed_text)

        # store knowledge base
        self.knowledge_base = {'documents': documents, 'index': np.stack(embeddings, axis=0)}

    def _retriever(self, query):
        # embed user query
        query_embedding = self.client.embed_text(query)  # use the same embedding that was used for the knowledge base

        # retrieve most similar document
        similarities = np.dot(self.knowledge_base['index'], query_embedding)
        most_similar_idx = np.argmax(similarities)
        document = self.knowledge_base['documents'].iloc[most_similar_idx]

        return document, similarities[most_similar_idx]

    def _construct_prompt(self, query_text, document_text):
        prompt = textwrap.dedent(
            f'''\
            <s>[INST]Use only the below-given KNOWLEDGE and not prior knowledge to provide an accurate, helpful, concise, and clear answer to the QUERY below.
            Avoid copying word-for-word from the KNOWLEDGE and try to use your own words when possible.

            KNOWLEDGE:
            "{document_text}"

            Answer the QUERY using the provided KNOWLEDGE. Do not provide notes, comments, or explanations.

            QUERY: "{query_text}"
            ANSWER:[/INST]
            '''
        )

        return prompt

    def process_query(self, query):
        document, similarity = self._retriever(query)
        prompt = self._construct_prompt(query, document.content)
        answer = self.client.execute_prompt(prompt)

        return answer, document.url

Create the naive RAG, load the Wikipedia knowledge base, and test it on a couple of examples from the previous notebook.

In [4]:
naive_rag = NaiveRAGQA()
naive_rag.load_knowledge_base('../data/wikipedia_kb/')

query = 'How many employees did Socialbakers have in 2016?'
answer, reference = naive_rag.process_query(query)

print(f'RAG: "{answer}"\n     [ref: {reference}]')

RAG: "In 2016, Socialbakers had 350 employees."
     [ref: https://en.wikipedia.org/wiki/Emplifi]


Let's check how well it works when the question is about something very specific, such as names. We added *Jan Rus* to the Emplifi Wikipedia page, but it's just one mention in a document that is mostly about the Emplifi company and there are another couple of documents with content containing a lot of *rus* tokens. Let's see if the naive RAG can find him.

In [5]:
query1 = 'Who is Jan Rus?'
query2 = 'Where is Jan Rus working now?'

answer1, reference1 = naive_rag.process_query(query1)
answer2, reference2 = naive_rag.process_query(query2)

print(f'RAG: "{answer1}"\n     [ref: {reference1}]')
print(f'RAG: "{answer2}"\n     [ref: {reference2}]')

RAG: "Based on the provided KNOWLEDGE, there is no information about a person named "Jan Rus." It is possible that there may be a misunderstanding or typo in the query, as "Rus" refers to the first East Slavic state, Kievan Rus', which arose in the 9th century and adopted Orthodox Christianity from the Byzantine Empire in 988. However, there is no mention of any individual named "Jan" in relation to Kievan Rus' or any other part of Russian history in the given KNOWLEDGE."
     [ref: https://en.wikipedia.org/wiki/Russia]
RAG: "Based on the provided KNOWLEDGE, there is no information about Jan Rus or their current employment status. Therefore, I cannot provide an answer to this QUERY."
     [ref: https://en.wikipedia.org/wiki/Belarus]


### Fixed-size chunking
The single mention of the name Jan Rus vanished in the long document about a seemingly unrelated topic - Emplifi.
Let's try to improve the RAG performance with chunking! The chunking may help because the resulting documents, the chunks,
will be much shorter and the name Jan Rus will be more significant in the resulting embeddings. Chunking of the text just into fixed-size chunks could be sufficient.

In [7]:
# some text, just for a quick experiment
chunking_test_doc = '''\
Emplifi is an American private company headquartered in Columbus, Ohio. It develops and markets\
customer experience systems. The business was founded in 2020 after social media analytics company Socialbakers was acquired\
by customer experience systems business Astute. The combined entity changed its name to Emplifi.\
'''

print(chunking_test_doc)

# we can start with chunks having 70 characters
# and 15 characters overlap between neighboring chunks
chunk_size = 70
chunk_overlap = 15

def str2chunks(text, chunk_size, chunk_overlap):
    # ensure the chunking settings make sense
    chunk_size = abs(chunk_size)
    chunk_overlap = abs(chunk_overlap)

    if (chunk_size - chunk_overlap) < 1:
        raise Exception('The chunk_size needs to be larger than chunk_overlap')

    return [text[a:a + chunk_size] for a in range(0, len(text), chunk_size - chunk_overlap)]

print(str2chunks(chunking_test_doc, chunk_size, chunk_overlap))

Emplifi is an American private company headquartered in Columbus, Ohio. It develops and marketscustomer experience systems. The business was founded in 2020 after social media analytics company Socialbakers was acquiredby customer experience systems business Astute. The combined entity changed its name to Emplifi.
['Emplifi is an American private company headquartered in Columbus, Ohio', ' Columbus, Ohio. It develops and marketscustomer experience systems. T', 'ence systems. The business was founded in 2020 after social media anal', 'cial media analytics company Socialbakers was acquiredby customer expe', 'y customer experience systems business Astute. The combined entity cha', 'ined entity changed its name to Emplifi.']


Rewrite the `load_knowledge_base` function to start splitting loaded documents into chunks before they are vectorized and stored in the knowledge base. Rebuild the knowledge base and ensure its content looks as intended.

In [8]:
def load_knowledge_base(self, dir_path):
    # read all documents
    documents_list = []
    for file_name in os.listdir(dir_path):
        if file_name.endswith('.json'):
            documents_list.append(pd.read_json(os.path.join(dir_path, file_name)))

    # build database
    documents = pd.concat(documents_list, ignore_index=True)

    # split documents into chunks
    documents['chunks'] = documents['content'].apply(str2chunks, chunk_size=chunk_size, chunk_overlap=chunk_overlap)
    documents = documents.explode('chunks').reset_index(drop=True)
    documents = documents.drop(columns=['content'])
    documents = documents.rename(columns={'chunks': 'content'})

    # build index
    embeddings = documents['content'].apply(self.client.embed_text)

    # store knowledge base
    self.knowledge_base = {'documents': documents, 'index': np.stack(embeddings, axis=0)}

naive_rag.load_knowledge_base = load_knowledge_base
naive_rag.load_knowledge_base(naive_rag, '../data/wikipedia_kb/')

display(naive_rag.knowledge_base['documents'].head(10))

Unnamed: 0,url,title,content
0,https://en.wikipedia.org/wiki/Russia,Russia,"Russia, or the Russian Federation, is a countr..."
1,https://en.wikipedia.org/wiki/Russia,Russia,g Eastern Europe and North Asia. Russia is the...
2,https://en.wikipedia.org/wiki/Russia,Russia,"country in the world by area, extending across..."
3,https://en.wikipedia.org/wiki/Russia,Russia,ime zones and sharing land borders with fourte...
4,https://en.wikipedia.org/wiki/Russia,Russia,ies. It is the world's ninth-most populous cou...
5,https://en.wikipedia.org/wiki/Russia,Russia,Europe's most populous country. The country's ...
6,https://en.wikipedia.org/wiki/Russia,Russia,s well as its largest city is Moscow. Saint Pe...
7,https://en.wikipedia.org/wiki/Russia,Russia,is Russia's second-largest city and cultural c...
8,https://en.wikipedia.org/wiki/Russia,Russia,ther major cities in the country include Novos...
9,https://en.wikipedia.org/wiki/Russia,Russia,"ekaterinburg, Nizhny Novgorod, Chelyabinsk, Kr..."


In [9]:
query1 = 'Who is Jan Rus?'
query2 = 'Where is Jan Rus working now?'

answer1, reference1 = naive_rag.process_query(query1)
answer2, reference2 = naive_rag.process_query(query2)

print(f'RAG: "{answer1}"\n     [ref: {reference1}]')
print(f'RAG: "{answer2}"\n     [ref: {reference2}]')

RAG: "Jan Rus is a Research Team Lead.

(Note: The provided KNOWLEDGE was incomplete and cut off, so I used the available information to answer the QUERY concisely and clearly.)"
     [ref: https://en.wikipedia.org/wiki/Emplifi]
RAG: "Based on the provided knowledge, Jan Rus is currently working as a Research Team Lead. However, the specific location or organization where he is working is not mentioned in the given text."
     [ref: https://en.wikipedia.org/wiki/Emplifi]


Better! The RAG can find the name now but not the organization where he works. Why? Let's peek into the knowledge base and look for Jan Rus ...

In [10]:
all_chunks = naive_rag.knowledge_base['documents'][naive_rag.knowledge_base['documents']['content'].str.contains('Jan Rus')]
display(all_chunks)
print(f'"{all_chunks.iloc[0].content}"')

Unnamed: 0,url,title,content
84,https://en.wikipedia.org/wiki/Emplifi,Emplifi,nding over the years. Jan Rus is a Research Te...


"nding over the years. Jan Rus is a Research Team Lead, he is working a"


It seems the chunks are too short and the information in them can miss context. Let's make them bigger - change the `chunk_size` to 250 and `chunk_overlap` to 50 and re-run the experiment.

...

We expected it will end up better. Let's see what is in the top 6 documents most relevant to the query. Maybe the right document is quite similar just not the most similar.

In [11]:
# embedding of "Where is Jan Rus working now?"
query_embedding = naive_rag.client.embed_text(query2)

# retrieve 6 most similar documents
similarities = np.dot(naive_rag.knowledge_base['index'], query_embedding)
most_similar_idxs = pd.Series(similarities).sort_values(ascending=False).index.values[0:6]
documents = pd.DataFrame(naive_rag.knowledge_base['documents'].iloc[most_similar_idxs])
documents['similarity'] = similarities[most_similar_idxs]

display(documents)

Unnamed: 0,url,title,content,similarity
84,https://en.wikipedia.org/wiki/Emplifi,Emplifi,nding over the years. Jan Rus is a Research Te...,139.141512
9,https://en.wikipedia.org/wiki/Russia,Russia,"ekaterinburg, Nizhny Novgorod, Chelyabinsk, Kr...",119.20293
0,https://en.wikipedia.org/wiki/Russia,Russia,"Russia, or the Russian Federation, is a countr...",110.334886
122,https://en.wikipedia.org/wiki/Belarus,Belarus,"ia to the east and northeast, Ukraine to the s...",107.426976
153,https://en.wikipedia.org/wiki/Belarus,Belarus,"itics until well into the 1970s, overseeing Be...",105.321153
10,https://en.wikipedia.org/wiki/Russia,Russia,", Kazan, Krasnodar and Rostov-on-Don.\n ...",104.819989


So, the chunking is an improvement and we are quite close to retrieving the right chunk/knowledge for the second query. Let's create a copy of the RAG class and add the chunking because it helps. Then, we can try another approach ...

In [12]:
class BasicRAGQA:
    def __init__(self, chunk_size=250, chunk_overlap=50):
        self.client = Client()

        # ensure the chunking settings make sense
        chunk_size = abs(chunk_size)
        chunk_overlap = abs(chunk_overlap)

        if (chunk_size - chunk_overlap) < 1:
            raise Exception('The chunk_size needs to be larger than chunk_overlap')

        self.chunk_size = chunk_size
        self.chunk_overlap = chunk_overlap


    def load_knowledge_base(self, dir_path):
        # read all documents
        documents_list = []
        for file_name in os.listdir(dir_path):
            if file_name.endswith('.json'):
                documents_list.append(pd.read_json(os.path.join(dir_path, file_name)))

        # build database
        documents = pd.concat(documents_list, ignore_index=True)

        # split documents into chunks
        documents['chunks'] = documents['content'].apply(self._str2chunks)
        documents = documents.explode('chunks').reset_index(drop=True)
        documents = documents.drop(columns=['content'])
        documents = documents.rename(columns={'chunks': 'content'})

        # build index
        embeddings = documents['content'].apply(self.client.embed_text)

        # store knowledge base
        self.knowledge_base = {'documents': documents, 'index': np.stack(embeddings, axis=0)}

    def _str2chunks(self, text):
        return [text[a:a + self.chunk_size] for a in range(0, len(text), self.chunk_size - self.chunk_overlap)]

    def _retriever(self, query):
        # embed user query
        query_embedding = self.client.embed_text(query)  # use the same embedding that was used for the knowledge base

        # retrieve most similar document
        similarities = np.dot(self.knowledge_base['index'], query_embedding)
        most_similar_idx = np.argmax(similarities)
        document = self.knowledge_base['documents'].iloc[most_similar_idx]

        return document, similarities[most_similar_idx]

    def _construct_prompt(self, query_text, document_text):
        prompt = textwrap.dedent(
            f'''\
            <s>[INST]Use only the below-given KNOWLEDGE and not prior knowledge to provide an accurate, helpful, concise, and clear answer to the QUERY below.
            Avoid copying word-for-word from the KNOWLEDGE and try to use your own words when possible.

            KNOWLEDGE:
            "{document_text}"

            Answer the QUERY using the provided KNOWLEDGE. Do not provide notes, comments, or explanations.

            QUERY: "{query_text}"
            ANSWER:[/INST]
            '''
        )

        return prompt

    def process_query(self, query):
        document, similarity = self._retriever(query)
        prompt = self._construct_prompt(query, document.content)
        answer = self.client.execute_prompt(prompt)

        return answer, document.url

### Retrieve more chunks
We can try to use KNN (k nearest neighbors) instead of the current NN (nearest neighbor) to retrieve not 1 but k chunks. We can use all the chunks to augment the prompt sent to the LLM.

In [13]:
class BasicRAGQA:
    def __init__(self, chunk_size=250, chunk_overlap=50, k=8):
        self.client = Client()

        # ensure the chunking settings make sense
        chunk_size = abs(chunk_size)
        chunk_overlap = abs(chunk_overlap)

        if (chunk_size - chunk_overlap) < 1:
            raise Exception('The chunk_size needs to be larger than chunk_overlap')

        k = abs(k)

        if not 0 < k <= 20:
            raise Exception('The k needs to be between 1 and 20')

        self.chunk_size = chunk_size
        self.chunk_overlap = chunk_overlap
        self.k = k


    def load_knowledge_base(self, dir_path):
        # read all documents
        documents_list = []
        for file_name in os.listdir(dir_path):
            if file_name.endswith('.json'):
                documents_list.append(pd.read_json(os.path.join(dir_path, file_name)))

        # build database
        documents = pd.concat(documents_list, ignore_index=True)

        # split documents into chunks
        documents['chunks'] = documents['content'].apply(self._str2chunks)
        documents = documents.explode('chunks').reset_index(drop=True)
        documents = documents.drop(columns=['content'])
        documents = documents.rename(columns={'chunks': 'content'})

        # build index
        embeddings = documents['content'].apply(self.client.embed_text)

        # store knowledge base
        self.knowledge_base = {'documents': documents, 'index': np.stack(embeddings, axis=0)}

    def _str2chunks(self, text):
        return [text[a:a + self.chunk_size] for a in range(0, len(text), self.chunk_size - self.chunk_overlap)]

    def _retriever(self, query):
        # embed user query
        query_embedding = self.client.embed_text(query)  # use the same embedding that was used for the knowledge base

        # retrieve most similar document
        similarities = np.dot(self.knowledge_base['index'], query_embedding)
        top_k_idxs = np.argpartition(similarities, -self.k)[-self.k:]
        top_k_sorted_idxs = top_k_idxs[np.argsort(similarities[top_k_idxs])][::-1]
        top_k_documents = self.knowledge_base['documents'].iloc[top_k_sorted_idxs]

        return top_k_documents, similarities[top_k_sorted_idxs]

    def _construct_prompt(self, query_text, documents_text):
        prompt = textwrap.dedent(
            '''\
            <s>[INST]Use only the below-given KNOWLEDGE and not prior knowledge to provide an accurate, helpful, concise, and clear answer to the QUERY below.
            Avoid copying word-for-word from the KNOWLEDGE and try to use your own words when possible.

            KNOWLEDGE:
            {texts}

            Answer the QUERY using the provided KNOWLEDGE. Do not provide notes, comments, or explanations.
            After the answer, write a paragraph starting with "References: " followed by the [id] of each reference article containing information needed to answer the query.


            QUERY: "{query_text}"
            ANSWER:[/INST]
            '''
        ).format(
            query_text = query_text,
            texts = '\n\n'.join(['Article [{}]: """\n{}\n"""'.format(idx, text) for idx, text in documents_text.reset_index(drop=True).items()])
        )

        return prompt

    def process_query(self, query):
        documents, similarities = self._retriever(query)
        prompt = self._construct_prompt(query, documents.content)
        answer = self.client.execute_prompt(prompt)

        answer_and_refs = answer.split('References:')
        answer = answer_and_refs[0].strip()

        llm_references = []
        if len(answer_and_refs) > 1:
            for ref_id in re.findall(r'\[(\d+)\]', answer_and_refs[1]):
                doc = documents.iloc[int(ref_id)]
                llm_references.append((doc.title, doc.url, similarities[int(ref_id)]))

            llm_references = sorted(llm_references, key=lambda x: x[1], reverse=True)

        return answer, llm_references

Test the effect of retrieving more chunks on the easy example, just to see it works.

In [14]:
basic_rag = BasicRAGQA()
basic_rag.load_knowledge_base('../data/wikipedia_kb/')

query = 'How many employees did Socialbakers have in 2016?'
answer, references = basic_rag.process_query(query)

print(f'RAG: "{answer}"\n     [refs: {references}]')

RAG: "In 2016, Socialbakers had 350 employees."
     [refs: [('Emplifi', 'https://en.wikipedia.org/wiki/Emplifi', 126.11330250122865), ('Emplifi', 'https://en.wikipedia.org/wiki/Emplifi', 82.78630937431362)]]


And it works on the broken example too!

In [15]:
query1 = 'Who is Jan Rus?'
query2 = 'Where is Jan Rus working now?'

answer1, references1 = basic_rag.process_query(query1)
answer2, references2 = basic_rag.process_query(query2)

print(f'RAG: "{answer1}"\n     [refs: {references1}]')
print(f'RAG: "{answer2}"\n     [refs: {references2}]')

RAG: "Jan Rus is a Research Team Lead who is currently working at Emplifi."
     [refs: [('Emplifi', 'https://en.wikipedia.org/wiki/Emplifi', 99.09258361662546)]]
RAG: "Jan Rus is currently working at Emplifi."
     [refs: [('Emplifi', 'https://en.wikipedia.org/wiki/Emplifi', 86.93297576438992), ('Emplifi', 'https://en.wikipedia.org/wiki/Emplifi', 86.93297576438992)]]


But will this version of RAG QA work with questions aiming outside of the KB domain? The first query completely misses the KB, there are no relevant documents. But the second query is harder - it is possible to find a document that is relevant to the "social media" part of the question ad it can confuse the model.

In [16]:
query1 = 'What is a dog?'
query2 = 'Do dogs love social media?'
query3 = 'Hi!'

answer1, references1 = basic_rag.process_query(query1)
answer2, references2 = basic_rag.process_query(query2)
answer3, references3 = basic_rag.process_query(query3)

print(f'RAG: "{answer1}"\n     [refs: {references1}]')
print(f'RAG: "{answer2}"\n     [refs: {references2}]')
print(f'RAG: "{answer3}"\n     [refs: {references3}]')

RAG: "Query not answerable using provided knowledge."
     [refs: [('Belarus', 'https://en.wikipedia.org/wiki/Belarus', 35.417392358065236), ('Belarus', 'https://en.wikipedia.org/wiki/Belarus', 22.147229189567007)]]
RAG: "Based on the provided knowledge, there is no information regarding dogs and their love for social media."
     [refs: []]
RAG: "Hello! How can I assist you today?"
     [refs: [('Yuval Ben-Itzhak', 'https://en.wikipedia.org/wiki/Yuval_Ben-Itzhak', 20.971611589100025), ('Yuval Ben-Itzhak', 'https://en.wikipedia.org/wiki/Yuval_Ben-Itzhak', 19.230715491606723), ('Yuval Ben-Itzhak', 'https://en.wikipedia.org/wiki/Yuval_Ben-Itzhak', 19.100102402148362), ('Yuval Ben-Itzhak', 'https://en.wikipedia.org/wiki/Yuval_Ben-Itzhak', 18.50406482318339), ('Yuval Ben-Itzhak', 'https://en.wikipedia.org/wiki/Yuval_Ben-Itzhak', 15.78845807208484), ('Emplifi', 'https://en.wikipedia.org/wiki/Emplifi', 16.984539112920615), ('Belarus', 'https://en.wikipedia.org/wiki/Belarus', 20.8585656158881

### "I don't know" state
There are cases when the RAG system doesn't have enough relevant context to respond to the user query reliably. In these cases, the RAG needs to have an option to say "I don't know" to further suppress potential hallucinations. Let's do some more prompt engineering ...

In [17]:
class BasicRAGQA:
    def __init__(self, chunk_size=250, chunk_overlap=50, k=8):
        self.client = Client()

        # ensure the chunking settings make sense
        chunk_size = abs(chunk_size)
        chunk_overlap = abs(chunk_overlap)

        if (chunk_size - chunk_overlap) < 1:
            raise Exception('The chunk_size needs to be larger than chunk_overlap')

        k = abs(k)

        if not 0 < k <= 20:
            raise Exception('The k needs to be between 1 and 20')

        self.chunk_size = chunk_size
        self.chunk_overlap = chunk_overlap
        self.k = k


    def load_knowledge_base(self, dir_path):
        # read all documents
        documents_list = []
        for file_name in os.listdir(dir_path):
            if file_name.endswith('.json'):
                documents_list.append(pd.read_json(os.path.join(dir_path, file_name)))

        # build database
        documents = pd.concat(documents_list, ignore_index=True)

        # split documents into chunks
        documents['chunks'] = documents['content'].apply(self._str2chunks)
        documents = documents.explode('chunks').reset_index(drop=True)
        documents = documents.drop(columns=['content'])
        documents = documents.rename(columns={'chunks': 'content'})

        # build index
        embeddings = documents['content'].apply(self.client.embed_text)

        # store knowledge base
        self.knowledge_base = {'documents': documents, 'index': np.stack(embeddings, axis=0)}

    def _str2chunks(self, text):
        return [text[a:a + self.chunk_size] for a in range(0, len(text), self.chunk_size - self.chunk_overlap)]

    def _retriever(self, query):
        # embed user query
        query_embedding = self.client.embed_text(query)  # use the same embedding that was used for the knowledge base

        # retrieve most similar document
        similarities = np.dot(self.knowledge_base['index'], query_embedding)
        top_k_idxs = np.argpartition(similarities, -self.k)[-self.k:]
        top_k_sorted_idxs = top_k_idxs[np.argsort(similarities[top_k_idxs])][::-1]
        top_k_documents = self.knowledge_base['documents'].iloc[top_k_sorted_idxs]

        return top_k_documents, similarities[top_k_sorted_idxs]

    def _construct_prompt(self, query_text, documents_text):
        prompt = textwrap.dedent(
            '''\
            <s>[INST]Use only the below-given KNOWLEDGE and not prior knowledge to provide an accurate, helpful, concise, and clear answer to the QUERY below.
            Avoid copying word-for-word from the KNOWLEDGE and try to use your own words when possible.

            KNOWLEDGE:
            {texts}

            Answer the QUERY using only the provided KNOWLEDGE. Don't provide notes, comments, or explanations.
            After the answer, write a paragraph starting with "References: " followed by the [id] of each reference article containing information needed to answer the query.
            If none of the articles from KNOWLEDGE contains information needed to provide a precise answer, or if you are not 100 % sure, reply with string: "DONT_KNOW".

            QUERY: "{query_text}"
            ANSWER:[/INST]
            '''
        ).format(
            query_text = query_text,
            texts = '\n\n'.join(['Article [{}]: """\n{}\n"""'.format(idx, text) for idx, text in documents_text.reset_index(drop=True).items()])
        )

        return prompt

    def process_query(self, query):
        documents, similarities = self._retriever(query)
        prompt = self._construct_prompt(query, documents.content)
        answer = self.client.execute_prompt(prompt)

        answer_and_refs = answer.split('References:')
        answer = answer_and_refs[0].strip()

        llm_references = []
        if len(answer_and_refs) > 1:
            for ref_id in re.findall(r'\[(\d+)\]', answer_and_refs[1]):
                doc = documents.iloc[int(ref_id)]
                llm_references.append((doc.title, doc.url, similarities[int(ref_id)]))

            llm_references = sorted(llm_references, key=lambda x: x[1], reverse=True)

        if ('DONT_KNOW' in answer) or (len(llm_references) == 0):
            answer = 'I\'m sorry, I don\'t know answer to your query.'
            llm_references = []

        return answer, llm_references

In [18]:
basic_rag = BasicRAGQA()
basic_rag.load_knowledge_base('../data/wikipedia_kb/')

query1 = 'What is a dog?'
query2 = 'Do dogs love social media?'
query3 = 'Hi!'

answer1, references1 = basic_rag.process_query(query1)
answer2, references2 = basic_rag.process_query(query2)
answer3, references3 = basic_rag.process_query(query3)

print(f'RAG: "{answer1}"\n     [refs: {references1}]')
print(f'RAG: "{answer2}"\n     [refs: {references2}]')
print(f'RAG: "{answer3}"\n     [refs: {references3}]')

RAG: "I'm sorry, I don't know answer to your query."
     [refs: []]
RAG: "I'm sorry, I don't know answer to your query."
     [refs: []]
RAG: "I'm sorry, I don't know answer to your query."
     [refs: []]


Here we have the last example of inputs which can cause issues. Notice that the RAG replied correctly but without any reference to Russia. Why?

In [19]:
query1 = 'Is Russia bigger than Belarus?'  # Elaborate
query2 = 'Which country is bigger, Russia or Belarus?'  # Elaborate

answer1, references1 = basic_rag.process_query(query1)
answer2, references2 = basic_rag.process_query(query2)

print(f'RAG: "{answer1}"\n     [refs: {references1}]')
print(f'RAG: "{answer2}"\n     [refs: {references2}]')

RAG: "Based on the provided knowledge, Russia is bigger than Belarus."
     [refs: [('Belarus', 'https://en.wikipedia.org/wiki/Belarus', 205.95386592863792), ('Belarus', 'https://en.wikipedia.org/wiki/Belarus', 198.6775665542217)]]
RAG: "Based on the provided knowledge, Russia is bigger than Belarus."
     [refs: [('Belarus', 'https://en.wikipedia.org/wiki/Belarus', 187.64516225245112), ('Belarus', 'https://en.wikipedia.org/wiki/Belarus', 185.5182271467039), ('Belarus', 'https://en.wikipedia.org/wiki/Belarus', 161.03730269652505)]]


This example is super confusing for the RAG because not only we can find relevant documents but they even contain part of the needed information. Unfortunately, the retrieval is not good in this case and the chunk about Russia that could help to correctly answer these questions with proper reasoning and references is not retrieved. Can you fix it?