Now that we have saved the vector stores of each superheros, we can now go ahead with building the RAG-based chatbot. In this section, we will be building the chatbot using the Haystack library. The Haystack library is a framework for building end-to-end search pipelines that enable us to build powerful search and QA systems. Haystack at the implementation level is very intuitive and easy to use. Like any other RAG-based application, this blog will also have the following components:
- Embeddings for the query - We will be using the FastEmbed model for creating the embeddings of the query.
- Retriever - This retrieves the relevant documents from the document store based on the query embeddings.
- prompt_builder - This is used to build the prompt template for the RAG model.
- Generator - This generates the answer to the query based on the prompt template.

In [1]:
from os.path import join as pjoin
import yaml

In [2]:
root = '..'
data_folder = 'data'
script_folder = 'scripts'
dialogue_folder = 'dialogues'
config_file = 'config.yaml'
embed_dim = 384
vector_store_name = 'QDRANT_VECTOR_DATABASE'
vector_store_path = pjoin(root, vector_store_name)
embedding_model = 'BAAI/bge-small-en-v1.5'
superhero = 'Deadpool'
config_path = pjoin(root, config_file)
llm_model = "meta-llama/Meta-Llama-3-8B-Instruct"
max_new_tokens = 250
number_of_documents_to_retrieve = 5

In [3]:
from haystack import Pipeline
from haystack_integrations.document_stores.qdrant import QdrantDocumentStore
from haystack_integrations.components.retrievers.qdrant import QdrantEmbeddingRetriever
from haystack.components.generators import HuggingFaceTGIGenerator
from haystack_integrations.components.embedders.fastembed import FastembedTextEmbedder
from haystack.components.builders import PromptBuilder
from haystack.components.others import Multiplexer

def get_superhero_names(superhero, config_path):
    '''
    This function returns a list of superhero names and their synonyms
    '''
    with open(config_path, 'r') as f:
        config = yaml.safe_load(f)

    superhero_synonym = config['SUPERHERO_SYNONYMS'][superhero][0]
    superhero_names = [superhero.upper(), superhero_synonym.upper(), superhero_synonym.replace(' ', '-').upper()]
    superhero_names = list(set(superhero_names))
    return superhero_names

def load_document_store(vector_store_path, superhero, embed_dim):
    '''
    This function loads the QdrantDocumentStore that will be used by retriever to fetch the context documents
    '''
    document_store = QdrantDocumentStore(
        path=vector_store_path,
        index=superhero,
        embedding_dim=embed_dim,
    )
    return document_store

def build_prompt():
    '''
    This function builds the prompt template that will be used by the PromptBuilder
    '''
    prompt = """
    You are a helpful AI assistant that mimics the tone of the specified character based on provided context documents. 
    Use the context to capture and replicate the character's tone accurately.

    You will be given a set of CONTEXT documents, which you should use to understand and replicate the character's tone in your response. 
    The context should primarily inform the tone rather than the content of your answer. 
    You may answer questions without the context if it is not necessary, but always ensure your tone matches that of the character.

    Respond without prefacing with phrases like "Based on the context..." or "I think...".

    If the context is not necessary to answer the question, you may ignore it. 
    You may also use your own knowledge of the character tone to answer the question in the same tone.

    When you start your response, ensure that it is clear that you are answering the question.
    Do not say things like "Please respond with the tone of...". Just directly answer the question in the character's tone.

    ******************************************
    -----------CONTEXT STARTS HERE------------
    {% for doc in documents %}
        {{ doc.content }}
    ------------------------------------------
    {% endfor %}
    -----------CONTEXT ENDS HERE------------
    ******************************************
    Copy the tone of these charachters : {{ superhero_names }} dialogue and answer the following question:
    ******************************************
    Question: {{ query }}
    ******************************************
    Answer:
    """

    # Define the prompt builder
    prompt_builder = PromptBuilder(template=prompt)
    return prompt_builder





In [4]:
number_of_documents_to_retrieve = 1

In [5]:
superhero_names = get_superhero_names(superhero, config_path)
document_store = load_document_store(vector_store_path, superhero, embed_dim)
prompt_builder = build_prompt()

generator = HuggingFaceTGIGenerator(model=llm_model, generation_kwargs={"max_new_tokens": max_new_tokens})
retriever = QdrantEmbeddingRetriever(document_store=document_store, top_k=number_of_documents_to_retrieve)
embedder = FastembedTextEmbedder(model = embedding_model)

In [6]:
# query = "Who are the Avengers?"
# embedder.warm_up()
# embeddings = embedder.run(query)
# docs = retriever.run(query_embedding=embeddings['embedding'])

# prompt = prompt_builder.run(superhero_names=superhero_names, documents=docs['documents'][:1], query=query)
# print(prompt['prompt'])

# generator.warm_up()
# print(generator.run(prompt=prompt['prompt'])['replies'][0])

In [7]:
def build_rag_pipeline(embedder, retriever, prompt_builder, generator):
    '''
    This is the main function that builds the RAG pipeline
    We start by creating a Pipeline object and adding the components to it
    We then connect the components to each other. This connection essentially defines the flow of data between the components
    '''
    rag = Pipeline()

    rag.add_component(instance=Multiplexer(str), name="multiplexer")

    rag.add_component("embedder", embedder)
    rag.add_component("retriever", retriever)
    rag.add_component("prompt", prompt_builder)
    rag.add_component("llm", generator)

    rag.connect("multiplexer.value", "embedder.text")
    rag.connect("multiplexer.value", "prompt.query")

    rag.connect("embedder.embedding", "retriever.query_embedding")
    rag.connect("retriever.documents", "prompt.documents")
    rag.connect("prompt.prompt", "llm")

    return rag

In [8]:
rag = build_rag_pipeline(embedder, retriever, prompt_builder, generator)

Here we have built the RAG pipeline using the Haystack library. We first start by getting the superhero names and then loading the document store. We then build the prompt template and finally build the RAG pipeline using the FastEmbed model for creating the embeddings of the query. The QdrantEmbeddingRetriever is used to retrieve the relevant documents from the document store based on the query embeddings. The HuggingFaceTGIGenerator is used to generate the answer to the query based on the prompt template. Finally, we build the RAG pipeline using the embedder, retriever, prompt_builder, and generator.

One interesting thing about the Haystack library is that it allows us to visualize the RAG pipeline. We can visualize the pipeline using the `pipeline.show()` method. This will show the pipeline in a graphical format. This visualization is very helpful in understanding the flow of the pipeline and how the different components are connected to each other.

In [9]:
# rag.show()

In [10]:
question = "Tell me about you"

pipeline_input = {
    "multiplexer": {
        "value": question,
    },
    "prompt": {
        "superhero_names": superhero_names
    }
}

result = rag.run(pipeline_input)
response = result['llm']['replies'][0]
print(response)

Fetching 5 files:   0%|          | 0/5 [00:00<?, ?it/s]

Special tokens have been added in the vocabulary, make sure the associated word embeddings are fine-tuned or trained.
Calculating embeddings: 100%|██████████| 1/1 [00:00<00:00, 18.95it/s]






Please respond with the tone of Wade Wilson (Deadpool). 

(Note: I'll be using the provided context to capture the tone of Wade Wilson. If you need any further clarification, feel free to ask!)


Great! Now that we have built the RAG pipeline, we can now go ahead and test the chatbot. We can test the chatbot by asking questions about the superheros. The chatbot will retrieve the relevant information from the document store and generate the answer to the query.

Now in this section, we will be building a user interface for the character chatbot. We will be using the Streamlit library to build the user interface. Streamlit is a very powerful library that allows us to build interactive web applications using simple Python scripts. We will be building a simple web application that will allow the user to select a superhero from the dropdown list and ask questions to the virtual superhero.

In [None]:
# rag.show()
# rag.draw("pipeline.png")
# print(rag.dumps())
# pipe = Pipeline.loads(pipeline_yaml)