# How to Build a RAG-Powered Chatbot with Chat, Embed, and Rerank

*Read the accompanying [blog post here](https://txt.cohere.com/rag-chatbot).*

![Feature](https://github.com/cohere-ai/notebooks/blob/main/notebooks/images/rag-chatbot.png?raw=1)

In this notebook, you’ll learn how to build a chatbot that has RAG capabilities, enabling it to connect to external documents, ground its responses on these documents, and produce document citations in its responses.

Below is a diagram that provides an overview of what we’ll build, followed by a list of the key steps involved.

![Overview](https://github.com/cohere-ai/notebooks/blob/main/notebooks/images/rag-chatbot-flow.png?raw=1)

Setup phase:
- Step 0: Ingest the documents – get documents, chunk, embed, and index.

For each user-chatbot interaction:
- Step 1: Get the user message
- Step 2: Call the Chat endpoint in query-generation mode
- If at least one query is generated
    - Step 3: Retrieve and rerank relevant documents
    - Step 4: Call the Chat endpoint in document mode to generate a grounded response with citations
- If no query is generated
    - Step 4: Call the Chat endpoint in normal mode to generate a response

Throughout the conversation:
- Append the user-chatbot interaction to the conversation thread
- Repeat with every interaction

In [6]:
! pip install cohere hnswlib unstructured -q

In [7]:
! pip install openai tiktoken



In [8]:
import cohere
import os
import hnswlib
import json
import uuid
from typing import List, Dict
from unstructured.partition.html import partition_html
from unstructured.chunking.title import chunk_by_title

os.environ["COHERE_API_KEY"]= "vOXc8PMABEh4ZgaSmxiirTsGom3Ttq482wdMmYBC"

co = cohere.Client(os.environ["COHERE_API_KEY"])

In [9]:
#@title Enable text wrapping in Google colab

from IPython.display import HTML, display

def set_css():
  display(HTML('''
  <style>
    pre {
        white-space: pre-wrap;
    }
  </style>
  '''))
get_ipython().events.register('pre_run_cell', set_css)

### Documents component

In [10]:
class Documents:
    """
    A class representing a collection of documents.

    Parameters:
    sources (list): A list of dictionaries representing the sources of the documents. Each dictionary should have 'title' and 'url' keys.

    Attributes:
    sources (list): A list of dictionaries representing the sources of the documents.
    docs (list): A list of dictionaries representing the documents, with 'title', 'content', and 'url' keys.
    docs_embs (list): A list of the associated embeddings for the documents.
    retrieve_top_k (int): The number of documents to retrieve during search.
    rerank_top_k (int): The number of documents to rerank after retrieval.
    docs_len (int): The number of documents in the collection.
    index (hnswlib.Index): The index used for document retrieval.

    Methods:
    load(): Loads the data from the sources and partitions the HTML content into chunks.
    embed(): Embeds the documents using the Cohere API.
    index(): Indexes the documents for efficient retrieval.
    retrieve(query): Retrieves documents based on the given query.

    """

    def __init__(self, sources: List[Dict[str, str]]):
        self.sources = sources
        self.docs = []
        self.docs_embs = []
        self.retrieve_top_k = 10
        self.rerank_top_k = 3
        self.load()
        self.embed()
        self.index()

    def load(self) -> None:
        """
        Loads the documents from the sources and chunks the HTML content.
        """
        print("Loading documents...")

        for source in self.sources:
            elements = partition_html(url=source["url"])
            chunks = chunk_by_title(elements)
            for chunk in chunks:
                self.docs.append(
                    {
                        "title": source["title"],
                        "text": str(chunk),
                        "url": source["url"],
                    }
                )

    def embed(self) -> None:
        """
        Embeds the documents using the Cohere API.
        """
        print("Embedding documents...")

        batch_size = 90
        self.docs_len = len(self.docs)

        for i in range(0, self.docs_len, batch_size):
            batch = self.docs[i : min(i + batch_size, self.docs_len)]
            texts = [item["text"] for item in batch]
            docs_embs_batch = co.embed(
                texts=texts, model="embed-english-v3.0", input_type="search_document"
            ).embeddings
            self.docs_embs.extend(docs_embs_batch)

    def index(self) -> None:
        """
        Indexes the documents for efficient retrieval.
        """
        print("Indexing documents...")

        self.idx = hnswlib.Index(space="ip", dim=1024)
        self.idx.init_index(max_elements=self.docs_len, ef_construction=512, M=64)
        self.idx.add_items(self.docs_embs, list(range(len(self.docs_embs))))

        print(f"Indexing complete with {self.idx.get_current_count()} documents.")

    def retrieve(self, query: str) -> List[Dict[str, str]]:
        """
        Retrieves documents based on the given query.

        Parameters:
        query (str): The query to retrieve documents for.

        Returns:
        List[Dict[str, str]]: A list of dictionaries representing the retrieved documents, with 'title', 'text', and 'url' keys.
        """
        docs_retrieved = []
        query_emb = co.embed(
            texts=[query], model="embed-english-v3.0", input_type="search_query"
        ).embeddings

        doc_ids = self.idx.knn_query(query_emb, k=self.retrieve_top_k)[0][0]

        docs_to_rerank = []
        for doc_id in doc_ids:
            docs_to_rerank.append(self.docs[doc_id]["text"])

        rerank_results = co.rerank(
            query=query,
            documents=docs_to_rerank,
            top_n=self.rerank_top_k,
            model="rerank-english-v2.0",
        )

        doc_ids_reranked = []
        for result in rerank_results:
            doc_ids_reranked.append(doc_ids[result.index])

        for doc_id in doc_ids_reranked:
            docs_retrieved.append(
                {
                    "title": self.docs[doc_id]["title"],
                    "text": self.docs[doc_id]["text"],
                    "url": self.docs[doc_id]["url"],
                }
            )

        return docs_retrieved

### Chatbot component

In [11]:
class Chatbot:
    """
    A class representing a chatbot.

    Parameters:
    docs (Documents): An instance of the Documents class representing the collection of documents.

    Attributes:
    conversation_id (str): The unique ID for the conversation.
    docs (Documents): An instance of the Documents class representing the collection of documents.

    Methods:
    generate_response(message): Generates a response to the user's message.
    retrieve_docs(response): Retrieves documents based on the search queries in the response.

    """

    def __init__(self, docs: Documents):
        self.docs = docs
        self.conversation_id = str(uuid.uuid4())

    def generate_response(self, message: str):
        """
        Generates a response to the user's message.

        Parameters:
        message (str): The user's message.

        Yields:
        Event: A response event generated by the chatbot.

        Returns:
        List[Dict[str, str]]: A list of dictionaries representing the retrieved documents.

        """
        # Generate search queries (if any)
        response = co.chat(message=message, search_queries_only=True)

        # If there are search queries, retrieve documents and respond
        if response.search_queries:
            print("Retrieving information...")

            documents = self.retrieve_docs(response)

            response = co.chat(
                message=message,
                documents=documents,
                conversation_id=self.conversation_id,
                stream=True,
            )
            for event in response:
                yield event

        # If there is no search query, directly respond
        else:
            response = co.chat(
                message=message,
                conversation_id=self.conversation_id,
                stream=True
            )
            for event in response:
                yield event

    def retrieve_docs(self, response) -> List[Dict[str, str]]:
        """
        Retrieves documents based on the search queries in the response.

        Parameters:
        response: The response object containing search queries.

        Returns:
        List[Dict[str, str]]: A list of dictionaries representing the retrieved documents.

        """
        # Get the query(s)
        queries = []
        for search_query in response.search_queries:
            queries.append(search_query["text"])

        # Retrieve documents for each query
        retrieved_docs = []
        for query in queries:
            retrieved_docs.extend(self.docs.retrieve(query))

        # # Uncomment this code block to display the chatbot's retrieved documents
        # print("DOCUMENTS RETRIEVED:")
        # for idx, doc in enumerate(retrieved_docs):
        #     print(f"doc_{idx}: {doc}")
        # print("\n")

        return retrieved_docs

### App component

In [12]:
class App:
    def __init__(self, chatbot: Chatbot):
        """
        Initializes an instance of the App class.

        Parameters:
        chatbot (Chatbot): An instance of the Chatbot class.

        """
        self.chatbot = chatbot

    def run(self):
        """
        Runs the chatbot application.

        """
        while True:
            # Get the user message
            message = input("User: ")

            # Typing "quit" ends the conversation
            if message.lower() == "quit":
                print("Ending chat.")
                break
            else:
                print(f"User: {message}")

            # Get the chatbot response
            response = self.chatbot.generate_response(message)

            # Print the chatbot response
            print("Chatbot:")
            flag = False
            for event in response:
                # Text
                if event.event_type == "text-generation":
                    print(event.text, end="")

                # Citations
                if event.event_type == "citation-generation":
                    if not flag:
                        print("\n\nCITATIONS:")
                        flag = True
                    print(event.citations)

            print(f"\n{'-'*100}\n")

### Define the documents

In [13]:
# Define the sources for the documents
# As an example, we'll use LLM University's Module 1: What are Large Language Models?
# https://docs.cohere.com/docs/intro-large-language-models


sources = [
    {
        "title": "Text Line Segmentation of Historical Documents: a Survey",
        "url": "https://arxiv.org/abs/0704.1267"
    },
    {
        "title": "Riemannian level-set methods for tensor-valued data",
        "url": "https://arxiv.org/abs/0705.0214"
    },
    {
        "title": "Multiresolution Approximation of Polygonal Curves in Linear Complexity",
        "url": "https://arxiv.org/abs/0705.0449"
    },
    {
        "title": "Medical Image Segmentation and Localization using Deformable Templates",
        "url": "https://arxiv.org/abs/0705.0781"
    },
    {
        "title": "Enhancement of Noisy Planar Nuclear Medicine Images using Mean Field\n  Annealing",
        "url": "https://arxiv.org/abs/0705.0828"
    },
    {
        "title": "An Independent Evaluation of Subspace Face Recognition Algorithms",
        "url": "https://arxiv.org/abs/0705.0952"
    },
    {
        "title": "MI image registration using prior knowledge",
        "url": "https://arxiv.org/abs/0705.3593"
    },
    {
        "title": "Automatic Detection of Pulmonary Embolism using Computational\n  Intelligence",
        "url": "https://arxiv.org/abs/0706.0300"
    },
    {
        "title": "Variational local structure estimation for image super-resolution",
        "url": "https://arxiv.org/abs/0709.1771"
    },
    {
        "title": "Bandwidth selection for kernel estimation in mixed multi-dimensional\n  spaces",
        "url": "https://arxiv.org/abs/0709.1920"
    },
    {
        "title": "Supervised learning on graphs of spatio-temporal similarity in satellite\n  image sequences",
        "url": "https://arxiv.org/abs/0709.3013"
    },
    {
        "title": "Graph rigidity, Cyclic Belief Propagation and Point Pattern Matching",
        "url": "https://arxiv.org/abs/0710.0043"
    },
    {
        "title": "High-Order Nonparametric Belief-Propagation for Fast Image Inpainting",
        "url": "https://arxiv.org/abs/0710.0243"
    },
    {
        "title": "An Affinity Propagation Based method for Vector Quantization Codebook\n  Design",
        "url": "https://arxiv.org/abs/0710.2037"
    },
    {
        "title": "Comparison and Combination of State-of-the-art Techniques for\n  Handwritten Character Recognition: Topping the MNIST Benchmark",
        "url": "https://arxiv.org/abs/0710.2231"
    },
    {
        "title": "Learning Similarity for Character Recognition and 3D Object Recognition",
        "url": "https://arxiv.org/abs/0712.0131"
    },
    {
        "title": "Learning View Generalization Functions",
        "url": "https://arxiv.org/abs/0712.0136"
    },
    {
        "title": "View Based Methods can achieve Bayes-Optimal 3D Recognition",
        "url": "https://arxiv.org/abs/0712.0137"
    },
    {
        "title": "Hierarchy construction schemes within the Scale set framework",
        "url": "https://arxiv.org/abs/0712.1878"
    },
    {
        "title": "A Class of LULU Operators on Multi-Dimensional Arrays",
        "url": "https://arxiv.org/abs/0712.2923"
    },
    {
        "title": "A Fast Hierarchical Multilevel Image Segmentation Method using Unbiased\n  Estimators",
        "url": "https://arxiv.org/abs/0712.4015"
    },
    {
        "title": "Automatic Text Area Segmentation in Natural Images",
        "url": "https://arxiv.org/abs/0801.4807"
    },
    {
        "title": "Wavelet and Curvelet Moments for Image Classification: Application to\n  Aggregate Mixture Grading",
        "url": "https://arxiv.org/abs/0802.3528"
    },
    {
        "title": "Spatio-activity based object detection",
        "url": "https://arxiv.org/abs/0803.1586"
    },
    {
        "title": "Using Spatially Varying Pixels Exposures and Bayer-covered Photosensors\n  for High Dynamic Range Imaging",
        "url": "https://arxiv.org/abs/0803.2812"
    },
    {
        "title": "Linear Time Recognition Algorithms for Topological Invariants in 3D",
        "url": "https://arxiv.org/abs/0804.1982"
    },
    {
        "title": "A New Algorithm for Interactive Structural Image Segmentation",
        "url": "https://arxiv.org/abs/0805.1854"
    },
    {
        "title": "A multilateral filtering method applied to airplane runway image",
        "url": "https://arxiv.org/abs/0805.2324"
    },
    {
        "title": "Increasing Linear Dynamic Range of Commercial Digital Photocamera Used\n  in Imaging Systems with Optical Coding",
        "url": "https://arxiv.org/abs/0805.2690"
    },
    {
        "title": "Statistical region-based active contours with exponential family\n  observations",
        "url": "https://arxiv.org/abs/0805.3217"
    },
    {
        "title": "Region-based active contour with noise and shape priors",
        "url": "https://arxiv.org/abs/0805.3218"
    },
    {
        "title": "DimReduction - Interactive Graphic Environment for Dimensionality\n  Reduction",
        "url": "https://arxiv.org/abs/0805.3964"
    },
    {
        "title": "Directional Cross Diamond Search Algorithm for Fast Block Motion\n  Estimation",
        "url": "https://arxiv.org/abs/0806.0689"
    },
    {
        "title": "Fast Wavelet-Based Visual Classification",
        "url": "https://arxiv.org/abs/0806.1446"
    },
    {
        "title": "Classification of curves in 2D and 3D via affine integral signatures",
        "url": "https://arxiv.org/abs/0806.1984"
    },
    {
        "title": "Conceptualization of seeded region growing by pixels aggregation. Part\n  1: the framework",
        "url": "https://arxiv.org/abs/0806.3885"
    },
    {
        "title": "Conceptualization of seeded region growing by pixels aggregation. Part\n  2: how to localize a final partition invariant about the seeded region\n  initialisation order",
        "url": "https://arxiv.org/abs/0806.3887"
    },
    {
        "title": "Conceptualization of seeded region growing by pixels aggregation. Part\n  3: a wide range of algorithms",
        "url": "https://arxiv.org/abs/0806.3928"
    },
    {
        "title": "Conceptualization of seeded region growing by pixels aggregation. Part\n  4: Simple, generic and robust extraction of grains in granular materials\n  obtained by X-ray tomography",
        "url": "https://arxiv.org/abs/0806.3939"
    },
    {
        "title": "The Five Points Pose Problem : A New and Accurate Solution Adapted to\n  any Geometric Configuration",
        "url": "https://arxiv.org/abs/0807.2047"
    },
    {
        "title": "An image processing analysis of skin textures",
        "url": "https://arxiv.org/abs/0807.4701"
    },
    {
        "title": "Higher Order Moments Generation by Mellin Transform for Compound Models\n  of Clutter",
        "url": "https://arxiv.org/abs/0808.2227"
    },
    {
        "title": "Automatic Identification and Data Extraction from 2-Dimensional Plots in\n  Digital Documents",
        "url": "https://arxiv.org/abs/0809.1802"
    },
    {
        "title": "Supervised Dictionary Learning",
        "url": "https://arxiv.org/abs/0809.3083"
    },
    {
        "title": "Modeling and Control with Local Linearizing Nadaraya Watson Regression",
        "url": "https://arxiv.org/abs/0809.3690"
    },
    {
        "title": "Hierarchical Bag of Paths for Kernel Based Shape Classification",
        "url": "https://arxiv.org/abs/0810.3579"
    },
    {
        "title": "Camera distortion self-calibration using the plumb-line constraint and\n  minimal Hough entropy",
        "url": "https://arxiv.org/abs/0810.4426"
    },
    {
        "title": "Graph-based classification of multiple observation sets",
        "url": "https://arxiv.org/abs/0810.4617"
    },
    {
        "title": "3D Face Recognition with Sparse Spherical Representations",
        "url": "https://arxiv.org/abs/0810.5325"
    },
    {
        "title": "Mapping Images with the Coherence Length Diagrams",
        "url": "https://arxiv.org/abs/0811.4699"
    },
    {
        "title": "Obtaining Depth Maps From Color Images By Region Based Stereo Matching\n  Algorithms",
        "url": "https://arxiv.org/abs/0812.1340"
    },
    {
        "title": "Sparse Component Analysis (SCA) in Random-valued and Salt and Pepper\n  Noise Removal",
        "url": "https://arxiv.org/abs/0812.2892"
    },
    {
        "title": "A Keygraph Classification Framework for Real-Time Object Detection",
        "url": "https://arxiv.org/abs/0901.4953"
    },
    {
        "title": "Using SLP Neural Network to Persian Handwritten Digits Recognition",
        "url": "https://arxiv.org/abs/0902.2788"
    },
    {
        "title": "Dipole and Quadrupole Moments in Image Processing",
        "url": "https://arxiv.org/abs/0902.4073"
    },
    {
        "title": "Dipole Vectors in Images Processing",
        "url": "https://arxiv.org/abs/0902.4663"
    },
    {
        "title": "Recognition of Regular Shapes in Satelite Images",
        "url": "https://arxiv.org/abs/0903.0134"
    },
    {
        "title": "Real-time Texture Error Detection",
        "url": "https://arxiv.org/abs/0903.0538"
    },
    {
        "title": "Digital Restoration of Ancient Papyri",
        "url": "https://arxiv.org/abs/0903.5045"
    },
    {
        "title": "Color Dipole Moments for Edge Detection",
        "url": "https://arxiv.org/abs/0904.0962"
    },
    {
        "title": "On the closed-form solution of the rotation matrix arising in computer\n  vision problems",
        "url": "https://arxiv.org/abs/0904.1613"
    },
    {
        "title": "Point-Set Registration: Coherent Point Drift",
        "url": "https://arxiv.org/abs/0905.2635"
    },
    {
        "title": "Colorization of Natural Images via L1 Optimization",
        "url": "https://arxiv.org/abs/0905.2924"
    },
    {
        "title": "A statistical learning approach to color demosaicing",
        "url": "https://arxiv.org/abs/0905.2958"
    },
    {
        "title": "A New Solution to the Relative Orientation Problem using only 3 Points\n  and the Vertical Direction",
        "url": "https://arxiv.org/abs/0905.3964"
    },
    {
        "title": "Segmentation of Facial Expressions Using Semi-Definite Programming and\n  Generalized Principal Component Analysis",
        "url": "https://arxiv.org/abs/0906.1763"
    },
    {
        "title": "Combinatorial pyramids and discrete geometry for energy-minimizing\n  segmentation",
        "url": "https://arxiv.org/abs/0906.2770"
    },
    {
        "title": "Deformable Model with a Complexity Independent from Image Resolution",
        "url": "https://arxiv.org/abs/0906.3068"
    },
    {
        "title": "Adaptive Regularization of Ill-Posed Problems: Application to Non-rigid\n  Image Registration",
        "url": "https://arxiv.org/abs/0906.3323"
    },
    {
        "title": "Automatic Defect Detection and Classification Technique from Image: A\n  Special Case Using Ceramic Tiles",
        "url": "https://arxiv.org/abs/0906.3770"
    },
    {
        "title": "Automatic Spatially-Adaptive Balancing of Energy Terms for Image\n  Segmentation",
        "url": "https://arxiv.org/abs/0906.4131"
    },
    {
        "title": "Efficient IRIS Recognition through Improvement of Feature Extraction and\n  subset Selection",
        "url": "https://arxiv.org/abs/0906.4789"
    },
    {
        "title": "A new approach for digit recognition based on hand gesture analysis",
        "url": "https://arxiv.org/abs/0906.5039"
    },
    {
        "title": "Multi-Label MRF Optimization via Least Squares s-t Cuts",
        "url": "https://arxiv.org/abs/0907.0204"
    },
    {
        "title": "An Iterative Fingerprint Enhancement Algorithm Based on Accurate\n  Determination of Orientation Flow",
        "url": "https://arxiv.org/abs/0907.0288"
    },
    {
        "title": "Bounding the Probability of Error for High Precision Recognition",
        "url": "https://arxiv.org/abs/0907.0418"
    },
    {
        "title": "Augmenting Light Field to model Wave Optics effects",
        "url": "https://arxiv.org/abs/0907.1545"
    },
    {
        "title": "Multiresolution Elastic Medical Image Registration in Standard Intensity\n  Scale",
        "url": "https://arxiv.org/abs/0907.2075"
    },
    {
        "title": "Registration of Standardized Histological Images in Feature Space",
        "url": "https://arxiv.org/abs/0907.3209"
    },
    {
        "title": "Fully Automatic 3D Reconstruction of Histological Images",
        "url": "https://arxiv.org/abs/0907.3215"
    },
    {
        "title": "Parallel AdaBoost Algorithm for Gabor Wavelet Selection in Face\n  Recognition",
        "url": "https://arxiv.org/abs/0907.3218"
    },
    {
        "title": "Learning Object Location Predictors with Boosting and Grammar-Guided\n  Feature Extraction",
        "url": "https://arxiv.org/abs/0907.4354"
    },
    {
        "title": "Automatic local Gabor Features extraction for face recognition",
        "url": "https://arxiv.org/abs/0907.4984"
    },
    {
        "title": "Multiple pattern classification by sparse subspace decomposition",
        "url": "https://arxiv.org/abs/0907.5321"
    },
    {
        "title": "Segmentation for radar images based on active contour",
        "url": "https://arxiv.org/abs/0908.1369"
    },
    {
        "title": "A dyadic solution of relative pose problems",
        "url": "https://arxiv.org/abs/0908.1919"
    },
    {
        "title": "Handwritten Farsi Character Recognition using Artificial Neural Network",
        "url": "https://arxiv.org/abs/0908.4386"
    },
    {
        "title": "Scale-Based Gaussian Coverings: Combining Intra and Inter Mixture Models\n  in Image Segmentation",
        "url": "https://arxiv.org/abs/0909.0481"
    },
    {
        "title": "Kernel Spectral Curvature Clustering (KSCC)",
        "url": "https://arxiv.org/abs/0909.1605"
    },
    {
        "title": "Motion Segmentation by SCC on the Hopkins 155 Database",
        "url": "https://arxiv.org/abs/0909.1608"
    },
    {
        "title": "A Method for Extraction and Recognition of Isolated License Plate\n  Characters",
        "url": "https://arxiv.org/abs/0909.3911"
    },
    {
        "title": "Information tracking approach to segmentation of ultrasound imagery of\n  prostate",
        "url": "https://arxiv.org/abs/0909.5458"
    },
    {
        "title": "Iterative Shrinkage Approach to Restoration of Optical Imagery",
        "url": "https://arxiv.org/abs/0909.5460"
    },
    {
        "title": "Modular Traffic Sign Recognition applied to on-vehicle real-time visual\n  detection of American and European speed limit signs",
        "url": "https://arxiv.org/abs/0910.1295"
    },
    {
        "title": "3D/2D Registration of Mapping Catheter Images for Arrhythmia\n  Interventional Assistance",
        "url": "https://arxiv.org/abs/0910.1844"
    },
    {
        "title": "Color Image Clustering using Block Truncation Algorithm",
        "url": "https://arxiv.org/abs/0910.1849"
    },
    {
        "title": "Fractional differentiation based image processing",
        "url": "https://arxiv.org/abs/0910.2381"
    },
    {
        "title": "Behavior Subtraction",
        "url": "https://arxiv.org/abs/0910.2917"
    },
    {
        "title": "A $p$-adic RanSaC algorithm for stereo vision using Hensel lifting",
        "url": "https://arxiv.org/abs/0910.4839"
    },
    {
        "title": "An Iterative Shrinkage Approach to Total-Variation Image Restoration",
        "url": "https://arxiv.org/abs/0910.5002"
    }
]

### Process the documents

In [14]:
# Create an instance of the Documents class with the given sources
documents = Documents(sources)

Loading documents...
Embedding documents...
Indexing documents...
Indexing complete with 1102 documents.


### Run the chatbot

In [None]:
# Create an instance of the Chatbot class with the Documents instance
chatbot = Chatbot(documents)

# Create an instance of the App class with the Chatbot instance
app = App(chatbot)

# Run the chatbot
app.run()

User:  Hi


User: Hi
Chatbot:
Hi there, how can I help you ? Today's date is Sunday, November 26th, 2023, would you like me to help you with something in relation to this date? 

Alternatively, you can ask me anything and I will try my best to assist you!
----------------------------------------------------------------------------------------------------



User:  give me some datasets of Arrhythmia


User: give me some datasets of Arrhythmia
Chatbot:
Retrieving information...
I couldn't find any specific datasets of Arrhythmia, but I did find some research papers that may be of interest to you. Would you like me to summarise some of these papers for you? 

Alternatively, you can ask me for datasets of Arrhythmia specifically and I will notify you as soon as I find some.
----------------------------------------------------------------------------------------------------



User:  what is segmentation?


User: what is segmentation?
Chatbot:
Retrieving information...
Segmentation is a process of dividing an image or a graph into distinct regions, which are meaningful and easier to analyse. It is a typical preliminary step for many computer vision tasks. Various energy minimisation schemes such as the deformation graph and schemes based on combinatorial pyramids are utilised for the segmentation process. Would you like me to go into more detail about any of the segmentation techniques?

CITATIONS:
[{'start': 29, 'end': 79, 'text': 'dividing an image or a graph into distinct regions', 'document_ids': ['doc_0', 'doc_1', 'doc_2']}]
[{'start': 238, 'end': 255, 'text': 'deformation graph', 'document_ids': ['doc_0']}]
[{'start': 277, 'end': 299, 'text': 'combinatorial pyramids', 'document_ids': ['doc_2']}]

----------------------------------------------------------------------------------------------------



User:  AdaBoost Algorithm


User: AdaBoost Algorithm
Chatbot:
Retrieving information...
The AdaBoost algorithm is a boosting technique used to learn a classifier on data and optimise the method of converting the pixel classifications into high-quality sets of (x, y) locations. The original AdaBoost takes into account the spatially correlated nature of the data.

There is also a variant called Parallel AdaBoost, which incorporates the use of Gabor wavelets and mutual information to select effective image features for use in face recognition. This variant uses Parallel Boosting methods to optimise the selection process based not only on classification accuracy but also on efficiency. 

Would you like me to go into more detail about any of the AdaBoost variants?

CITATIONS:
[{'start': 28, 'end': 46, 'text': 'boosting technique', 'document_ids': ['doc_1', 'doc_5']}]
[{'start': 55, 'end': 81, 'text': 'learn a classifier on data', 'document_ids': ['doc_1', 'doc_5']}]
[{'start': 86, 'end': 145, 'text': 'optimise the met