# Building a RAG Chatbot Using LlamaIndex

LlamaIndex provides users with a simple way of creating a chatbot that works both with an LLM and data from a database. This combination is called Retrieval-Augmented Generation (RAG) and is used to give LLM's the ability to answer queries using data it was not trained on. This notebook will cover each step necessary to create a RAG chatbot using the Python SDK for Azure Cosmos DB for NoSQL. At the end, we create a UX using gradio to allow users to type in questions and see the response displayed in a chatbot style.

Important Note: This sample requires you to have Azure Cosmos DB for NoSQL and Azure OpenAI accounts setup. To get started, visit:
-  [Azure Cosmos DB for NoSQL Python Quickstart](https://learn.microsoft.com/en-us/azure/cosmos-db/nosql/quickstart-python?pivots=devcontainer-codespace)
-  [Azure Cosmos DB for NoSQL Vector Search](https://learn.microsoft.com/en-us/azure/cosmos-db/nosql/vector-search)
-  [Azure OpenAI](https://learn.microsoft.com/en-us/azure/ai-services/openai/)

In [None]:
%pip install llama-index-embeddings-openai
%pip install llama-index-llms-azure-openai

In [None]:
!pip install llama-index

## Setup Azure OpenAI
Prior to beginning we need to set up the llm and embedding model that will be used in the RAG chatbot.

In [None]:
from llama_index.llms.azure_openai import AzureOpenAI
from llama_index.embeddings.azure_openai import AzureOpenAIEmbedding
import os
from dotenv import load_dotenv

In [None]:
llm = AzureOpenAI(
    model = "gpt-35-turbo",
    deployment_name = "gpt-35-turbo",
    azure_endpoint = os.getenv('AZURE_OPENAI_API_ENDPOINT'),
    api_key = os.getenv('AZURE_OPENAI_API_KEY'),
    api_version = "2023-05-15"
)

embed_model = AzureOpenAIEmbedding(
    model = "text-embedding-3-large",
    deployment_name = "text-embedding-3-large",
    azure_endpoint = os.getenv('AZURE_OPENAI_API_ENDPOINT'),
    api_key = os.getenv('AZURE_OPENAI_API_KEY'),
    api_version = "2023-05-15"
)

## Loading the data
The first step is to load the data using the LlamaIndex function SimpleDirectoryReader.

In [None]:
import time
import nest_asyncio
from llama_index.core import SimpleDirectoryReader
from llama_index.core.readers.base import BaseReader
from llama_index.core import Document

In [None]:
documents = SimpleDirectoryReader(input_files = [r"DataSet/CVPR2019/abstracts_pdf"]).load_data()

## Create the Index
The next step is to index the data loaded, this is done through vector embeddings. Prior to indexing it is important to initialize a Cosmos DB NoSql vector store where the embeddings will be stored.

In [None]:
from azure.cosmos import CosmosClient, PartitionKey
from llama_index.vector_stores.azurecosmosnosql import AzureCosmosDBNoSqlVectorSearch
from llama_index.core import StorageContext
from llama_index.core import VectorStoreIndex, SimpleDirectoryReader

In [None]:
from llama_index.core import Settings

Settings.llm = llm
Settings.embed_model = embed_model

In [None]:
#create cosmos client
URI = os.getenv('COSMOS_DB_URI')
KEY = os.getenv('COSMOS_DB_API_KEY')
client = CosmosClient(URI, credential=KEY)

#specify vector store properties
indexing_policy = {
    "indexingMode": "consistent",
    "includedPaths": [{"path": "/*"}],
    "excludedPaths": [{"path": '/"_etag"/?'}],
    "vectorIndexes": [{"path": "/embedding", "type": "quantizedFlat"}],
}

vector_embedding_policy = {
    "vectorEmbeddings": [
        {
            "path": "/embedding",
            "dataType": "float32",
            "distanceFunction": "cosine",
            "dimensions": 3072,
        }
    ]
}

partition_key = PartitionKey(path="/id")
cosmos_container_properties_test = {"partition_key": partition_key}
cosmos_database_properties_test = {}

#create vector store
store = AzureCosmosDBNoSqlVectorSearch(cosmos_client=client,
                                       vector_embedding_policy=vector_embedding_policy,
                                       indexing_policy=indexing_policy,
                                       cosmos_container_properties=cosmos_container_properties_test,
                                       cosmos_database_properties=cosmos_database_properties_test,
                                       create_container=True,
                                       database_name = "rag_chatbot_example")

storage_context = StorageContext.from_defaults(vector_store=store)

#index the data
index = VectorStoreIndex.from_documents(
        documents, storage_context=storage_context
)

## Query the data

In [None]:
!pip install gradio

In [None]:
import gradio as gr

In [None]:
query_engine = index.as_query_engine()
def user_query(user_prompt, chat_history):
    # Create a timer to measure the time it takes to complete the request
    start_time = time.time()
    # Get LLM completion
    response = query_engine.query(user_prompt)    
    # Stop the timer
    end_time = time.time()
    elapsed_time = round((end_time - start_time) * 1000, 2)
    print(response)
    # Append user message and response to chat history
    details = f"\n (Time: {elapsed_time}ms)"
    chat_history.append([user_prompt, str(response) + details])
        
    return gr.update(value=""), chat_history

In [None]:
chat_history = []
with gr.Blocks() as demo:
    chatbot = gr.Chatbot(label="RAG Chatbot")
    
    msg = gr.Textbox(label="Ask me anything about the document!")
    clear = gr.Button("Clear")
    
    msg.submit(user_query, [msg, chatbot], [msg, chatbot], queue=False)

    clear.click(lambda: None, None, chatbot, queue=False)

# Launch the Gradio interface
demo.launch(debug=True)

In [None]:
demo.close()