### Chatbots with LlamaIndex  
LlamaIndex serves as a bridge between your data and Large Language Models (LLMs), providing a toolkit that enables you to establish a query interface around your data for a variety of tasks, such as question-answering and summarization.

In this tutorial, we'll walk you through building a context-augmented chatbot  

#### Installing Packages

In [1]:
!pip install -q openai
!pip install -q llama-index
!pip install -q llama-index-experimental
!pip install -q pypdf
!pip install -q docx2txt

ERROR: pip's dependency resolver does not currently take into account all the packages that are installed. This behaviour is the source of the following dependency conflicts.
llama-index-postprocessor-cohere-rerank 0.2.1 requires llama-index-core<0.12.0,>=0.11.0, but you have llama-index-core 0.12.37 which is incompatible.
llama-index-llms-mistralai 0.2.7 requires llama-index-core<0.12.0,>=0.11.0, but you have llama-index-core 0.12.37 which is incompatible.
llama-index-llms-azure-openai 0.2.2 requires llama-index-core<0.12.0,>=0.11.0, but you have llama-index-core 0.12.37 which is incompatible.
llama-index-llms-azure-openai 0.2.2 requires llama-index-llms-openai<0.3.0,>=0.2.1, but you have llama-index-llms-openai 0.3.42 which is incompatible.
llama-index-finetuning 0.2.0 requires llama-index-core<0.12.0,>=0.11.0, but you have llama-index-core 0.12.37 which is incompatible.
llama-index-finetuning 0.2.0 requires llama-index-llms-openai<0.3.0,>=0.2.0, but you have llama-index-llms-openai 

#### Importing Packages

In [None]:
import os
import openai

os.environ["OPENAI_API_KEY"] = "<the key>"
#os.environ["OPENAI_API_KEY"] = "<the key>"
openai.api_key = os.environ["OPENAI_API_KEY"]

import sys
import shutil
import glob
import logging
from pathlib import Path

import warnings
warnings.filterwarnings('ignore')

import pandas as pd

import llama_index

## Llamaindex readers
from llama_index.core import SimpleDirectoryReader

## LlamaIndex Index Types
from llama_index.core import ListIndex
from llama_index.core import VectorStoreIndex
from llama_index.core import TreeIndex
from llama_index.core import KeywordTableIndex
from llama_index.core import SimpleKeywordTableIndex
from llama_index.core import DocumentSummaryIndex
from llama_index.core import KnowledgeGraphIndex
from llama_index.experimental.query_engine import PandasQueryEngine


## LlamaIndex Context Managers
from llama_index.core import StorageContext
from llama_index.core import load_index_from_storage
from llama_index.core.response_synthesizers import get_response_synthesizer
from llama_index.core.response_synthesizers import ResponseMode
from llama_index.core.schema import Node

## LlamaIndex Templates
from llama_index.core.prompts import PromptTemplate
from llama_index.core.prompts import ChatPromptTemplate
from llama_index.core.base.llms.types import ChatMessage, MessageRole

## LlamaIndex Callbacks
from llama_index.core.callbacks import CallbackManager
from llama_index.core.callbacks import LlamaDebugHandler

In [5]:
print("LLamaIndex:", llama_index.core.__version__)
print("OpenAI:", openai.__version__)

LLamaIndex: 0.11.23
OpenAI: 1.78.1


In [6]:
import logging

#logging.basicConfig(stream=sys.stdout, level=logging.DEBUG)
#logging.getLogger().addHandler(logging.StreamHandler(stream=sys.stdout))

#### Defining Models

In [7]:
from llama_index.core import Settings
from llama_index.llms.openai import OpenAI
from llama_index.embeddings.openai import OpenAIEmbedding
from llama_index.embeddings.ollama import OllamaEmbedding
#from llama_index.embeddings.huggingface import HuggingFaceEmbedding
from llama_index.llms.ollama import Ollama


#model="gpt-4o"
model="gpt-4o-mini"

Settings.llm = OpenAI(temperature=0, 
                      model=model, 
                      #max_tokens=512
                      PRESENCE_PENALTY=-2,
                      TOP_P=1,
                     )

Settings.embed_model = OpenAIEmbedding(model="text-embedding-3-small")
#Settings.llm = Ollama(model="llama3.2", request_timeout=300.0)
#Settings.embed_model = HuggingFaceEmbedding(model_name="BAAI/bge-small-en-v1.5")
#Settings.llm = Ollama(model="llama3.2:latest", request_timeout=300.0)
#Settings.embed_model = OllamaEmbedding(model_name="nomic-embed-text")

### Using the the vector index created in the notebook: `RAG_LLamaIndex.ipynb`

#### Defining Folders

In [12]:
PERSIST_DIR = "../../Index/VectorStoreIndex/"

#### Retrieving Vector Store Index  

In [13]:
vectorstoreindex = load_index_from_storage(
    storage_context=StorageContext.from_defaults(persist_dir=PERSIST_DIR)
)

INFO:llama_index.core.indices.loading:Loading all indices.
Loading all indices.


## Retrieving your data  

#### Retrieving from [Vector Store Index](https://docs.llamaindex.ai/en/stable/api_reference/query/retrievers/vector_store.html)  


In [21]:
query_engine = vectorstoreindex.as_query_engine(similarity_top_k=3,
                                                retriever_mode="embedding",
                                                #response_mode="accumulate",
                                                response_mode="compact",
                                                #response_mode="tree_summarize",
                                                verbose=False)
response = query_engine.query("Who is Gregor?")
print(response)

INFO:httpx:HTTP Request: POST https://api.openai.com/v1/embeddings "HTTP/1.1 200 OK"
HTTP Request: POST https://api.openai.com/v1/embeddings "HTTP/1.1 200 OK"
INFO:httpx:HTTP Request: POST https://api.openai.com/v1/chat/completions "HTTP/1.1 200 OK"
HTTP Request: POST https://api.openai.com/v1/chat/completions "HTTP/1.1 200 OK"
Gregor is a character who has undergone a transformation, leading to a significant change in his life and circumstances. He is depicted as being confined to his room, where he struggles with his new condition and the impact it has on his family. Gregor is also shown to have a strong connection with his sister, Grete, and has previously been the primary financial provider for his family. His situation has led to feelings of isolation and a longing for connection, particularly through music, which he appreciates despite his current state.


In [22]:
query_engine = vectorstoreindex.as_query_engine(similarity_top_k=3,
                                                retriever_mode="embedding",
                                                #response_mode="accumulate",
                                                #response_mode="compact",
                                                response_mode="tree_summarize",
                                                verbose=True)

response = query_engine.query("Who is Paul McCartney?")
print(response)

INFO:httpx:HTTP Request: POST https://api.openai.com/v1/embeddings "HTTP/1.1 200 OK"
HTTP Request: POST https://api.openai.com/v1/embeddings "HTTP/1.1 200 OK"
INFO:httpx:HTTP Request: POST https://api.openai.com/v1/chat/completions "HTTP/1.1 200 OK"
HTTP Request: POST https://api.openai.com/v1/chat/completions "HTTP/1.1 200 OK"
The provided information does not contain any details about Paul McCartney.


In [23]:
query_engine = vectorstoreindex.as_query_engine(similarity_top_k=3,
                                                retriever_mode="embedding",
                                                response_mode="accumulate",
                                                #response_mode="compact",
                                                #response_mode="tree_summarize",
                                                verbose=True)

response = query_engine.query("how does the insect is described?")
print(response)

INFO:httpx:HTTP Request: POST https://api.openai.com/v1/embeddings "HTTP/1.1 200 OK"
HTTP Request: POST https://api.openai.com/v1/embeddings "HTTP/1.1 200 OK"
INFO:httpx:HTTP Request: POST https://api.openai.com/v1/chat/completions "HTTP/1.1 200 OK"
HTTP Request: POST https://api.openai.com/v1/chat/completions "HTTP/1.1 200 OK"
INFO:httpx:HTTP Request: POST https://api.openai.com/v1/chat/completions "HTTP/1.1 200 OK"
HTTP Request: POST https://api.openai.com/v1/chat/completions "HTTP/1.1 200 OK"
INFO:httpx:HTTP Request: POST https://api.openai.com/v1/chat/completions "HTTP/1.1 200 OK"
HTTP Request: POST https://api.openai.com/v1/chat/completions "HTTP/1.1 200 OK"
Response 1: The insect is described through Gregor's perspective, highlighting its physical struggles and the pain it experiences. It is depicted as having a smooth chest, a lower body that causes serious pain, and a lack of proper teeth, which complicates its attempts to turn a key. The insect's movements are awkward and chal

In [25]:
query_engine = vectorstoreindex.as_query_engine(similarity_top_k=5,
                                                retriever_mode="embedding",
                                                #response_mode="accumulate",
                                                response_mode="compact",
                                                #response_mode="tree_summarize",
                                                verbose=True)

response = query_engine.query("how many stars in the galaxy of auron-b?")
print(response)

INFO:httpx:HTTP Request: POST https://api.openai.com/v1/embeddings "HTTP/1.1 200 OK"
HTTP Request: POST https://api.openai.com/v1/embeddings "HTTP/1.1 200 OK"
INFO:httpx:HTTP Request: POST https://api.openai.com/v1/chat/completions "HTTP/1.1 200 OK"
HTTP Request: POST https://api.openai.com/v1/chat/completions "HTTP/1.1 200 OK"
The provided context does not contain any information regarding the number of stars in the galaxy of Auron-B. Therefore, I cannot provide an answer to that query.


## Creating an Simple Interactive Chatbot for our Index  

Chat Modes have their own different [retrieval modes](https://www.actalyst.ai/blog/optimizing-enterprise-chatbots-with-llamaindex-chat-modes-in-a-rag-system)

![](https://assets-global.website-files.com/652165b2d773cd686ae910a6/652e2b0154e3af353e2d9422_actalyst-llalaindex-chatmodes.png)

In [27]:
chat_engine = vectorstoreindex.as_chat_engine(chat_mode="context", 
                                              verbose=False)
chat_engine.reset()
chat_engine.chat_repl()

===== Entering Chat REPL =====
Type "exit" to exit.

INFO:httpx:HTTP Request: POST https://api.openai.com/v1/embeddings "HTTP/1.1 200 OK"
HTTP Request: POST https://api.openai.com/v1/embeddings "HTTP/1.1 200 OK"
INFO:httpx:HTTP Request: POST https://api.openai.com/v1/chat/completions "HTTP/1.1 200 OK"
HTTP Request: POST https://api.openai.com/v1/chat/completions "HTTP/1.1 200 OK"
Assistant: The document is an excerpt from Franz Kafka's novella "Metamorphosis." It tells the story of Gregor Samsa, a traveling salesman who wakes up one morning to find himself transformed into a giant insect. The narrative explores themes of alienation, family dynamics, and the struggle for identity as Gregor and his family cope with the drastic changes in their lives following his transformation. The text delves into the emotional and psychological impact of Gregor's condition on both him and his family, highlighting their reactions and the resulting tensions.

INFO:httpx:HTTP Request: POST https://api.op

In [28]:
chat_engine = vectorstoreindex.as_chat_engine(chat_mode="condense_question", verbose=False)
chat_engine.reset()
chat_engine.chat_repl()

===== Entering Chat REPL =====
Type "exit" to exit.

INFO:llama_index.core.chat_engine.condense_question:Querying with: WHo is Gregor?
Querying with: WHo is Gregor?
INFO:httpx:HTTP Request: POST https://api.openai.com/v1/embeddings "HTTP/1.1 200 OK"
HTTP Request: POST https://api.openai.com/v1/embeddings "HTTP/1.1 200 OK"
INFO:httpx:HTTP Request: POST https://api.openai.com/v1/chat/completions "HTTP/1.1 200 OK"
HTTP Request: POST https://api.openai.com/v1/chat/completions "HTTP/1.1 200 OK"
Assistant: Gregor is a character who has undergone a transformation, which has significantly affected his relationship with his family and his living situation. He is depicted as being in a state of distress and isolation, struggling with his new condition while observing the dynamics of his family and their interactions with others.



---
### We can see that, aparently, there is no consistence on the usage of the context and external knowledge. We can check it further on these pages:  

[LlamaIndex Prompts](https://docs.llamaindex.ai/en/stable/module_guides/models/prompts/), [
LlamaIndex Chat Prompts Customization](https://docs.llamaindex.ai/en/stable/examples/customization/prompts/chat_prompts/), and [default_prompts.py](https://github.com/run-llama/llama_index/blob/main/llama-index-core/llama_index/core/prompts/default_prompts.py)  

### Creating an [Customized Prompt](https://github.com/run-llama/llama_index/blob/main/llama-index-core/llama_index/core/prompts/chat_prompts.py) Chatbot

In [29]:
from llama_index.core.prompts.base import PromptTemplate
from llama_index.core.prompts.prompt_type import PromptType

In [30]:
CUSTOM_PROMPT = [
    ChatMessage(
        content=(
            "You are an expert Q&A system that is trusted around the world.\n"
            "Always answer the query using only the provided context information, "
            "Do not use prior knowledge.\n"
            "Some rules to follow:\n"
            "1. Never directly reference the given context in your answer.\n"
            "2. Avoid statements like 'Based on the context, ...' or "
            "'The context information ...' or anything along "
            "those lines."
        ),
        role=MessageRole.SYSTEM,
    ),
    ChatMessage(
        content=(
            "Context information from multiple sources is below.\n"
            "---------------------\n"
            "{context_str}\n"
            "---------------------\n"
            "Given the information from multiple sources and not prior knowledge, "
            "answer the query.\n"
            "Query: {query_str}\n"
            "Answer: "
        ),
        role=MessageRole.USER,
    ),
]


CHAT_PROMPT = ChatPromptTemplate(message_templates=CUSTOM_PROMPT)

In [31]:
chat_engine = vectorstoreindex.as_chat_engine(chat_mode="condense_question",
                                              verbose=True,
                                              text_qa_template=CHAT_PROMPT)
chat_engine.reset()
chat_engine.chat_repl()

===== Entering Chat REPL =====
Type "exit" to exit.

INFO:llama_index.core.chat_engine.condense_question:Querying with: Who is the main character?
Querying with: Who is the main character?
Querying with: Who is the main character?
INFO:httpx:HTTP Request: POST https://api.openai.com/v1/embeddings "HTTP/1.1 200 OK"
HTTP Request: POST https://api.openai.com/v1/embeddings "HTTP/1.1 200 OK"
INFO:httpx:HTTP Request: POST https://api.openai.com/v1/chat/completions "HTTP/1.1 200 OK"
HTTP Request: POST https://api.openai.com/v1/chat/completions "HTTP/1.1 200 OK"
Assistant: The main character is Gregor.

INFO:httpx:HTTP Request: POST https://api.openai.com/v1/chat/completions "HTTP/1.1 200 OK"
HTTP Request: POST https://api.openai.com/v1/chat/completions "HTTP/1.1 200 OK"
INFO:llama_index.core.chat_engine.condense_question:Querying with: Who is the insect in the story where Gregor is the main character?
Querying with: Who is the insect in the story where Gregor is the main character?
Querying w

### Adding Memory and Custom System Prompt

In [32]:
from llama_index.core.memory import ChatMemoryBuffer

memory = ChatMemoryBuffer.from_defaults(token_limit=80000)
chat_engine = vectorstoreindex.as_chat_engine(
    chat_mode="context",
    memory=memory,
    system_prompt=(
            "Context information from multiple sources is below.\n"
            "---------------------\n"
            "{context_str}\n"
            "---------------------\n"
            "Given the information from multiple sources"
            "answer the query.\n"
            "If the query is unrelated to the context, just answer: I don't know"
            "Always start your answer with 'Dear Student'" 
            "Query: {query_str}\n"
            "Answer: "
    ),
)

chat_engine.reset()
chat_engine.chat_repl()

===== Entering Chat REPL =====
Type "exit" to exit.

INFO:httpx:HTTP Request: POST https://api.openai.com/v1/embeddings "HTTP/1.1 200 OK"
HTTP Request: POST https://api.openai.com/v1/embeddings "HTTP/1.1 200 OK"
INFO:httpx:HTTP Request: POST https://api.openai.com/v1/chat/completions "HTTP/1.1 200 OK"
HTTP Request: POST https://api.openai.com/v1/chat/completions "HTTP/1.1 200 OK"
Assistant: Dear Student, 

Gregor Samsa is the main character in Franz Kafka's novella "The Metamorphosis." He is a traveling salesman who wakes up one morning to find himself transformed into a giant insect. This transformation leads to significant changes in his life and the dynamics of his family, as they struggle to cope with his new condition and the implications it has on their lives.

Best regards.



In [33]:
memory.to_dict()

{'chat_store': {'store': {'chat_history': [{'role': <MessageRole.USER: 'user'>,
     'content': 'Who is Gregor?',
     'additional_kwargs': {}},
    {'role': <MessageRole.ASSISTANT: 'assistant'>,
     'content': 'Dear Student, \n\nGregor Samsa is the main character in Franz Kafka\'s novella "The Metamorphosis." He is a traveling salesman who wakes up one morning to find himself transformed into a giant insect. This transformation leads to significant changes in his life and the dynamics of his family, as they struggle to cope with his new condition and the implications it has on their lives.\n\nBest regards.',
     'additional_kwargs': {}}]},
  'class_name': 'SimpleChatStore'},
 'chat_store_key': 'chat_history',
 'token_limit': 80000,
 'class_name': 'ChatMemoryBuffer'}