# Chatbot v3
With the use of LLMS, such as GPT-4 or LLama, we finally create a chatbot that can answer questions about a set of documents. When answering questions in a particular context, basic vector search is an effective methodology to use. It entails loading documents, embedding them into a vector, then creating a vector database so an LLM can answer questions about the data. 


## Streamlit Review
Below are the 3 methods from the Streamlit framework to capture, view and manage messages. 

* `st.chat_message` displays containers with the user and bot responses
* `st.chat_input` an element that allows users to enter their questions
* `st.session_state` stores the chat history with keys `role` and value `content`

## Environment and Set Up
Load the environment variables and create an openai client.

In [1]:
from dotenv import load_dotenv
import os
from openai import OpenAI

# Load the api key and other variables
load_dotenv()
client = OpenAI(api_key=os.environ.get("OPENAI_API_KEY"))


# Three Steps for Direct Vector Search


## Loading Documents
Documents are loaded into an object for preprocessing with `SimpleDirectoryReader`. In addition to this, LlamaIndex provides many other functionalities which make it a valuable toolkit for working with LLMs. It is used here to convert PDFs into document objects. 

In [6]:
from llama_index.core import SimpleDirectoryReader
# Load the documents in the directory "/manuals"
documents = SimpleDirectoryReader("../manuals").load_data()


## Embedding the Data and Creating the Vector Database
When text is converted into numerical vectors, data that is more readily analyzed by computers, it is said to be embedded. `VectorStoreIndex` splits the data into chunks then generates a vector representation for each using an embedding model. 

Once the data has been embedded, it needs to be stored in a data structure so  information can be retrieved quickly. Luckily, `VectorStoreIndex` can generate vector embeddings as well as organize them in a searchable index. The vectors are created and stored in an index with the `from_documents()` method.

Although llama index provides a full pipeline, perhaps it can be customized in a future version to improve performance.


In [7]:
from llama_index.core import VectorStoreIndex

# Create a vector index for all the manuals
index = VectorStoreIndex.from_documents(documents)

2026-01-07 21:08:31,386 - INFO - HTTP Request: POST https://api.openai.com/v1/embeddings "HTTP/1.1 200 OK"
2026-01-07 21:08:33,174 - INFO - HTTP Request: POST https://api.openai.com/v1/embeddings "HTTP/1.1 200 OK"
2026-01-07 21:08:34,965 - INFO - HTTP Request: POST https://api.openai.com/v1/embeddings "HTTP/1.1 200 OK"
2026-01-07 21:08:36,706 - INFO - HTTP Request: POST https://api.openai.com/v1/embeddings "HTTP/1.1 200 OK"


# Tying it all Together
All that is left to do is convert the indexed data to a query engine so our chatbot can start generating responses. The vector search integration is completed with just a couple lines of code in the chatbot script.

`NOTE` The OpenAI client is not defined below because LlamaIndex manages the connection internally using the environment variables.

## Query Engine
The `.as_query_engine()` method converts an index object into a query engine object that can fetch relevant nodes of data. In the final code below the query engine is the return value from the `load_index()` function.

When a question is input into the chatbot, it is embedded with the `.query()` method. The query engine then finds the most similar vectors in its database and provides the top 5 chunks. Below is a quick test with a simple question to see how it works.


In [8]:
# Example
query_engine = index.as_query_engine(similarity_top_k = 5)
response = query_engine.query("How much oil does the mower use?")
print(response)


2026-01-07 21:08:47,371 - INFO - HTTP Request: POST https://api.openai.com/v1/embeddings "HTTP/1.1 200 OK"
2026-01-07 21:08:48,793 - INFO - HTTP Request: POST https://api.openai.com/v1/chat/completions "HTTP/1.1 200 OK"


The mower uses between 18 to 20 ounces of oil.


## Performance Boost
The Streamlit decorator, `@st.cache_resource`, caches the return value of a function that produces a global 
resource; in our case, it is the index created from our documents. This prevents the index from being re-created on every app rerun, making the app more performant. 


In [9]:
%%writefile v3-chatbot-doc-qa.py

import streamlit as st
from llama_index.core import SimpleDirectoryReader, VectorStoreIndex

# Streamlit decorator 
@st.cache_resource

# Load and cache the index
def load_index():
    documents = SimpleDirectoryReader("./manuals").load_data()
    index = VectorStoreIndex.from_documents(documents)
    return index.as_query_engine(similarity_top_k=5)

query_engine = load_index()

# Streamlit Chat
st.title('Instruction Manuals and Reference')

# Initialize chat history
if "messages" not in st.session_state:
    st.session_state.messages = []

# Display chat history
for message in st.session_state.messages:
    with st.chat_message(message['role']):
        st.markdown(message['content'])

# User input captured: The chatbot interface captures the user's question
if prompt := st.chat_input('Enter your question...'):
    st.session_state.messages.append({'role':'user', 'content':prompt})
    with st.chat_message('user'):
        st.markdown(prompt)
    
    # Response generated: The LLM synthesizes an answer using the retrieved context
    response = query_engine.query(prompt)
    
    st.session_state.messages.append({'role':'assistant','content':str(response)})   
    # Response displayed: The chatbot displays the response to the user
    with st.chat_message('assistant'):
        st.markdown(str(response))
        

Writing v3-chatbot-doc-qa.py
