# Chatbot Project v2
In this next iteration of the project, we create a Chatbot that can answer questions about a set of documents. Basic vector search is the method behind answering questions relevant in a particular context. It entails loading documents, embedding them into a vector, then creating a vector database so an LLM can answer questions about the data. 

Chatbots help facilitate interaction with large language models (LLMS) such as GPT-4 or LLama through the use of API calls and user interfaces.


## Container Review

Below are the 3 methods from the Streamlit framework to capture, view and manage messages. 

* `st.chat_message` used to display containers with the user's input and the bot's response.
* `st.chat_input` a widget that allows user enter input
* `st.session_state` a list to store the chat history so it can be displayed in the containers; a dictionary is used in the example below with keys `role` (the author of the message) and `content` (the message itself)

## Environment and Set Up
Load the environment variables and create an openai client.

In [12]:
from dotenv import load_dotenv
import os
from openai import OpenAI

load_dotenv()
# Create the client (initializes API connection)
client = OpenAI(api_key=os.environ.get("OPENAI_API_KEY"))

# Call the model
completion = client.responses.create(
    model="gpt-4o",
    input="Write a single sentence about LLMs.",
)


# Three Steps for Direct Vector Search and Retrieval
This section shows how documents are ingested for content extraction. They are loaded into an object for preprocessing with `SimpleDirectoryReader`.  Then the data is embedded, and stored in an index with `VectorStoreIndex`.


## Loading Documents
LlamaIndex provides many functionalities which make it a valuable toolkit for working with LLMs. It is used here to convert PDFs into Document objects. 

In [13]:
from llama_index.core import VectorStoreIndex, SimpleDirectoryReader

In [14]:
# Load the documents in the directory "/manuals"
documents = SimpleDirectoryReader("./manuals").load_data()


## Embedding the Data
An index is a data structure made of vectors that can be queried quickly for information. `VectorStoreIndex` generates a vector representation for each chunk using an embedding model. It organizes these vectors in a database that enables fast search and retrieval. 

`VectorStoreIndex` can handle the parsing, chunking and embedding in one wrapper. If direct control of the process is desired, each step can be done separately.


In [15]:
from llama_index.core import VectorStoreIndex

# Create a vector index for all the manuals
index = VectorStoreIndex.from_documents(documents)

2025-12-16 10:11:12,411 - INFO - HTTP Request: POST https://api.openai.com/v1/embeddings "HTTP/1.1 200 OK"
2025-12-16 10:11:24,334 - INFO - HTTP Request: POST https://api.openai.com/v1/embeddings "HTTP/1.1 200 OK"
2025-12-16 10:11:27,964 - INFO - HTTP Request: POST https://api.openai.com/v1/embeddings "HTTP/1.1 200 OK"
2025-12-16 10:11:29,578 - INFO - HTTP Request: POST https://api.openai.com/v1/embeddings "HTTP/1.1 200 OK"


## Create Query Engine

The `as_query_engine()` method converts a question into an embedding, searches the index, and retrieves relevant chunks.


In [16]:
query_engine = index.as_query_engine(similarity_top_k = 5)


In [17]:
response = query_engine.query("How much oil does the mower use?")
print(response)


2025-12-16 10:16:09,461 - INFO - HTTP Request: POST https://api.openai.com/v1/embeddings "HTTP/1.1 200 OK"
2025-12-16 10:16:10,860 - INFO - HTTP Request: POST https://api.openai.com/v1/chat/completions "HTTP/1.1 200 OK"


The mower uses between 18 to 20 ounces of oil.


# Chatbot Integration
Now that we can retrieve information from the documents, let us connect the indexed data to the query engine and generate responses in our chatbot. This vector search integration is plugged into the chatbot's message loop with just a couple lines of code.

`NOTE` The OpenAI client is not defined because LlamaIndex manages the connection internally using the environment variable we created in the beginning.

In [18]:
%%writefile chatbot_v2.py

import streamlit as st
from llama_index.core import SimpleDirectoryReader, VectorStoreIndex

# Streamlit decorator caches the return value of a function that produces a global 
# resource, such as a database connection or a machine learning model. Prevents the 
# resource from being re-created on every app rerun, making the app more performant. 

@st.cache_resource

# Load and cache the index
def load_index():
    documents = SimpleDirectoryReader("./manuals").load_data()
    index = VectorStoreIndex.from_documents(documents)
    return index.as_query_engine(similarity_top_k=5)

query_engine = load_index()

# Streamlit Chat
st.title('Instruction Manuals and Reference')

# Initialize chat history
if "messages" not in st.session_state:
    st.session_state.messages = []

# Display chat history
for message in st.session_state.messages:
    with st.chat_message(message['role']):
        st.markdown(message['content'])

# User input captured: The chatbot interface captures the user's question
if prompt := st.chat_input('Enter your question...'):
    st.session_state.messages.append({'role':'user', 'content':prompt})
    with st.chat_message('user'):
        st.markdown(prompt)
    
    # Response generated: The LLM synthesizes an answer using the retrieved context
    response = query_engine.query(prompt)
    
    st.session_state.messages.append({'role':'assistant','content':str(response)})   
    # Response displayed: The chatbot displays the response to the user
    with st.chat_message('assistant'):
        st.markdown(str(response))
        

Writing chatbot_v2.py
