![A car dashboard with lots of new technical features.](dashboard.jpg)

You're working for a well-known car manufacturer who is looking at implementing LLMs into vehicles to provide guidance to drivers. You've been asked to experiment with integrating car manuals with an LLM to create a context-aware chatbot. They hope that this context-aware LLM can be hooked up to a text-to-speech software to read the model's response aloud.

As a proof of concept, you'll integrate several pages from a car manual that contains car warning messages and their meanings and recommended actions. This particular manual, stored as an HTML file, `mg-zs-warning-messages.html`, is from an MG ZS, a compact SUV. Armed with your newfound knowledge of LLMs and LangChain, you'll implement Retrieval Augmented Generation (RAG) to create the context-aware chatbot.

### **Project Instructions**

The car manual HTML document has been loaded for you as `car_docs`. Using Retrieval Augmented Generation (RAG) to make an LLM of your choice (OpenAI's `gpt-4o-mini` is recommended) aware of the contents of `car_docs`, answer the following user query:

```python
"The Gasoline Particular Filter Full warning has appeared. What does this mean and what should I do about it?"
```

- Store the answer to the user query in the variable `answer`.

**How to approach the project**

1. Split the document
   - Split the HTML document into chunks:
     - Initializing a splitter:<br>
       Use the `RecursiveCharacterTextSplitter` class from `langchain_text_splitters` for splitting documents. Its `chunk_size` argument sets how long should each text chunk should be, and `chunk_overlap` sets how much the chunks should overlap.
     - Splitting the text: <br>
       Use the `.split_documents()` method to actually split the texts.
2. Store embeddings
   - Embed and store the document chunks for retrieval:
     - Where to store the embeddings: <br>
       There are a few options when it comes to vector storage: you can create a vector database locally using _FAISS_ or _ChromaDB_, a cloud-based vector database like _Pinecone_, or even save the vectors in an easily retrievable file type like a JSON.
     - Storing embeddings in a Chroma vector database: <br>
       Use the `Chroma.from_documents()` method to store the document chunks, specifying the documents to store with the `documents` argument and the embedding function to use to `embedding`.
3. Create a retriever
   - Create a retriever to retrieve relevant documents from the vector store: <br>
     - Create a Chroma retriever: <br>
       Use the `.as_retriever()` method on the vectorstore you created.
4. Initialize the LLM and prompt template
   - Define an LLM and create a prompt template to set up the RAG workflow: <br>
     - Initialize the LLM: <br>
       To define the LLM, use `ChatOpenAI()` with the model argument set to `gpt-4o-mini`. The temperature argument should, by default, be zero. A higher temperature will give more creative outputs.
     - Define a chat prompt template: <br>
       - You can create an instance of the `ChatPromptTemplate` class and use the `.from_template()` method to convert a string into a chat prompt template.
       - Specify variables to dynamically insert into the string using curly braces `{}`, e.g., `{context}` for inserting some context during the chain.
5. Define RAG chain
   - Define RAG chain to connect the retriever, question, prompt, and LLM.
     - Defining the RAG chain using LangChain Expression Language (LCEL): <br>
       To define our RAG chain, you can use the following syntax:
        ```python
        rag_chain = (
            {"context": retriever, "question": RunnablePassthrough()}
            | prompt
            | llm
        )
        ```
6. Invoke RAG chain
   - Invoke your chain with the user query to answer.
     - Invoking the RAG: <br>
       You can use the `.invoke()` method on the chain you created, passing in the query as its only argument.

In [18]:
# Set your API key to a variable
import os
openai_api_key = os.environ["OPENAI_API_KEY"]

# Import the required packages
from langchain_core.prompts import ChatPromptTemplate
from langchain_openai import ChatOpenAI
from langchain_community.document_loaders import UnstructuredHTMLLoader
from langchain_openai import OpenAIEmbeddings
from langchain_core.runnables import RunnablePassthrough
from langchain_text_splitters import RecursiveCharacterTextSplitter
from langchain_chroma import Chroma

# Load the HTML as a LangChain document loader
loader = UnstructuredHTMLLoader(file_path="./datasets/mg-zs-warning-messages.html")
car_docs = loader.load()


# Split the document
splitter = RecursiveCharacterTextSplitter(
    separators=["\n\n", "\n", " ", ""],
    chunk_size=300,
    chunk_overlap=50)

# Split the document into chunks
car_docs_split = splitter.split_documents(car_docs)


# Create the embeddings
embeddings = OpenAIEmbeddings(openai_api_key=openai_api_key, model='text-embedding-3-small')

# Store the embeddings in a Chroma vector database
vector_store = Chroma.from_documents(
    documents=car_docs_split,
    embedding=embeddings,
    persist_directory="./datasets/")

# Create a retriever
retriever = vector_store.as_retriever(
    search_type="similarity",
    search_kwargs={"k":2}
)

# Initialize LLM
llm = ChatOpenAI(
    model="gpt-4o-mini",
    api_key=openai_api_key,
    temperature=0
)

# Define a prompt template

prompt_template = """
Use the following pieces of context to answer the question at the end.
If you don't know the answer, just say that you don't know, don't try to make up an answer.\n
Context: {context}\n
Question: {question}\n
Answer:
"""

# Create a prompt
prompt = ChatPromptTemplate.from_template(
    template=prompt_template,
)

# Create a chain using LCEL

chain = (
    {"context": retriever, "question": RunnablePassthrough()}
    | prompt
    | llm
)

# Invoke the chain
query = ("The Gasoline Particular Filter Full warning has appeared. What does this mean and what should I do about it?")
answer = chain.invoke(query).content

# Print the result
print(answer)

