# Q&A with Rag using Local Model

### Prerequisite

You need to download and store 2 GGUF files in the same folder as this notebook file
1. Meta-Llama-3.1-8B-Instruct-Q3_K_L.gguf
1. nomic-embed-text-v1.5.Q2_K.gguf

### Document Loading

First, install packages needed for local embeddings and vector storage.

In [None]:
%pip install --upgrade --quiet langchain langchain-community langchainhub gpt4all langchain-chroma pypdf gpt4all nomic

### Restart current runtime

To use the newly installed packages in this Jupyter runtime, you must restart the runtime. You can do this by running the cell below, which will restart the current kernel.

In [None]:
# Restart kernel after installs so that your environment can access the new packages
import IPython

app = IPython.Application.instance()
app.kernel.do_shutdown(True)

### Indexing: Load

Use PyPDFLoader as in this [document](https://python.langchain.com/v0.1/docs/modules/data_connection/document_loaders/pdf/).

In [None]:
from langchain_community.document_loaders import PyPDFLoader

file_path = "../data/Owner-Manual-KICKS-e-POWER-EN.pdf"
loader = PyPDFLoader(file_path)

docs = loader.load()

print(len(docs))

### Indexing: Split

To handle this we’ll split the Document into chunks for embedding and vector storage. This should help us retrieve only the most relevant bits of the manual at run time.

In [None]:
from langchain_text_splitters import RecursiveCharacterTextSplitter

text_splitter = RecursiveCharacterTextSplitter(
    chunk_size=1000, chunk_overlap=200, add_start_index=True
)
all_splits = text_splitter.split_documents(docs)

In [None]:
type(all_splits[0].page_content)

Next, the below steps will download the GPT4All embeddings locally (if you don't already have them).

In [None]:
from nomic import embed

texts = [i.page_content for i in all_splits]
embeddings = embed.text(texts, inference_mode="local")['embeddings']
print("Number of embeddings created:", len(embeddings))
print("Number of dimensions per embedding:", len(embeddings[0]))

Load to vector store

In [None]:
from langchain.vectorstores import Chroma
from langchain.embeddings import GPT4AllEmbeddings

model_path = "./all-MiniLM-L6-v2.gguf2.f16.gguf"
gpt4all_kwargs = {'allow_download': 'false'}
embeddings = GPT4AllEmbeddings(
    model_path=model_path,
    gpt4all_kwargs=gpt4all_kwargs
)

vectorstore = Chroma.from_documents(documents=all_splits, embedding=embeddings)

### Retrieval and Generation: Retrieve

Now let’s write the actual application logic. We want to create a simple application that takes a user question, searches for documents relevant to that question, passes the retrieved documents and initial question to a model, and returns an answer.

In [None]:
retriever = vectorstore.as_retriever(search_type="similarity", search_kwargs={"k": 6})
retrieved_docs = retriever.invoke("What is auto brake hold?")
len(retrieved_docs)

In [None]:
print(retrieved_docs[0].page_content)

### Retrieval and Generation: Generate
Let’s put it all together into a chain that takes a question, retrieves relevant documents, constructs a prompt, passes that to a model, and parses the output.

Install dependencies

Set environment variables

In [None]:
from langchain_community.llms.gpt4all import GPT4All

llm = GPT4All(model=('./Meta-Llama-3.1-8B-Instruct-Q3_K_L.gguf'))

Let’s put it all together into a chain that takes a question, retrieves relevant documents, constructs a prompt, passes that to a model, and parses the output.

In [None]:
from langchain_core.prompts import PromptTemplate
from langchain_core.output_parsers import StrOutputParser
from langchain_core.runnables import RunnablePassthrough

def format_docs(docs):
    return "\n\n".join(doc.page_content for doc in docs)

template = """Use the following pieces of context to answer the question at the end.
If you don't know the answer, just say that you don't know, don't try to make up an answer.

{context}

Question: {question}

Helpful Answer:"""
custom_rag_prompt = PromptTemplate.from_template(template)

rag_chain = (
    {"context": retriever | format_docs, "question": RunnablePassthrough()}
    | custom_rag_prompt
    | llm
    | StrOutputParser()
)

In [None]:
res = rag_chain.invoke("What is auto brake hold and how does function be?")
print(f"{res}")

Example output

 > The automatic brake hold (ABH) system maintains braking force without driver having to depress the brake pedal when vehicle stops at a traffic light or intersection. It can be activated by pressing the ABH switch while driving, but it will reset off every time power is switched from "OFF" position to "ON". To use the function, driver's seatbelt must be fastened and electronic parking brake released. The system helps maintain steering control and minimizes swerving and spinning on slippery surfaces.
>
> The automatic brake hold indicator light illuminates when ABH becomes standby (white) or active (green). When activated, it maintains braking force until accelerator pedal is pressed again to deactivate the function. If a malfunction occurs in the ABH function, an error message may appear in the vehicle information display. The system also has warnings and indicators for steep slope conditions.
>
> The automatic brake hold can be deactivated by pressing the brake pedal firmly while pushing the ABH switch or when driver's seatbelt is unfastened, door opened, power switch placed in "OFF" position, parking brake applied manually, or an error occurs in the ABH function. The system prevents wheels from locking during hard braking on slippery surfaces and helps maintain steering control.
>
> The automatic brake hold indicator light may appear in the vehicle

### References
- Langchain Docs
- GPT4ALL Python SDK Docs