### RAG (Retrieval Augmented Generation ) 
This notebook code reads a .pdf file, chunk and store into a db.
On requesting for an answer to the question, this code read from the stored db + model knowledge to answer 

LLM Model = ggml-gpt4all-j-v1.3-groovy

In [133]:
import textwrap
import chromadb

from langchain.llms import GPT4All
from langchain.embeddings import HuggingFaceEmbeddings

from langchain.text_splitter import RecursiveCharacterTextSplitter
from langchain.document_loaders import PyPDFLoader
from langchain.chains import RetrievalQA


from langchain.vectorstores import Chroma
from chromadb.config import Settings

Download the model from https://gpt4all.io/models/ggml-gpt4all-j-v1.3-groovy.bin (Date : 10/17/2023)
Size of the model is 3gb.

In [134]:
CHROMA_SETTINGS = Settings(
    persist_directory="../db/",
    anonymized_telemetry=False,
    allow_reset=True
)
EMBEDDINGS_MODEL_NAME="all-MiniLM-L6-v2"
model_path = '../models/ggml-gpt4all-j-v1.3-groovy.bin'

text_wrapper = textwrap.TextWrapper(width=100)

Initialize GPT4All, Embeddings, Chroma db client and db and text Splitter

In [135]:
llm = GPT4All(model=model_path, max_tokens=800, verbose=False)
embeddings = HuggingFaceEmbeddings(model_name=EMBEDDINGS_MODEL_NAME)

chroma_client = chromadb.PersistentClient(settings=CHROMA_SETTINGS, path="../db/")
chroma_db = Chroma(embedding_function=embeddings, persist_directory="../db/", 
            client_settings= CHROMA_SETTINGS, client=chroma_client)

pdf_loader = PyPDFLoader("../data/advanced_programmingforprint.pdf")
text_splitter = RecursiveCharacterTextSplitter(chunk_size=500, chunk_overlap=10)


Found model file at  ../models/ggml-gpt4all-j-v1.3-groovy.bin


Load and split files 

In [136]:
docs = pdf_loader.load_and_split(text_splitter=text_splitter)
len(docs)

32

Batch the chunked file and add to chroma db

In [138]:
chunk = 4
model_n_ctx=1000
model_n_batch = 8

question = 'What is switch statement'

# retrieve from vector db
source_vector_store_retriever = chroma_db.as_retriever(search_kwargs={"k": chunk})
retrieval_qa = RetrievalQA.from_chain_type(llm=llm, chain_type="stuff", 
                                              retriever=source_vector_store_retriever, 
                                              return_source_documents=True)



In [139]:

result = retrieval_qa(question)
answer, source = result['result'], result['source_documents']

print(text_wrapper.fill(f'Answer : {answer}'))
print('\n')
print(text_wrapper.fill(f'Source : {source}' ))


Answer :  The switch statement is a programming language feature that allows you to make decisions
based on whether or not certain conditions are met. It can be used for different purposes such as
controlling loops, checking if variables meet specific criteria, making comparisons between values,
etc. In the context of this code snippet, it appears to control an external device (motor) using a
temperature sensor and switch statement logic.


Source : [Document(page_content='\uf097The Switch has been defined with multiple text messages to
compare this value to (forward, back, left, right, and stop)\n\uf097In order to define multiple
switch options you MUST deselect the flat view , then use the + or – options to add or delete
options\n\uf097You should select one of the values as default (no new value), in this case we
selected stop', metadata={'page': 14, 'source': '../data/advanced_programmingforprint.pdf'}),
Document(page_content='\uf097The Switch has been defined with multiple text mes