### RAG (Retrieval Augmented Generation ) 
This notebook code reads a .pdf file, chunk and store into a db.
On requesting for an answer to the question, this code read from the stored db + model knowledge to answer 

LLM Model = ggml-gpt4all-j-v1.3-groovy

In [1]:
import textwrap
import chromadb

from langchain.llms import GPT4All
from langchain.embeddings import HuggingFaceEmbeddings

from langchain.text_splitter import RecursiveCharacterTextSplitter
from langchain.document_loaders import PyPDFLoader
from langchain.chains import RetrievalQA
from langchain import PromptTemplate, LLMChain


from langchain.vectorstores import Chroma
from chromadb.config import Settings

Download the model from https://gpt4all.io/models/ggml-gpt4all-j-v1.3-groovy.bin (Date : 10/17/2023)
File size of the model is 3gb.

In [8]:
CHROMA_SETTINGS = Settings(
    persist_directory="../db/",
    anonymized_telemetry=False,
    allow_reset=True
)
EMBEDDINGS_MODEL_NAME="all-MiniLM-L6-v2"
model_path = '../models/ggml-gpt4all-j-v1.3-groovy.bin'
vector_db_path='../db/'

text_wrapper = textwrap.TextWrapper(width=100)

Initialize GPT4All, Embeddings, Chroma db client and db and text Splitter

In [21]:
llm = GPT4All(model=model_path, max_tokens=1800, verbose=True, allow_download=False, repeat_last_n=0) # download if mode not exist
embeddings = HuggingFaceEmbeddings(model_name=EMBEDDINGS_MODEL_NAME)

Found model file at  ../models/ggml-gpt4all-j-v1.3-groovy.bin


Load and split files, source is NXT advanced programming 

In [10]:
pdf_loader = PyPDFLoader("../data/advanced_programmingforprint.pdf")
text_splitter = RecursiveCharacterTextSplitter(chunk_size=500, chunk_overlap=10)
docs = pdf_loader.load_and_split(text_splitter=text_splitter)
len(docs)

32

In [11]:
# Store into db
chroma_db = Chroma.from_documents( documents=docs, embedding=embeddings, persist_directory="../db/", 
             client_settings= CHROMA_SETTINGS)
chroma_db.persist()


### Similarity Search.

In [None]:
question = "What is data Wires  ? "
matched_docs = chroma_db.similarity_search(question)

for match_doc in matched_docs:
    print(match_doc.page_content)
    print()


### Retrieve from Database
- Open DB
- Similarity search from persisted DB
- Create context

In [20]:
#retrieve from vector db
persisted_db = Chroma(persist_directory=CHROMA_SETTINGS.persist_directory, 
                            embedding_function=embeddings, 
                            client_settings=CHROMA_SETTINGS)

source_vector_store_retriever = persisted_db.as_retriever(search_type="similarity_score_threshold",
                                            search_kwargs={'score_threshold': 0.8})

matched_docs = persisted_db.similarity_search(question)

context = ''
for doc in matched_docs:
    context = context + doc.page_content + "\n\n"

print(text_wrapper.fill(context))

9/13/2011 7Data Wires We use Data Wires to pass  information around inside of a program.  This is
easier than using variables and accomplishes much of the same function Data wires can go between
blocks and are connected at the Data T erminals (normally hidden) Shown here are the same move
blocks with the Data T erminal hidden and shown Press here to  openPress here to  close Linking the
Loop to the Motors Lets run a loop from 0 to 100 And lets make the Motor Power level equal the
Loop  9/13/2011 15Variables in the Program In this program we see  TWO strands running at the same
time The lower strand is looping and assigning the value of the LS to the variable “LightValue” and
The upper strand is looping and displaying the value of the variable in the NXT window Switches
with multiple discrete options In this example, the NXT will receive  a Bluetooth text message and
feed this as a value to a Switch  9/13/2011 17Action Palette The five action palette  blocks:
Motor Sound Di

Create Prompt template, input variables
- context
- question

Output - llm chain not returning full response, where as RetrievalQA.from_chain_type is retuning full expected response.

In [23]:
template = """
Please use the following context to answer questions. If you don't know the answer, say I am sorry I don't know.
Context: {context}
 
Question: {question}
Answer: Here you go"""

prompt = PromptTemplate(input_variables=['context','question'], template=template).partial(context=context)

llm_chain = LLMChain(prompt=prompt, llm=llm, verbose=True)
response = llm_chain.run(question)
print(text_wrapper.fill(f'Response {response}' ))



[1m> Entering new LLMChain chain...[0m
Prompt after formatting:
[32;1m[1;3m
Please use the following context to answer questions. If you don't know the answer, say I am sorry I don't know.
Context: 9/13/2011
7Data Wires
We use Data Wires to pass 
information around inside of a program.  This is easier than using variables and accomplishes much of the same function
Data wires can go between blocks and are connected at the Data T erminals (normally hidden)
Shown here are the same move blocks with the Data T erminal hidden and shown
Press here to 
openPress here to 
close
Linking the Loop to the Motors
Lets run a loop from 0 to 100
And lets make the Motor Power level equal the Loop

9/13/2011
15Variables in the Program
In this program we see 
TWO strands running at the same time
The lower strand is looping and assigning the value of the LS to the variable “LightValue” and
The upper strand is looping and displaying the value of the variable in the NXT window
Switches with mul

Response from LLM without prompting

In [None]:
chunk = 4
retriever = persisted_db.as_retriever(search_kwargs={"k": chunk})
retrieval_qa = RetrievalQA.from_chain_type(llm=llm, chain_type="stuff", retriever=retriever, return_source_documents=True)
result = retrieval_qa(question) 

answer, source = result['result'], result['source_documents']

print(text_wrapper.fill(f'Answer : {answer}'))
print('\n')
print(text_wrapper.fill(f'Source : {source}' ))


### Custom Prompt (WIP)

### Prompt types
- Leading words

The model is forced to break down its solution into multiple, more manageable steps rather than being allowed to just hazard a guess
```
“think step by step”
```

In [28]:
question = 'what is the first animal that evolved on earth ?'
result = retrieval_qa(question)
answer, source = result['result'], result['source_documents']

print(text_wrapper.fill(f'Answer : {answer}'))
print('\n')
print(text_wrapper.fill(f'Source : {source}' ))

Answer :  The answer depends on which time period or geological era we are referring to. However,
some of the earliest known animals include trilobites (around 520 million years ago), conodonts
(about 542-541 million years ago) and jawless fish such as agnathans (which evolved around 540
million years ago).


Source : [Document(page_content='counter (0 to 100) which would match the range of the motor power
level\n\uf097You need to go to the LOOP block and click the “show counter” option at the
bottom\n\uf097Now open up the motor Data T erminal and use the cursor to click between the LOOPS
counter button and the motor speed control\n\uf097Set the motor duration to 0.1 second\n\uf097Now
the robot will start at a stand still (count = power = 0)', metadata={'page': 6, 'source':
'../data/advanced_programmingforprint.pdf'}), Document(page_content='9/13/2011\n4The Switch,
terrarium control\n(switch nested in loop)\n\uf097Using a temperature sensor insi de a terrarium and
a 9 volt \nmotor (LEG