# RAG - Retrieval-Augmented Generation

RAG extends the already powerful capabilities of LLMs to specific domains or an organization's internal knowledge base, all without the need to retrain the model. It is a cost-effective approach to improving LLM output so it remains relevant, accurate, and useful in various contexts.

## How RAG works

* Create external data
* Retrieve relevant information
* Augment the LLM prompt

In [9]:
import urllib.request
from langchain_community.document_loaders import TextLoader
from langchain.text_splitter import CharacterTextSplitter
from langchain_community.embeddings import OllamaEmbeddings
from langchain_community.chat_models import ChatOllama
from langchain_community.vectorstores import FAISS
from langchain.chains import RetrievalQA
from langchain import hub

### Configure embeddings model and the vector store

In [10]:
HOST = "http://localhost:11434"
LLM_MODEL = "gemma:7b"
EMBEDDINGS_MODEL = "nomic-embed-text:latest"
llm = ChatOllama(base_url=HOST, model=LLM_MODEL, temperature=0)
embeddings_model = OllamaEmbeddings(base_url=HOST, model=EMBEDDINGS_MODEL)

### Get the book 20.000 Leagues under the see

In [3]:
url = 'https://www.gutenberg.org/cache/epub/164/pg164.txt'
filename = '../data/twenty-thousand-leagues-under-the-sea.txt'

In [None]:
urllib.request.urlretrieve(url, filename)

In [4]:
loader = TextLoader(filename)
documents = loader.load()

In [7]:
text_splitter = CharacterTextSplitter(chunk_size=500, chunk_overlap=0)
all_splits = text_splitter.split_documents(documents)

Created a chunk of size 517, which is longer than the specified 500
Created a chunk of size 733, which is longer than the specified 500
Created a chunk of size 778, which is longer than the specified 500
Created a chunk of size 521, which is longer than the specified 500
Created a chunk of size 1076, which is longer than the specified 500
Created a chunk of size 719, which is longer than the specified 500
Created a chunk of size 613, which is longer than the specified 500
Created a chunk of size 556, which is longer than the specified 500
Created a chunk of size 999, which is longer than the specified 500
Created a chunk of size 1042, which is longer than the specified 500
Created a chunk of size 529, which is longer than the specified 500
Created a chunk of size 634, which is longer than the specified 500
Created a chunk of size 1378, which is longer than the specified 500
Created a chunk of size 579, which is longer than the specified 500
Created a chunk of size 532, which is longer 

### Create embedding vectors 

We put all vectors into an FAISS index named ```twenty_thousand_leagues_under_the_sea```.

In [8]:
vector_db = FAISS.from_documents(
    documents=all_splits, 
    embedding=embeddings_model
)

In [21]:
vector_db.save_local(folder_path="../.", index_name="twenty_thousand_leagues_under_the_sea")

### Load existing index

In [11]:
vector_db = FAISS.load_local(
    folder_path="../.", 
    index_name="twenty_thousand_leagues_under_the_sea", 
    embeddings=embeddings_model,
    allow_dangerous_deserialization=True)

### Query the database

In [12]:
query = "What is the Nautilus?"
docs = vector_db.similarity_search(query)

In [13]:
for doc in docs:
    print(doc.page_content)

“Ah, Commander! your _Nautilus_ is certainly a marvellous boat.”
“But how could you construct this wonderful _Nautilus_ in secret?”
I could not answer that question, and I feared that Captain Nemo would
rather take us to the vast ocean that touches the coasts of Asia and
America at the same time. He would thus complete the tour round the
submarine world, and return to those waters in which the _Nautilus_
could sail freely. We ought, before long, to settle this important
point. The _Nautilus_ went at a rapid pace. The polar circle was soon
passed, and the course shaped for Cape Horn. We were off the American
point, March 31st, at seven o’clock in the evening. Then all our past
sufferings were forgotten. The remembrance of that imprisonment in the
ice was effaced from our minds. We only thought of the future. Captain
Nemo did not appear again either in the drawing-room or on the
platform. The point shown each day on the planisphere, and, marked by
the lieutenant, showed me the exact dire

### Basic retrieval

In [14]:
retriever = vector_db.as_retriever()

In [15]:
docs = retriever.get_relevant_documents(query)

In [16]:
for doc in docs:
    print(doc.page_content)

“Ah, Commander! your _Nautilus_ is certainly a marvellous boat.”
“But how could you construct this wonderful _Nautilus_ in secret?”
I could not answer that question, and I feared that Captain Nemo would
rather take us to the vast ocean that touches the coasts of Asia and
America at the same time. He would thus complete the tour round the
submarine world, and return to those waters in which the _Nautilus_
could sail freely. We ought, before long, to settle this important
point. The _Nautilus_ went at a rapid pace. The polar circle was soon
passed, and the course shaped for Cape Horn. We were off the American
point, March 31st, at seven o’clock in the evening. Then all our past
sufferings were forgotten. The remembrance of that imprisonment in the
ice was effaced from our minds. We only thought of the future. Captain
Nemo did not appear again either in the drawing-room or on the
platform. The point shown each day on the planisphere, and, marked by
the lieutenant, showed me the exact dire

In [17]:
prompt = hub.pull("rlm/rag-prompt-llama")

In [18]:
prompt

ChatPromptTemplate(input_variables=['context', 'question'], metadata={'lc_hub_owner': 'rlm', 'lc_hub_repo': 'rag-prompt-llama', 'lc_hub_commit_hash': '693a2db5447e3b58c060a6ac02758dc7f1aaaaa4ee6214d127bf70b443158630'}, messages=[HumanMessagePromptTemplate(prompt=PromptTemplate(input_variables=['context', 'question'], template="[INST]<<SYS>> You are an assistant for question-answering tasks. Use the following pieces of retrieved context to answer the question. If you don't know the answer, just say that you don't know. Use three sentences maximum and keep the answer concise.<</SYS>> \nQuestion: {question} \nContext: {context} \nAnswer: [/INST]"))])

In [19]:
# RetrievalQA
qa_chain = RetrievalQA.from_chain_type(
    llm,
    retriever=retriever,
    chain_type_kwargs={"prompt": prompt}
)
result = qa_chain({"query": query})


  warn_deprecated(


In [20]:
result

{'query': 'What is the Nautilus?',
 'result': 'Sure, here is the answer to the question:\n\nThe Nautilus is a boat that is described in the text. It is a marvelous boat that is owned by Commander. The text does not describe the Nautilus in detail, therefore I cannot answer the question.'}