# RAG - Retrieval-Augmented Generation

RAG extends the already powerful capabilities of LLMs to specific domains or an organization's internal knowledge base, all without the need to retrain the model. It is a cost-effective approach to improving LLM output so it remains relevant, accurate, and useful in various contexts.

## How RAG works

* Create external data
* Retrieve relevant information
* Augment the LLM prompt

In [2]:
import os
import urllib.request
from langchain_community.document_loaders import TextLoader, WebBaseLoader
from langchain_community.embeddings.sentence_transformer import (
    SentenceTransformerEmbeddings,
)
from langchain.schema.output_parser import StrOutputParser
from langchain.globals import set_llm_cache
from langchain.cache import SQLiteCache
from langchain_community.vectorstores import SQLiteVSS
from langchain.text_splitter import CharacterTextSplitter, RecursiveCharacterTextSplitter
from langchain_community.embeddings import OllamaEmbeddings
from langchain_community.chat_models import ChatOllama
from langchain_community.vectorstores import FAISS
from langchain.prompts import ChatPromptTemplate
from langchain.schema.runnable import RunnablePassthrough
from IPython.display import display, Markdown, JSON
from langchain_openai import AzureChatOpenAI, AzureOpenAIEmbeddings
from langchain_core.messages import HumanMessage, SystemMessage
from dotenv import load_dotenv

### Configure embeddings model and the vector store

In [None]:
HOST = "http://localhost:11434"
LLM_MODEL = "gemma:7b"
EMBEDDINGS_MODEL = "nomic-embed-text:latest"
llm = ChatOllama(base_url=HOST, model=LLM_MODEL, temperature=0)
embeddings_model = OllamaEmbeddings(base_url=HOST, model=EMBEDDINGS_MODEL)

In [6]:
load_dotenv()
llm = AzureChatOpenAI(
    openai_api_version="2023-05-15",
    azure_deployment="gpt-4",
)
embeddings_model = AzureOpenAIEmbeddings()

### Get the content from website https://en.wikipedia.org/wiki/GPT-4

In [7]:
loader = WebBaseLoader("https://en.wikipedia.org/wiki/GPT-4")
data = loader.load()

### Indexing split

In [8]:
text_splitter = RecursiveCharacterTextSplitter(chunk_size=1000, chunk_overlap=200)
docs = text_splitter.split_documents(data)
texts = [doc.page_content for doc in docs]

In [9]:
len(texts)

62

In [10]:
texts[2]

'Print/export\n\t\n\n\nDownload as PDFPrintable version\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\nFrom Wikipedia, the free encyclopedia\n\n\n2023 text-generating language model\n\n\nGenerative Pre-trained Transformer 4 (GPT-4)Developer(s)OpenAIInitial releaseMarch\xa014, 2023; 12 months ago\xa0(2023-03-14)PredecessorGPT-3.5Type\nMultimodal\nLarge language model\nGenerative pre-trained transformer\nFoundation model\nLicenseProprietaryWebsiteopenai.com/gpt-4\xa0\nPart of a series onMachine learningand data mining\nParadigms\nSupervised learning\nUnsupervised learning\nOnline learning\nBatch learning\nMeta-learning\nSemi-supervised learning\nSelf-supervised learning\nReinforcement learning\nCurriculum learning\nRule-based learning\nQuantum machine learning'

In [11]:
vector_db = FAISS.from_documents(
    docs, 
    embeddings_model
)

In [12]:
vector_db.save_local("faiss_rag_index")

In [13]:
print(vector_db.index.ntotal)

62


In [14]:
query ="How many parameters has GPT-4?"
docs_and_scores = vector_db.similarity_search_with_score(query)

In [15]:
for doc in docs_and_scores:
    print(doc)

(Document(page_content='Sam Altman stated that the cost of training GPT-4 was more than $100 million.[45] News website Semafor claimed that they had spoken with "eight people familiar with the inside story" and found that GPT-4 had 1 trillion parameters.[46]', metadata={'source': 'https://en.wikipedia.org/wiki/GPT-4', 'title': 'GPT-4 - Wikipedia', 'language': 'en'}), 0.27346978)
(Document(page_content='Background[edit]\nFurther information: GPT-3 §\xa0Background, and GPT-2 §\xa0Background\nOpenAI introduced the first GPT model (GPT-1) in 2018, publishing a paper called "Improving Language Understanding by Generative Pre-Training."[8] It was based on the transformer architecture and trained on a large corpus of books.[9] The next year, they introduced GPT-2, a larger model that could generate coherent text.[10] In 2020, they introduced GPT-3, a model with 100 times as many parameters as GPT-2, that could perform various tasks with few examples.[11] GPT-3 was further improved into GPT-3.

In [16]:
retriever = vector_db.as_retriever()

# Version 1
docs = retriever.invoke(query)

# Version 2
docs = retriever.get_relevant_documents(query)

In [17]:
docs

[Document(page_content='Sam Altman stated that the cost of training GPT-4 was more than $100 million.[45] News website Semafor claimed that they had spoken with "eight people familiar with the inside story" and found that GPT-4 had 1 trillion parameters.[46]', metadata={'source': 'https://en.wikipedia.org/wiki/GPT-4', 'title': 'GPT-4 - Wikipedia', 'language': 'en'}),
 Document(page_content='Background[edit]\nFurther information: GPT-3 §\xa0Background, and GPT-2 §\xa0Background\nOpenAI introduced the first GPT model (GPT-1) in 2018, publishing a paper called "Improving Language Understanding by Generative Pre-Training."[8] It was based on the transformer architecture and trained on a large corpus of books.[9] The next year, they introduced GPT-2, a larger model that could generate coherent text.[10] In 2020, they introduced GPT-3, a model with 100 times as many parameters as GPT-2, that could perform various tasks with few examples.[11] GPT-3 was further improved into GPT-3.5, which was

In [18]:
prompt_template = """
Answer the question based only on the supplied context. If you don't know the answer, say you don't know the answer.
Context: {context}
Question: {question}
Your answer:
"""
prompt = ChatPromptTemplate.from_template(prompt_template)

In [19]:
rag_chain = (
    {"context": retriever, "question": RunnablePassthrough()}
    | prompt
    | llm
    | StrOutputParser()
)

In [20]:
result = rag_chain.invoke("How many parameters has GPT-4")
display(Markdown(result))

GPT-4 has 1 trillion parameters according to some sources, while rumors claim it has 1.76 trillion parameters. The precise number has not been officially confirmed by OpenAI.

### Get the book 20.000 Leagues under the see

In [None]:
url = 'https://www.gutenberg.org/cache/epub/164/pg164.txt'
filename = '../data/twenty-thousand-leagues-under-the-sea.txt'
urllib.request.urlretrieve(url, filename)

In [None]:
loader = TextLoader(filename)
documents = loader.load()

In [None]:
text_splitter = CharacterTextSplitter(chunk_size=1000, chunk_overlap=200)
docs = text_splitter.split_documents(documents)
texts = [doc.page_content for doc in docs]

### Create embedding vectors 

We put all vectors into the sqlite-vss in a table named ```twenty_thousand_leagues_under_the_sea```.
The db_file parameter is the name of the file you want as your sqlite database.

In [None]:
vector_db = SQLiteVSS.from_texts(
    texts=texts,
    embedding=embeddings_model,
    table="twenty_thousand_leagues_under_the_sea",
    db_file="../vss.db",
)

### Query the database

In [None]:
query = "What is the Nautilus?"
docs = vector_db.similarity_search(query)

In [None]:
for doc in docs:
    print(doc.page_content)

### Basic retrieval

In [None]:
retriever = vector_db.as_retriever()

In [None]:
docs = retriever.get_relevant_documents("What is Ned's last name")

In [None]:
for doc in docs:
    print(doc.page_content)

In [None]:
print(len(docs))

In [None]:
prompt_template = """
Answer the question based only on the supplied context. If you don't know the answer, say you don't know the answer.
Context: {context}
Question: {question}
Your answer:
"""
prompt = ChatPromptTemplate.from_template(prompt_template)

In [None]:
llm = ChatOllama(base_url=HOST, model=MODEL, temperature=0)
#set_llm_cache(SQLiteCache(database_path=".langchain.db"))

In [None]:
chain = (
    prompt
    | llm
    | StrOutputParser()
)

In [None]:
result = chain.invoke({"context": retriever, "question": "What is Professor Aronnax's name"})

In [None]:
display(Markdown(result))