# Retrieval Augmented Generation (RAG)
Retrieval Augmented Generation (RAG) combines information retrieval with language models. It first searches for relevant facts in external sources, then feeds those facts to the language model alongside the user's prompt. This helps the model generate more accurate and factual responses, even on topics beyond its initial training data.


---
## 1.&nbsp; Installations and Settings 🛠️
Now, we'll need to install two additional libraries to build our RAG model. These libraries will help us create and store numerical representations of our text, which are essential for this task.

1. **sentence_transformers:** This library will generate embeddings, which are like numerical summaries of our text. These embeddings will allow us to compare different sentences and identify relationships between them.

2. **faiss-gpu:** This library provides a fast and efficient database for storing and retrieving our numerical summaries.

In [None]:
!pip3 install -qqq langchain --progress-bar off
!CMAKE_ARGS="-DLLAMA_CUBLAS=on" FORCE_CMAKE=1 pip3 install -qqq llama-cpp-python --progress-bar off

!pip3 install -qqq sentence_transformers --progress-bar off
!pip3 install -qqq faiss-gpu --progress-bar off

!huggingface-cli download TheBloke/Mistral-7B-Instruct-v0.1-GGUF mistral-7b-instruct-v0.1.Q4_K_M.gguf --local-dir . --local-dir-use-symlinks False

  Installing build dependencies ... [?25l[?25hdone
  Getting requirements to build wheel ... [?25l[?25hdone


---
## 2.&nbsp; Setting up the LLM 🧠
Only one change to the previous code, the parameter `n_ctx` which controls the size of the input context.

In [None]:
from langchain.llms import LlamaCpp

llm = LlamaCpp(model_path = "/content/mistral-7b-instruct-v0.1.Q4_K_M.gguf",
               max_tokens = 2000,
               temperature = 0.1,
               top_p = 1,
               n_gpu_layers = -1,
               n_ctx = 1024)

AVX = 1 | AVX_VNNI = 0 | AVX2 = 1 | AVX512 = 1 | AVX512_VBMI = 0 | AVX512_VNNI = 0 | FMA = 1 | NEON = 0 | ARM_FMA = 0 | F16C = 1 | FP16_VA = 0 | WASM_SIMD = 0 | BLAS = 1 | SSE3 = 1 | SSSE3 = 1 | VSX = 0 | 
Model metadata: {'tokenizer.ggml.unknown_token_id': '0', 'tokenizer.ggml.eos_token_id': '2', 'general.architecture': 'llama', 'llama.rope.freq_base': '10000.000000', 'llama.context_length': '32768', 'general.name': 'mistralai_mistral-7b-instruct-v0.1', 'llama.embedding_length': '4096', 'llama.feed_forward_length': '14336', 'llama.attention.layer_norm_rms_epsilon': '0.000010', 'llama.rope.dimension_count': '128', 'tokenizer.ggml.bos_token_id': '1', 'llama.attention.head_count': '32', 'llama.block_count': '32', 'llama.attention.head_count_kv': '8', 'general.quantization_version': '2', 'tokenizer.ggml.model': 'llama', 'general.file_type': '15'}



### 2.1.&nbsp;  Test the LLM

In [None]:
answer_1 = llm.invoke("Write a poem about data science.")
print(answer_1)


Data Science is the future, it's where we're all headed,
With algorithms and models, predictions are guaranteed.
We take in information from all around,
And use it to create insights that astound.

From machine learning to deep learning too,
There's nothing that we can't do.
We can analyze data with ease,
And find patterns that were once hidden from our sight.

Data Science is the key to unlocking the unknown,
It helps us make decisions that are smart and sound.
With statistics and visualization, we can communicate our findings,
And help businesses make data-driven decisions.

So if you're interested in the future of technology,
Data Science is the place to be.
It's an ever-evolving field, with new techniques emerging all the time,
So there's always something new to learn and explore.


---
## 3.&nbsp; Retrieval Augmented Generation 🔃

### 3.1.&nbsp; Find our data
Our model needs some information to work its magic! Let's insert a copy of Alice's Adventures in Wonderland!

In [None]:
!wget -O /content/alice_in_wonderland.txt https://www.gutenberg.org/cache/epub/11/pg11.txt

--2024-03-13 13:50:35--  https://www.gutenberg.org/cache/epub/11/pg11.txt
Resolving www.gutenberg.org (www.gutenberg.org)... 152.19.134.47, 2610:28:3090:3000:0:bad:cafe:47
Connecting to www.gutenberg.org (www.gutenberg.org)|152.19.134.47|:443... connected.
HTTP request sent, awaiting response... 200 OK
Length: 174385 (170K) [text/plain]
Saving to: ‘/content/alice_in_wonderland.txt’


2024-03-13 13:50:35 (3.63 MB/s) - ‘/content/alice_in_wonderland.txt’ saved [174385/174385]



### 3.2.&nbsp; Load the data
Now that we have the data, we have to load it in a format LangChain can understand.


In [None]:
from langchain.document_loaders import TextLoader

loader = TextLoader("/content/alice_in_wonderland.txt")
documents = loader.load()

### 3.3.&nbsp; Splitting the document
Let's make it easier to digest by [splitting](https://python.langchain.com/docs/modules/data_connection/document_transformers/) the document into chunks.


In [None]:
from langchain.text_splitter import RecursiveCharacterTextSplitter

text_splitter = RecursiveCharacterTextSplitter(chunk_size=800,
                                               chunk_overlap=150)

docs = text_splitter.split_documents(documents)

### 3.4.&nbsp; Creating vectors with embeddings

In [None]:
from langchain.embeddings import HuggingFaceEmbeddings

embedding_model = "sentence-transformers/all-MiniLM-l6-v2"
embeddings_folder = "/content/"

embeddings = HuggingFaceEmbeddings(model_name=embedding_model,
                                   cache_folder=embeddings_folder)

The cache for model files in Transformers v4.22.0 has been updated. Migrating your old cache. This is a one-time only operation. You can interrupt this and resume the migration later on by calling `transformers.utils.move_cache()`.


0it [00:00, ?it/s]

The secret `HF_TOKEN` does not exist in your Colab secrets.
To authenticate with the Hugging Face Hub, create a token in your settings tab (https://huggingface.co/settings/tokens), set it as secret in your Google Colab and restart your session.
You will be able to reuse this secret in all of your notebooks.
Please note that authentication is recommended but still optional to access public models or datasets.


modules.json:   0%|          | 0.00/349 [00:00<?, ?B/s]

config_sentence_transformers.json:   0%|          | 0.00/116 [00:00<?, ?B/s]

README.md:   0%|          | 0.00/10.6k [00:00<?, ?B/s]

sentence_bert_config.json:   0%|          | 0.00/53.0 [00:00<?, ?B/s]

config.json:   0%|          | 0.00/612 [00:00<?, ?B/s]

pytorch_model.bin:   0%|          | 0.00/90.9M [00:00<?, ?B/s]

tokenizer_config.json:   0%|          | 0.00/350 [00:00<?, ?B/s]

vocab.txt:   0%|          | 0.00/232k [00:00<?, ?B/s]

tokenizer.json:   0%|          | 0.00/466k [00:00<?, ?B/s]

special_tokens_map.json:   0%|          | 0.00/112 [00:00<?, ?B/s]

1_Pooling/config.json:   0%|          | 0.00/190 [00:00<?, ?B/s]

To exemplify using embeddings to transform a sentence into a vector, let's look at an example:

In [None]:
test_text = "Why do data scientists make great comedians? They're always trying to make ANOVA pun"
query_result = embeddings.embed_query(test_text)
query_result

[0.009409704245626926,
 -0.023806296288967133,
 -0.012127567082643509,
 0.03612375259399414,
 -0.03382448852062225,
 -0.0797419548034668,
 0.07004597783088684,
 0.07465541362762451,
 0.040141817182302475,
 0.044190652668476105,
 -0.007505988236516714,
 -0.06001218780875206,
 -0.10028242319822311,
 0.03230978921055794,
 -0.039546310901641846,
 0.016906650736927986,
 -0.030313871800899506,
 -0.12780101597309113,
 -0.03218210116028786,
 -0.07546593248844147,
 7.657839159946889e-05,
 0.05085941031575203,
 0.12591636180877686,
 -0.04004547744989395,
 0.040401335805654526,
 -0.022957738488912582,
 -0.07265669852495193,
 -0.025434954091906548,
 -0.019824886694550514,
 0.011819693259894848,
 -0.03572343662381172,
 0.03657734766602516,
 0.07559998333454132,
 0.03425063565373421,
 -0.05330543592572212,
 -0.030826330184936523,
 0.021478520706295967,
 0.12243164330720901,
 -0.005445715039968491,
 0.04834014177322388,
 -0.004316528327763081,
 -0.043691303580999374,
 0.009050613269209862,
 0.0271101

In [None]:
characters = len(test_text)
dimensions = len(query_result)
print(f"The {characters} character sentence was transformed into a {dimensions} dimension vector")

The 84 character sentence was transformed into a 384 dimension vector


Embedding vectors have a fixed length, meaning each vector produced by this specific embedding will always have 384 dimensions. Choosing the appropriate embedding size involves a trade-off between accuracy and computational efficiency. Larger embeddings capture more semantic information but require more memory and processing power. Start with the provided MiniLM embedding as a baseline and experiment with different sizes to find the optimal balance for your needs.

### 3.5.&nbsp; Creating a vector database
Imagine a library where books aren't just filed alphabetically, but also by their themes, characters, and emotions. That's the magic of vector databases: they unlock information beyond keywords, connecting ideas in unexpected ways.

In [None]:
from langchain.vectorstores import FAISS

vector_db = FAISS.from_documents(docs, embeddings)

Let's save the db for later use.

In [None]:
vector_db.save_local("/content/faiss_index")

Here's the code to load it again.

In [None]:
# new_db = FAISS.load_local("/content/faiss_index", embeddings)

You can also search your database to see which vectors are close to your input.

In [None]:
vector_db.similarity_search("What does the Mad Hatter drink?")

[Document(page_content='CHAPTER VII.\nA Mad Tea-Party\n\n\nThere was a table set out under a tree in front of the house, and the\nMarch Hare and the Hatter were having tea at it: a Dormouse was sitting\nbetween them, fast asleep, and the other two were using it as a\ncushion, resting their elbows on it, and talking over its head. “Very\nuncomfortable for the Dormouse,” thought Alice; “only, as it’s asleep,\nI suppose it doesn’t mind.”\n\nThe table was a large one, but the three were all crowded together at\none corner of it: “No room! No room!” they cried out when they saw\nAlice coming. “There’s _plenty_ of room!” said Alice indignantly, and\nshe sat down in a large arm-chair at one end of the table.\n\n“Have some wine,” the March Hare said in an encouraging tone.', metadata={'source': '/content/alice_in_wonderland.txt'}),
 Document(page_content='“I’d rather finish my tea,” said the Hatter, with an anxious look at\nthe Queen, who was reading the list of singers.\n\n“You may go,” said 

### 3.6.&nbsp; Adding a prompt
Let's guide the model's behavior with a prompt.

In [None]:
from langchain.prompts.prompt import PromptTemplate

input_template = """
<s>
[INST] Answer the question based only on the following context: [/INST]
{context}
</s>
[INST] Question: {question} [/INST]
"""

prompt = PromptTemplate(template=input_template,
                        input_variables=["context", "question"])

### 3.7.&nbsp; RAG - chaining it all together
This is the final piece of the puzzle, we now bring everything together in a chain. Our vector database, our prompt, and our LLM join to give us retrieval augmented generation.

In [None]:
from langchain.chains import RetrievalQA

qa_chain = RetrievalQA.from_chain_type(
    llm=llm,
    retriever=vector_db.as_retriever(search_kwargs={"k": 2}), # top 2 results only, speed things up
    return_source_documents=True,
    chain_type_kwargs={"prompt": prompt},
)

In [None]:
answer = qa_chain.invoke("Who likes to chop off heads?")

answer

Llama.generate: prefix-match hit


{'query': 'Who likes to chop off heads?',
 'result': 'The Queen in "Alice\'s Adventures in Wonderland" likes to chop off heads.',
 'source_documents': [Document(page_content='“Leave off that!” screamed the Queen. “You make me giddy.” And then,\nturning to the rose-tree, she went on, “What _have_ you been doing\nhere?”\n\n“May it please your Majesty,” said Two, in a very humble tone, going\ndown on one knee as he spoke, “we were trying—”\n\n“_I_ see!” said the Queen, who had meanwhile been examining the roses.\n“Off with their heads!” and the procession moved on, three of the\nsoldiers remaining behind to execute the unfortunate gardeners, who ran\nto Alice for protection.\n\n“You shan’t be beheaded!” said Alice, and she put them into a large\nflower-pot that stood near. The three soldiers wandered about for a\nminute or two, looking for them, and then quietly marched off after the\nothers.\n\n“Are their heads off?” shouted the Queen.', metadata={'source': '/content/alice_in_wonderland.

#### 3.7.1.&nbsp; Exploring the returned dictionary

In [None]:
answer.keys()

dict_keys(['query', 'result', 'source_documents'])

##### `query`

The question that we asked.

In [None]:
answer['query']

'Who likes to chop off heads?'

##### `result`

The response.

In [None]:
answer['result']

'The Queen in "Alice\'s Adventures in Wonderland" likes to chop off heads.'

In [None]:
print(answer['result'])

The Queen in "Alice's Adventures in Wonderland" likes to chop off heads.


##### `source_documents`

What information was used to form the response.

In [None]:
answer['source_documents']

[Document(page_content='“Leave off that!” screamed the Queen. “You make me giddy.” And then,\nturning to the rose-tree, she went on, “What _have_ you been doing\nhere?”\n\n“May it please your Majesty,” said Two, in a very humble tone, going\ndown on one knee as he spoke, “we were trying—”\n\n“_I_ see!” said the Queen, who had meanwhile been examining the roses.\n“Off with their heads!” and the procession moved on, three of the\nsoldiers remaining behind to execute the unfortunate gardeners, who ran\nto Alice for protection.\n\n“You shan’t be beheaded!” said Alice, and she put them into a large\nflower-pot that stood near. The three soldiers wandered about for a\nminute or two, looking for them, and then quietly marched off after the\nothers.\n\n“Are their heads off?” shouted the Queen.', metadata={'source': '/content/alice_in_wonderland.txt'}),
 Document(page_content='“In my youth,” said the sage, as he shook his grey locks,\n    “I kept all my limbs very supple\nBy the use of this oin

In [None]:
answer['source_documents'][0]

Document(page_content='“Leave off that!” screamed the Queen. “You make me giddy.” And then,\nturning to the rose-tree, she went on, “What _have_ you been doing\nhere?”\n\n“May it please your Majesty,” said Two, in a very humble tone, going\ndown on one knee as he spoke, “we were trying—”\n\n“_I_ see!” said the Queen, who had meanwhile been examining the roses.\n“Off with their heads!” and the procession moved on, three of the\nsoldiers remaining behind to execute the unfortunate gardeners, who ran\nto Alice for protection.\n\n“You shan’t be beheaded!” said Alice, and she put them into a large\nflower-pot that stood near. The three soldiers wandered about for a\nminute or two, looking for them, and then quietly marched off after the\nothers.\n\n“Are their heads off?” shouted the Queen.', metadata={'source': '/content/alice_in_wonderland.txt'})

In [None]:
answer['source_documents'][0].page_content

'“Leave off that!” screamed the Queen. “You make me giddy.” And then,\nturning to the rose-tree, she went on, “What _have_ you been doing\nhere?”\n\n“May it please your Majesty,” said Two, in a very humble tone, going\ndown on one knee as he spoke, “we were trying—”\n\n“_I_ see!” said the Queen, who had meanwhile been examining the roses.\n“Off with their heads!” and the procession moved on, three of the\nsoldiers remaining behind to execute the unfortunate gardeners, who ran\nto Alice for protection.\n\n“You shan’t be beheaded!” said Alice, and she put them into a large\nflower-pot that stood near. The three soldiers wandered about for a\nminute or two, looking for them, and then quietly marched off after the\nothers.\n\n“Are their heads off?” shouted the Queen.'

In [None]:
print(answer['source_documents'][0].page_content)

“Leave off that!” screamed the Queen. “You make me giddy.” And then,
turning to the rose-tree, she went on, “What _have_ you been doing
here?”

“May it please your Majesty,” said Two, in a very humble tone, going
down on one knee as he spoke, “we were trying—”

“_I_ see!” said the Queen, who had meanwhile been examining the roses.
“Off with their heads!” and the procession moved on, three of the
soldiers remaining behind to execute the unfortunate gardeners, who ran
to Alice for protection.

“You shan’t be beheaded!” said Alice, and she put them into a large
flower-pot that stood near. The three soldiers wandered about for a
minute or two, looking for them, and then quietly marched off after the
others.

“Are their heads off?” shouted the Queen.


The documents name also gets returned.

In [None]:
answer['source_documents'][0].metadata

{'source': '/content/alice_in_wonderland.txt'}

In [None]:
answer['source_documents'][0].metadata["source"]

'/content/alice_in_wonderland.txt'