<a href="https://colab.research.google.com/github/nathan-young1/RAG-WITH-GEMMA/blob/main/Retrieval_Argumented_Generation_(RAG)_with_Google_Gemma_%26_ChromaDB.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

## Retrieval Augmented Generation (RAG)

Large language models (LLMs) like Gemma are great at understanding language and generating fluent text. However, sometimes they struggle with factual accuracy or keeping information up to date. Retrieval augmented generation (RAG) solves this by adding a "research assistant" step:

1. **Retrieval**: When you give the LLM a prompt or question, RAG first searches through a database of texts 📜 - like having access to a giant virtual library! It retrieves relevant snippets of information that could be useful for composing its response.

2. **Augmentation**: Those retrieved context passages are then incorporated into the prompt to the LLM 📝, giving it an information source to base the answer on. Just like reading research notes from a database and integrating them into your understanding before writing on a topic.

3. **Generation**: Finally, the LLM leverages the augmented context to expand its knowledge and language capabilities to generate a response. Making the text produced not just fluent, but also accurate and factual, since it's based on relevant reference material.

In essence, RAG reduces the LLM's chance of hallucinating because now it gets to consult a knowledge base before responding. This makes responses more reliable and trustworthy, especially for topics requiring specific up-to-date facts.

<img src="https://python.langchain.com/assets/images/vector_stores-125d1675d58cfb46ce9054c9019fea72.jpg" height=400 width=800/>

⭐ Photo credits: [Langchain](https://python.langchain.com/docs/modules/data_connection/vectorstores/)

### **Retrieval**

To use RAG, we need to have a database of documents that can provide relevant information for our queries. In this tutorial, we will create a database from google gemini privacy policy & support page as at 18th April 2024 (note i have already converted the webpage to pdf for you). We will use Langchain, ChromaDB, and Hugging Face to perform RAG on this book.

The process of creating a database involves the following steps:

- **Chunking**: We divide the book (pdf in this case) into smaller pieces, such as paragraphs or sentences, that can be easily indexed and retrieved.

- **Embedding**: We use a pre-trained model from Hugging Face to convert each chunk into a vector representation, also known as a sentence embedding. This captures the semantic meaning of the chunk and allows us to compare it with other chunks or queries.

💡: For more information on vector embeddings check out the word embeddings section in my last lesson at [Notebook Link](https://www.kaggle.com/code/nathanyoung1/transformer-based-language-translation-in-pytorch). The word embeddings are combined to form sentence embeddings which we will refer to as vector embeddings throughout this tutorial.

- **Indexing**: We store the vector embeddings in a vector database, such as ChromaDB, that can efficiently perform similarity search. This means that given a query vector, we can find the most similar vectors in the database, and retrieve the corresponding chunks.

When we want to use RAG to generate a response for a query, we first embed the query using the same model as before. Then, as shown in the image above 👆 we use the vector database to find the most similar embeddings to the query embedding. These similar embeddings are linked to particular chunks of our document. We then fed this chunks as context to the LLM, enabling it to generate a coherent and informative answer.

In [None]:
# install the vector database, langchain, hugging face sentence_transformers
!pip install --quiet chromadb langchain sentence_transformers

#### Download Gemini privacy policy & Support from GDrive Link

In [None]:
!pip install --quiet gdown

In [None]:
import gdown
url_tofile = "https://drive.google.com/uc?export=download&id=1MW2boTI9PEsobl5Kn0yDKbIXBaTCBOTg"

# download
gdown.download(url_tofile)

Downloading...
From: https://drive.google.com/uc?export=download&id=1MW2boTI9PEsobl5Kn0yDKbIXBaTCBOTg
To: /content/google gemini privacy policy and support @ 18 April 2024.pdf
100%|██████████| 1.31M/1.31M [00:00<00:00, 46.7MB/s]


'google gemini privacy policy and support @ 18 April 2024.pdf'

#### Retrieval Cont'd

In [None]:
!pip install --quiet pdfminer pdfminer.six

[2K     [91m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m[91m╸[0m [32m5.6/5.6 MB[0m [31m47.7 MB/s[0m eta [36m0:00:01[0m[2K     [91m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m[91m╸[0m [32m5.6/5.6 MB[0m [31m47.7 MB/s[0m eta [36m0:00:01[0m[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m5.6/5.6 MB[0m [31m29.6 MB/s[0m eta [36m0:00:00[0m
[?25h

In [None]:
from langchain_community.document_loaders import PDFMinerLoader

# Load the entire pdf file as one document.
loader = PDFMinerLoader("/content/google gemini privacy policy and support @ 18 April 2024.pdf")
entire_pdf_asdocument = loader.load()[0] # there is only one documents, so am getting it.

In [None]:
from langchain.embeddings import HuggingFaceBgeEmbeddings

# Load an embedding model from hugging face.
embed_model = HuggingFaceBgeEmbeddings(model_name="sentence-transformers/all-MiniLM-L6-v2")

In [None]:
from langchain_text_splitters import RecursiveCharacterTextSplitter

# Explanation: The RecursiveCharacterTextSplitter breaks down a large text into smaller chunks.
# It uses a predefined set of characters (like spaces, newlines, etc.) to split the text.
# The goal is to find the best split that balances size and meaningful content.

# Create an instance of the RecursiveCharacterTextSplitter
text_splitter = RecursiveCharacterTextSplitter(
    # Set a really small chunk size, just to demonstrate.
    chunk_size=512,  # Given that an average word is 5 letters... approximately 102 words
    chunk_overlap=102,  # Approximately 20% overlap (around 20 words).
    is_separator_regex=False,  # Indicates that separators are not regular expressions.
)

# use the text splitter to split into chunks
chunks = text_splitter.create_documents([entire_pdf_asdocument.page_content])

In [None]:
from langchain_community.vectorstores import Chroma

# embed and insert all chunks of the documents into the vector database
vector_db = Chroma.from_documents(
    chunks,
    embed_model, # model to use for embedding the document chunks before storing.
    persist_directory='vector_db', # persist the database in memory.
    collection_name='gemini_privacy_book' # name of the collection to store the chunks in.
)

In [None]:
# perform a vector similarity search on a query.
query = "what do you do with my images ?"

# return the chunks of the most similar five embeddings in the db
chunks_retrieved = vector_db.similarity_search(query, k=1)

print(chunks_retrieved[0].page_content)

.

How does Google work with my uploaded images?

How images in prompts work

When you add an image to your prompt, Gemini Apps use Google Lens technology
to understand what's in the image. For example, Google Lens might interpret an
image's pixels as a cat jumping. Gemini Apps add this information to your prompt
to understand your request better. Google uses this information just like any other
prompt, as explained in the Gemini Apps Privacy Notice.

How we limit use of your actual images


## **Argumentation** ➕

Now that we have setup a vector database and can retrieve similar chunks to our query, we are going to combine this chunks together to form a context. This context is then passed together with our query as the prompt to our LLM.

In [None]:
# util function to join all retrieved documents chunks together to form a context.
def join_chunks(chunks_retrieved):
    return "\n\n".join([chunk.page_content for chunk in chunks_retrieved])

## **Generation** ✍️

In [None]:
# prompt to condition the LLM's behavior.
system_prompt = """
You are an AI assistant specializing in Google Gemini's privacy policy. Your role is to help users understand and navigate the privacy policy by answering their questions based on the provided context.

When given a context and a question related to the privacy policy:

Carefully read and comprehend the context.
1. Reason over the information in the context to infer an answer to the user's question, even if the answer is not stated verbatim.
2. If the context / your system prompt contains enough relevant information to reasonably infer an answer, provide your best answer interpolated from the context / your system prompt.
3. However, if the context / your system prompt does not provide sufficient or relevant information to answer the question, even with inference, politely respond: "Sorry, I don't have enough or relevant information to answer your question."

Stay focused on the topic of Google Gemini's privacy policy and do not engage in discussions outside of this topic instead tell user what you can only talk about and ask them to stay on topic.

"""

In [None]:
from langchain_core.prompts import PromptTemplate

# Template so we can attach our context and query as prompt to the LLM on the fly.
template = '''

Context: {context}

Question: {question}

Your Answer:
'''

prompt = PromptTemplate.from_template(template)

In [None]:
# We will be using Groq an LLM provider
!pip install --quiet Groq

In [None]:
from groq import Groq

# Establish a connection to Groq. get your api by signing up @ wow.groq.com
client = Groq(api_key='<Your Groq API key Here>')

In [None]:
def ask_question(user_question):

    context = vector_db.similarity_search(user_question, k=7)

    response = client.chat.completions.create(
        model="gemma-7b-it",
        messages=[
            {"role": "system", "content": system_prompt},
            {"role": "user", "content": prompt.format(context=join_chunks(context), question=user_question)}
        ],

        temperature=0.1) # lower values for more precise generation

    # Return the text of the response generated by the language model
    return response.choices[0].message.content, context


### **Testing**

In [None]:
question = 'when i chat with gemini, what do you do with my chat, do you store it or do you sell it to others?'

answer, context = ask_question(question)
display(answer)

'Google retains your conversations with Gemini Apps even after you turn off Gemini Apps Activity. This data is used to provide, improve, and develop Google products and services, as well as to provide you with a safer and better quality experience. Google does not sell your conversations to others.'

In [None]:
question2 = 'which company developed Gemini ?'

answer2, context2 = ask_question(question2)
display(answer2)

'The provided text indicates that **Gemini Apps were developed by Google LLC**. The context explicitly states that "Gemini Apps are provided by Google LLC".'

In [None]:
question3 = 'Turkey and Chicken which is better for peppersoup ?'

answer3, context3 = ask_question(question3)
display(answer3)

"I am unable to provide an answer to this question as it is not related to the provided context regarding Google Gemini's privacy policy."

👆👆 As you can see in the first and second example the LLM used our context as the source to generate an answer for us. While in the third example, instead of hallucinating the LLM simply replied that there isn't enough or revelant context to answer us.

### **Final Words**
This tutorial simply introduced you to a RAG techniques & implementation, in production more complex RAG techinques like Sentence-Window retrival, Auto-merging retrival e.t.c are used to improve context relevance. We also use tools 🏹 like TruERA for LLM response Evaluation.

**Congratulations** 🎉🎉
You can now use fundermental RAG techniques. 😊

Follow me on:

* **[LinkedIn Profile](https://www.linkedin.com/in/jonathan-okorie-843126216/)** for questions, deep learning projects, chat e.t.c.

* **[Twitter Profile](https://twitter.com/Nathan_Young_1)** for bite-sized knowledge & (questionable) puns.

* **[Kaggle Profile](https://www.kaggle.com/nathanyoung1)** to be notified when i create a new detailed notebook explanation.