# Gen AI Essentials
Module-3 Retrieval Augumented Generation (RAG)

## *Learning Naive Retrieval Augmented Generation (RAG) with Python*
This notebook introduces developers to Retrieval Augmented Generation (RAG), explaining its concepts, use cases, and implementation. We'll use Google Gemini for the LLM and FAISS for managing vector embeddings.


## **1. What is Retrieval Augmented Generation (RAG)?**
Retrieval Augmented Generation (RAG) is a hybrid architecture combining retrieval-based systems with large language models (LLMs). Instead of relying solely on an LLM's pre-trained knowledge, RAG fetches domain-specific context or documents from an external knowledge base, ensuring accurate and up-to-date responses.

**RAG Workflow:**
1. **Retrieve**: Fetch relevant information (e.g., documents or context) from a knowledge base based on a query.
2. **Augment**: Prepare prompt with relevant information as a context
3. **Generate**: Use the retrieved information as context for the LLM to generate precise answers.

## **2. Why is RAG Required?**
### Use Cases and Problems Solved:
- **Limitations of LLM Knowledge**: Overcomes the issue of outdated or incomplete training data in LLMs.
- **Enterprise-specific Queries**: Helps answer highly specialized questions using proprietary data.
- **Applications**:
  - Customer Support (FAQs with latest data)
  - Personalized Content Recommendations
  - Complex Research Assistance
  - Legal/Financial Document Analysis

In [1]:
# Mount Google Drive
from google.colab import drive
drive.mount('/content/mandar_gdrive')

Mounted at /content/mandar_gdrive


In [5]:
# Install langchain and related dependencies
# !pip install langchain
# !pip install langchain_community
#!pip install langchain-google-genai -q
#!pip install PyMuPDF

Collecting PyMuPDF
  Downloading pymupdf-1.25.1-cp39-abi3-manylinux2014_x86_64.manylinux_2_17_x86_64.whl.metadata (3.4 kB)
Downloading pymupdf-1.25.1-cp39-abi3-manylinux2014_x86_64.manylinux_2_17_x86_64.whl (20.0 MB)
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m20.0/20.0 MB[0m [31m63.7 MB/s[0m eta [36m0:00:00[0m
[?25hInstalling collected packages: PyMuPDF
Successfully installed PyMuPDF-1.25.1


In [24]:
import yaml
with open('/content/mandar_gdrive/MyDrive/Trainings/GenAI Essentials/config.yml', 'r') as f:
    config = yaml.load(f,Loader=yaml.SafeLoader )

In [4]:
from langchain_google_genai import ChatGoogleGenerativeAI

# Create a Gemini LLM instance
# For Gemini 1.5 pro use # model="gemini-1.5-pro"
llm = ChatGoogleGenerativeAI(model="gemini-1.5-pro", google_api_key =config['gemini_api_key'] )



In [7]:
import fitz  # PyMuPDF

# Reading a sample PDF
pdf_path = "/content/mandar_gdrive/MyDrive/Trainings/GenAI Essentials/Spotify-Q1-24-Earnings-Call-Prepared-Remarks.pdf"
doc = fitz.open(pdf_path)

content = ""
for page in doc:
    content += page.get_text()

doc.close()
print(content[:500])  # Display first 500 characters


Q1 2024 Earnings Call Prepared Remarks
April 2024
Bryan Goldberg, Head of Investor Relations
Thanks operator and welcome to Spotify’s First Quarter 2024 Earnings Conference Call.
Joining us today will be Daniel Ek, our CEO and Ben Kung our interim CFO and VP of Financial
Planning & Analysis. We’ll start with opening comments from Daniel and Ben and afterwards
we’ll be happy to answer your questions.
Questions can be submitted by going to slido.com (S L I D O.com) and using the code
#SpotifyEarni


5. What is Chunking?
Chunking splits large documents into smaller, manageable segments (chunks) for better processing and retrieval. Chunks should be small enough to fit within the token limits of the embedding model or LLM.

In [16]:
from langchain.text_splitter import CharacterTextSplitter

def chunk_text(text, chunk_size, chunk_overlap):
  """
  Splits the given text into smaller chunks.

  Args:
    text: The input text to be chunked.
    chunk_size: The desired size of each chunk in characters.
    chunk_overlap: The number of characters to overlap between chunks.

  Returns:
    A list of text chunks.
  """

  text_splitter = CharacterTextSplitter(
      chunk_size=chunk_size,
      chunk_overlap=chunk_overlap
  )
  chunks = text_splitter.split_text(text)
  return chunks

# Example usage:
my_text = """Q1 2024 Earnings Call Prepared Remarks
April 2024
Bryan Goldberg, Head of Investor Relations
Thanks operator and welcome to Spotify’s First Quarter 2024 Earnings Conference Call.
Joining us today will be Daniel Ek, our CEO and Ben Kung our interim CFO and VP of Financial
Planning & Analysis. We’ll start with opening comments from Daniel and Ben and afterwards
we’ll be happy to answer your questions.
Questions can be submitted by going to slido.com (S L I D O.com) and using the code
#SpotifyEarni

Before we begin, let me quickly cover the Safe Harbor. During this call, we’ll be making certain
forward-looking statements including projections or estimates about the future performance of
the Company. These statements are based on current expectations and assumptions that are
subject to risks and uncertainties. Actual results could materially differ because of factors
discussed on today’s call, in our Shareholder Deck and in filings with the Securities and
Exchange Commission.
During this call, we’ll also refer to certain non-IFRS financial measures. Reconciliations between
our IFRS and non-IFRS financial measures can be found in our Shareholder Deck, in the
financial section of our Investor Relations website and also furnished today on Form 6-K.
And with that, I’ll turn it over to Daniel.
"""
#my_text = "This is a very long piece of text that needs to be split into smaller chunks for processing."
chunks = chunk_text(my_text, chunk_size=600, chunk_overlap=25)
print(len(chunks))
for i, chunk in enumerate(chunks):
  print(f"Chunk {i+1}: {chunk}")

2
Chunk 1: Q1 2024 Earnings Call Prepared Remarks
April 2024
Bryan Goldberg, Head of Investor Relations
Thanks operator and welcome to Spotify’s First Quarter 2024 Earnings Conference Call.
Joining us today will be Daniel Ek, our CEO and Ben Kung our interim CFO and VP of Financial
Planning & Analysis. We’ll start with opening comments from Daniel and Ben and afterwards
we’ll be happy to answer your questions.
Questions can be submitted by going to slido.com (S L I D O.com) and using the code
#SpotifyEarni
Chunk 2: Before we begin, let me quickly cover the Safe Harbor. During this call, we’ll be making certain
forward-looking statements including projections or estimates about the future performance of
the Company. These statements are based on current expectations and assumptions that are
subject to risks and uncertainties. Actual results could materially differ because of factors
discussed on today’s call, in our Shareholder Deck and in filings with the Securities and
Exchange Commis

6. Code: Creating and Using a FAISS Vector Database

In [17]:
!pip install faiss-cpu -q

[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m27.5/27.5 MB[0m [31m61.4 MB/s[0m eta [36m0:00:00[0m
[?25h

In [28]:
from langchain_google_genai import GoogleGenerativeAIEmbeddings

embeddings = GoogleGenerativeAIEmbeddings(model="models/embedding-001",google_api_key=config['GOOGLE_API_KEY'])
#try:
vector = embeddings.embed_query("Hello world")
#except GoogleGenerativeAIError as e:
    #print(f"Error embedding content: {e}")
vector[:5]

[0.04703257977962494,
 -0.04019005596637726,
 -0.02902696467936039,
 -0.026809632778167725,
 0.01892058178782463]

In [30]:
from langchain_google_genai import GoogleGenerativeAIEmbeddings
# get instance of an embedding model
embedding_model = GoogleGenerativeAIEmbeddings(model="models/embedding-001",google_api_key=config['GOOGLE_API_KEY'])

def generate_embeddings(embedding_model,text):
  """
  Generates embeddings for the given texts using Google Generative AI.

  Args:
    texts: A string for which embedding is to be generated.

  Returns:
    a vector embedding
  """
  vector =  embedding_model.embed_query(text)
  return vector

# Example usage:
my_texts = "This is the first text."#, "This is the second text."
text_embeddings = generate_embeddings(embedding_model,my_texts)

print(text_embeddings)

[0.018139317631721497, -0.05055618658661842, -0.011706100776791573, -0.04811829701066017, 0.05193569138646126, 0.05051935836672783, -0.0015297045465558767, -0.011294047348201275, 0.0013437260640785098, 0.03954709321260452, 0.03463238477706909, 0.04430802911520004, 0.033306948840618134, -0.021963708102703094, 0.010860363021492958, -0.007806161884218454, 0.0177259910851717, 0.02515266276896, -0.029688432812690735, -0.0587134025990963, 0.004387598019093275, 0.023885665461421013, 0.0009509326773695648, 0.005754344630986452, -0.0024500684812664986, -0.0033366556745022535, 0.010939100757241249, -0.07028411328792572, -0.034387193620204926, 0.03559841960668564, -0.049960702657699585, -0.0009579098550602794, -0.07453469932079315, 0.006237538065761328, 0.033382631838321686, -0.04787854105234146, 0.009801723062992096, 0.03495361655950546, -0.03657948970794678, 0.02209707908332348, -0.02719740755856037, -0.00731572974473238, 0.01511992048472166, 0.01277232076972723, 0.04550834372639656, 0.00305383

In [39]:
import faiss


# Initialize FAISS Index
dimension = 768  # Embedding vector size for OpenAI models
index = faiss.IndexFlatL2(dimension)



In [40]:
import numpy as np
# Insert embeddings into FAISS
for chunk in chunks:
    embedding = generate_embeddings(embedding_model,chunk)
    index.add(np.array([embedding]))
print(f"Number of vectors in index: {index.ntotal}")

Number of vectors in index: 2


In [41]:
PROMPT_TEMPLATE = """
You are a helpful assistant for enterprise users. Use the following context to answer the question.

Context:
{context}

Question:
{question}

Answer:
"""


8. Code: Retrieve Context from FAISS

In [42]:
def retrieve_context(question, top_k=1):
    query_embedding =generate_embeddings(embedding_model, question)
    distances, indices = index.search(np.array([query_embedding]), top_k)
    retrieved_chunks = [chunks[i] for i in indices[0]]
    return " ".join(retrieved_chunks)

question = "Who is the CEO of Spotify?"
context = retrieve_context(question)
print(f"Retrieved Context:\n{context}")


Retrieved Context:
Q1 2024 Earnings Call Prepared Remarks
April 2024
Bryan Goldberg, Head of Investor Relations
Thanks operator and welcome to Spotify’s First Quarter 2024 Earnings Conference Call.
Joining us today will be Daniel Ek, our CEO and Ben Kung our interim CFO and VP of Financial
Planning & Analysis. We’ll start with opening comments from Daniel and Ben and afterwards
we’ll be happy to answer your questions.
Questions can be submitted by going to slido.com (S L I D O.com) and using the code
#SpotifyEarni


9. Code: Generate Answer using LLM

In [43]:
def generate_answer(question, context):
    prompt = PROMPT_TEMPLATE.format(context=context, question=question)
    response = llm.invoke(prompt)
    return response #['choices'][0]['text'].strip()

answer = generate_answer(question, context)
print(f"Answer:\n{answer.content}")


Answer:
content='Daniel Ek\n' additional_kwargs={} response_metadata={'prompt_feedback': {'block_reason': 0, 'safety_ratings': []}, 'finish_reason': 'STOP', 'safety_ratings': []} id='run-629ec702-25a9-4de2-a80c-2d68b2b483b7-0' usage_metadata={'input_tokens': 165, 'output_tokens': 3, 'total_tokens': 168, 'input_token_details': {'cache_read': 0}}


Conclusion
By following this notebook, developers can implement a basic Retrieval Augmented Generation (RAG) workflow. This setup enables enterprise-specific solutions for QnA use case using Gen AI