## Advance RAG Pipeline With Vector DataBase

### 1 Document Loaders

In [1]:
# pdf loader
from langchain_community.document_loaders import PyPDFLoader # type: ignore
pdf = PyPDFLoader("GenerativeAI.pdf")
pdf_doc = pdf.load()
pdf_doc

[Document(metadata={'source': 'GenerativeAI.pdf', 'page': 0}, page_content='Certified\nCloud\nApplied\nGenerative\nAI\nEngineer\n(GenEng)\nVersion:\n6.0\n(Implementation\nand\nadoption\nstarting\nfrom\nFeb\n1,\n2024)\nExecutive\nSummary\nThe\nCloud\nApplied\nGenerative\nAI\nEngineering\n(GenEng)\ncertification\nprogram\naims\nto\nprepare\nindividuals\nfor\nthe\nrevolutionary\nera\nof\nGenerative\nAI,\nwhich\npromises\nto\ntransform\nindustries\nand\ngenerate\nsignificant\neconomic\nbenefits.\nThe\none-year\nprogram\ncombines\nonsite\nand\nonline\ninstruction,\ncovering\ntopics\nsuch\nas\nGenAI\napplication\ndevelopment,\ncloud\ncomputing,\nand\nDevOps.\nParticipants\nlearn\npractical\nskills\nin\nTypeScript,\nPython,\nfront-end\ndevelopment\nusing\nNext.js,\nand\nGenAI-related\ntechnologies.\nThe\nprogram\nemphasises\nreal-world\napplication,\nenabling\nparticipants\nto\nstart\nearning\nthrough\nfreelancing\nor\nother\nopportunities\nafter\nthe\nsecond\nquarter .\nAt\nthe\nend\nof\nthe

### 2 text_splitter

In [5]:
from langchain.text_splitter import RecursiveCharacterTextSplitter # type: ignore
text_splitter = RecursiveCharacterTextSplitter(chunk_size=1000, chunk_overlap=100)
documents = text_splitter.split_documents(pdf_doc)
documents


[Document(metadata={'source': 'GenerativeAI.pdf', 'page': 0}, page_content='Certified\nCloud\nApplied\nGenerative\nAI\nEngineer\n(GenEng)\nVersion:\n6.0\n(Implementation\nand\nadoption\nstarting\nfrom\nFeb\n1,\n2024)\nExecutive\nSummary\nThe\nCloud\nApplied\nGenerative\nAI\nEngineering\n(GenEng)\ncertification\nprogram\naims\nto\nprepare\nindividuals\nfor\nthe\nrevolutionary\nera\nof\nGenerative\nAI,\nwhich\npromises\nto\ntransform\nindustries\nand\ngenerate\nsignificant\neconomic\nbenefits.\nThe\none-year\nprogram\ncombines\nonsite\nand\nonline\ninstruction,\ncovering\ntopics\nsuch\nas\nGenAI\napplication\ndevelopment,\ncloud\ncomputing,\nand\nDevOps.\nParticipants\nlearn\npractical\nskills\nin\nTypeScript,\nPython,\nfront-end\ndevelopment\nusing\nNext.js,\nand\nGenAI-related\ntechnologies.\nThe\nprogram\nemphasises\nreal-world\napplication,\nenabling\nparticipants\nto\nstart\nearning\nthrough\nfreelancing\nor\nother\nopportunities\nafter\nthe\nsecond\nquarter .\nAt\nthe\nend\nof\nthe

### 3 Embedding

In [6]:
from langchain_google_genai import  GoogleGenerativeAIEmbeddings # type: ignore
embedding = GoogleGenerativeAIEmbeddings(model="models/embedding-001")
embedding

  from .autonotebook import tqdm as notebook_tqdm


GoogleGenerativeAIEmbeddings(client=<google.ai.generativelanguage_v1beta.services.generative_service.client.GenerativeServiceClient object at 0x000001C890705910>, model='models/embedding-001', task_type=None, google_api_key=SecretStr('**********'), credentials=None, client_options=None, transport=None, request_options=None)

## 4 Vector Store

In [7]:
from langchain.vectorstores import FAISS # type: ignore
vector = FAISS.from_documents(documents, embedding=embedding)
vector

<langchain_community.vectorstores.faiss.FAISS at 0x1c89bb8da90>

In [8]:
# to performing similarty search
query = "Generative AI is poised to completely transform our lives and work "
retriever_result = vector.similarity_search(query)
print(retriever_result[0].page_content)

The
Details
Generative
AI
is
poised
to
completely
transform
our
lives
and
work
landscape.
McKinsey
&
Company
estimates
that
generative
AI
could
add
$2.6
trillion
to
$4.4
trillion
in
economic
benefits
annually
across
various
industries.
This
will
be
achieved
through
increased
automation,
improved
decision-making,
and
personalised
experiences.
It
is
transformative
for
tech
and
jobs.
It’s
critical,
must-know
knowledge
across
industries
and
businesses
and
if
you
don’t
keep
up,
you
become
obsolete
in
tech
cycles
that
are
moving
fast.
As
new
Gen
AI-powered
technologies
keep
coming
and
demand
for
skills
are
changing
rapidly ,
workforce
and
professional
training
is
exploding
and
is
having
difficulty
in
keeping
up.
Our
one-year
GenEng
certification
program
teaches
you
to
get
ready
for
this
new
revolutionary
era.
It
consists
of
four
quarters
of
three
months
each.
At
the
completion
of
the
program
the
students
may
choose
to
specialise
in
a
specific
area.
It
is
a
hybrid
program
consisting
of
onsite

## 5 Retriever

In [9]:
retriever = vector.as_retriever()
retriever

VectorStoreRetriever(tags=['FAISS', 'GoogleGenerativeAIEmbeddings'], vectorstore=<langchain_community.vectorstores.faiss.FAISS object at 0x000001C89BB8DA90>, search_kwargs={})

### LLM

In [10]:
from langchain_google_genai import ChatGoogleGenerativeAI # type: ignore
from dotenv import load_dotenv # type: ignore
import os
load_dotenv()

# load google api key
GOOGLE_API_KEY = os.getenv("GOOGLE_API_KEY")


# Initialize an instance of the ChatGoogleGenerativeAI with specific parameters
llm:ChatGoogleGenerativeAI =  ChatGoogleGenerativeAI(
    model="gemini-1.5-flash",
    temperature=0.2,          
)


In [11]:
# invoking with LLM
response = llm.invoke("what is Generative AI")
response.content

"## Generative AI: The AI that Creates\n\nGenerative AI is a type of artificial intelligence that focuses on **creating new content**, rather than just analyzing existing data. It learns patterns and structures from existing data and then uses that knowledge to generate new, original outputs. \n\n**Think of it like this:**\n\n* **Traditional AI:**  Analyzes a picture of a cat and tells you it's a cat.\n* **Generative AI:** Creates a new, realistic picture of a cat that doesn't exist in the real world.\n\n**Here's a breakdown of key aspects:**\n\n**What it does:**\n\n* **Generates new content:**  This can be anything from text, images, music, code, and even videos.\n* **Learns from existing data:** It analyzes vast amounts of data to understand patterns and relationships.\n* **Creates original outputs:** It doesn't simply copy or modify existing content; it generates something entirely new.\n\n**Examples of Generative AI:**\n\n* **Text generation:**  ChatGPT, Bard, and other language mo

### Prompt template

In [13]:
from langchain.prompts import ChatPromptTemplate # type: ignore
prompt = ChatPromptTemplate.from_template("""
Answer the following question based only on the provided content. 
Think step by step before providing a detailed answer. 
I will tip you $1000 if the user finds the the answer helpful.
<context> 
{context}                                       
</context>   

Question:{input}                                     
""")

#### cahins

In [17]:
# creat a chain of prompt and llm
from langchain.chains.combine_documents import  create_stuff_documents_chain
prompt_llm_chain = create_stuff_documents_chain(llm, prompt)
prompt_llm_chain

RunnableBinding(bound=RunnableBinding(bound=RunnableAssign(mapper={
  context: RunnableLambda(format_docs)
}), kwargs={}, config={'run_name': 'format_inputs'}, config_factories=[])
| ChatPromptTemplate(input_variables=['context', 'input'], input_types={}, partial_variables={}, messages=[HumanMessagePromptTemplate(prompt=PromptTemplate(input_variables=['context', 'input'], input_types={}, partial_variables={}, template='\nAnswer the following question based only on the provided content. \nThink step by step before providing a detailed answer. \nI will tip you $1000 if the user finds the the answer helpful.\n<context> \n{context}                                       \n</context>   \n\nQuestion:{input}                                     \n'), additional_kwargs={})])
| ChatGoogleGenerativeAI(model='models/gemini-1.5-flash', google_api_key=SecretStr('**********'), temperature=0.2, client=<google.ai.generativelanguage_v1beta.services.generative_service.client.GenerativeServiceClient object

In [19]:
# creat the chain of prompt_llm_chain with retriever db
from langchain.chains import create_retrieval_chain # type: ignore
retriever_chain = create_retrieval_chain(retriever, prompt_llm_chain)

response = retriever_chain.invoke({"input":"what we learn in Quarter 1"})
response

{'input': 'what we learn in Quarter 1',
 'context': [Document(metadata={'source': 'GenerativeAI.pdf', 'page': 2}, page_content="Headless\nCMSs\nwill\nalso\nbe\ncovered.\nWe\nwill\nalso\nlearn\nto\nuse\nVercel\nAI\nSDK,\nan\nopen-source\nlibrary\nfor\nbuilding\nAI-powered\nuser\ninterfaces.\nThe\nquarter\nwill\nend\nwith\nyou\nlearning\nto\ndeploy\nthese\nUI\napps\non\nVercel\nCloud\nand\nCDN.\nQuarter\n3:\nAPI\nDesign,\nDevelopment,\nand\nDeployment\nusing\nFastAPI,\nContainers,\nand\nOpenAPI\nSpecifications\nAn\nAPI-as-a-Product\nis\na\ntype\nof\nSoftware-as-a-Service\nthat\nmonetizes\nniche\nfunctionality ,\ntypically\nserved\nover\nHTTP .\nOpenAI\nAPIs\nare\nthemselves\nthis\nkind\nof\nservice.\nAn\napplication\nprogramming\ninterface\neconomy ,\nor\nAPI\neconomy ,\nrefers\nto\nthe\nbusiness\nstructure\nwhere\nAPIs\nare\nthe\ndistribution\nchannel\nfor\nproducts\nand\nservices.\nIn\nthis\nquarter\nwe\nwill\nlearn\nto\ndevelop\nAPIs\nnot\njust\nas\na\nbackend\nfor\nour\nfrontend\nbut

In [16]:
print(response["answer"])

Here's a breakdown of what we learn in Quarter 1, based on the provided text:

**Quarter 1: TypeScript and Python Programming**

* **Focus:**  The two most used programming languages in GenAI Application Development.
* **Languages:**
    * **TypeScript:** For User Interfaces (UIs)
    * **Python:** For Application Programming Interfaces (APIs)
* **Programming Paradigms:** Both functional and object-oriented paradigms will be covered.
* **Delivery:**
    * **TypeScript:** Taught onsite
    * **Python:** Taught online 

