<a href="https://colab.research.google.com/github/edquestofficial/Gen-AI-Cohort/blob/main/2024/april/Level_2/LangChain/RAG_with_LangChain_Using_GeminiPro.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

In [None]:
%pip install --upgrade --quiet  langchain langchain-community langchainhub chromadb bs4 langchain-google-genai

## Mount Google Drive

In [None]:
from google.colab import drive
drive.mount('/content/drive', force_remount=True)

Mounted at /content/drive


## Set `Gemini` API Key

In [None]:
import os

base_path = "/content/drive/MyDrive/Gen AI Course/RAG_For_HDFC_Policy"
filepath = f"{base_path}/gemini_api_key.txt"
with open(filepath, "r") as f:
  api_key = ' '.join(f.readlines())
  os.environ["GOOGLE_API_KEY"] = api_key

In [None]:
import bs4
from langchain import hub
from langchain_community.document_loaders import WebBaseLoader
from langchain_community.vectorstores import Chroma
from langchain_core.output_parsers import StrOutputParser
from langchain_core.runnables import RunnablePassthrough
from langchain_openai import OpenAIEmbeddings
from langchain_text_splitters import RecursiveCharacterTextSplitter

## LangChain Document loaders

*   PDF
*   CSV
*   JSON
*   HTML etc...

[Reference](https://python.langchain.com/docs/modules/data_connection/document_loaders/)

In [None]:
!pip install pypdf



In [None]:
from langchain_community.document_loaders import PyPDFLoader

pdf_file_path = f"{base_path}/data/HDFC-Life-Easy-Health-101N110V03-Policy-Bond-Single-Pay.pdf"
loader = PyPDFLoader(pdf_file_path)
pages = loader.load_and_split()

In [None]:
len(pages)

39

In [59]:
pages[36]

Document(page_content='Annexure IV  \n \nSection 38 - Assignment or Transfer of Insurance Policies \nAssignment or transfer of a Policy should be in accordance with Section 38 of the Insurance Act, 1938 as \namended by Insurance Laws (Amendment) Act, 2015 dated 23.03.2015. The extant provisions in this regard are \nas follows: \n \n(1) This Policy may be transferred/ assigned, wholly or in part, with or without consideration. \n(2) An Assignment may be effected in a Policy by an endorsement upon the Policy itself or by a separate instrument \nunder notice to the Insurer. \n(3) The instrument of assignment should indicate the fact of transfer or assignment and the reasons for the \nassignment or transfer, antecedents of the assignee and terms on which assignment is made. \n(4) The assignment must be signed by the transferor or assignor or duly authorized agent and attested by at least one \nwitness. \n(5) The transfer or assignment shall not be operative as against an insurer until a no

In [61]:
pages[37]

Document(page_content='liabilities and equities to which the transferor or assignor was subject to at the date of transfer or assignment and \nb. may institute any proceedings in relation to the Policy c. obtain loan under the Policy or surrender the Policy \nwithout obtaining the consent of the transferor or assignor or making him a party to the proceedings. \n(15) Any rights and remedies of an assignee or transferee of a life insurance Policy under an assignment or transfer \neffected before commencement of the Insurance Laws (Amendment) Act, 2015 shall not be affected by this \nsection.', metadata={'source': '/content/drive/MyDrive/Gen AI Course/RAG_For_HDFC_Policy/data/HDFC-Life-Easy-Health-101N110V03-Policy-Bond-Single-Pay.pdf', 'page': 31})

## Generate Embedding Function

In [None]:
from langchain_google_genai import GoogleGenerativeAIEmbeddings

embeddings = GoogleGenerativeAIEmbeddings(model="models/embedding-001")

## Store Embedding into `VectorDB`

In [None]:
vectorstore = Chroma.from_documents(documents=pages, embedding=embeddings)

## Semantic Search

In [None]:
# Retrieve and generate using the relevant snippets of the blog.
retriever = vectorstore.as_retriever(search_kwargs={"k": 3})

In [None]:
question = "what are Daily Hospital Cash Benefit Option"
docs = retriever.get_relevant_documents(question)

In [None]:
len(docs)

3

In [None]:
docs[0]

Document(page_content='Illness or Injury, correction of deformities and defects, diagnosis and cure of diseases, relief of suffering or \nprolongation of life, performed in a Hospital or Day Care Centre by a Medical Practitioner; \n(32) Sum Insured  - Sum Insured is the face value of the Policy contracted between you and us. All the \nmorbidity benefits applicable under the product have been expressed as a proportion of this amount. \n(33) Surrender  - means complete withdrawal/ termination of the entire Policy;  \nPart C \n \n \n1. Benefit Description \nThis product offers the Life Assured an option to choose any 1, 2 or all 3 of the following benefit option(s): \n• Daily Hospital Cash Benefit Option (DHCB) \n• Surgical Benefit Option (SB) \n• Critical Illness Benefit Option (CIB) \n \nThus, product offers 7 Plan Options as mentioned below:', metadata={'page': 8, 'source': '/content/drive/MyDrive/Gen AI Course/RAG_For_HDFC_Policy/data/HDFC-Life-Easy-Health-101N110V03-Policy-Bond-Singl

## Generative Search

* prompt hub [reference](https://docs.smith.langchain.com/hub/quickstart)
* Output Parser [reference](https://python.langchain.com/docs/modules/model_io/output_parsers/)

In [None]:
from langchain_google_genai import ChatGoogleGenerativeAI

prompt = hub.pull("rlm/rag-prompt")

llm = ChatGoogleGenerativeAI(model="gemini-pro")

def format_docs(docs):
    return "\n\n".join(doc.page_content for doc in docs)


rag_chain = (
    {"context": retriever | format_docs, "question": RunnablePassthrough()}
    | prompt
    | llm
    | StrOutputParser()
)

In [None]:
rag_chain.invoke(question)

'Daily Hospital Cash Benefit Option (DHCB) is a benefit option that provides a daily cash benefit to the life assured in the event of hospitalization due to injury, sickness, or disease. The benefit is payable for a maximum period of 20 days in a policy year for non-ICU rooms and 10 days in a policy year for ICU rooms, subject to a maximum limit of 60 days and 30 days during the entire policy term, respectively. The benefit is payable after the completion of each continuous hospitalization for more than 24 hours as a result of injury, sickness, or disease.'