<a href="https://colab.research.google.com/github/mohitha15/RAG_Application_Agentic_AI/blob/main/agenticAI_Rag.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

In [1]:
!pip install langchain langchain-community langchain-google-genai langchain-chroma pypdf python-dotenv



In [2]:
from langchain_community.document_loaders import PyPDFLoader
from langchain.text_splitter import RecursiveCharacterTextSplitter
from langchain_google_genai import GoogleGenerativeAIEmbeddings
from langchain_google_genai import ChatGoogleGenerativeAI
from langchain_community.vectorstores import Chroma
from langchain_core.prompts import PromptTemplate
from langchain_core.output_parsers import StrOutputParser
import os
from dotenv import load_dotenv
load_dotenv()


False

In [3]:
loader = PyPDFLoader("/content/file.pdf")
docs = loader.load()
print(docs[0].metadata)

{'producer': 'Microsoft® Word 2016', 'creator': 'Microsoft® Word 2016', 'creationdate': '2025-08-26T08:38:45+05:30', 'author': 'Windows User', 'moddate': '2025-08-26T08:38:45+05:30', 'source': '/content/file.pdf', 'total_pages': 7, 'page': 0, 'page_label': '1'}


In [4]:
splitter = RecursiveCharacterTextSplitter(
    chunk_size=1000,
    chunk_overlap=200,
)
chunks = splitter.split_documents(docs)
print(len(chunks))
print(chunks[0].page_content)


12
1. Geography and Demographics 
 
India is located in South Asia, sharing borders with Pakistan, China, Nepal, Bhutan, Bangladesh, and 
Myanmar, while surrounded by the Indian Ocean, Arabian Sea, and Bay of Bengal. With an area of about 
3.28 million square kilometers, it is the seventh-largest country in the world. 
 
Climate: Ranges from tropical monsoon in the south to temperate and alpine in the north. 
 
Rivers: Ganga, Yamuna, Brahmaputra, Godavari, Narmada, and Krishna form the backbone of Indian 
agriculture and culture. 
 
Mountains: The Himalayas in the north act as natural barriers and are home to some of the world’s 
tallest peaks. 
 
India has a population of more than 1.4 billion people (as of 2023–25), making it the most populous 
country on Earth. This population is incredibly diverse, with thousands of ethnic groups, over 22 
officially recognized languages, and hundreds of dialects. 
 
2. Historical Background 
Ancient India


In [5]:
import os
from google.colab import userdata


# Set the Google API Key as an environment variable
os.environ["GOOGLE_API_KEY"] = userdata.get('GOOGLE_API_KEY')
embeddings = GoogleGenerativeAIEmbeddings(model="gemini-embedding-001", google_api_key=os.environ["GOOGLE_API_KEY"])

In [6]:
vector_store = Chroma.from_documents(documents=chunks, embedding=embeddings)


In [7]:
retriever = vector_store.as_retriever(search_type="similarity", search_kwargs={"k": 5})

In [8]:
chat_model = ChatGoogleGenerativeAI(
    model="gemini-2.5-flash",
    temperature=0.1,
    max_output_tokens=500
)

In [9]:
prompt = PromptTemplate(
    template="""
      You are a helpful assistant.
      Answer ONLY from the provided transcript context.
      If the context is insufficient, just say you don't know.

      {context}
      Question: {question}
    """,
    input_variables=['context', 'question']
)


In [10]:
question = 'Which major rivers of India are considered the backbone of its agriculture and culture?' #'probation period is' #'duration of notice period' #'how much graduity deducted'
retrieved_docs = retriever.invoke(question)
context = "\n\n".join([doc.page_content for doc in retrieved_docs])

In [11]:
final_prompt = prompt.invoke({
    "context": context,
    "question": question
})


In [12]:
parser = StrOutputParser()

# Generate the answer
response = chat_model.invoke(final_prompt)

parser.invoke(response.content)

'The Ganga, Yamuna, Brahmaputra, Godavari, Narmada, and Krishna rivers are considered the backbone of Indian agriculture and culture.'

**Lets create a chain**

In [13]:
from langchain_core.runnables import RunnableParallel, RunnablePassthrough, RunnableLambda

In [14]:
def format_docs(retrieved_docs):
  context_text = "\n\n".join([doc.page_content for doc in retrieved_docs])
  return context_text

In [15]:
parallel_chain = RunnableParallel(
    context= retriever | RunnableLambda(format_docs),
    question=RunnablePassthrough()
)

In [16]:
parallel_chain.invoke('Which major rivers of India are considered the backbone of its agriculture and culture?')

{'context': '1. Geography and Demographics \n \nIndia is located in South Asia, sharing borders with Pakistan, China, Nepal, Bhutan, Bangladesh, and \nMyanmar, while surrounded by the Indian Ocean, Arabian Sea, and Bay of Bengal. With an area of about \n3.28 million square kilometers, it is the seventh-largest country in the world. \n \nClimate: Ranges from tropical monsoon in the south to temperate and alpine in the north. \n \nRivers: Ganga, Yamuna, Brahmaputra, Godavari, Narmada, and Krishna form the backbone of Indian \nagriculture and culture. \n \nMountains: The Himalayas in the north act as natural barriers and are home to some of the world’s \ntallest peaks. \n \nIndia has a population of more than 1.4 billion people (as of 2023–25), making it the most populous \ncountry on Earth. This population is incredibly diverse, with thousands of ethnic groups, over 22 \nofficially recognized languages, and hundreds of dialects. \n \n2. Historical Background \nAncient India\n\nofficially

In [17]:
parser = StrOutputParser()

In [18]:
main_chain = parallel_chain | prompt | chat_model | parser

In [19]:
main_chain.invoke('Which ancient civilization of India is known for its planned cities like Mohenjo-Daro and Harappa?')

'The Indus Valley Civilization (3300–1300 BCE) is known for its planned cities like Mohenjo-Daro and Harappa.'