## Simple RAG Pipeline for Text Editor

In [1]:
## Import getpass to create propmt to add env variables

import getpass
import os
from dotenv import load_dotenv
load_dotenv()

True

In [2]:
## Import text loader 
from langchain_community.document_loaders import TextLoader

loader = TextLoader('docs\Maharana_PratapJi.txt')

text = loader.load()

text

[Document(metadata={'source': 'docs\\Maharana_PratapJi.txt'}, page_content="Source : https://www.geeksforgeeks.org/maharana-pratap-biography/\n\n\nMaharana Pratap was a brave Rajput king in India who stood up against the powerful Mughal Empire. He is famous for his courage and refusal to surrender, making him a symbol of resistance and pride for many.\n\nMaharana Pratap was a Hindu Rajput king who ruled the Mewar region for 35 years. He was the 13th Maharana of Mewar and a member of the Sisodia dynasty.\n\nIn this article, we have covered the complete Maharana Pratap Biography, Why he is Famous, His Family tree, How did he died. His battles and many more in Detail.\n\nLetâ€™s dive right in.\n\nMaharana Pratap Life Overview\nThe following is the Life Overview of Maharana Pratap:\n\nAspect\t-->  Details\nFull Name\t--> Maharana Pratap Singh\nBirth Date\t--> May 9, 1540\nBirthplace\t--> Kumbhalgarh, Mewar (present-day Rajasthan)\nReign\t-->   1572â€“1597(25 years)\nDynasty\t--> Sisodia Ra

In [3]:
## RecursiveCharacterTextSplitter will split test into small chunks due to llm has less context size.

from langchain_text_splitters import RecursiveCharacterTextSplitter

text_splitter = RecursiveCharacterTextSplitter( chunk_size=512,chunk_overlap=50)
chunk_text = text_splitter.split_documents(text)
chunk_text[:5]

[Document(metadata={'source': 'docs\\Maharana_PratapJi.txt'}, page_content='Source : https://www.geeksforgeeks.org/maharana-pratap-biography/\n\n\nMaharana Pratap was a brave Rajput king in India who stood up against the powerful Mughal Empire. He is famous for his courage and refusal to surrender, making him a symbol of resistance and pride for many.\n\nMaharana Pratap was a Hindu Rajput king who ruled the Mewar region for 35 years. He was the 13th Maharana of Mewar and a member of the Sisodia dynasty.'),
 Document(metadata={'source': 'docs\\Maharana_PratapJi.txt'}, page_content='In this article, we have covered the complete Maharana Pratap Biography, Why he is Famous, His Family tree, How did he died. His battles and many more in Detail.\n\nLetâ€™s dive right in.\n\nMaharana Pratap Life Overview\nThe following is the Life Overview of Maharana Pratap:'),
 Document(metadata={'source': 'docs\\Maharana_PratapJi.txt'}, page_content='Aspect\t-->  Details\nFull Name\t--> Maharana Pratap Sin

In [4]:
len(chunk_text)

33

In [5]:
## we have chunks, we need to ADD those chunct into vector database so we need to have vector embedding first then we can insert the record into cromadb.

if not os.getenv("COHERE_API_KEY"):
    os.environ["COHERE_API_KEY"] = getpass.getpass()

In [6]:
from langchain_cohere import CohereEmbeddings

embeddings = CohereEmbeddings(
    model="embed-english-v3.0",
)

sagemaker.config INFO - Not applying SDK defaults from location: C:\ProgramData\sagemaker\sagemaker\config.yaml
sagemaker.config INFO - Not applying SDK defaults from location: C:\Users\GAURAV\AppData\Local\sagemaker\sagemaker\config.yaml


In [7]:
from langchain_chroma import Chroma

vector_store = Chroma(
    collection_name="FirstCromaCollection",
    embedding_function=embeddings,
    persist_directory="./chroma_langchain_db",  # Where to save data locally, remove if not necessary
)

In [8]:
from uuid import uuid4

uuids = [str(uuid4()) for _ in range(len(chunk_text))]

vector_store.add_documents(documents=chunk_text, ids=uuids)



['ad20c321-8042-41d9-8328-da4817f60a22',
 '3e2a0deb-5658-409d-860b-0b59d6c37fa9',
 '3f68bd2b-1bd1-4d27-bec3-b4d2fab24d0c',
 '06e5b006-c409-4fde-99fa-035fe6827e10',
 '3a981f05-3f79-4627-a931-4398d9a1e3b5',
 '8c725279-866d-4b3f-8c45-afb7af21d2c4',
 'b2ca4bef-b360-42e0-8552-f9db5775c854',
 '47308312-9171-4ebc-a566-203a3f0e2d83',
 '537e269f-1597-4fc5-8ef3-f61a313a788c',
 '4d8977ac-620c-4d56-bfbc-15c9c520690b',
 '05124a24-c366-4371-a125-e8fbd4f3cdc9',
 '4bee9a7b-27f4-425d-af55-872bf815bb63',
 '4196fb37-6aa7-44a2-afa9-6954e032f2c8',
 'ccea8ae8-b7d8-4079-a671-a545104ff1b5',
 '2c100c2c-b12c-46fc-9b5e-081072875b91',
 'ecaa935a-8825-4641-a6a7-195ae73e3657',
 'a62ecc45-a76b-46a5-95e9-1388fd7a5602',
 'c273b5c1-62ca-40c0-ad68-1d9f41036cc1',
 'f7271758-b81f-4866-8ffe-c91bb1728a69',
 '53114caa-fa10-49da-8d63-77ee6301fcbe',
 '3f3e562c-96db-447a-9b13-751e94fbde97',
 '776a333c-8134-41ec-8800-63e4ec00a41a',
 'c2a3cee9-5905-45ad-8ee5-8db8dc18653c',
 'cc7ad6ac-c04b-447a-9596-1add8ce6ad98',
 '0baf4457-1054-

In [9]:
results = vector_store.similarity_search(
    "What was the last war of Maharana Pratap?",
    k=2
)
for res in results:
    print(f"* {res.page_content} [{res.metadata}]")

* While other Rajput clans accepted the supremacy of the Mughal Empire, he was the only person who stood alone against the Akbar, for this Maharana Pratap was so powerful.

Amar Singh, Shakti Singh, Kunwar Vijay Singh, Kunwar Pratap Singh were the Maharana Pratapâ€™s son.

Battle of Haldighati was the last war of Maharana Pratap.

Maharana Pratap died in Chavand, Mewar, which is located in present-day Rajasthan, India. [{'source': 'docs\\Maharana_PratapJi.txt'}]
* Death of Maharana Pratap
Rana Pratap faced many challenges after the Battle of Haldighati. He was in exile from his kingdom and lived in the rugged Aravalli Hills. He continued his struggle against the Mughals and refused to surrender his principles. Maharana Pratap died on 19th January 1597 at the age of 56. His eldest son Amar Singh I beacame the new king. [{'source': 'docs\\Maharana_PratapJi.txt'}]


In [10]:
## added an example of vector store

from langchain_core.vectorstores import InMemoryVectorStore


vectorstore = InMemoryVectorStore.from_documents(
    chunk_text,
    embedding=embeddings,
)

# Use the vectorstore as a retriever
retriever = vectorstore.as_retriever()

# Retrieve the most similar text
retrieved_documents = retriever.invoke("What was the last war of Maharana Pratap?")

# show the retrieved document's content
retrieved_documents[0].page_content

'While other Rajput clans accepted the supremacy of the Mughal Empire, he was the only person who stood alone against the Akbar, for this Maharana Pratap was so powerful.\n\nAmar Singh, Shakti Singh, Kunwar Vijay Singh, Kunwar Pratap Singh were the Maharana Pratapâ€™s son.\n\nBattle of Haldighati was the last war of Maharana Pratap.\n\nMaharana Pratap died in Chavand, Mewar, which is located in present-day Rajasthan, India.'