In [1]:
import dotenv
dotenv.load_dotenv('../.env')

from llama_index.core.readers.base import BaseReader
from llama_index.core.schema import MetadataMode

from projectgurukul.readers import CSVReader,RamayanaCSVReader, MahabharataCSVReader
import pandas as pd
import re

# Gita

In [None]:
def preprocess(row):
    row['chapter'], row['verse'] = row.verse_number.split(', ')
    return row


reader = CSVReader(text_columns=['verse','verse_in_sanskrit','translation_in_english','meaning_in_english'], metadata_columns=['chapter'], preprocess = preprocess)

documents = reader.load_data('../data/gita/data/bhagavad_gita.csv', extra_info={'souce': 'Bhagavad Gita'})

In [None]:
len(documents)

701

In [None]:
documents[0].metadata_template = "> {key}: {value}"
print(documents[0].get_content(metadata_mode = MetadataMode.ALL))

> chapter: Chapter 1
> souce: Bhagavad Gita

Verse 1

धृतराष्ट्र उवाच |धर्मक्षेत्रे कुरुक्षेत्रे समवेता युयुत्सवः |मामकाः पाण्डवाश्चैव किमकुर्वत सञ्जय ||1||

Dhritarashtra said: O Sanjay, after gathering on the holy field of Kurukshetra, and desiring to fight, what did my sons and the sons of Pandu do?

The two armies had gathered on the battlefield of Kurukshetra, well prepared to fight a war that was inevitable. Still, in this verse, King Dhritarashtra asked Sanjay, what his sons and his brother Pandu’s sons were doing on the battlefield? It was apparent that they would fight, then why did he ask such a question?The blind King Dhritarashtra’s fondness for his own sons had clouded his spiritual wisdom and deviated him from the path of virtue. He had usurped the kingdom of Hastinapur from the rightful heirs; the Pandavas, sons of his brother Pandu. Feeling guilty of the injustice he had done towards his nephews, his conscience worried him about the outcome of this battle.The words dhar

In [None]:
documents[1].get_metadata_str()

'chapter: Chapter 1\nsouce: Bhagavad Gita'

In [None]:
wikireader = WikipediaReader()
wikidocs = wikireader.load_data(['ayodhya ram temple'])


In [None]:
print( wikidocs[0].get_content()[:1500])

The Ram Mandir is a Hindu temple that is under construction in Ayodhya, Uttar Pradesh, India. It is located at the site of Ram Janmabhoomi, the hypothesized birthplace of Rama, a principal deity of Hinduism. The site is the former location of the Babri Masjid which was built after the demolition an existing non-Islamic structure. The worship of Hindu god Ram and Sita at the disputed site started when their idols were installed in 1949. In 2019, the Supreme Court of India delivered the verdict to give the disputed land to Hindus for a temple of Ram, while Muslims would be given land elsewhere to construct a mosque. The court referenced a report from the Archaeological Survey of India (ASI) as evidence suggesting the presence of a structure beneath the demolished Babri Masjid, that was found to be non-Islamic.The bhumi pujan (transl. ground breaking ceremony) for the commencement of the construction of Ram Mandir was performed on 5 August 2020, by Prime Minister Narendra Modi. The temple

# Ramayana

In [None]:
df = pd.read_csv("../data/ramayana/data/yuddhakanda.csv").dropna(how = 'all')
df.tail()

Unnamed: 0,content,explanation
5204,एवमेतत्पुरावृत्तमाख्यानंभद्रमस्तुवः ।।6.131.12...,This way Ramayana is the ancient narration. Re...
5205,देवाश्चसर्वेतुष्यन्तिग्रहणाच्छ्रवणात्तथा ।।6.1...,"By grasping and listening to Ramayana, all god..."
5206,भक्त्यारामस्ययेचेमांसंसितामृषिणाकृताम् ।।6.131...,Residence in heaven is assured for those who t...
5207,कुटुम्बवृद्धिंधनधान्यवृद्धिंस्त्रियश्चमुख्यास्...,Listening to this great auspicious epic will r...
5208,आयुष्यमारोग्यकरंयशस्यंसौभ्रातृकंबुद्धिकरंशुभं ...,If the narrative of Ramayana is regularly hear...


In [None]:
docs = RamayanaCSVReader().load_data("../data/ramayana/data/balakanda.csv")

[ 1  2  3  4  5  6  7  8  9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24
 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48
 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72
 73 74 75 76 77]


In [None]:
print(docs[-2].get_content(MetadataMode.LLM))

sarga: 76
summary: Rama bends the bow of Visnu--Parasurama returns to Mahendra mountain.

Hearing the words of the son of Jamadagni (Parasurama), Rama, the son of Dasaratha, avoiding further conversation out of respect for his father intercepted Parasurama saying:  ।।1.76.1।।

 "O Descendant of Bhrigu I have listened to the (marvellous) acts you have performed. O Brahman I commend you for discharging your duty in repaying the debt to your father. ।।1.76.2।।

You underrate me O Bhargava as though I am devoid of valour and incompetent to perform the duties of a Kshatriya. Now witness my energy and valour. ।।1.76.3।।

 Having spoken thus, the enraged Rama, gifted with quick vigour, seized the bow and arrow from Parasurama's hands. ।।1.76.4।।

Infuriated Rama bent the bow stretched it, fixed the arrow and addressed Parasurama, the son of Jamadagni: ।।1.76.5।।

"You are a brahmin, O Parasurama. You are also related to Viswamitra. Hence you are worthy of homage. I cannot, therefore, release 

# Mahabharata

In [38]:
import csv
import re
import pandas as pd
from text_to_num import text2num
import random
MAHABHARATA_OUT_FILE = "../data/mahabharata/data/mahabharata_parsed.csv"

In [7]:
all_lines = [] 
with open('../data/mahabharata/raw/1-18 books combined.txt', 'r') as reader:
    all_lines = reader.readlines()
    

In [8]:

with open(MAHABHARATA_OUT_FILE, "w") as file:
    wr = csv.writer(file)
    wr.writerow(["parva", "chapter",  "chapter title" , "content"])
    current_parva = 0
    current_chapter = 0
    capture_current_chapter_title = False
    for line in all_lines:
        if line == '\n' or len(line.strip()) == 0:
            continue
        if capture_current_chapter_title:
            current_chapter_title = line.strip()
            capture_current_chapter_title = False
            continue
        if re.match('^\w+ Parva$\n', line):
            current_parva = line.strip()
            continue
        if line.startswith('Chapter') and line != 'Chapter Commentary\n':
            current_chapter = line.strip()
            capture_current_chapter_title = True
            continue
        wr.writerow([current_parva, current_chapter,  current_chapter_title , line.strip()])


In [5]:

with open(MAHABHARATA_OUT_FILE, 'r') as file:
    df = pd.read_csv(file).dropna(how="all").fillna("")
    print(dict(zip(*[iter(df['parva'].unique()), iter(range(1,df.size + 1))])))
    

{'Adi Parva': 1, 'Sabha Parva': 2, 'Vana Parva': 3, 'Virata Parva': 4, 'Udyoga Parva': 5, 'Bhisma Parva': 6, 'Drona Parva': 7, 'Karna Parva': 8, 'Salya Parva': 9, 'Sauptika Parva': 10, 'Stree Parva': 11, 'Shanti Parva': 12, 'Anushasana Parva': 13, 'Ashvamedha Parva': 14, 'Ashramvasika Parva': 15, 'Mausala Parva': 16, 'Mahaprasthanika Parva': 17}


In [31]:
df.head(3)

Unnamed: 0,parva,chapter,chapter title,content
0,Adi Parva,Chapter One,Maharaja Shantanu Marries the Celestial Ganga,According to the historical records of this ea...
1,Adi Parva,Chapter One,Maharaja Shantanu Marries the Celestial Ganga,"Once when Maharaja Shantanu, that bull among m..."
2,Adi Parva,Chapter One,Maharaja Shantanu Marries the Celestial Ganga,The beautiful apsara (celestial maiden) then s...


In [32]:
df_grouped = df.groupby(['parva', 'chapter', 'chapter title'],as_index=True).agg(lambda lst: "\n\n".join(lst)).reset_index()
df_grouped.head(3)

Unnamed: 0,parva,chapter,chapter title,content
0,Adi Parva,Chapter Eight,The Preceptor Drona,"Seeing the princes enter adolescence, Maharaja..."
1,Adi Parva,Chapter Eighteen,Arjuna Goes on Pilgrimage,After leaving Indraprastha in the dress of a m...
2,Adi Parva,Chapter Eleven,Tuition for Drona,Drona saw that all his students were now adept...


In [34]:
parva_id_map = dict(zip(*[iter(df['parva'].unique()), iter(range(1,df.size + 1))]))
print(parva_id_map)

{'Adi Parva': 1, 'Sabha Parva': 2, 'Vana Parva': 3, 'Virata Parva': 4, 'Udyoga Parva': 5, 'Bhisma Parva': 6, 'Drona Parva': 7, 'Karna Parva': 8, 'Salya Parva': 9, 'Sauptika Parva': 10, 'Stree Parva': 11, 'Shanti Parva': 12, 'Anushasana Parva': 13, 'Ashvamedha Parva': 14, 'Ashramvasika Parva': 15, 'Mausala Parva': 16, 'Mahaprasthanika Parva': 17}


In [44]:

documents = []
for i, row in df.iterrows():
            parva = row['parva']
            parva_id = parva_id_map[parva]
            chapter_id = row['chapter']
            chapter_title = row['chapter title']
            content = row['content']
            page_link = 118367 + i

            metadata = {
                'parva_id' : parva_id,
                'parva': parva,
                'chapter_id':chapter_id,
                'chapter_title':chapter_title,
                'page_link':page_link
            }

            documents.append((metadata, content))
print(documents[0])

({'parva_id': 1, 'parva': 'Adi Parva', 'chapter_id': 'Chapter One', 'chapter_title': 'Maharaja Shantanu Marries the Celestial Ganga', 'page_link': 118367}, 'According to the historical records of this earth, there once lived a King named Maharaja Shantanu, the son of Pratipa, who took his birth in the solar dynasty and was considered naradeva, the manifest representative of the Supreme Lord on earth. His fame and rule extended to all parts of the world. The qualities of self-control, liberality, forgiveness, intelligence, modesty, patience and power always resided this exalted emperor. His neck was marked with three lines like a conchshell, and his shoulders were broad. In prowess He resembled a maddened elephant. Above all these qualities, he was a devoted servant of Lord Vishnu, and therefore he was given the title, "King of kings".')


In [45]:

metadata = documents[random.randint(0,len(documents))][0]
print(metadata)

{'parva_id': 2, 'parva': 'Sabha Parva', 'chapter_id': 'Chapter Nine', 'chapter_title': 'The Gambling Match', 'page_link': 119076}


In [48]:
print(f"https://en.wikisource.org/wiki/The_Mahabharata/Book_{parva_id}:_{parva.replace(' ','_')}")

https://en.wikisource.org/wiki/The_Mahabharata/Book_17:_Mahaprasthanika_Parva


In [10]:
documents = MahabharataCSVReader().load_data(MAHABHARATA_OUT_FILE)
documents[0]

NameError: name 'MahabharataCSVReader' is not defined

In [9]:
documents[0].metadata

NameError: name 'documents' is not defined