
# Bhagavad Gita

### Introduction
Bhagavad-gītā is the most important chapter of Mahābhārata, the great Sanskrit epic of ancient India, one of the books of vedic literature and one of the most significant literary works of human history. Bhagavad-gītā contains a discussion between God, Lord Kṛṣṇa and Arjuna, in which Kṛṣṇa teaches Arjuna about the most important questions of human life.

In this project, I will perform an exploratory analysis on data provided by [Kaggle](https://www.kaggle.com/schcsaba/bhagavadgita/data). I will plot some visualization, so that it will be easy on the eyes.

In [56]:
# Import modules
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
from wordcloud import WordCloud,STOPWORDS

# Input data files are available in the "../input/" directory.
# For example, running this (by clicking run or pressing Shift+Enter) will list the files in the input directory

import os
print(os.listdir("../input"))

# Any results you write to the current directory are saved as output.

In [57]:
%matplotlib inline

In [58]:
# Read the csv file into DataFrame
df = pd.read_csv('../input/bhagavad-gita.csv')
# df = pd.read_csv('bhagavad-gita.csv')

In [59]:
# Drop the first cloumn
df.drop(df.columns[[0]], axis=1, inplace=True)

In [60]:
# Check first 5 rows of the DataFrame
df.head()

In [61]:
# Checking the data type of every column.
df.dtypes

### Exploratory Data Analysis
Let's compute some basic counts like:
* Number of chapters
* Number of verses in each chapter
* and total number of verses

In [62]:
# Describe the DataFrame to get some stats.
df.describe()

In [63]:
# Finding common pattern.

# Storing the title numbers to count the number
# of verses in a chapter
title_no = df['title'].astype(int)
list_counts = title_no.value_counts()

In `list_counts` *chapter* number are the index and the number of *verses* are the values of the Series.

In [64]:
# Checking the type of the list_count variable
type(list_counts)

In [65]:
# To find the total number of verses
list_counts.sum()

In [66]:
# Describing the Series to look at some stats
list_counts.describe()

###### In Bhagavad Gita (from above anlaysis):
* It has a total of **18** chapters.
* On an average there are **38** verses in a chapter.
* There are a total of **700** verses in it.
* Least number of verses are **20** in a any chapter.
* Maximun number of verses are **78** in a chapter.

## Visualizing common patterns.

In [67]:
# Setting the dimensions of the figure
plt.figure(figsize=(15,5), frameon=False)
plt.tick_params(labelsize=11, length=6, width=2)

# Passing the data to plot
sns.countplot(list_counts)
plt.xlabel("Number of verses", fontsize=18)
plt.ylabel("Counts (Chapter(s))", fontsize=18)

# Displaying the plot
plt.show()

Tere are two chapters with 20 number of verses in them, another two chapters have 28 number of verses and two more chapters which has 42 numbers of verses in it. Rest of the chapters have distinct number of verses in them.

## Number of verses in various chapters

In [68]:
# Setting plot dimenstions
plt.figure(figsize=(15,5), frameon=False)
plt.tick_params(labelsize=11, colors='k', length=6, width=2)

sns.set(style="darkgrid")

# Passing the data to plot
sns.countplot(title_no, color='c')
plt.xlabel("Chapter", fontsize=18)
plt.ylabel("Number of verses", fontsize=18)

# Displaying the plot
plt.show()

Highest number of verses are in the last chapter, i.e chapter 18. And the lowest number of verses are in chapter 12. 

## Grouping words

With word cloud we will have a look at the most used words in Gita.

In [69]:
stopwords = set(STOPWORDS)

In [70]:
# Collecting the words for wordcloud
data = df['verse_text']

In [71]:
# Inspecting the collected words
data.head()

In [None]:
fig = plt.figure(figsize=(20,10), facecolor='k')
wordcloud = WordCloud(width=1300, height=600, stopwords=stopwords).generate(str(data))
plt.imshow(wordcloud, interpolation='bilinear')
plt.axis('off')
plt.tight_layout(pad=0)
plt.show()

With the above visusalization we can see some of the most used words in Gita; the text size of the word gives an idea about the number of times the word has been used.