# MANDATE - 4 : (MT2021069)

## In this module I have tried to implement the knowledge gained through all the experimentations done in the previous mandates.

# The main aim of this mandate is to develop an Unsupervised QA Model on 'The Bhagwad Gita' to predict the most relevant sholka from the 700 verses of the book that answers or has reference to the given question.

## I decided to implement this QA model in an Unsupervised manner, because the questions to be trained would be general in nature with no direct one word answer to it but rather a relevant shloka (verse) describing about the phenomena asked in the question or sometimes directly answering it. Also, as the dataset was personally handcrafted by me, I initially didn't knew how the model would react to general questions and hence found it difficult to make a target list of verses that should be the answers to the questions.

## So in this mandate, I have first done embedding on the verses and questions dataset and then applied Cosine similarity, Euclidean distance and Root match techniques on it to develop the model. Among all these techniques Cosine Similarity technique turned out to be the most effective one.

### Here it begins...

## Importing necessary libraries and packages :

In [1]:
import warnings
warnings.filterwarnings('ignore')
import pickle
import numpy as np
import pandas as pd
import json
from textblob import TextBlob
import nltk
from scipy import spatial
import torch
import spacy
en_nlp = spacy.load('en_core_web_sm')
nltk.download('punkt')

[nltk_data] Downloading package punkt to
[nltk_data]     C:\Users\admin\AppData\Roaming\nltk_data...
[nltk_data]   Package punkt is already up-to-date!


True

## This is my self created dataset consisting of verse_no. and verses(English) from the 'Bhagwad Gita' :

In [2]:
df = pd.read_csv(r'C:\Users\admin\Documents\BG_Verses.csv',  encoding= 'unicode_escape')

In [3]:
df

Unnamed: 0,verse_no,verse
0,1.10000,"Asked Dhritarashtra, In the field of righteou..."
1,1.20000,"Spoke thus Sanjaya, Having seen the numerous b..."
2,1.30000,"O great teacher, look at the military might of..."
3,1.40000,Here in this army are great heroes and archers...
4,1.50000,"And there are great fighters like Dhristaketu,..."
...,...,...
713,18.75000,"By the grace of sage Vyasa, I have heard this ..."
714,18.76000,"O King, remembering again and again this wonde..."
715,18.77000,"O great king, remembering again and again the ..."
716,18.78000,"Where there are Lord Krishna, the Master of al..."


### Creating a list of all verses :

In [4]:
verses = df['verse'].tolist()

In [5]:
len(verses)

718

In [6]:
verses

['Asked Dhritarashtra,  In the field of righteousness called Kurukshetra, O Sanjaya, doing what are my sons and Pandavas, assembled and excited to fight?',
 "Spoke thus Sanjaya, Having seen the numerous battle formations of the Pandava's army, king Duryodhana approached his teacher and uttered the following words.",
 "O great teacher, look at the military might of the army of Pandu's sons, strategically arranged by your intelligent disciple and son of Drupada.",
 'Here in this army are great heroes and archers, equal to Arjuna and Bhima in fighting, like Yuyudhna, Virata and also the great charioteer, Drupada.',
 'And there are great fighters like Dhristaketu, Chekitanu, king of Kasi, the powerful Purujit, Kuntibhoja and Saibya, the notable among men.',
 'There are also the mighty Yudhamanyu, powerful Uttamauja, the son of Subhadra and the sons of Draupadi who are all great charioteers.',
 'O Superior among the twice born, let me tell you about the most distinguished leaders of my own 

### Importing few more libraries and packages :

In [109]:
%load_ext autoreload
%autoreload 2
%matplotlib inline

from random import randint

import numpy as np
import torch


The autoreload extension is already loaded. To reload it, use:
  %reload_ext autoreload


### Importing Infersent (sentence to vec) embedding model as used in mandate-2 :

In [110]:
from models import InferSent
model_version = 2
MODEL_PATH = r"C:\Users\admin\encoder\infersent2.pkl".format(model_version)
params_model = {'bsize': 64, 'word_emb_dim': 300, 'enc_lstm_dim': 2048,
                'pool_type': 'max', 'dpout_model': 0.0, 'version': model_version}
infersent = InferSent(params_model)
infersent.load_state_dict(torch.load(MODEL_PATH))

<All keys matched successfully>

In [111]:
W2V_PATH = r'C:\Users\admin\fastText\crawl-300d-2M.vec'
infersent.set_w2v_path(W2V_PATH)

### Building vocabalary using training dataset of verses :

In [112]:
infersent.build_vocab(verses, tokenize=True)

Found 2894(/2975) words with w2v vectors
Vocab size : 2894


## Performing Embedding on Verses :

In [113]:
dict_embeddings = {}
for i in range(len(verses)):
    print(i)
    dict_embeddings[verses[i]] = infersent.encode([verses[i]], tokenize=True)

0
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
27

In [114]:
dict_embeddings

{'Asked Dhritarashtra,  In the field of righteousness called Kurukshetra, O Sanjaya, doing what are my sons and Pandavas, assembled and excited to fight?': array([[ 0.00746889,  0.21682782,  0.12622435, ..., -0.02128535,
          0.00876765,  0.0138954 ]], dtype=float32),
 "Spoke thus Sanjaya, Having seen the numerous battle formations of the Pandava's army, king Duryodhana approached his teacher and uttered the following words.": array([[0.00746889, 0.03127939, 0.13939922, ..., 0.04373034, 0.01442943,
         0.03523598]], dtype=float32),
 "O great teacher, look at the military might of the army of Pandu's sons, strategically arranged by your intelligent disciple and son of Drupada.": array([[0.00746889, 0.19573927, 0.17814165, ..., 0.02141844, 0.01050768,
         0.00409276]], dtype=float32),
 'Here in this army are great heroes and archers, equal to Arjuna and Bhima in fighting, like Yuyudhna, Virata and also the great charioteer, Drupada.': array([[ 0.00746889, -0.00291813,  0.1

Storing the Emdedding :

In [115]:
with open(r'C:\Users\admin\dataN\dict_embeddings3.pickle', 'wb') as handle:
    pickle.dump(dict_embeddings, handle)

### Same procedure with Questions Dataset :

In [116]:
df2 = pd.read_csv(r'C:\Users\admin\Documents\BG_Questions.csv',  encoding= 'unicode_escape')

In [117]:
df2

Unnamed: 0,questions
0,What can one attain through the practice of yo...
1,Who performs sacrifices?
2,What if we are not competent to practice yoga?
3,How to be a yogi?
4,How was the splender of the great Lord?
5,How should we control our mind?
6,What should one eat?
7,How to find happiness?


In [118]:
questions = df2['questions'].tolist()

In [119]:
len(questions)

8

In [120]:
questions

['What can one attain through the practice of yoga and meditation?',
 'Who performs sacrifices?',
 'What if we are not competent to practice yoga?',
 'How to be a yogi?',
 'How was the splender of the great Lord?',
 'How should we control our mind?',
 'What should one eat?',
 'How to find happiness?']

In [121]:
infersent.build_vocab(questions, tokenize=True)

Found 38(/38) words with w2v vectors
Vocab size : 38


In [122]:
for i in range(len(questions)):
    print(i)
    dict_embeddings[questions[i]] = infersent.encode([questions[i]], tokenize=True)

0
1
2
3
4
5
6
7


Stored the new Embedding :

In [123]:
with open(r'C:\Users\admin\dataN\dict_embeddings4.pickle', 'wb') as handle:
    pickle.dump(dict_embeddings, handle)

### Adding new column 'verses'  where each element is a list of all 718 verses, into the df2 dataframe :

First initialzing the column with following list :

In [131]:
df2["verses"] = ['tiger' , 'tiger', 'tiger', 'tiger', 'tiger', 'tiger', 'tiger', 'tiger']

Now adding list of verses to each element in 'verses' column :

In [132]:
for i in range(8):
    df2["verses"][i]=verses

Adding Sentence Embedding and Question Embedding to each row i.e. two more columns :

In [133]:
def process_data(df,df2):
    
    print("step 1")
    df2['sent_emb'] = df2['verses'].apply(lambda x: [dict_embeddings[item][0] if item in\
                                                           dict_embeddings else np.zeros(4096) for item in x])
    print("step 2")
    df2['quest_emb'] = df2['questions'].apply(lambda x: dict_embeddings[x] if x in dict_embeddings else np.zeros(4096) )
        
    return df2  

In [134]:
df2 = process_data(df,df2)

step 1
step 2


### New Dataframe :

In [135]:
df2

Unnamed: 0,questions,verses,sent_emb,quest_emb
0,What can one attain through the practice of yo...,"[Asked Dhritarashtra, In the field of righteo...","[[0.0074688885, 0.21682782, 0.12622435, -0.021...","[[0.0074688885, -0.05823731, 0.14651679, 0.005..."
1,Who performs sacrifices?,"[Asked Dhritarashtra, In the field of righteo...","[[0.0074688885, 0.21682782, 0.12622435, -0.021...","[[0.0074688885, -0.028091112, 0.10969122, -0.0..."
2,What if we are not competent to practice yoga?,"[Asked Dhritarashtra, In the field of righteo...","[[0.0074688885, 0.21682782, 0.12622435, -0.021...","[[0.0074688885, -0.058098245, 0.095136724, -0...."
3,How to be a yogi?,"[Asked Dhritarashtra, In the field of righteo...","[[0.0074688885, 0.21682782, 0.12622435, -0.021...","[[0.0074688885, -0.06241584, 0.09848077, 0.034..."
4,How was the splender of the great Lord?,"[Asked Dhritarashtra, In the field of righteo...","[[0.0074688885, 0.21682782, 0.12622435, -0.021...","[[0.0074688885, 0.016090054, 0.16666088, -0.00..."
5,How should we control our mind?,"[Asked Dhritarashtra, In the field of righteo...","[[0.0074688885, 0.21682782, 0.12622435, -0.021...","[[0.0074688885, -0.04101248, 0.02947673, -0.00..."
6,What should one eat?,"[Asked Dhritarashtra, In the field of righteo...","[[0.0074688885, 0.21682782, 0.12622435, -0.021...","[[0.0074688885, -0.05823731, -0.021776596, 0.0..."
7,How to find happiness?,"[Asked Dhritarashtra, In the field of righteo...","[[0.0074688885, 0.21682782, 0.12622435, -0.021...","[[0.0074688885, -0.06241584, 0.14958163, -0.01..."


## Calculating Cosine Similarity and Euclidean distance among each question-verse pair : 

In [136]:
def cosine_sim(x):
    li = []
    for item in x["sent_emb"]:
        li.append(spatial.distance.cosine(item,x["quest_emb"][0]))
    return li   

In [137]:
def pred_idx(distances):
    return np.argmin(distances)   

## Predicting the verse having the lowest cosine similarity distance and euclidean distance with the corresponding question :

In [138]:
def predictions(train):
    print(0)
    train["cosine_sim"] = train.apply(cosine_sim, axis = 1)
    print(1)
    train["diff"] = (train["quest_emb"] - train["sent_emb"])**2
    train["euclidean_dis"] = train["diff"].apply(lambda x: list(np.sum(x, axis = 1)))
    del train["diff"]
    
    print("cosine start")
    
    train["pred_idx_cos"] = train["cosine_sim"].apply(lambda x: pred_idx(x))
    train["pred_idx_euc"] = train["euclidean_dis"].apply(lambda x: pred_idx(x))
    
    return train

In [139]:
df2 = predictions(df2)

0
1
cosine start


### Modified Dataframe :
pred_idx_cos - indicates the index of the verse in the verses list that has the lowest cosine similarty distance w.r.t. the given question.

pred_idx_euc - indicates the index of the verse in the verses list that has the lowest euclidean distance w.r.t. the given question.

In [140]:
df2

Unnamed: 0,questions,verses,sent_emb,quest_emb,cosine_sim,euclidean_dis,pred_idx_cos,pred_idx_euc
0,What can one attain through the practice of yo...,"[Asked Dhritarashtra, In the field of righteo...","[[0.0074688885, 0.21682782, 0.12622435, -0.021...","[[0.0074688885, -0.05823731, 0.14651679, 0.005...","[0.6767719388008118, 0.6493039429187775, 0.685...","[8.369364, 7.557782, 7.8473005, 7.754485, 7.76...",239,239
1,Who performs sacrifices?,"[Asked Dhritarashtra, In the field of righteo...","[[0.0074688885, 0.21682782, 0.12622435, -0.021...","[[0.0074688885, -0.028091112, 0.10969122, -0.0...","[0.9020327925682068, 0.9161750078201294, 0.921...","[11.400515, 10.944119, 10.852612, 10.803724, 8...",529,529
2,What if we are not competent to practice yoga?,"[Asked Dhritarashtra, In the field of righteo...","[[0.0074688885, 0.21682782, 0.12622435, -0.021...","[[0.0074688885, -0.058098245, 0.095136724, -0....","[0.6870888471603394, 0.6938521862030029, 0.721...","[9.45543, 9.065798, 9.300252, 8.9939, 9.001206...",488,488
3,How to be a yogi?,"[Asked Dhritarashtra, In the field of righteo...","[[0.0074688885, 0.21682782, 0.12622435, -0.021...","[[0.0074688885, -0.06241584, 0.09848077, 0.034...","[0.8940677344799042, 0.8946469873189926, 0.864...","[10.87681, 10.261024, 9.765745, 10.015877, 8.2...",254,503
4,How was the splender of the great Lord?,"[Asked Dhritarashtra, In the field of righteo...","[[0.0074688885, 0.21682782, 0.12622435, -0.021...","[[0.0074688885, 0.016090054, 0.16666088, -0.00...","[0.7114264965057373, 0.7419969439506531, 0.653...","[8.482611, 8.289263, 7.2062626, 7.3825135, 5.2...",434,161
5,How should we control our mind?,"[Asked Dhritarashtra, In the field of righteo...","[[0.0074688885, 0.21682782, 0.12622435, -0.021...","[[0.0074688885, -0.04101248, 0.02947673, -0.00...","[0.8801421895623207, 0.9650369696319103, 0.898...","[10.304851, 10.596598, 9.723666, 10.13454, 7.7...",180,180
6,What should one eat?,"[Asked Dhritarashtra, In the field of righteo...","[[0.0074688885, 0.21682782, 0.12622435, -0.021...","[[0.0074688885, -0.05823731, -0.021776596, 0.0...","[1.019883992150426, 1.0518014542758465, 1.0383...","[12.731559, 12.411493, 12.077648, 12.316008, 9...",371,180
7,How to find happiness?,"[Asked Dhritarashtra, In the field of righteo...","[[0.0074688885, 0.21682782, 0.12622435, -0.021...","[[0.0074688885, -0.06241584, 0.14958163, -0.01...","[0.8686690032482147, 0.9633977748453617, 0.888...","[11.1259, 11.6635685, 10.60957, 10.685482, 8.6...",389,389


In [141]:
df2.to_csv(r'C:\Users\Admin\Downloads\BG_emb.csv', index = None)

In [142]:
df2 = pd.read_csv(r'C:\Users\admin\Downloads\BG_emb.csv').reset_index(drop=True)

## Now trying to predict the relevant verse/verses using Root Match :

### Importing Necessary libraries and packages :

In [143]:
import numpy as np, pandas as pd
import json
import ast 
from textblob import TextBlob
import nltk
import torch
import pickle
from scipy import spatial
import warnings
warnings.filterwarnings('ignore')
import spacy
from nltk import Tree
nlp = spacy.load("en_core_web_sm")
from nltk.stem.lancaster import LancasterStemmer
st = LancasterStemmer()
from sklearn.feature_extraction.text import TfidfVectorizer, TfidfTransformer

In [144]:
len(dict_embeddings)

726

### Function for generating and matching the roots of verses and question :

In [145]:
def match_roots(x):
    question = x["questions"].lower()
    verses1 = en_nlp(x["verses"].lower()).sents
    print(question)

    # Finding Question roots
    question_root = st.stem(str([sent.root for sent in en_nlp(question).sents][0]))
    
    li = []
    for i,sent in enumerate(verses1):
        # Finding roots of verses
        roots = [st.stem(chunk.root.head.text.lower()) for chunk in sent.noun_chunks]
        
        # Finding the verses whose roots match with the question roots
        if question_root in roots: 
            for k,item in enumerate(ast.literal_eval(x["verses"])):
                if str(sent) in item.lower(): 
                    li.append(k)
                
    return li

In [146]:
df2["root_match_idx"] = df2.apply(match_roots, axis = 1)

what can one attain through the practice of yoga and meditation?
who performs sacrifices?
what if we are not competent to practice yoga?
how to be a yogi?
how was the splender of the great lord?
how should we control our mind?
what should one eat?
how to find happiness?


Storing first verse identified whose roots match with the corresponding question :

In [147]:
df2["root_match_idx_first"] = df2["root_match_idx"].apply(lambda x: x[0] if len(x)>0 else 0)

## This dataframe now includes predicted verses (indices of verses) using Cosine similarity, Euclidean distance and Root match :

In [148]:
df2

Unnamed: 0,questions,verses,sent_emb,quest_emb,cosine_sim,euclidean_dis,pred_idx_cos,pred_idx_euc,root_match_idx,root_match_idx_first
0,What can one attain through the practice of yo...,"['Asked Dhritarashtra, In the field of righte...","[array([ 0.00746889, 0.21682782, 0.12622435,...",[[ 0.00746889 -0.05823731 0.14651679 ... 0.0...,"[0.6767719388008118, 0.6493039429187775, 0.685...","[8.369364, 7.557782, 7.8473005, 7.754485, 7.76...",239,239,[263],263
1,Who performs sacrifices?,"['Asked Dhritarashtra, In the field of righte...","[array([ 0.00746889, 0.21682782, 0.12622435,...",[[ 0.00746889 -0.02809111 0.10969122 ... 0.0...,"[0.9020327925682068, 0.9161750078201294, 0.921...","[11.400515, 10.944119, 10.852612, 10.803724, 8...",529,529,"[152, 154, 194]",152
2,What if we are not competent to practice yoga?,"['Asked Dhritarashtra, In the field of righte...","[array([ 0.00746889, 0.21682782, 0.12622435,...",[[ 0.00746889 -0.05809825 0.09513672 ... -0.0...,"[0.6870888471603394, 0.6938521862030029, 0.721...","[9.45543, 9.065798, 9.300252, 8.9939, 9.001206...",488,488,[],0
3,How to be a yogi?,"['Asked Dhritarashtra, In the field of righte...","[array([ 0.00746889, 0.21682782, 0.12622435,...",[[ 0.00746889 -0.06241584 0.09848077 ... -0.0...,"[0.8940677344799042, 0.8946469873189926, 0.864...","[10.87681, 10.261024, 9.765745, 10.015877, 8.2...",254,503,"[94, 703]",94
4,How was the splender of the great Lord?,"['Asked Dhritarashtra, In the field of righte...","[array([ 0.00746889, 0.21682782, 0.12622435,...",[[ 0.00746889 0.01609005 0.16666088 ... -0.0...,"[0.7114264965057373, 0.7419969439506531, 0.653...","[8.482611, 8.289263, 7.2062626, 7.3825135, 5.2...",434,161,[],0
5,How should we control our mind?,"['Asked Dhritarashtra, In the field of righte...","[array([ 0.00746889, 0.21682782, 0.12622435,...",[[ 0.00746889 -0.04101248 0.02947673 ... -0.0...,"[0.8801421895623207, 0.9650369696319103, 0.898...","[10.304851, 10.596598, 9.723666, 10.13454, 7.7...",180,180,[],0
6,What should one eat?,"['Asked Dhritarashtra, In the field of righte...","[array([ 0.00746889, 0.21682782, 0.12622435,...",[[ 0.00746889 -0.05823731 -0.0217766 ... 0.0...,"[1.019883992150426, 1.0518014542758465, 1.0383...","[12.731559, 12.411493, 12.077648, 12.316008, 9...",371,180,[],0
7,How to find happiness?,"['Asked Dhritarashtra, In the field of righte...","[array([ 0.00746889, 0.21682782, 0.12622435,...",[[ 0.00746889 -0.06241584 0.14958163 ... -0.0...,"[0.8686690032482147, 0.9633977748453617, 0.888...","[11.1259, 11.6635685, 10.60957, 10.685482, 8.6...",389,389,[],0


In [149]:
df2.to_csv(r'C:\Users\Admin\Downloads\BG_cosinepred.csv', index = None)

### Among the three techniques used above Root Match technique doesn't necessarily provide us with the answer everytime, because the roots of the question may not match with the roots of any of the verses. On the other hand, Cosine similarity and Euclidean distance techniques always provide an answer. Among the two cosine similarity techniques is better as it also takes the angle between the vectors into consideration.

### Function for printing Question and Predicted Verse for it using Cosine Similarity :

In [152]:
def showqa(train):
    for i in range(8):
        print("Question : " + train["questions"][i])
        print("Predicted Verse : " + verses[train["pred_idx_cos"][i]])
        print("\n")
        

# Final Result :

In [153]:
showqa(df2)

Question : What can one attain through the practice of yoga and meditation?
Predicted Verse : For the sage who has just begun the yoga, work is said to be the means, after attaining yoga even mindedness in doing actions is to be the means.


Question : Who performs sacrifices?
Predicted Verse : He who sees Prakriti alone performing all the actions and the Self as the Non-doer does actually see.


Question : What if we are not competent to practice yoga?
Predicted Verse : If you are not competent to practice Yoga, then do My work dedicating it to Me. By doing work for My sake you will achieve (spiritual ) perfection.


Question : How to be a yogi?
Predicted Verse : When the disciplined mind is established in the self, and when one becomes impervious to all the desires, he is said to be established in Yoga.


Question : How was the splender of the great Lord?
Predicted Verse : The splendor of the great Lord was like many thousands of sun ablaze in the sky at the same time.


Question : H

## The above results show a good amount of relevance between the question and the predicted verse. The model seems to be working quite well.

## As there is now target verse that should be predicted for any question, finding accuracy is not relevant. I showed these results to many peers and they reviewed it to be satisfactory.

## With this I conclude my Final Mandate Submission !!!

## Thank You ! 

### Kunal Patil - MT2021069