# NLP using Deep Learning in Python - Quora Duplicate Questions

## Problem Statement:

Over 100 million people visit Quora every month, so it's no surprise that **many people ask similarly worded questions**. Multiple questions with the same intent **can cause seekers to spend more time finding the best answer to their question**, and **make writers feel they need to answer multiple versions of the same question**. Quora values canonical questions because they **provide a better experience to active seekers and writers**, and offer more value to both of these groups in the long term.

**Reference:** https://www.kaggle.com/c/quora-question-pairs

In [322]:
import pandas as pd
import numpy as np
import sklearn

In [323]:
question_pairs = pd.read_csv("../../raw_data/questions.csv")
question_pairs.head()

Unnamed: 0,id,qid1,qid2,question1,question2,is_duplicate
0,0,1,2,What is the step by step guide to invest in sh...,What is the step by step guide to invest in sh...,0
1,1,3,4,What is the story of Kohinoor (Koh-i-Noor) Dia...,What would happen if the Indian government sto...,0
2,2,5,6,How can I increase the speed of my internet co...,How can Internet speed be increased by hacking...,0
3,3,7,8,Why am I mentally very lonely? How can I solve...,Find the remainder when [math]23^{24}[/math] i...,0
4,4,9,10,"Which one dissolve in water quikly sugar, salt...",Which fish would survive in salt water?,0


In [325]:
question_pairs_1 = question_pairs[['qid1', 'question1']]
question_pairs_1.columns = ['id', 'question']
question_pairs_2 = question_pairs[['qid2', 'question2']]
question_pairs_2.columns = ['id', 'question']
questions_list = pd.concat([question_pairs_1,question_pairs_2]).sort_values('id')
questions_list.shape

(808702, 2)

In [326]:
corpus = questions_list['question'].tolist()
corpus[:10]

['What is the step by step guide to invest in share market in india?',
 'What is the step by step guide to invest in share market?',
 'What is the story of Kohinoor (Koh-i-Noor) Diamond?',
 'What would happen if the Indian government stole the Kohinoor (Koh-i-Noor) diamond back?',
 'How can I increase the speed of my internet connection while using a VPN?',
 'How can Internet speed be increased by hacking through DNS?',
 'Why am I mentally very lonely? How can I solve it?',
 'Find the remainder when [math]23^{24}[/math] is divided by 24,23?',
 'Which one dissolve in water quikly sugar, salt, methane and carbon di oxide?',
 'Which fish would survive in salt water?']

In [327]:
corpus = list(np.unique(corpus))

-------
## Feature Extraction

### Count Vectorizer

In [79]:
from sklearn.feature_extraction.text import CountVectorizer
count_vect = CountVectorizer()

X_train_counts = count_vect.fit_transform(corpus[:10])
X_train_counts = pd.DataFrame(X_train_counts.toarray())
X_train_counts.columns = count_vect.get_feature_names()
X_train_counts

Unnamed: 0,1000,2000,500,about,add,all,am,and,any,are,...,was,we,what,who,will,win,with,world,you,your
0,0,0,0,0,0,0,0,0,1,0,...,1,0,1,0,0,0,1,0,0,0
1,1,0,1,0,0,0,0,2,0,0,...,0,0,1,0,1,0,0,0,0,0
2,0,0,0,0,1,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
3,0,0,0,0,0,1,0,0,0,0,...,0,0,0,1,1,1,0,0,1,0
4,1,0,0,0,0,0,1,0,0,0,...,0,0,0,0,0,0,0,0,0,0
5,0,1,1,0,0,0,0,1,0,1,...,0,0,0,0,0,0,0,0,0,0
6,0,1,0,0,0,0,0,0,0,1,...,0,0,0,0,0,0,1,0,0,0
7,0,0,0,1,0,0,0,0,0,1,...,0,0,0,1,0,0,1,0,1,1
8,0,0,0,0,0,0,0,0,0,1,...,0,1,0,0,0,0,0,1,0,0
9,0,0,0,0,0,0,0,0,0,1,...,0,1,0,0,0,0,0,1,0,0


In [80]:
corpus[0]

'"The question was marked as needing improvement" how to deal with this, what ever I do still this error pops up? Is it Quora bot or any user?'

In [81]:
X_train_counts.loc[0]

1000             0
2000             0
500              0
about            0
add              0
all              0
am               0
and              0
any              1
are              0
as               1
aside            0
at               0
banned           0
based            0
be               0
been             0
biases           0
big              0
bot              1
can              0
chip             0
closer           0
currency         0
deal             1
distance         0
do               1
election         0
embedded         0
error            1
                ..
really           0
relationship     0
relationships    0
rs               0
see              0
short            0
starting         0
still            1
successful       0
tell             0
term             0
the              1
there            0
think            0
this             2
time             0
to               1
up               1
user             1
war              0
was              1
we          

### Tf-Idf (Term Frequency - Inverse Document Frequency)

In [82]:
from sklearn.feature_extraction.text import TfidfVectorizer
vectorizer = TfidfVectorizer()

X_train_tfidf = vectorizer.fit_transform(corpus[:10])
X_train_tfidf = pd.DataFrame(X_train_tfidf.toarray())
X_train_tfidf.columns = vectorizer.get_feature_names()
X_train_tfidf

Unnamed: 0,1000,2000,500,about,add,all,am,and,any,are,...,was,we,what,who,will,win,with,world,you,your
0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.20204,0.0,...,0.20204,0.0,0.171753,0.0,0.0,0.0,0.150263,0.0,0.0,0.0
1,0.205776,0.0,0.205776,0.0,0.0,0.0,0.0,0.411553,0.0,0.0,...,0.0,0.0,0.205776,0.0,0.205776,0.0,0.0,0.0,0.0,0.0
2,0.0,0.0,0.0,0.0,0.518291,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
3,0.0,0.0,0.0,0.0,0.0,0.263025,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.223595,0.223595,0.263025,0.0,0.0,0.223595,0.0
4,0.266593,0.0,0.0,0.0,0.0,0.0,0.313605,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
5,0.0,0.251549,0.251549,0.0,0.0,0.0,0.0,0.251549,0.0,0.175717,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
6,0.0,0.31334,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.21888,...,0.0,0.0,0.0,0.0,0.0,0.0,0.274136,0.0,0.0,0.0
7,0.0,0.0,0.0,0.204276,0.0,0.0,0.0,0.0,0.0,0.121303,...,0.0,0.0,0.0,0.173653,0.0,0.0,0.151926,0.0,0.173653,0.204276
8,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.263628,...,0.0,0.3774,0.0,0.0,0.0,0.0,0.0,0.3774,0.0,0.0
9,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.23543,...,0.0,0.337033,0.0,0.0,0.0,0.0,0.0,0.337033,0.0,0.0


$$Tf(w, d) = Number \ of \ times \ word \ `w` \ appears \ in \ document \ `d`$$

$$IDF(w) = \log \frac{Total \ number \ of \ documents}{Number \ of \ documents \ with \ word \ `w`}$$

$$Tfidf(w, d) = Tf(w, d) * IDF(w)$$

### Word Vectors

Word vectors - also called *word embeddings* - are mathematical descriptions of individual words such that words that appear frequently together in the language will have similar values. In this way we can mathematically derive *context*.

**There are two possible approaches:**

<img src="img/w2v_image.png" alt="Drawing" style="width: 600px;"/>

**CBOW (Continuous Bag Of Words):** It predicts the word, given context around the word as input

**Skip-gram:** It predicts the context, given the word as input

In [83]:
import spacy
nlp = spacy.load('en_core_web_md')

In [84]:
len(nlp('dog').vector)

300

In [85]:
def most_similar(word, topn=5):
    word = nlp.vocab[str(word)]
    queries = [
      w for w in word.vocab 
      if w.is_lower == word.is_lower and w.prob >= -15 and np.count_nonzero(w.vector)
    ]

    by_similarity = sorted(queries, key=lambda w: word.similarity(w), reverse=True)
    return [(w.lower_,w.similarity(word)) for w in by_similarity[:topn+1] if w.lower_ != word.lower_]

In [86]:
most_similar("king", topn=10)

[('princes', 0.7876614),
 ('kings', 0.7876614),
 ('prince', 0.73377377),
 ('queen', 0.72526103),
 ('scepter', 0.6726005),
 ('throne', 0.6726005),
 ('kingdoms', 0.6604046),
 ('kingdom', 0.6604046),
 ('lord', 0.6439695),
 ('royal', 0.6168811)]

In [87]:
most_similar("lion", topn=10)

[('cheetah', 0.9999999),
 ('lions', 0.7758893),
 ('tiger', 0.7359829),
 ('panther', 0.7359829),
 ('leopard', 0.7359829),
 ('elephant', 0.71239567),
 ('hippo', 0.71239567),
 ('zebra', 0.71239567),
 ('rhino', 0.71239567),
 ('giraffe', 0.71239567)]

Sentence (or document) objects  have vectors, derived from the averages of individual token vectors. This makes it possible to compare similarities between whole documents.

In [88]:
doc = nlp('The quick brown fox jumped over the lazy dogs.')
len(doc.vector)

300

### Bert Sentence Transformer

In [89]:
from sentence_transformers import SentenceTransformer
import scipy.spatial
embedder = SentenceTransformer('bert-base-nli-mean-tokens')

In [90]:
%%time
corpus_embeddings = embedder.encode(corpus)

CPU times: user 1min 54s, sys: 3.71 s, total: 1min 58s
Wall time: 34.9 s


----
## Candidate Genration using Faiss vector similarity search library

Faiss is a library developed by Facebook AI Research. It is for effecient similarity search and clustering of dense vectors.

**References:**

1. [Tutorial](https://github.com/facebookresearch/faiss/wiki/Getting-started)
2. [facebookresearch/faiss](https://github.com/facebookresearch/faiss)

In [91]:
import faiss
d= 768
index = faiss.IndexFlatL2(d)
print(index.is_trained)
index.add(np.stack(corpus_embeddings, axis=0))
print(index.ntotal)

True
1362


In [92]:
# queries = ['What is the step by step guide to invest in share market in india?', 'How can Internet speed be increased by hacking through DNS?']
queries = question_pairs['question1'][:3].tolist()
query_embeddings = embedder.encode(queries)

In [93]:
k = 5                          # we want to see 4 nearest neighbors
D, I = index.search(np.stack(query_embeddings, axis=0), k)     # actual search
print(I)                   # neighbors of the 5 first queries

[[ 858  998 1025  983   73]
 [ 771 1015  775 1014 1133]
 [ 436  455  457  463  462]]


In [94]:
for query, query_embedding in zip(queries, query_embeddings):
    distances, indices = index.search(np.asarray(query_embedding).reshape(1,768),k)
    print("\n======================\n")
    print("Query:", query)
    print("\nTop 5 most similar sentences in corpus:")
    for idx in range(0,5):
        print(corpus[indices[0,idx]], "(Distance: %.4f)" % distances[0,idx])



Query: What is purpose of life?

Top 5 most similar sentences in corpus:
What is purpose of life? (Distance: 0.0000)
What is the purpose of life? (Distance: 7.9868)
What is your purpose of life? (Distance: 12.3884)
What is the meaning or purpose of life? (Distance: 13.6231)
From your perspective, what is the purpose of life? (Distance: 17.5448)


Query: What are your New Year's resolutions for 2017?

Top 5 most similar sentences in corpus:
What are your New Year's resolutions for 2017? (Distance: 0.0000)
What is your New Year's resolutions for 2017? (Distance: 0.6093)
What are your new year resolutions for 2017? (Distance: 5.8446)
What is your New Year's resolution for 2017? (Distance: 6.6011)
What's your New Year's resolution for 2017? (Distance: 8.1350)


Query: How will Indian GDP be affected from banning 500 and 1000 rupees notes?

Top 5 most similar sentences in corpus:
How will Indian GDP be affected from banning 500 and 1000 rupees notes? (Distance: 0.0000)
How will the ban of

In [122]:
query = "How will Indian GDP be affected from banning 500 and 1000 rupees notes?"
query_embed = embedder.encode(query)
distances, indices = index.search(np.asarray(query_embedding).reshape(1,768),50)
relevant_docs = [corpus[indices[0,idx]] for idx in range(50)]

----
## Reranking using Bidirectional LSTM model

<img src="img/bi_lstm.png" alt="Drawing" style="width: 600px;"/>

**Reference:** https://mlwhiz.com/blog/2019/03/09/deeplearning_architectures_text_classification/

In [344]:
import re
import nltk
from nltk.tokenize.toktok import ToktokTokenizer
from nltk.stem import WordNetLemmatizer, SnowballStemmer
toko_tokenizer = ToktokTokenizer()
wordnet_lemmatizer = WordNetLemmatizer()

def normalize_text(text):
        puncts = ['/', ',', '.', '"', ':', ')', '(', '-', '!', '?', '|', ';', '$', '&', '/', '[', ']', '>', '%', '=', '#', '*', '+', '\\', '•',  '~', '@', '£', 
         '·', '_', '{', '}', '©', '^', '®', '`',  '<', '→', '°', '€', '™', '›',  '♥', '←', '×', '§', '″', '′', 'Â', '█', '½', 'à', '…', 
         '“', '★', '”', '–', '●', 'â', '►', '−', '¢', '²', '¬', '░', '¶', '↑', '±', '¿', '▾', '═', '¦', '║', '―', '¥', '▓', '—', '‹', '─', 
         '▒', '：', '¼', '⊕', '▼', '▪', '†', '■', '’', '▀', '¨', '▄', '♫', '☆', 'é', '¯', '♦', '¤', '▲', 'è', '¸', '¾', 'Ã', '⋅', '‘', '∞', 
         '∙', '）', '↓', '、', '│', '（', '»', '，', '♪', '╩', '╚', '³', '・', '╦', '╣', '╔', '╗', '▬', '❤', 'ï', 'Ø', '¹', '≤', '‡', '√', ]

        def clean_text(text):
            text = str(text)
            text = text.replace('\n', '')
            text = text.replace('\r', '')
            for punct in puncts:
                if punct in text:
                    text = text.replace(punct, '')
            return text.lower()

        def clean_numbers(text):
            if bool(re.search(r'\d', text)):
                text = re.sub('[0-9]{5,}', '#####', text)
                text = re.sub('[0-9]{4}', '####', text)
                text = re.sub('[0-9]{3}', '###', text)
                text = re.sub('[0-9]{2}', '##', text)
            return text

        contraction_dict = {"ain't": "is not", "aren't": "are not","can't": "cannot", "'cause": "because", "could've": "could have", "couldn't": "could not", "didn't": "did not",  "doesn't": "does not", "don't": "do not", "hadn't": "had not", "hasn't": "has not", "haven't": "have not", "he'd": "he would","he'll": "he will", "he's": "he is", "how'd": "how did", "how'd'y": "how do you", "how'll": "how will", "how's": "how is",  "I'd": "I would", "I'd've": "I would have", "I'll": "I will", "I'll've": "I will have","I'm": "I am", "I've": "I have", "i'd": "i would", "i'd've": "i would have", "i'll": "i will",  "i'll've": "i will have","i'm": "i am", "i've": "i have", "isn't": "is not", "it'd": "it would", "it'd've": "it would have", "it'll": "it will", "it'll've": "it will have","it's": "it is", "let's": "let us", "ma'am": "madam", "mayn't": "may not", "might've": "might have","mightn't": "might not","mightn't've": "might not have", "must've": "must have", "mustn't": "must not", "mustn't've": "must not have", "needn't": "need not", "needn't've": "need not have","o'clock": "of the clock", "oughtn't": "ought not", "oughtn't've": "ought not have", "shan't": "shall not", "sha'n't": "shall not", "shan't've": "shall not have", "she'd": "she would", "she'd've": "she would have", "she'll": "she will", "she'll've": "she will have", "she's": "she is", "should've": "should have", "shouldn't": "should not", "shouldn't've": "should not have", "so've": "so have","so's": "so as", "this's": "this is","that'd": "that would", "that'd've": "that would have", "that's": "that is", "there'd": "there would", "there'd've": "there would have", "there's": "there is", "here's": "here is","they'd": "they would", "they'd've": "they would have", "they'll": "they will", "they'll've": "they will have", "they're": "they are", "they've": "they have", "to've": "to have", "wasn't": "was not", "we'd": "we would", "we'd've": "we would have", "we'll": "we will", "we'll've": "we will have", "we're": "we are", "we've": "we have", "weren't": "were not", "what'll": "what will", "what'll've": "what will have", "what're": "what are",  "what's": "what is", "what've": "what have", "when's": "when is", "when've": "when have", "where'd": "where did", "where's": "where is", "where've": "where have", "who'll": "who will", "who'll've": "who will have", "who's": "who is", "who've": "who have", "why's": "why is", "why've": "why have", "will've": "will have", "won't": "will not", "won't've": "will not have", "would've": "would have", "wouldn't": "would not", "wouldn't've": "would not have", "y'all": "you all", "y'all'd": "you all would","y'all'd've": "you all would have","y'all're": "you all are","y'all've": "you all have","you'd": "you would", "you'd've": "you would have", "you'll": "you will", "you'll've": "you will have", "you're": "you are", "you've": "you have"}

        def _get_contractions(contraction_dict):
            contraction_re = re.compile('(%s)' % '|'.join(contraction_dict.keys()))
            return contraction_dict, contraction_re

        contractions, contractions_re = _get_contractions(contraction_dict)

        def replace_contractions(text):
            def replace(match):
                return contractions[match.group(0)]
            return contractions_re.sub(replace, text)

        stopword_list = nltk.corpus.stopwords.words('english')

        def remove_stopwords(text, is_lower_case=True):
            tokens = toko_tokenizer.tokenize(text)
            tokens = [token.strip() for token in tokens]
            if is_lower_case:
                filtered_tokens = [token for token in tokens if token not in stopword_list]
            else:
                filtered_tokens = [token for token in tokens if token.lower() not in stopword_list]
            filtered_text = ' '.join(filtered_tokens)    
            return filtered_text

        def lemmatizer(text):
            tokens = toko_tokenizer.tokenize(text)
            tokens = [token.strip() for token in tokens]
            tokens = [wordnet_lemmatizer.lemmatize(token) for token in tokens]
            return ' '.join(tokens)

        def trim_text(text):
            tokens = toko_tokenizer.tokenize(text)
            tokens = [token.strip() for token in tokens]
            return ' '.join(tokens)
        
        def remove_non_english(text):
            tokens = toko_tokenizer.tokenize(text)
            tokens = [token.strip() for token in tokens]
            tokens = [token for token in tokens if d.check(token)]
            eng_text = ' '.join(tokens)
            return eng_text

        text_norm = clean_text(text)
        text_norm = clean_numbers(text_norm)
        text_norm = replace_contractions(text_norm)
#         text_norm = remove_stopwords(text_norm)
#         text_norm = remove_non_english(text_norm)
        text_norm = lemmatizer(text_norm)
        text_norm = trim_text(text_norm)
        return text_norm

In [328]:
question_pairs.head()

Unnamed: 0,id,qid1,qid2,question1,question2,is_duplicate
0,0,1,2,What is the step by step guide to invest in sh...,What is the step by step guide to invest in sh...,0
1,1,3,4,What is the story of Kohinoor (Koh-i-Noor) Dia...,What would happen if the Indian government sto...,0
2,2,5,6,How can I increase the speed of my internet co...,How can Internet speed be increased by hacking...,0
3,3,7,8,Why am I mentally very lonely? How can I solve...,Find the remainder when [math]23^{24}[/math] i...,0
4,4,9,10,"Which one dissolve in water quikly sugar, salt...",Which fish would survive in salt water?,0


In [329]:
question_pairs.shape

(404351, 6)

In [293]:
embedding_path = "./../../Embeddings/glove.twitter.27B/glove.twitter.27B.200d.txt"
def get_word2vec(file_path):
    file = open(embedding_path, "r")
    if (file):
        word2vec = dict()
        split = file.read().splitlines()
        for line in split:
            key = line.split(' ',1)[0]
            value = np.array([float(val) for val in line.split(' ')[1:]])
            word2vec[key] = value
        return (word2vec)
    else:
        print("invalid fiel path")
w2v = get_word2vec(embedding_path)

In [352]:
total_text = pd.concat([question_pairs['question1'], question_pairs['question2']]).reset_index(drop=True)
total_text = total_text.apply(lambda x: str(x))
total_text = total_text.apply(lambda x: normalize_text(x))
max_features = 6000
tokenizer = Tokenizer(num_words=max_features)
tokenizer.fit_on_texts(total_text)
question_1_sequenced = tokenizer.texts_to_sequences(question_pairs['question1'].apply(lambda x: normalize_text(x)))
question_2_sequenced = tokenizer.texts_to_sequences(question_pairs['question2'].apply(lambda x: normalize_text(x)))
vocab_size = len(tokenizer.word_index) + 1

In [353]:
vocab_size

92423

In [354]:
maxlen = 100
question_1_padded = pad_sequences(question_1_sequenced, maxlen=maxlen)
question_2_padded = pad_sequences(question_2_sequenced, maxlen=maxlen)

In [359]:
y = question_pairs['is_duplicate']

In [381]:
from tqdm import tqdm

In [384]:
from numpy import zeros
embedding_matrix = zeros((vocab_size, 768))
for word, i in tokenizer.word_index.items():
    embedding_vector = w2v.get(word)
    if embedding_vector is not None:
        embedding_matrix[i] = embedding_vector[0]

In [357]:
embedding_size = 128
max_len = 100

inp1 = Input(shape=(100,))
inp2 = Input(shape=(100,))

x1 = Embedding(vocab_size, 200, weights=[embedding_matrix], input_length=max_len)(inp1)
x2 = Embedding(vocab_size, 200, weights=[embedding_matrix], input_length=max_len)(inp2)

x3 = Bidirectional(LSTM(32, return_sequences = True))(x1)
x4 = Bidirectional(LSTM(32, return_sequences = True))(x2)

x5 = GlobalMaxPool1D()(x3)
x6 = GlobalMaxPool1D()(x4)

x7 =  dot([x5, x6], axes=1)

x8 = Dense(40, activation='relu')(x7)
x9 = Dropout(0.05)(x8)
x10 = Dense(10, activation='relu')(x9)
output = Dense(1, activation="sigmoid")(x10)

model = Model(inputs=[inp1, inp2], outputs=output)
model.compile(loss='binary_crossentropy', optimizer='adam', metrics=['accuracy'])
batch_size = 256
epochs = 4

In [360]:
model.fit([question_1_padded, question_2_padded], y, batch_size=batch_size, epochs=epochs, validation_split=0.2, )

Train on 323480 samples, validate on 80871 samples
Epoch 1/4
Epoch 2/4
Epoch 3/4
Epoch 4/4


<keras.callbacks.History at 0x1bd352da0>

--------
## Combining candidate generation and reranking

In [361]:
query

'How will Indian GDP be affected from banning 500 and 1000 rupees notes?'

In [363]:
query_copy = [query]*len(relevant_docs)
question_1_sequenced_final = tokenizer.texts_to_sequences(query_copy)
question_2_sequenced_final = tokenizer.texts_to_sequences(relevant_docs)

In [364]:
maxlen = 100
question_1_padded_final = pad_sequences(question_1_sequenced_final, maxlen=maxlen)
question_2_padded_final = pad_sequences(question_2_sequenced_final, maxlen=maxlen)

In [365]:
preds_test = model.predict([question_1_padded_final, question_2_padded_final])
preds_test = np.array([x[0] for x in preds_test])

In [390]:
[relevant_docs[x] for x in preds_test.argsort()[::-1]][:10]

['What do you think about banning 500 and 1000 rupee notes in India?',
 'What will be the implications of banning 500 and 1000 rupees currency notes on Indian economy?',
 'What will be the consequences of 500 and 1000 rupee notes banning?',
 'What will be the effects after banning on 500 and 1000 rupee notes?',
 'What will be the impact on real estate by banning 500 and 1000 rupee notes from India?',
 'How is banning 500 and 1000 INR going to help Indian economy?',
 'What are your views on India banning 500 and 1000 notes? In what way it will affect Indian economy?',
 'How is discontinuing 500 and 1000 rupee note going to put a hold on black money in India?',
 'What will be the result of banning 500 and 1000 rupees note in India?',
 'What are the economic implications of banning 500 and 1000 rupee notes?']