# Comparative question answering demo

Project by:  
Nikita Borovkov  
Filipp Furaev

External resourses required:

In [None]:
!python3 -m nltk.downloader stopwords
!python3 -m nltk.downloader universal_tagset
!python3 -m spacy download en

In [3]:
from utils.sentence_clearer import clear_sentences, remove_questions
from ml_approach.sentence_preparation_ML import prepare_sentence_DF
from ml_approach.classify import classify_sentences
from utils.es_requester import extract_sentences
from utils.objects import Argument

#import pke
#from pke.unsupervised import MultipartiteRank
import requests
from requests.auth import HTTPBasicAuth
import gensim
import gensim.downloader as api
import numpy as np
import pandas as pd
import pickle
from bert_comp_pred import get_BERT_prediction



### Input two objects to compare

In [4]:
obj_a = "python"
obj_b = "java"

In [5]:
obj_a = Argument(obj_a.lower().strip())
obj_b = Argument(obj_b.lower().strip())

## Look for sentences, containing the requested objects in Elasticsearch

Fill in user and password

In [16]:
def request_elasticsearch(obj_a, obj_b, user, password):
    url = 'http://ltdemos.informatik.uni-hamburg.de/depcc-index/_search?q='
    url += 'text:\"{}\"%20AND%20\"{}\"'.format(obj_a.name, obj_b.name)
    proxies = {"http": "http://185.46.212.97:10015/","https": "https://185.46.212.98:10015/",}
    size = 10000
    
    url += '&from=0&size={}'.format(size)
    response = requests.get(url, auth=HTTPBasicAuth(user, password), proxies=proxies)
    return response

In [9]:
import requests

URL = 'http://ltdemos.informatik.uni-hamburg.de/cam-api'
proxies = {"http": "http://185.46.212.97:10015/","https": "https://185.46.212.98:10015/",}
params = {
            'objectA': 'Moscow',
            'objectB': 'London',
            'fs': str(True).lower()}
response = requests.get(url=URL, params=params, proxies=proxies)

In [7]:
#Write down the name and password for elasticSearch
name = ""
password = ""

In [17]:
json_compl = request_elasticsearch(obj_a, obj_b, name, password)

## Preparing sentences for classificator

In [18]:
all_sentences = extract_sentences(json_compl)
remove_questions(all_sentences)
prepared_sentences = prepare_sentence_DF(all_sentences, obj_a, obj_b)

In [19]:
prepared_sentences.head()

Unnamed: 0,object_a,object_b,sentence


## Using classificator of comparative sentences
The classificator is used in CAM system:  
Paper: https://arxiv.org/abs/1901.05041  
Github: https://github.com/uhh-lt/cam/  

The classifier takes 2 compared objects and a sentence as an input.
The output is one of 3 classes:
- NONE - the sentence does not have comparison in it
- BETTER - the first object in a sentence is better than the second
- WORSE - the first object in a sentence is worse than the second  

Paper: https://arxiv.org/abs/1809.06152


The second option is to use pretrained BERT (BERT training is in another notebook (BERT_classifier.ipynb). It's preffered to run it in google colab).

To run BERT you should download it from https://drive.google.com/file/d/1Hu4XC-N_pt4f10-2Nk8k15jN1HyZK8DX/view?usp=sharing and put it into the "model" folder

In [8]:
# model = "BERT"
model = "bow"

In [9]:
if model == "BERT":
    classification_results = get_BERT_prediction(prepared_sentences)
if model == "bow":
    classification_results = classify_sentences(prepared_sentences, 'bow')

Preparing to convert 5457 examples..
Spawning 11 processes..


HBox(children=(IntProgress(value=0, max=5457), HTML(value='')))




HBox(children=(IntProgress(value=0, description='Prediction', max=171, style=ProgressStyle(description_width='…




We don't need the sentences without comparison

In [10]:
classification_results[classification_results['max'] != 'NONE']

Unnamed: 0,max
464,BETTER
481,BETTER
537,BETTER
538,BETTER
550,BETTER
...,...
5274,BETTER
5309,BETTER
5330,BETTER
5399,BETTER


In [11]:
prepared_sentences[classification_results['max'] != 'NONE']

Unnamed: 0,object_a,object_b,sentence
464,java,python,Java 8X Faster than Python
481,java,python,Java 5.3X Faster than Python
537,python,java,"When I use Python, I write Python, not Java."
538,python,java,I write Python code in Python not Java code us...
550,java,python,Java 8X Faster than Python .
...,...,...,...
5274,python,java,throw a Python RuntimeError instead of a Java ...
5309,java,python,I'm writing this project in Java instead of Py...
5330,java,python,Java is a whole lot more predictible than Python.
5399,python,java,"Python has classical OOP, kinda like C++ or Java."


Uniting the comparative sentences and results of classification into one dataframe

In [12]:
comparative_sentences = prepared_sentences[classification_results['max'] != 'NONE']

In [13]:
comparative_sentences['max'] = classification_results[classification_results['max'] != 'NONE']['max']

A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: http://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  """Entry point for launching an IPython kernel.


In [14]:
comparative_sentences

Unnamed: 0,object_a,object_b,sentence,max
464,java,python,Java 8X Faster than Python,BETTER
481,java,python,Java 5.3X Faster than Python,BETTER
537,python,java,"When I use Python, I write Python, not Java.",BETTER
538,python,java,I write Python code in Python not Java code us...,BETTER
550,java,python,Java 8X Faster than Python .,BETTER
...,...,...,...,...
5274,python,java,throw a Python RuntimeError instead of a Java ...,BETTER
5309,java,python,I'm writing this project in Java instead of Py...,BETTER
5330,java,python,Java is a whole lot more predictible than Python.,BETTER
5399,python,java,"Python has classical OOP, kinda like C++ or Java.",BETTER


## Getting aspects from gathered sentences

### Keywords approach
We unite the sentences into a single document and look for keywords in that document using PKE

In [15]:
text = prepared_sentences[classification_results['max'] != 'NONE']['sentence'].str.cat(sep=' ')

In [16]:
extractor = MultipartiteRank()
extractor.load_document(input=text, language="en", normalization='stemming')

extractor.candidate_selection(pos={'NOUN', 'PROPN', 'ADJ'})

extractor.candidate_weighting()

keyphrases = extractor.get_n_best(n=-1, stemming=False)

Here are our keyphrases

In [17]:
keyphrases

[('java 8x', 0.24749249572409165),
 ('python', 0.19022007379986342),
 ('java', 0.07702735192956407),
 ('python syntax', 0.034760281504254295),
 ('python code', 0.03403017697337611),
 ('java code', 0.01556818580647329),
 ('faster', 0.008886322455805515),
 ('slower', 0.008357065934817541),
 ('easier', 0.0068932419619071985),
 ('java programs', 0.006539830762890901),
 ('startup time', 0.006483490826817549),
 ('better programmers', 0.006381821755624784),
 ('popular', 0.005426591557860954),
 ('strict indentation rules', 0.0053477982634904855),
 ('javascript', 0.0046697540033665596),
 ('intro programming classes', 0.004669132067498419),
 ('language', 0.004635580728139692),
 ('bytecode vm', 0.004605635466155685),
 ('ruby', 0.0042366545050094165),
 ('times', 0.003844499948726177),
 ('simple', 0.0035921294615379124),
 ('duck typing', 0.003586835262880295),
 ('java first', 0.003558688689640822),
 ('much', 0.003500010064004049),
 ('optimizations', 0.0034299613598155363),
 ('general language prefe

Most of the keyphrases don't look like aspects we need. To extract the needed aspects we use a classifier which is trained to find good aspects.

## Aspect classifier

Loading and preprocessing sentences for training the classifier

In [18]:
names = ["OBJECT A", "OBJECT B", "ASPECT", "MOST FREQUENT RATING", "SENTENCE"]
df_train = pd.read_csv("classification_fine_grained/train_clf_fine_grained.csv", header=None, names=names)
df_test = pd.read_csv("classification_fine_grained/test_clf_fine_grained.csv", header=None, names=names)
df_dev = pd.read_csv("classification_fine_grained/dev_clf_fine_grained.csv", header=None, names=names)

import string
import nltk
from nltk.corpus import stopwords
from nltk.stem import WordNetLemmatizer


def get_list_of_tokens(df_texts):
    stop_words=set(stopwords.words('english'))
    wordnet_lemmatizer = WordNetLemmatizer()
    tokens = []
    texts = df_texts["SENTENCE"].values
    for i in range(len(texts)):
        row = texts[i]
        # remove punctuation
        for ch in string.punctuation:
            row = row.replace(ch, " ")
        row = row.replace("   ", " ")
        row = row.replace("  ", " ")
        temp_line = []
        # remove stop words
        for word in row.split():
            if word not in stop_words:
                temp_line.append(word)
        row = ' '.join(temp_line)
        # lemmatization
        temp_line = []
        for word in row.split():
            temp_line.append(wordnet_lemmatizer.lemmatize(word))
        tokens.append(temp_line)
    return tokens

tokens_test = get_list_of_tokens(df_test)
tokens_train = get_list_of_tokens(df_train)
tokens_dev = get_list_of_tokens(df_dev)

df_train['TOKENS'] = pd.Series(tokens_train)
df_dev['TOKENS'] = pd.Series(tokens_dev)
df_test['TOKENS'] = pd.Series(tokens_test)

To vectorise the sentences we use Word2Vec:  
Input of the classifier is a concatenation of 4 embeddings:
- object a embedding
- object b embedding
- aspect embedding
- sentence embedding  

For sentence embedding we use mean of embeddings of its words.  
So, considering w2v dimensionality, we have vecctors of size 1200 as an input.

In [19]:
# w2v_model = gensim.models.KeyedVectors.load_word2vec_format('GoogleNews-vectors-negative300.bin', binary=True)
w2v_model = api.load('word2vec-google-news-300')

In [20]:
def create_sentence_embeddings(model, words_list):
    sentence_embedding = []
    for word in words_list:
        try:
            sentence_embedding.append(model[word])
        except KeyError:
            continue
#             print(word + " is not in the vocabulary, skipping...")
    if len(sentence_embedding) == 0:
        sentence_embedding.append(np.zeros(300))
    return np.array(sentence_embedding)

def to_w2v_matrix(df_data, model):
    sent_embs = np.zeros([df_data.shape[0], 300 * 4], dtype='float32')
    for i in range(df_data.shape[0]):
        object_a_embedding = create_sentence_embeddings(model, df_data["OBJECT A"][i].split()).mean(axis=0)
        object_b_embedding = create_sentence_embeddings(model, df_data["OBJECT B"][i].split()).mean(axis=0)
        aspect_embedding = create_sentence_embeddings(model, df_data["ASPECT"][i].split()).mean(axis=0)
        sentence_embedding = create_sentence_embeddings(model, df_data["TOKENS"][i]).mean(axis=0)
        sent_embs[i, :] = np.concatenate((object_a_embedding, object_b_embedding, aspect_embedding, sentence_embedding), axis=0)
    return sent_embs

X_train = to_w2v_matrix(df_train, w2v_model)
X_dev = to_w2v_matrix(df_dev, w2v_model)
X_test = to_w2v_matrix(df_test, w2v_model)

In [21]:
def get_output_for_binary(data):
    return (data['MOST FREQUENT RATING'] != 'BAD').astype('float32').to_numpy()

y_train = get_output_for_binary(df_train)
y_dev = get_output_for_binary(df_dev)
y_test = get_output_for_binary(df_test)

In [22]:
from sklearn.metrics import accuracy_score
from sklearn.metrics import precision_score
from sklearn.metrics import recall_score
from sklearn.metrics import f1_score
from sklearn.metrics import classification_report

def report_scores(model, X, y):
    y_pred = model.predict(X)
    acc = accuracy_score(y, y_pred)
    pr = precision_score(y, y_pred, average='weighted')
    re = recall_score(y, y_pred, average='weighted')
    f1 = f1_score(y, y_pred, average='weighted')
    f1_bad, f1_good = f1_score(y, y_pred, average=None)
    print("Accuracy: {:.2f}".format(acc * 100))
    print("Precision: {:.2f}".format(pr * 100))
    print("Recall: {:.2f}".format(re * 100))
    print("F1: {:.2f}".format(f1 * 100))
    print("F1 GOOD: {:.2f}".format(f1_good * 100))
    print("F1 BAD: {:.2f}".format(f1_bad * 100))

Training the classifier: Support Vector Classifier with a linear kernel.

In [23]:
from sklearn.svm import SVC

model = SVC(kernel='linear', gamma='auto')

print("start of fit")
model.fit(X_train, y_train)

# Evaluation
print("Train")
report_scores(model, X_train, y_train)

print("Dev")
report_scores(model, X_dev, y_dev)

start of fit
Train
Accuracy: 96.41
Precision: 96.44
Recall: 96.41
F1: 96.41
F1 GOOD: 96.21
F1 BAD: 96.59
Dev
Accuracy: 78.74
Precision: 78.72
Recall: 78.74
F1: 78.72
F1 GOOD: 80.48
F1 BAD: 76.67


In [24]:
# Evaluation
print("Test")
report_scores(model, X_test, y_test)

Test
Accuracy: 81.91
Precision: 81.99
Recall: 81.91
F1: 81.89
F1 GOOD: 82.43
F1 BAD: 81.36


In [25]:
# save the model
filename = 'asp_clf.pkl'
pickle.dump(model, open(filename, 'wb'))

In [26]:
loaded_model = pickle.load(open(filename, 'rb'))

In [27]:
# Evaluation
print("Test")
report_scores(loaded_model, X_test, y_test)

Test
Accuracy: 81.91
Precision: 81.99
Recall: 81.91
F1: 81.89
F1 GOOD: 82.43
F1 BAD: 81.36


Now we have trained a classifier and are going to process our keyphrases.

To process the keyphrases we need a separate dataframe with sentences for each aspect (keyphrase).

In [28]:
asp_df = pd.DataFrame(columns=['OBJECT A', 'OBJECT B', 'ASPECT', 'SENTENCE', 'max'])
forbidden_phrases = [obj_a.name, obj_b.name, 'better', 'worse']

for index, row in comparative_sentences.iterrows():
    sentence = row['sentence']
    for (keyphrase, score) in keyphrases:
        skip_keyphrase = False
        for phrase in forbidden_phrases:
            if keyphrase == phrase:
                skip_keyphrase = True
                break
        if not skip_keyphrase:
            if keyphrase in sentence:
                asp_df = asp_df.append(
                    {'OBJECT A': row['object_a'],
                     'OBJECT B': row['object_b'],
                     'ASPECT': keyphrase,
                     'SENTENCE': row['sentence'],
                     'max': row['max'],
                    }, ignore_index=True)

In [29]:
asp_df['TOKENS'] = pd.Series(get_list_of_tokens(asp_df))

In [30]:
X_asp = to_w2v_matrix(asp_df, w2v_model)

Applying classifier

In [31]:
y_pred = model.predict(X_asp)

The aspects left after classifier

In [32]:
aspects = asp_df.iloc[np.nonzero(y_pred)[0].tolist()]['ASPECT'].unique()

In [33]:
aspects

array(['syntax', 'faster', 'indentation', 'better language', 'quicker',
       'easier', 'instincts', 'simpler'], dtype=object)

Top 10 keyphrases for comparison

In [34]:
keyphrases[:10]

[('java 8x', 0.24749249572409165),
 ('python', 0.19022007379986342),
 ('java', 0.07702735192956407),
 ('python syntax', 0.034760281504254295),
 ('python code', 0.03403017697337611),
 ('java code', 0.01556818580647329),
 ('faster', 0.008886322455805515),
 ('slower', 0.008357065934817541),
 ('easier', 0.0068932419619071985),
 ('java programs', 0.006539830762890901)]

## Determining winner

First, we need to specify which aspects belong to which object.

In [35]:
obj_a_aspects = []
obj_b_aspects = []
for aspect in aspects:
    rows = asp_df[asp_df['ASPECT']==aspect]
    if obj_a.name == rows.iloc[0]['OBJECT A']:
        obj_a_aspects.append(aspect)
    else:
        obj_b_aspects.append(aspect)

In [36]:
obj_a_aspects

['syntax', 'better language', 'quicker', 'easier', 'simpler']

In [37]:
obj_b_aspects

['faster', 'indentation', 'instincts']

The winner of comparison is the object which has more aspects.

In [38]:
comparing_pair = {}

In [39]:
if len(obj_a_aspects) > len(obj_b_aspects):
    comparing_pair['winner_aspects'] = obj_a_aspects
    comparing_pair['loser_aspects'] = obj_b_aspects
    comparing_pair['winner'] = obj_a.name
    comparing_pair['loser'] = obj_b.name
else:
    comparing_pair['winner_aspects'] = obj_b_aspects
    comparing_pair['loser_aspects'] = obj_a_aspects
    comparing_pair['winner'] = obj_b.name
    comparing_pair['loser'] = obj_a.name

## Generating response

Using templates

In [40]:
from template_generation.template_generation import generate_template

In [41]:
generate_template(comparing_pair, mode="extended")

'I would prefer to use python because it is: first, syntax, second, better language, third, quicker, fourth, easier, fifth, simpler, but java is: first, faster, second, indentation, third, instincts'

Getting a brief summary using text rank.

In [42]:
from gensim.summarization.textcleaner import split_sentences
from gensim.summarization.summarizer import summarize

In [43]:
rows = asp_df[asp_df.ASPECT.isin(aspects)]

In [44]:
sentences = ""
for row in range (rows.shape[0]):
    sentence = asp_df.iloc[row]['SENTENCE'] + " "
    if sentence not in sentences:
        sentences += sentence

In [45]:
if len(split_sentences(sentences)) > 10:
    summary = str(summarize(sentences, split=False, word_count=30))

In [46]:
print(summary)

Why is Java so popular than Python.
Simple: Java is faster than Python.
Python grew six times faster than Java, but Java still has twice the market share of Python.


In [47]:
from Demo import one_liner
import gensim.downloader as api

You can test what the demo by using the oneliner below (it only requires the w2v model)  
response - sentence containing aspects of products generated using templates  
summary - brief summary of sentences gathered from Elasticsearch

In [2]:
# w2v_model = gensim.models.KeyedVectors.load_word2vec_format('GoogleNews-vectors-negative300.bin', binary=True)
# w2v_model = api.load('word2vec-google-news-300')

In [48]:
obj_a = "play station"
obj_b = "xbox"
user = "" # username in Elasticsearch
password = "" # password in Elasticsearch

response, summary = one_liner(obj_a, obj_b, user, password, w2v_model)

Requesting Elasticsearch
Preparing sentences
Classifying comparative sentences


A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: http://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  comparative_sentences['max'] = classification_results[classification_results['max'] != 'NONE']['max']


Looking for keyphrases
Preparing keyphrases for classification
Classifying keyphrases
Determining the winner
Generating response
Generating summary


In [49]:
response

'i came to the conclusion that play station is better, because: much, ill, fun abilities, useful, fun, smart design, better target, free, reliable, much video games, bumpers, rubbish, price tag, price, investors, current market dominance, apprehension, order, powerful consoles, candy, stocking, solder toys, bit bigger, x2 inches, numbers, works, cheaper, free games, better deal, better graphics, touch screen, coarse flipping, resistance bs, liberation, bad system, overall. But it will be useful for you to know that xbox is: play, control, graphics, comparison, better situation, greater sales, form, sale, last, bluray drive, disk, trading games, powerful, console sales, size, cheap, best, hard, secure'

In [50]:
summary

"One great feature on this then, is the free play station network, which is also much more reliable than Xbox LIVE.\nPersonally I prefer the Xbox controller and I've heard that live is a more complete online experience than what play station offers."

## Conclusion

- A demo version of a comparative question answering system was developed
- Using machine learning techniques it allows to get aspects of the compared objects and receive an answer containing them
- Among tested comparative sentences classifier (BETTER, WORSE, NONE) the classifier using bow + xgboost seems to be the most suitable for the system because it is a lot faster than others (infersent and BERT) and its accuracy is not much lower
- As for the aspect classifiers, w2v + SVC was chosen due to the same reasons (the full table with comparison of aspect classifiers is available)
- The aspects received from the system are not always reasonable, so for some pairs of objects the system may return strange results, but in these cases the summary generated with the help of TextRank gives an understanding of the answer
- One of the future work directions is improving aspect extraction (trying sequence labelling and other methods)