# Google QUEST Q&A Labeling
Improving automated understanding of complex question answer content

Computers are really good at answering questions with single, verifiable answers. But, humans are often still better at answering questions about opinions, recommendations, or personal experiences.

Humans are better at addressing subjective questions that require a deeper, multidimensional understanding of context - something computers aren't trained to do well…yet.. Questions can take many forms - some have multi-sentence elaborations, others may be simple curiosity or a fully developed problem. They can have multiple intents, or seek advice and opinions. Some may be helpful and others interesting. Some are simple right or wrong.

Unfortunately, it’s hard to build better subjective question-answering algorithms because of a lack of data and predictive models. That’s why the CrowdSource team at Google Research, a group dedicated to advancing NLP and other types of ML science via crowdsourcing, has collected data on a number of these quality scoring aspects.

In this competition, you’re challenged to use this new dataset to build predictive algorithms for different subjective aspects of question-answering. The question-answer pairs were gathered from nearly 70 different websites, in a "common-sense" fashion. Our raters received minimal guidance and training, and relied largely on their subjective interpretation of the prompts. As such, each prompt was crafted in the most intuitive fashion so that raters could simply use their common-sense to complete the task. By lessening our dependency on complicated and opaque rating guidelines, we hope to increase the re-use value of this data set. What you see is what you get!

Demonstrating these subjective labels can be predicted reliably can shine a new light on this research area. Results from this competition will inform the way future intelligent Q&A systems will get built, hopefully contributing to them becoming more human-like.

# $\color{blue}{\text{Summary of main results}}$

# - Each question_title, question_body and answer_body is fed into BERT pretrained model 

# - The [cls] output from the last BERT layer for all of the above three is then concatenated and fed into a two layer LSTM, which is then fed into a dense layer with 30 outputs

# - Below BERT model is only used in evaluation mode bert_model.eval()

# - We get Spearman's correlation coefficient of upto 0.35 using this method

# - The results improve by using some tricks along the lines of 
https://github.com/rapat82/ReOrNot/blob/master/realornot-bertbase.ipynb

# - Obtaining sentence embeddings by just using the [CLS] token output from the last BERT layer typically does not constitute a good sentence representation

# - Training many such models and combining their predictions using simple average increases performance

# - To run this notebook locally, change the path of data files and files for pretrained bert model accordingly 

In [1]:
# This Python 3 environment comes with many helpful analytics libraries installed
# It is defined by the kaggle/python docker image: https://github.com/kaggle/docker-python
# For example, here's several helpful packages to load in 

import numpy as np # linear algebra
import pandas as pd # data processing, CSV file I/O (e.g. pd.read_csv)

# Input data files are available in the "../input/" directory.
# For example, running this (by clicking run or pressing Shift+Enter) will list all files under the input directory

import os
for dirname, _, filenames in os.walk('/kaggle/input'):
    for filename in filenames:
        print(os.path.join(dirname, filename))

# Any results you write to the current directory are saved as output.

/kaggle/input/google-quest-challenge/train.csv
/kaggle/input/google-quest-challenge/test.csv
/kaggle/input/google-quest-challenge/sample_submission.csv
/kaggle/input/pretrained-bert-models-for-pytorch/bert-large-cased-vocab.txt
/kaggle/input/pretrained-bert-models-for-pytorch/bert-base-uncased-vocab.txt
/kaggle/input/pretrained-bert-models-for-pytorch/bert-base-chinese-vocab.txt
/kaggle/input/pretrained-bert-models-for-pytorch/bert-base-multilingual-cased-vocab.txt
/kaggle/input/pretrained-bert-models-for-pytorch/bert-base-cased-vocab.txt
/kaggle/input/pretrained-bert-models-for-pytorch/bert-base-multilingual-uncased-vocab.txt
/kaggle/input/pretrained-bert-models-for-pytorch/bert-large-uncased-vocab.txt
/kaggle/input/pretrained-bert-models-for-pytorch/bert-base-uncased/pytorch_model.bin
/kaggle/input/pretrained-bert-models-for-pytorch/bert-base-uncased/bert_config.json
/kaggle/input/pretrained-bert-models-for-pytorch/bert-large-uncased/pytorch_model.bin
/kaggle/input/pretrained-bert-mo

In [2]:
from transformers import BertTokenizer, BertConfig, BertModel

In [3]:
qa_data = pd.read_csv('/kaggle/input/google-quest-challenge/train.csv').fillna('')
qa_testdata = pd.read_csv('/kaggle/input/google-quest-challenge/test.csv').fillna('')

In [4]:
q_title = qa_data['question_title'].values
q_body = qa_data['question_body'].values
a_body = qa_data['answer'].values
q_test_title = qa_testdata['question_title'].values
q_test_body = qa_testdata['question_body'].values
a_test_body = qa_testdata['answer'].values

In [5]:
import torch

In [6]:
tokenizer = BertTokenizer.from_pretrained('../input/pretrained-bert-models-for-pytorch/bert-base-uncased-vocab.txt')

In [7]:
def numericalize(input_list, m_len):
    output_list = []
    for i, row in enumerate(input_list):
        output_list.append(tokenizer.encode(row, max_length=m_len, 
                                            truncation_strategy='longest_first', 
                                            pad_to_max_length=True, return_tensors='pt'))
    output_tensor = torch.stack(output_list).squeeze()
    return output_tensor


In [8]:
import time
print(time.ctime())
q_title_num = numericalize(q_title, 30)
q_body_num = numericalize(q_body, 120)
a_body_num = numericalize(a_body, 150)
q_test_title_num = numericalize(q_test_title, 30)
q_test_body_num = numericalize(q_test_body, 120)
a_test_body_num = numericalize(a_test_body, 150)
print(time.ctime())

Sun Feb  9 18:38:19 2020
Sun Feb  9 18:39:46 2020


In [9]:
from torch.utils.data import TensorDataset, DataLoader

In [10]:
sample_submission = pd.read_csv('/kaggle/input/google-quest-challenge/sample_submission.csv')
y = qa_data[sample_submission.columns[1:]].values

In [11]:
len(q_title_num)

6079

In [12]:
perm = np.random.permutation(q_title_num.shape[0])
qtnum_shuffled = np.zeros_like(q_title_num)
qbnum_shuffled = np.zeros_like(q_body_num)
abnum_shuffled = np.zeros_like(a_body_num)
labels_shuffled = np.zeros_like(y)
np.take(q_title_num,perm,axis=0,out=qtnum_shuffled)
np.take(q_body_num,perm,axis=0,out=qbnum_shuffled)
np.take(a_body_num,perm,axis=0,out=abnum_shuffled)
np.take(y,perm,axis=0,out=labels_shuffled)

array([[1.        , 0.55555556, 0.        , ..., 0.        , 0.        ,
        0.88888889],
       [0.66666667, 0.33333333, 0.        , ..., 0.33333333, 1.        ,
        1.        ],
       [0.88888889, 1.        , 0.33333333, ..., 0.        , 0.66666667,
        0.88888889],
       ...,
       [0.77777778, 0.44444444, 0.        , ..., 0.        , 0.        ,
        0.77777778],
       [1.        , 0.66666667, 0.        , ..., 0.        , 0.33333333,
        0.77777778],
       [1.        , 0.55555556, 0.        , ..., 0.        , 1.        ,
        0.88888889]])

In [13]:
split_frac=0.9
iindex = int(len(q_title_num)*split_frac)
qttrain_x, qbtrain_x, abtrain_x, qtval_x, qbval_x, abval_x = qtnum_shuffled[:iindex], qbnum_shuffled[:iindex], abnum_shuffled[:iindex], qtnum_shuffled[iindex:], qbnum_shuffled[iindex:], abnum_shuffled[iindex:]
train_y, val_y = labels_shuffled[:iindex], labels_shuffled[iindex:] 
test_y = np.zeros((len(q_test_title_num), 30))

In [14]:
print(type(qttrain_x))
print(type(q_test_title_num))

<class 'numpy.ndarray'>
<class 'torch.Tensor'>


In [15]:
train_bs = 128
test_bs = 128

In [16]:
train_data = TensorDataset(torch.from_numpy(qttrain_x), torch.from_numpy(qbtrain_x), torch.from_numpy(abtrain_x), torch.from_numpy(train_y))
val_data = TensorDataset(torch.from_numpy(qtval_x), torch.from_numpy(qbval_x), torch.from_numpy(abval_x), torch.from_numpy(val_y))
test_data = TensorDataset(q_test_title_num, q_test_body_num, a_test_body_num, torch.from_numpy(test_y))

In [17]:
train_loader = DataLoader(train_data, shuffle = True, batch_size=train_bs)
valid_loader = DataLoader(val_data, shuffle = True, batch_size=train_bs)
test_loader = DataLoader(test_data, shuffle = False, batch_size=test_bs, drop_last=False)

In [18]:
dataiter = iter(train_loader)
qtx, qbx, abx, label = dataiter.next()
print(qtx.shape)
print(qbx.shape)
print(abx.shape)
print(label.shape)

torch.Size([128, 30])
torch.Size([128, 120])
torch.Size([128, 150])
torch.Size([128, 30])


In [19]:
dataiter = iter(test_loader)
qtx, qbx, abx, label = dataiter.next()
print(qtx.shape)
print(qbx.shape)
print(abx.shape)
print(label.shape)

torch.Size([128, 30])
torch.Size([128, 120])
torch.Size([128, 150])
torch.Size([128, 30])


In [20]:
train_on_gpu=torch.cuda.is_available()

In [21]:
bert_model_config = '../input/pretrained-bert-models-for-pytorch/bert-base-uncased/bert_config.json'
bert_config = BertConfig.from_json_file(bert_model_config)
bert_model = BertModel.from_pretrained('../input/pretrained-bert-models-for-pytorch/bert-base-uncased/', config = bert_config)
bert_model.eval()

BertModel(
  (embeddings): BertEmbeddings(
    (word_embeddings): Embedding(30522, 768, padding_idx=0)
    (position_embeddings): Embedding(512, 768)
    (token_type_embeddings): Embedding(2, 768)
    (LayerNorm): LayerNorm((768,), eps=1e-12, elementwise_affine=True)
    (dropout): Dropout(p=0.1, inplace=False)
  )
  (encoder): BertEncoder(
    (layer): ModuleList(
      (0): BertLayer(
        (attention): BertAttention(
          (self): BertSelfAttention(
            (query): Linear(in_features=768, out_features=768, bias=True)
            (key): Linear(in_features=768, out_features=768, bias=True)
            (value): Linear(in_features=768, out_features=768, bias=True)
            (dropout): Dropout(p=0.1, inplace=False)
          )
          (output): BertSelfOutput(
            (dense): Linear(in_features=768, out_features=768, bias=True)
            (LayerNorm): LayerNorm((768,), eps=1e-12, elementwise_affine=True)
            (dropout): Dropout(p=0.1, inplace=False)
          

In [22]:
import torch.nn as nn

class QALSTM(nn.Module):
    def __init__(self, seq_len, output_size, embedding_dim, hidden_dim, n_layers, drop_prob=0.5):
        
        super(QALSTM, self).__init__()
        
        self.seq_len = seq_len
        self.output_size = output_size
        self.n_layers = n_layers
        self.hidden_dim = hidden_dim
        self.embedding_dim =embedding_dim
        self.drop_prob = drop_prob
        self.lstm = nn.LSTM(embedding_dim, hidden_dim, 
                           n_layers, dropout = drop_prob,
                           batch_first = True)
        self.dropout = nn.Dropout(0.5)
        self.fc = nn.Linear(hidden_dim, output_size)
        self.sigmoid = nn.Sigmoid()
    def forward(self, x1, x2, hidden):
        batch_size = x1.size(0)
        x = torch.cat((x1, x2), dim=1)
        lstm_output, hidden = self.lstm(x, hidden)
        out = self.dropout(lstm_output)
        out = self.fc(out)
        sigmoid_out = self.sigmoid(out)
        sigmoid_out = sigmoid_out.view(batch_size, -1)
        sigmoid_out = sigmoid_out[:,-30:]
        return sigmoid_out, hidden

    def init_hidden(self, batch_size):
        
        weight = next(self.parameters()).data
        
        if train_on_gpu:
            hidden=(weight.new(self.n_layers, batch_size, self.hidden_dim).zero_().cuda(),
                   weight.new(self.n_layers, batch_size, self.hidden_dim).zero_().cuda())
        else:
            hidden=(weight.new(self.n_layers, batch_size, self.hidden_dim).zero_(),
                   weight.new(self.n_layers, batch_size, self.hidden_dim).zero_())
            
        return hidden

In [23]:
output_size = 30
embedding_dim = 768
hidden_dim = 1024
n_layers = 2
seq_len = 300
net = QALSTM(seq_len, output_size, embedding_dim, hidden_dim, n_layers)

In [24]:
lr = 0.001
criterion = nn.BCELoss()
optimizer = torch.optim.Adam(net.parameters(), lr=lr)

In [25]:
from scipy.stats import spearmanr
epochs = 19
clip = 5 # Gradient clipping

# If GPU is available, train on GPU
if (train_on_gpu):
    net.cuda()

net.train()
# Now our network is in training mode, lets train for some epochs
#accuracy_old_old =0.0
loss_vs_epoch = []
valloss_vs_epoch = []
score_old = 0.0
valid_loss_old = 100.0
bert_model.to('cuda')
bert_bs = 128
for e in range(0,epochs):
    t1=time.ctime()
    for qtx, qbx, abx, labels in train_loader:
        bert_qt_loader = DataLoader(qtx, shuffle = False, batch_size=bert_bs, drop_last=False)
        bert_qb_loader = DataLoader(qbx, shuffle = False, batch_size=bert_bs, drop_last=False)
        bert_ab_loader = DataLoader(abx, shuffle = False, batch_size=bert_bs, drop_last=False)
        with torch.no_grad():
            qt_embed = []
            qb_embed = []
            ab_embed = []
            for qt in bert_qt_loader:
                qt = qt.to('cuda')
                outputs_qt = bert_model(qt)[0]
                qt_embed.append(outputs_qt)
            for qb in bert_qb_loader:
                qb = qb.to('cuda')
                outputs_qb = bert_model(qb)[0] 
                qb_embed.append(outputs_qb)
            for ab in bert_ab_loader:
                ab = ab.to('cuda')
                outputs_ab = bert_model(ab)[0]
                ab_embed.append(outputs_ab)
            qt_tensor=torch.cat(qt_embed)
            qb_tensor=torch.cat(qb_embed)
            a_feat=torch.cat(ab_embed)
            q_feat = torch.cat((qt_tensor, qb_tensor), dim =1)
        x1=q_feat
        x2=a_feat
        batch_size=qtx.shape[0]
        h = net.init_hidden(batch_size)

        if (train_on_gpu):
            x1, x2, labels = x1.cuda(), x2.cuda(), labels.cuda()
        h = tuple([each.data for each in h])
        net.zero_grad()
        output, h = net(x1, x2, h)
        loss = criterion(output, labels.float())
        loss.backward()
        nn.utils.clip_grad_norm_(net.parameters(), clip)
        optimizer.step()
    loss_vs_epoch.append([e+1, loss.item()])
    val_losses = []
    net.eval()
    preds = []
    original = []
    for qtvx, qbvx, abvx, labels in valid_loader:
        bert_qt_loader = DataLoader(qtvx, shuffle = False, batch_size=bert_bs, drop_last=False)
        bert_qb_loader = DataLoader(qbvx, shuffle = False, batch_size=bert_bs, drop_last=False)
        bert_ab_loader = DataLoader(abvx, shuffle = False, batch_size=bert_bs, drop_last=False)
        with torch.no_grad():
            qt_embed = []
            qb_embed = []
            ab_embed = []
            for qt in bert_qt_loader:
                qt = qt.to('cuda')
                outputs_qt = bert_model(qt)[0]
                qt_embed.append(outputs_qt)
            for qb in bert_qb_loader:
                qb = qb.to('cuda')
                outputs_qb = bert_model(qb)[0] 
                qb_embed.append(outputs_qb)
            for ab in bert_ab_loader:
                ab = ab.to('cuda')
                outputs_ab = bert_model(ab)[0]
                ab_embed.append(outputs_ab)
            qt_tensor=torch.cat(qt_embed)
            qb_tensor=torch.cat(qb_embed)
            a_feat=torch.cat(ab_embed)
            q_feat = torch.cat((qt_tensor, qb_tensor), dim =1)
        val_x1=q_feat
        val_x2=a_feat
        batch_size = qtvx.shape[0]
        val_h = net.init_hidden(batch_size)
        
        val_h = tuple([each.data for each in val_h])
        if (train_on_gpu):
            val_x1, val_x2, labels = val_x1.cuda(), val_x2.cuda(), labels.cuda()
        output, val_h = net(val_x1, val_x2, val_h)
        val_loss = criterion(output, labels.float())                
        val_losses.append(val_loss.item())
        preds.append(output.cpu().detach().numpy())
        original.append(labels.float().cpu().detach().numpy())
    score = 0
    for i in range(30):
        score += np.nan_to_num(
                spearmanr(np.concatenate(original)[:, i], 
                          np.concatenate(preds)[:, i]).correlation / 30)
    valid_loss = np.mean(val_losses)
    valloss_vs_epoch.append([e+1, np.mean(val_losses)])
    if score > score_old:
        score_old = score
        model_name = 'best_model.net'
        checkpoint = {'output_size': net.output_size,
                     'embedding_dim': net.embedding_dim,
                     'hidden_dim': net.hidden_dim,
                     'n_layers':net.n_layers,
                     'state_dict': net.state_dict()}
        with open(model_name, 'wb') as f:
            torch.save(checkpoint, f)    
    net.train()
    t2=time.ctime()
    #etime = t2 - t1
    print( "Epoch: {}/{}---finished---score:{}---time:{}, {}".format(e+1,epochs, score, t2, t1))

loss_vs_epoch = np.array(loss_vs_epoch)
valloss_vs_epoch = np.array(valloss_vs_epoch)

  c /= stddev[:, None]
  c /= stddev[None, :]
  return (a < x) & (x < b)
  return (a < x) & (x < b)
  cond2 = cond0 & (x <= _a)


Epoch: 1/19---finished---score:0.12706756350622106---time:Sun Feb  9 18:41:14 2020, Sun Feb  9 18:39:58 2020
Epoch: 2/19---finished---score:0.21214785115791956---time:Sun Feb  9 18:42:30 2020, Sun Feb  9 18:41:14 2020
Epoch: 3/19---finished---score:0.22457610824814958---time:Sun Feb  9 18:43:46 2020, Sun Feb  9 18:42:30 2020
Epoch: 4/19---finished---score:0.23196185373849876---time:Sun Feb  9 18:45:02 2020, Sun Feb  9 18:43:46 2020
Epoch: 5/19---finished---score:0.24493890957571007---time:Sun Feb  9 18:46:19 2020, Sun Feb  9 18:45:02 2020
Epoch: 6/19---finished---score:0.2594900885056537---time:Sun Feb  9 18:47:35 2020, Sun Feb  9 18:46:19 2020
Epoch: 7/19---finished---score:0.2706110362480822---time:Sun Feb  9 18:48:51 2020, Sun Feb  9 18:47:35 2020
Epoch: 8/19---finished---score:0.27509257309712876---time:Sun Feb  9 18:50:07 2020, Sun Feb  9 18:48:51 2020
Epoch: 9/19---finished---score:0.2862974360141058---time:Sun Feb  9 18:51:23 2020, Sun Feb  9 18:50:07 2020
Epoch: 10/19---finishe

In [26]:
print(valloss_vs_epoch)

[[ 1.          0.42199213]
 [ 2.          0.41335413]
 [ 3.          0.41024854]
 [ 4.          0.40622871]
 [ 5.          0.40145983]
 [ 6.          0.4034024 ]
 [ 7.          0.39705639]
 [ 8.          0.39863102]
 [ 9.          0.39372438]
 [10.          0.39002605]
 [11.          0.38912213]
 [12.          0.38817714]
 [13.          0.38669078]
 [14.          0.38744558]
 [15.          0.38238826]
 [16.          0.38358501]
 [17.          0.38155003]
 [18.          0.38250473]
 [19.          0.3796507 ]]


In [27]:
with open('best_model.net', 'rb') as f:
    checkpoint = torch.load(f)
loaded = QALSTM(seq_len, checkpoint['output_size'], checkpoint['embedding_dim'],
                     checkpoint['hidden_dim'],checkpoint['n_layers'])
loaded.load_state_dict(checkpoint['state_dict'])

<All keys matched successfully>

In [28]:
loaded.cuda()
loaded.eval()
preds = []
for qttx, qbtx, abtx, labels in test_loader:    
    bert_qt_loader = DataLoader(qttx, shuffle = False, batch_size=bert_bs, drop_last=False)
    bert_qb_loader = DataLoader(qbtx, shuffle = False, batch_size=bert_bs, drop_last=False)
    bert_ab_loader = DataLoader(abtx, shuffle = False, batch_size=bert_bs, drop_last=False)
    with torch.no_grad():
        qt_embed = []
        qb_embed = []
        ab_embed = []
        for qt in bert_qt_loader:
            qt = qt.to('cuda')
            outputs_qt = bert_model(qt)[0]
            qt_embed.append(outputs_qt)
        for qb in bert_qb_loader:
            qb = qb.to('cuda')
            outputs_qb = bert_model(qb)[0] 
            qb_embed.append(outputs_qb)
        for ab in bert_ab_loader:
            ab = ab.to('cuda')
            outputs_ab = bert_model(ab)[0]
            ab_embed.append(outputs_ab)
        qt_tensor=torch.cat(qt_embed)
        qb_tensor=torch.cat(qb_embed)
        a_feat=torch.cat(ab_embed)
        q_feat = torch.cat((qt_tensor, qb_tensor), dim =1)
    test_x1=q_feat
    test_x2=a_feat
    batch_size=qttx.shape[0]
    test_h = loaded.init_hidden(batch_size)
    if (train_on_gpu):
        test_x1, test_x2, labels = test_x1.cuda(), test_x2.cuda(), labels.cuda()
    test_h = tuple([each.data for each in test_h])
    output, test_h = loaded(test_x1, test_x2, test_h)
    y_pred = output.detach()
    preds.append(y_pred.cpu().detach())

In [29]:
submit_list = []
for i in range(len(test_loader)):
    submit_list.append(preds[i].numpy().squeeze())

In [30]:
nsub = np.concatenate(submit_list)
print(nsub.shape)

(476, 30)


In [31]:
TARGET = ['question_asker_intent_understanding',
          'question_body_critical',
          'question_conversational',
          'question_expect_short_answer',
          'question_fact_seeking',
          'question_has_commonly_accepted_answer',
          'question_interestingness_others',
          'question_interestingness_self',
          'question_multi_intent',
          'question_not_really_a_question',
          'question_opinion_seeking',
          'question_type_choice',
          'question_type_compare',
          'question_type_consequence',
          'question_type_definition',
          'question_type_entity',
          'question_type_instructions',
          'question_type_procedure',
          'question_type_reason_explanation',
          'question_type_spelling',
          'question_well_written',
          'answer_helpful',
          'answer_level_of_information',
          'answer_plausible',
          'answer_relevance',
          'answer_satisfaction',
          'answer_type_instructions',
          'answer_type_procedure',
          'answer_type_reason_explanation',
          'answer_well_written']

In [32]:
submission = pd.DataFrame(nsub, columns=TARGET).clip(0.00001, 0.999999)
submission.insert(0,'qa_id',qa_testdata['qa_id'].values)
submission.to_csv("submission.csv", index = False)

In [33]:
for column in submission.columns:
    print(column,":",sum(submission[column].between(0.00001,0.999999)))

qa_id : 0
question_asker_intent_understanding : 476
question_body_critical : 476
question_conversational : 476
question_expect_short_answer : 476
question_fact_seeking : 476
question_has_commonly_accepted_answer : 476
question_interestingness_others : 476
question_interestingness_self : 476
question_multi_intent : 476
question_not_really_a_question : 476
question_opinion_seeking : 476
question_type_choice : 476
question_type_compare : 476
question_type_consequence : 476
question_type_definition : 476
question_type_entity : 476
question_type_instructions : 476
question_type_procedure : 476
question_type_reason_explanation : 476
question_type_spelling : 476
question_well_written : 476
answer_helpful : 476
answer_level_of_information : 476
answer_plausible : 476
answer_relevance : 476
answer_satisfaction : 476
answer_type_instructions : 476
answer_type_procedure : 476
answer_type_reason_explanation : 476
answer_well_written : 476
