# Answer classification for boolean questions

In this notebook, we look at the answer (evidence) classifation, which is a component in the TyDiQA pipeline which decides whether a boolean question should be answered `yes` or `no`, based on a passage selected by the machine reading comprehension component.

## Preliminaries
We assume that the machine reading comprehension and the question type classifier components of the TyDiQA pipeline have already run, either through the integrated command line or the step-by-step process, both described [here](../../primeqa/boolqa/README.md) and that the output directory was `base`.

First some setup.  The classifier will obtain its input from the `qtc/eval_predictions.json` file produced by the question type classifier.
Most of this setup is very similar to the setup for [mrc](../mrc/mrc.ipynb)

In [2]:
output_dir="out"
input_file=f"{base}/qtc/predictions.json"

from transformers import (
    AutoConfig,
    AutoModelForSequenceClassification,
    AutoTokenizer,
    DataCollatorWithPadding,
    HfArgumentParser,
    Trainer,
    TrainingArguments)
from transformers.trainer_utils import set_seed
from primeqa.text_classification.processors.postprocessors.text_classifier import TextClassifierPostProcessor
from primeqa.text_classification.processors.preprocessors.text_classifier import TextClassifierPreProcessor
from primeqa.boolqa.processors.dataset.mrc2dataset import create_dataset_from_run_mrc_output, create_dataset_from_json_str
import pandas as pd

seed = 42
set_seed(seed)

training_args = TrainingArguments(
    output_dir=output_dir,
    overwrite_output_dir=True,
    do_train=False,
    do_eval=True,
    per_device_train_batch_size=16,
    per_device_eval_batch_size=16,
    gradient_accumulation_steps=1,
    num_train_epochs=1,
    evaluation_strategy='no',
    learning_rate=4e-05,
    warmup_ratio=0.1,
    weight_decay=0.1,
    save_steps=50000,
    seed=seed,
)

## Setup the auxiliary classes

These are the same type of classes that are used in the mrc system.  The `sentence1_key` and `sentence2_key` argument to the preprocessor specifies that the evidence classifier will predict `yes` or `no` based on the question and the (long) passage answer produced by the upstream MRC system.  In general the minimal (short) answers are too short to make reasonable predictions from.

In [3]:
config = AutoConfig.from_pretrained('PrimeQA/tydi-tydi_boolean_answer_classifier-xlmr_large-20221117', num_labels=2, use_auth_token=True)

tokenizer=AutoTokenizer.from_pretrained('PrimeQA/tydi-boolean_answer_classifier-xlmr_large-20221117', use_fast=True, use_auth_token=True)

model = AutoModelForSequenceClassification.from_pretrained('PrimeQA/tydi-tydi_boolean_answer_classifier-xlmr_large-20221117', config=config, use_auth_token=True)

label_list=['no', 'yes']

postprocessor_class = TextClassifierPostProcessor
postprocessor = postprocessor_class(
    k=10,       
    drop_label=None,
    label_list = label_list,
    id_key='example_id',
    output_label_prefix='boolean_answer'
)

preprocessor_class = TextClassifierPreProcessor
preprocessor = preprocessor_class(
    example_id_key='example_id',
    sentence1_key='question',
    sentence2_key='passage_answer_text',
    tokenizer=tokenizer,
    load_from_cache_file=False,
    max_seq_len=500,
    padding=False,
    language_key='en',
    label_list=label_list
)

## Inputs
Here we create a dataset from the input file.  The input file is the output file of the question type classifier.  For illustrative purposes, we filter it to focus on the english questions that have been predicted to be boolean.

In [4]:
examples=create_dataset_from_run_mrc_output(input_file, unpack=False)
examples=examples.filter(lambda x:x['language']=='english' and x['question_type_pred']=='boolean')
eval_examples, eval_dataset = preprocessor.process_eval(examples)
eval_examples

  0%|          | 0/19 [00:00<?, ?ba/s]

  0%|          | 0/1 [00:00<?, ?ba/s]

Dataset({
    features: ['example_id', 'cls_score', 'start_logit', 'end_logit', 'span_answer', 'span_answer_score', 'start_index', 'end_index', 'passage_index', 'target_type_logits', 'span_answer_text', 'yes_no_answer', 'start_stdev', 'end_stdev', 'query_passage_similarity', 'normalized_span_answer_score', 'confidence_score', 'question', 'language', 'passage_answer_text', 'order', 'rank', 'question_type_pred', 'question_type_scores', 'question_type_conf'],
    num_rows: 113
})

## Do the predictions.
As in mrc, the trainer class instance runs the predictions.

In [5]:
trainer = Trainer( 
    model=model,
    args=training_args,
    train_dataset=None,
    eval_dataset=eval_dataset,
    compute_metrics=None, #compute_metrics,
    tokenizer=tokenizer,
    data_collator=None,
)
predictions = trainer.predict(eval_dataset, metric_key_prefix="predict").predictions



## Predictions

Column 0 corresponds to `no`.  Column 1 corresponds to `yes`, as in the `label_list` variable.

In [6]:
pd.DataFrame.from_records(predictions[0:5,:])

Unnamed: 0,0,1
0,-5.019741,4.272789
1,4.270126,-3.877274
2,4.284977,-3.864507
3,-2.675221,2.365314
4,-2.605473,2.228158


In [7]:
eval_preds = postprocessor.process(eval_examples, eval_dataset, predictions)
eval_preds_ds = create_dataset_from_json_str(eval_preds.predictions, False)
print(eval_preds_ds)

in process
Dataset({
    features: ['example_id', 'cls_score', 'start_logit', 'end_logit', 'span_answer', 'span_answer_score', 'start_index', 'end_index', 'passage_index', 'target_type_logits', 'span_answer_text', 'yes_no_answer', 'start_stdev', 'end_stdev', 'query_passage_similarity', 'normalized_span_answer_score', 'confidence_score', 'question', 'language', 'passage_answer_text', 'order', 'rank', 'question_type_pred', 'question_type_scores', 'question_type_conf', 'boolean_answer_pred', 'boolean_answer_scores', 'boolean_answer_conf'],
    num_rows: 113
})


## Questions and answers

Here we display some questions that have been identified as boolean, and their predicted answers, based on the system output of the MRC system.  A weakness in the TydiQA dataset is that most (85%) of the boolean questions have an answer of `yes` - apparently the question writers wrote questions seeking confirmations of what they already knew or suspected.  We display the `passage_answer_text` that was automatically extracted by the upstream MRC system.


In [8]:
from datasets import ClassLabel, Sequence
from numpy.random import permutation
import random
import pandas as pd
from IPython.display import display, HTML

# Based on https://github.com/huggingface/notebooks/blob/main/examples/question_answering.ipynb
def show_balanced_examples(dataset, perm, groups, nrows, maxchars, cols):
    df = pd.DataFrame(dataset)
    dfp = df.iloc[perm] # shuffle
    dfg = dfp.groupby(groups)
    df_todisplay = dfg.head(nrows)[cols]
    if 'passage_answer_text' in cols:
        df_todisplay['passage_answer_text'] = df_todisplay['passage_answer_text'].str.slice(0,maxchars) + '...'
    display(HTML(df_todisplay.to_html()))
    
    

english_boolean_eval_examples = eval_preds_ds.filter(lambda x:x['language']=='english' and x['question_type_pred']=='boolean')
random_idxs = permutation(len(english_boolean_eval_examples))
cols=['example_id','question','passage_answer_text', 'boolean_answer_pred', 'boolean_answer_scores']
show_balanced_examples(english_boolean_eval_examples, random_idxs, 'boolean_answer_pred', 5, 300, cols)


  0%|          | 0/1 [00:00<?, ?ba/s]

Unnamed: 0,example_id,question,passage_answer_text,boolean_answer_pred,boolean_answer_scores
80,e021bf05-5d39-4781-96a3-e4aded51f940,Is popcorn lung linked to vaping?,"Bronchiolitis obliterans (BO), also known as popcorn lung and constrictive bronchiolitis, is a disease that results in obstruction of the smallest airways of the lungs (bronchioles) due to inflammation.[1] Symptoms include a dry cough, shortness of breath, wheezing and feeling tired.[1] These sympto...",no,"{'no': 4.307797431945801, 'yes': -4.1906046867370605}"
4,55facded-804e-4728-a3d4-c5d098daf5ab,Does the Magellanic Cloud system have a super massive black hole?,"The Large Magellanic Cloud and its neighbour and relative, the Small Magellanic Cloud, are conspicuous objects in the southern hemisphere, looking like separated pieces of the Milky Way to the naked eye. Roughly 21° apart in the night sky, the true distance between them is roughly 75,000 light-years...",yes,"{'no': -2.6054728031158447, 'yes': 2.2281577587127686}"
40,ecd427f1-ef0b-41e2-b259-d0dd2c48f4e6,How do you tell if you have an addictive personality?,"An addictive personality refers to a particular set of personality traits that make an individual predisposed to developing addictions.[1] This hypothesis states that there may be common personality traits observable in people suffering from addiction. Alan R. Lang of Florida State University, autho...",yes,"{'no': -4.374283790588379, 'yes': 3.593200922012329}"
69,58dfd53c-26cb-4148-baf0-395d177626bc,Do the Aborigines have a verbal culture?,"The Yugambeh (Yugambeh: Miban) are a group of Australian Aboriginal clans whose ancestors all spoke one or more dialects of the Yugambeh language. Their traditional lands are located in south-east Queensland and north-east New South Wales, now within the Logan City, Gold Coast, Scenic Rim, and Tweed...",yes,"{'no': -5.008730888366699, 'yes': 4.264303207397461}"
10,9eef8d9b-536c-482b-bc0c-49051373cb59,Is Cantonese written the same as Mandarin?,"Written Cantonese is the written form of Cantonese, the most complete written form of Chinese after that for Mandarin Chinese and Classical Chinese. Written Chinese was originally developed for Classical Chinese, and was the main literary language of China until the 19th century. Written vernacular...",no,"{'no': 4.223395824432373, 'yes': -3.887373924255371}"
45,375ed26c-a5d5-413e-b672-148010d3eb23,Does the KGB still exist?,"The agency was a military service governed by army laws and regulations, in the same fashion as the Soviet Army or MVD Internal Troops. While most of the KGB archives remain classified, two online documentary sources are available.[1][2] Its main functions were foreign intelligence, counter-intellig...",yes,"{'no': -4.376760482788086, 'yes': 3.681140661239624}"
70,44091813-f673-47b1-902f-6a96557221a7,Can the central nervous system heal itself?,"Nervous system injuries affect over 90,000 people every year.[2] It is estimated that spinal cord injuries alone affect 10,000 each year.[3] As a result of this high incidence of neurological injuries, nerve regeneration and repair, a subfield of neural tissue engineering, is becoming a rapidly grow...",yes,"{'no': -3.8311636447906494, 'yes': 3.0920374393463135}"
11,36af5968-9d7a-4139-a678-531f205db4d3,Is Hungarian a romance language?,"Additionally, the letter pairs ⟨ny⟩, ⟨ty⟩, and ⟨gy⟩ represent the palatal consonants /ɲ/, /c/, and /ɟ/ (a little like the ""d+y"" sounds in British ""du</i>ke"" or American ""woul<i data-parsoid='{""dsr"":[64312,64319,2,2]}'>d y</i>ou"")—a bit like saying ""d"" with the tongue pointing to the palate....",no,"{'no': 4.077722549438477, 'yes': -3.616605043411255}"
98,319b8e52-33bc-4f00-8846-06d6657aba96,Are any BBS servers still running today?,"The introduction of inexpensive dial-up internet service and the Mosaic web browser offered ease of use and global access that BBS and online systems did not provide, and led to a rapid crash in the market starting in 1994. Over the next year, many of the leading BBS software providers went bankrupt...",no,"{'no': 3.8056693077087402, 'yes': -3.3668763637542725}"
36,0e86e43c-9892-4a3a-aef0-bf75f539fe30,Are Urdu and Hindu the same?,"The Hindi–Urdu controversy is an ongoing dispute—dating back to the 19th century—regarding the status of Hindi and Urdu as a single language, Hindustani (lit ""of Hindustan""), or as two dialects of a single language, and the establishment of a single standard language in certain areas of North India....",no,"{'no': 4.417458534240723, 'yes': -3.884164571762085}"
