# TydiQA - support for boolean questions

Here we assume that you have used `run_mrc.py` with the `--do_boolean` option to decode the TydiQA dataset with full support for boolean questions.  See top-level README.md. There are four stages in the process:

1. MRC (machine reading comprehension) - given a question and and answer, find a representative span that may contain a short answer.  This is analyzed in detail in the `tydiqa.ipynb`
2. QTC (question type classifier) - given the question, decide if it is `boolean` or `short_answer`
3. EVC (evidence classifier) - given a question and a short answer span, decide the short answer span supports `yes` or `no`.  This is analyzed in more detail in `evc.ipynb`.
4. Score normalization - span scores may have different dynamic ranges according as whether the question is `boolean` or `short_anwer`.  Normalize them uniformally to $[0,1]$

In this notebook, we will show what happened internally in each step of the operation by looking at intermediate files from the experiment.

# Intermediate files

We will load some output/intermediate files from a recent experiment

In [1]:
base='/dccstor/jsmc-nmt-01/bool/expts/toolkit/b/b21'
mrc_file=f'{base}/mrc/eval_predictions.json'
qtc_file=f'{base}/qtc/eval_predictions.json'
evc_file=f'{base}/evc/eval_predictions.json'
out_file=f'{base}/eval_predictions_merge.json'

# Display helper

These file have many fields - to display them better we use a helper routine to convert to dataframes.

In [2]:
from examples.boolqa.mrc2dataset  import create_dataset_from_run_mrc_output

from datasets import ClassLabel, Sequence
from numpy.random import permutation
import pandas as pd
from IPython.display import display, HTML

# Based on https://github.com/huggingface/notebooks/blob/main/examples/question_answering.ipynb
def show_balanced_examples(dataset, perm, groups, nrows, maxchars, cols):
    df = pd.DataFrame(dataset)
    dfp = df.iloc[perm] # shuffle
    dfg = dfp.groupby(groups)
    df_todisplay = dfg.head(nrows)[cols]
    if 'passage_answer_text' in cols:
        df_todisplay['passage_answer_text'] = df_todisplay['passage_answer_text'].str.slice(0,maxchars) + '...'
    display(HTML(df_todisplay.to_html()))

# Samples of MRC output

Here we show `question`'s and predicted answer `span_answer_text` for the random examples.  This is at the initial stage of question answering - a purely extractive system.  The confidence in the span answer is given by `span_answer_score`, which is a function of `start_logit`, `end_logit`, and `target_type_logits`.

In [3]:
eval_examples=create_dataset_from_run_mrc_output(mrc_file, unpack=False)
random_idxs = permutation(len(eval_examples))

cols=['example_id','question','span_answer_text','language', 'span_answer_score', 'start_logit', 'end_logit', 'target_type_logits']
show_balanced_examples(eval_examples, random_idxs, 'language', 1, 100, cols)

Unnamed: 0,example_id,question,span_answer_text,language,span_answer_score,start_logit,end_logit,target_type_logits
17565,2663bd8c-f3f3-42c7-824e-f7b29c1ac899,ఆర్మేనియా దేశంలో అతిపెద్ద ఆర్ట్ మ్యూజియం ఏది?,అర్మేనియన్ ఎస్ఎస్ఆర్ యొక్క మ్యూజియం ఆఫ్ ఆర్ట్,telugu,0.780762,1.149414,1.539062,"[3.9140625, 3.685546875, 0.86474609375, -5.7109375, -6.61328125]"
16185,35ff8439-ae6c-4d78-8eb0-fc056a259818,How many people work for the British East India Company?,90,english,3.193359,2.351562,2.716797,"[4.21484375, 5.390625, 0.1146240234375, -6.12890625, -7.0234375]"
2911,30c1c548-8b09-41ff-8f6c-b55d4c312f8a,Mihin kieleen hollannin kieli pohjautuu?,Saksa,finnish,4.063965,2.615234,2.996094,"[2.328125, 4.67578125, 2.021484375, -5.2109375, -6.18359375]"
6524,b494e3d0-8e9a-494f-a66c-4fbff541244d,Apa faktor utama Misogini terjadi?,"laki-laki sebagai pemegang kekuasaan utama dan mendominasi dalam peran kepemimpinan politik, otoritas moral, hak sosial dan penguasaan properti",indonesian,-3.078247,-1.200195,-0.358643,"[6.15234375, 3.23828125, -1.236328125, -6.36328125, -6.9296875]"
530,1c7400e5-03a8-438a-b9dc-b69d753b0d8d,"Je,Johann Bayer alikuwa na mke?",Mjerumani Johann Bayer alibuni mfumo wa kutaja nyota ambao kimsingi unaendelea kutumiwa hadi leo[1].,swahili,-5.178467,-5.015625,-0.54834,"[5.97265625, 2.16015625, -1.9482421875, -5.546875, -5.8515625]"
4373,aca5c5e9-8ec7-4c5c-a2bd-7c31a045a5d2,อักษรย่อของ โรงเรียนราชวิทยาลัย คืออะไร?,ภ.ป.ร,thai,3.794922,2.5625,3.171875,"[2.892578125, 4.70703125, 1.771484375, -5.54296875, -6.59765625]"
17166,28daee78-2ded-4b5d-8ea4-163828d2e72b,ডালিয়া গ্রিবাউস্কাইটে কোন রাজনৈতিক দলের নেতা ছিলেন ?,কনজারভেটিভ পার্টি,bengali,4.260254,2.689453,2.818359,"[2.80078125, 4.9921875, 1.8212890625, -5.578125, -6.59375]"
14858,4ec268c2-b81a-4a44-a149-8eacf1358dd2,Где находится Центральный аппарат Министе́рства энергетики и электрификации СССР?,"Москва, 25 Октября, д. 17",russian,11.62793,4.953125,4.945312,"[0.1588134765625, 6.8984375, -0.376708984375, -2.958984375, -3.59765625]"
1866,4c259304-c44a-4607-8f1e-19e653bd8ae4,ما هي البطالة الإحتكاكية؟,نوع من أنواع البطالة المؤقتة (لفترة زمنية قصيرة).,arabic,11.148438,4.453125,3.945312,"[0.1854248046875, 6.91015625, -0.320556640625, -3.013671875, -3.595703125]"
7464,5b24d244-aec1-46fd-a9c5-8e93cdd71cb6,2019년 까지 노벨상을 받은 과학자는 몇 명인가?,2018년 시점에서 수상자가 없음,korean,-0.830688,0.426514,1.8125,"[4.91796875, 3.201171875, -0.4189453125, -5.875, -6.5703125]"


# Samples of QTC output

Three fields have been added: `question_type_pred` which is `YN` if the question is a boolean question, and `NONE` if the question is not boolean - typically factoid in this dataset.
The other field `question_type_scores` contains the classifier scores (logits) for each class.  The final field, `question_type_conf` contains score of the selected class.

In [4]:
eval_examples=create_dataset_from_run_mrc_output(qtc_file, unpack=False)
english_eval_examples = eval_examples.filter(lambda x:x['language']=='english')
random_idxs = permutation(len(english_eval_examples))
cols=['example_id','question','question_type_pred', 'question_type_scores']
show_balanced_examples(english_eval_examples, random_idxs, 'question_type_pred', 5, 100, cols)

  0%|          | 0/19 [00:00<?, ?ba/s]

Unnamed: 0,example_id,question,question_type_pred,question_type_scores
986,dc4db7c8-9df6-4ab3-b154-bad9be194507,What is the largest recorded tsunami?,short_answer,"{'boolean': -2.997452974319458, 'short_answer': 3.781531810760498}"
733,233a6264-edd4-4e76-a497-1be970da22bb,Is Pippin the Hunchback dead?,boolean,"{'boolean': 3.4151370525360107, 'short_answer': -4.333227634429932}"
939,d2ae656b-dcda-49dd-986c-429a65bee44d,When was the surrender of Japan signed?,short_answer,"{'boolean': -2.997558116912842, 'short_answer': 3.780714511871338}"
592,3c915c7a-0d44-4cdd-b39e-ff9060af1734,How much room does it take to keep a horse?,short_answer,"{'boolean': -2.9940197467803955, 'short_answer': 3.7767441272735596}"
372,806890de-5814-4e2c-98c2-df8c37a47482,How many copies of Blood Omen: Legacy of Kain were sold?,short_answer,"{'boolean': -2.995069742202759, 'short_answer': 3.7774362564086914}"
404,9db3a975-84ec-4c1b-8c56-c8941fcf5c10,What is the density of a diamond?,short_answer,"{'boolean': -2.997520923614502, 'short_answer': 3.7809488773345947}"
570,c08a47e2-f943-41e3-ba27-9a274e4268d0,Did Austria ever become part of Prussia?,boolean,"{'boolean': 3.4198384284973145, 'short_answer': -4.328428268432617}"
175,3fb65db7-c0b3-47be-87be-112df1b32cb7,Is there tomato in mutter paneer?,boolean,"{'boolean': 3.4148740768432617, 'short_answer': -4.333624362945557}"
65,8b9b5c68-4e76-43ef-bd7c-e2e34b051178,Is there a championship for stock car racing?,boolean,"{'boolean': 3.4153878688812256, 'short_answer': -4.333537578582764}"
480,70b2ef92-450b-4074-978f-01204487ea27,Does the KGB still exist?,boolean,"{'boolean': 3.4159293174743652, 'short_answer': -4.333240509033203}"


# Samples of EVC output 
As above this classifier adds three new fields.  `boolean_answer_pred` is `True` if the predicted answer to a boolean question is positive/yes, `False` if the answer is negative/no, and `NONE` if there is no support for either answer in the context.  `boolean_answer_scores` provides the scores (logits) of each class, and `boolean_answer_conf` is the score of the selected class.
We select the English questions from the dev set (they are not scored by tydi_eval.py), which have a higher fraction of boolean questions.  The boolean questions in the tydi dataset are overwhelmingly biased towards having a `yes` rather than a `no`  as the answer.  We suspect that the question writers were attempting to confirm existing knowledge.
Note that the answer classifier runs even on the short answer questions in order to simplify merging - a real deployed system would run the answer classifier only on questions that are predicted to be boolean.

In [5]:
eval_examples=create_dataset_from_run_mrc_output(evc_file, unpack=False)
english_boolean_eval_examples = eval_examples.filter(lambda x:x['language']=='english' and x['question_type_pred']=='boolean')
random_idxs = permutation(len(english_boolean_eval_examples))
cols=['example_id','question','passage_answer_text', 'boolean_answer_pred', 'boolean_answer_scores']
show_balanced_examples(english_boolean_eval_examples, random_idxs, 'boolean_answer_pred', 5, 300, cols)

  0%|          | 0/19 [00:00<?, ?ba/s]

Unnamed: 0,example_id,question,passage_answer_text,boolean_answer_pred,boolean_answer_scores
57,66c22597-8287-4fe5-8358-767bc4d936d4,Does Donna Troy have any superpowers?,"In her pre-Crisis origin, Donna was granted those powers by the Amazon's Purple Ray, and these powers increased as she grew older. She also wielded a lasso of her own, but it apparently had no magical properties like Diana's Lasso of Truth, aside from being infinite in length and virtually indestruc...",yes,"{'no': -5.461025714874268, 'no_answer': -1.1184296607971191, 'yes': 6.664615631103516}"
10,41471b45-8b0f-4a83-a91d-19f388a544cb,Is Cantonese written the same as Mandarin?,"Written Cantonese is the written form of Cantonese, the most complete written form of Chinese after that for Mandarin Chinese and Classical Chinese. Written Chinese was originally developed for Classical Chinese, and was the main literary language of China until the 19th century. Written vernacular...",no,"{'no': 4.523262023925781, 'no_answer': -5.28380012512207, 'yes': 0.39295539259910583}"
14,0323472e-56b8-4385-ad74-2c2786ca87f5,Has Romania been an any major war?,The Romanian United Principalities did not participate in any wars....,no,"{'no': 6.055264949798584, 'no_answer': -3.06253981590271, 'yes': -2.9737067222595215}"
23,c98ff20e-4e9f-4cc0-ae0e-4414cce3742a,Can you eat a black wildebeest?,"Wildebeest provide several useful animal products. The hide makes good-quality leather and the flesh is coarse, dry and rather hard.[13] Wildebeest are killed for food, especially to make biltong in Southern Africa. This dried game meat is a delicacy and an important food item in Africa.[25] The mea...",yes,"{'no': -5.193129539489746, 'no_answer': -1.4364780187606812, 'yes': 6.304743766784668}"
85,2030a582-ebd2-45bf-9690-fe46007c71ee,Does Frankfurt have a regional dish?,"Handkäse (pronounced[ˈhantkɛːzə]; literally: ""hand cheese"") is a German regional sour milk cheese (similar to Harzer) and is a culinary speciality of Frankfurt am Main, Offenbach am Main, Darmstadt, Langen, and other parts of southern Hesse. It gets its name from the traditional way of producing it:...",yes,"{'no': -5.1389312744140625, 'no_answer': -1.3864237070083618, 'yes': 6.278087615966797}"
12,bf0d744a-caf5-4df1-83ee-9de58ccdb071,Does Japan get snow?,"The most recent record snows were brought by the blizzards of December 2005–February 2006, when well over 3m (4.5m in one part of Aomori Prefecture) of snow accumulated in many rural areas, and anywhere from 46cm (Tottori) to nearly 1.5m (Aomori) piled up even in several major cities....",yes,"{'no': -5.7697601318359375, 'no_answer': -0.8454253077507019, 'yes': 5.994254112243652}"
71,3010ea68-96d7-4893-95ab-5f05c566071c,Did the Tudor family participate in the War of Roses?,"The Wars of the Roses were a series of English civil wars for control of the throne of England fought between supporters of two rival branches of the royal House of Plantagenet: the House of Lancaster, associated with a red rose, and the House of York, whose symbol was a white rose. Eventually, the ...",yes,"{'no': -4.764363765716553, 'no_answer': 4.752935409545898, 'yes': 0.5020203590393066}"
27,652fd953-2522-45de-af19-7ec8f2a98183,Is St Joseph's a private school?,The Toronto Catholic District School Board bought the College School property from the Sisters of St. Joseph in December 2007....,no,"{'no': 4.562924861907959, 'no_answer': -0.22298750281333923, 'yes': -4.307789325714111}"
53,94276bb1-d708-4abc-8da8-616877e9996a,Is there a cure for juvenile rheumatoid arthritis?,"JIA is best treated by a multidisciplinary team. The major emphasis of treatment for JIA is to help the child regain normal level of physical and social activities. This is accomplished with the use of physical therapy, pain management strategies, and social support.[28] Another emphasis of treatmen...",no,"{'no': 5.612574577331543, 'no_answer': -0.8928354382514954, 'yes': -3.8825314044952393}"
50,282e7d03-4fcc-4a26-b8cf-6a0def4c10e4,Was Spain part of the Allied Forces?,"The Spanish State under Francisco Franco did not officially join the Axis Powers during World War II, although Franco wrote to Hitler offering to join the war on 19 June 1940. Franco's regime supplied Germany with the Blue Division to fight specifically on the Eastern Front against the Soviet Union,...",no,"{'no': 5.607361793518066, 'no_answer': -1.9422791004180908, 'yes': -3.5038347244262695}"


# Final output

The final output file is in a format suitable for the tydi evalutation script and contains no textual information.  The `confidence_score` is normalized to `[0,1]` by the score normalizer based the confidence score of the original mrc output, and the prediction of the question type classifier.

In [6]:
pd.read_json(out_file)

Unnamed: 0,example_id,start_position,end_position,passage_index,yes_no_answer,confidence_score
0,b9eba742-f264-4fec-92d0-9b8de5ad0bd7,986,1020,2,0,0.653616
1,a71fe9c2-e518-4923-bced-4fe99cc7ff44,371,388,1,0,0.020314
2,2dcec21a-60bb-4ee9-85bc-6066f1ac6de3,14805,14807,27,0,0.038392
3,84603ec2-1637-4b13-96f5-5ef701c74a12,5993,6010,12,0,0.308387
4,a32fd17a-736a-4299-a543-bc26d68fa8dc,1,22,0,0,0.197467
...,...,...,...,...,...,...
18665,2634d4e7-99a7-40ad-aafa-829dc1270e3d,2563,2586,5,0,0.659538
18666,36227a7f-f387-4d98-99b4-0ba522db3746,183,192,0,0,0.001075
18667,17c7494a-c30e-4c0b-a0ba-b4c0487edf05,1483,1585,4,0,0.193370
18668,1e37c005-6d6b-436f-a97c-8dabc2c67704,723,731,3,0,0.004098
