# Question Answering with PyTorch Transformers: Part 1

Article for this notebook: https://medium.com/@patonw/question-answering-with-pytorch-transformers-part-1-8736196bf20e

> In the first part of this series we’ll look at the problem of question answering and the SQUAD datasets. Then we’ll see how the Transformers’ pipeline API allows us to easily use pre-trained models to answer questions.
>
> In later parts we’ll explore how to build systems around it that are generally useful to the average person. There have been many academic articles written on the topics we’ll explore. However, I want to focus on the engineering aspects and demonstrate how simple it is to build useful systems by leveraging a handful of high quality open-source libraries.

Skip ahead to Part 2 & 3 for the meat.

This is just an exploration of the SQUAD2.0 dataset.

In [1]:
!/bin/bash -c "[[ ! -f train-v2.0.json ]] && wget https://rajpurkar.github.io/SQuAD-explorer/dataset/train-v2.0.json"

In [2]:
import os
import random
import pandas as pd
import json

In [3]:
with open("train-v2.0.json") as f:
    doc = json.load(f)
doc.keys(), type(doc["data"]), len(doc["data"])

(dict_keys(['version', 'data']), list, 442)

In [8]:
doc["data"][0].keys(), doc["data"][0]["title"]

(dict_keys(['title', 'paragraphs']), 'Beyoncé')

In [14]:
len(doc["data"][0]["paragraphs"]), doc["data"][0]["paragraphs"][0].keys(), len(doc["data"][0]["paragraphs"][0]["qas"])

(66, dict_keys(['qas', 'context']), 15)

In [11]:
doc["data"][0]["paragraphs"][0]["context"]

'Beyoncé Giselle Knowles-Carter (/biːˈjɒnseɪ/ bee-YON-say) (born September 4, 1981) is an American singer, songwriter, record producer and actress. Born and raised in Houston, Texas, she performed in various singing and dancing competitions as a child, and rose to fame in the late 1990s as lead singer of R&B girl-group Destiny\'s Child. Managed by her father, Mathew Knowles, the group became one of the world\'s best-selling girl groups of all time. Their hiatus saw the release of Beyoncé\'s debut album, Dangerously in Love (2003), which established her as a solo artist worldwide, earned five Grammy Awards and featured the Billboard Hot 100 number-one singles "Crazy in Love" and "Baby Boy".'

In [15]:
doc["data"][0]["paragraphs"][0]["qas"][0]

{'question': 'When did Beyonce start becoming popular?',
 'id': '56be85543aeaaa14008c9063',
 'answers': [{'text': 'in the late 1990s', 'answer_start': 269}],
 'is_impossible': False}

In [21]:
paragraphs = []
questions = []
impossible = []
for topic in doc["data"]:
    for pgraph in topic["paragraphs"]:
        paragraphs.append(pgraph["context"])
        for qa in pgraph["qas"]:
            if not qa["is_impossible"]:
                questions.append((qa["question"], pgraph["context"]))
            else:
                impossible.append((qa["question"], pgraph["context"]))
        
len(paragraphs), len(questions), len(impossible)

(19035, 86821, 43498)

In [22]:
random.sample(paragraphs, 2)

["Baptists are individuals who comprise a group of Christian denominations and churches that subscribe to a doctrine that baptism should be performed only for professing believers (believer's baptism, as opposed to infant baptism), and that it must be done by complete immersion (as opposed to affusion or sprinkling). Other tenets of Baptist churches include soul competency (liberty), salvation through faith alone, Scripture alone as the rule of faith and practice, and the autonomy of the local congregation. Baptists recognize two ministerial offices, elders and deacons. Baptist churches are widely considered to be Protestant churches, though some Baptists disavow this identity.",
 'In 1913, Elmer McCollum discovered the first vitamins, fat-soluble vitamin A, and water-soluble vitamin B (in 1915; now known to be a complex of several water-soluble vitamins) and named vitamin C as the then-unknown substance preventing scurvy. Lafayette Mendel and Thomas Osborne also performed pioneering w

In [23]:
random.sample(questions, 5)

[('What social group grew as a result of the industrial revolution? ',
  'The origins of the department store lay in the growth of the conspicuous consumer society at the turn of the 19th century. As the Industrial Revolution accelerated economy expansion, the affluent middle-class grew in size and wealth. This urbanized social group, sharing a culture of consumption and changing fashion, was the catalyst for the retail revolution. As rising prosperity and social mobility increased the number of people, especially women (who found they could shop unaccompanied at department stores without damaging their reputation), with disposable income in the late Georgian period, window shopping was transformed into a leisure activity and entrepreneurs, like the potter Josiah Wedgwood, pioneered the use of marketing techniques to influence the prevailing tastes and preferences of society.'),
 ('Around when did the United States and United Kingdom began to develop different versions of cultural stud

In [24]:
random.sample(impossible, 5)

[('Who fought against the strength and awareness of the Orthodox Jewry?',
  'In reaction to the emergence of Reform Judaism, a group of traditionalist German Jews emerged in support of some of the values of the Haskalah, but also wanted to defend the classic, traditional interpretation of Jewish law and tradition. This group was led by those who opposed the establishment of a new temple in Hamburg , as reflected in the booklet "Ele Divrei HaBerit". As a group of Reform Rabbis convened in Braunschweig, Rabbi Jacob Ettlinger of Altona published a manifesto entitled "Shlomei Emunei Yisrael" in German and Hebrew, having 177 Rabbis sign on. At this time the first Orthodox Jewish periodical, "Der Treue Zions Waechter", was launched with the Hebrew supplement "Shomer Zion HaNe\'eman" [1845 - 1855]. In later years it was Rav Ettlinger\'s students Rabbi Samson Raphael Hirsch and Rabbi Azriel Hildesheimer of Berlin who deepened the awareness and strength of Orthodox Jewry. Rabbi Samson Raphael H