# Question Answering with PyTorch Transformers: Part 1

Article for this notebook: https://medium.com/@patonw/question-answering-with-pytorch-transformers-part-1-8736196bf20e

> In the first part of this series we’ll look at the problem of question answering and the SQUAD datasets. Then we’ll see how the Transformers’ pipeline API allows us to easily use pre-trained models to answer questions.
>
> In later parts we’ll explore how to build systems around it that are generally useful to the average person. There have been many academic articles written on the topics we’ll explore. However, I want to focus on the engineering aspects and demonstrate how simple it is to build useful systems by leveraging a handful of high quality open-source libraries.

Skip ahead to Part 2 & 3 for the meat.

This is just an exploration of the SQUAD2.0 dataset.

In [None]:
# Prepare to run in paperspace. You should manage these with pipenv or conda on your own machine.
# Run init_container from a Terminal window for debugging
# I'd rather not have the output filling up the screen here.
%run init_container.py

In [None]:
from qa.constants import *

In [None]:
import os
import random
import pandas as pd
import json

In [None]:
with open(SQUAD_TRAIN) as f:
    doc = json.load(f)
doc.keys(), type(doc["data"]), len(doc["data"])

In [None]:
doc["data"][0].keys(), doc["data"][0]["title"]

In [None]:
len(doc["data"][0]["paragraphs"]), doc["data"][0]["paragraphs"][0].keys(), len(doc["data"][0]["paragraphs"][0]["qas"])

In [None]:
doc["data"][0]["paragraphs"][0]["context"]

In [None]:
doc["data"][0]["paragraphs"][0]["qas"][0]

In [None]:
paragraphs = []
questions = []
impossible = []
for topic in doc["data"]:
    for pgraph in topic["paragraphs"]:
        paragraphs.append(pgraph["context"])
        for qa in pgraph["qas"]:
            if not qa["is_impossible"]:
                questions.append((qa["question"], pgraph["context"]))
            else:
                impossible.append((qa["question"], pgraph["context"]))
        
len(paragraphs), len(questions), len(impossible)

In [None]:
random.sample(paragraphs, 2)

In [None]:
random.sample(questions, 5)

In [None]:
random.sample(impossible, 5)