# **Question Answering**

Extractive question answering is the task of extracting an answer from a text to a given question. In question answering, text summarization methods are used to find answers to user questions in documents [[1]](#scrollTo=5aLTXh5Sa1bC).

This notebook shows an example of extractive question answering with the SQuAD dataset and the DistilBERT transformers model.

## **Question answering with SQuAD dataset**

The Stanford Question Answering Dataset (SQuAD) is a question answering dataset consisting of 100,000+ questions about a set of Wikipedia articles, where the answer to each question is a segment of text from the corresponding reading passage [[2]](https://nlp.stanford.edu/pubs/rajpurkar2016squad.pdf). The dataset is freely available at [[3]](https://stanford-qa.com).

As a transformer model, we use DistilBERT which is smaller and faster than BERT [[4]](https://huggingface.co/distilbert-base-uncased).

For question answering, we will apply the following steps:
* Install the ``transformers`` library
* Import the ``pipeline`` class from the ``transformers`` library
* Create a question answering pipeline with the ``distilbert-base-uncased-distilled-squad`` transformer model
* Create a sample text
* Apply the question answering model on the given text






### Install ``transformers``

To use a transformer model in our task, we have to install the ``transformers`` library first.

In [1]:
!pip install transformers

Looking in indexes: https://pypi.org/simple, https://us-python.pkg.dev/colab-wheels/public/simple/
Collecting transformers
  Downloading transformers-4.20.1-py3-none-any.whl (4.4 MB)
[K     |████████████████████████████████| 4.4 MB 30.8 MB/s 
[?25hCollecting pyyaml>=5.1
  Downloading PyYAML-6.0-cp37-cp37m-manylinux_2_5_x86_64.manylinux1_x86_64.manylinux_2_12_x86_64.manylinux2010_x86_64.whl (596 kB)
[K     |████████████████████████████████| 596 kB 61.2 MB/s 
Collecting tokenizers!=0.11.3,<0.13,>=0.11.1
  Downloading tokenizers-0.12.1-cp37-cp37m-manylinux_2_12_x86_64.manylinux2010_x86_64.whl (6.6 MB)
[K     |████████████████████████████████| 6.6 MB 477 kB/s 
Collecting huggingface-hub<1.0,>=0.1.0
  Downloading huggingface_hub-0.8.1-py3-none-any.whl (101 kB)
[K     |████████████████████████████████| 101 kB 13.0 MB/s 
Installing collected packages: pyyaml, tokenizers, huggingface-hub, transformers
  Attempting uninstall: pyyaml
    Found existing installation: PyYAML 3.13
    Uninstal

### Import ``pipeline``

The ``pipeline`` class is used to create all available transformer pipelines. We import it to create a question answering pipeline.


In [2]:
# Import pipeline from transformers library
from transformers import pipeline

### Create question answering pipeline

We create a question answering pipeline by using the ``pipeline()`` function. For this, we have 2 options:


1.   Define the pipeline type: ``nlp = pipeline("question-answering")``
2.   Define the model name: ``nlp = pipeline(model="distilbert-base-uncased-distilled-squad")``

If we choose the first option, we do not specify a certain model name. Consequently, the ``pipeline()`` function will use a default transformer model for the selected NLP task.

However, as explained before, we want to use the specific transformer model  ``distilbert-base-uncased-distilled-squad`` which was fine-tuned on SQuAD v1.1. Therefore, we will apply option 2 and define the model name.

In [3]:
# Create a question answering pipeline with the "distilbert-base-uncased-distilled-squad" transformer model
nlp = pipeline(model="distilbert-base-uncased-distilled-squad")

Downloading:   0%|          | 0.00/5.71k [00:00<?, ?B/s]

Downloading:   0%|          | 0.00/451 [00:00<?, ?B/s]

Downloading:   0%|          | 0.00/253M [00:00<?, ?B/s]

Downloading:   0%|          | 0.00/28.0 [00:00<?, ?B/s]

Downloading:   0%|          | 0.00/226k [00:00<?, ?B/s]

Downloading:   0%|          | 0.00/455k [00:00<?, ?B/s]

### Create a sample text
Now we create a sample text for the question answering task.

In [4]:
# Create a sample text
context = r"""
Alan Mathison Turing, a British mathematician and computer scientist, was one of the
early pioneers of artificial intelligence. Turing (1950) describes the foundation of what
was later called the Turing test. The experimental setup of the Turing test is as follows.
A human interrogator uses a chat program to talk to two conversation partners: a chatbot
and another human being. Both of them try to convince the interrogator that they
are the human. If the interrogator is not able to identify the human through intense
questioning, the machine is considered to have passed the Turing test. According to
Turing, passing the test can lead to the conclusion that the machine’s intellectual
power is on a level comparable to the human brain. While the Turing test has often
been criticized because of its focus on functionality, the question of whether the
machine is conscious about its answers remains open. Several attempts have been
made to pass the Turing test, but it still remains an unresolved challenge.
"""

### Apply question answering model on sample text
Our model is ready for the question answering test. Below we ask two questions and print the answers along with the probability associated to the answer. The closer the probability score is to 1, the more confident the model is that the answer is correct.

#### Question 1
Print the answer of the following question:
* Who is Alan Turing?

In [5]:
# Define a question
result = nlp(question="Who is Alan Turing?", context=context)

# Print the answer
print(f"Answer: '{result['answer']}'\nProbability score: {round(result['score'], 4)}")

Answer: 'British mathematician and computer scientist'
Probability score: 0.576


#### Question 2
Print the answer of the following question:
* When did Alan Turing describe the turing test?

In [6]:
# Define a question
result = nlp(question="When did Alan turing describe the turing test?", context=context)

# Print the answer
print(f"Answer: '{result['answer']}'\nProbability score: {round(result['score'], 4)}")

Answer: '1950'
Probability score: 0.8964


# **References**

- [1] NLP and Computer Vision_DLMAINLPCV01 Course Book
- [2] https://nlp.stanford.edu/pubs/rajpurkar2016squad.pdf
- [3] https://stanford-qa.com
- [4] https://huggingface.co/distilbert-base-uncased

Copyright © 2022 IU International University of Applied Sciences