# Extractive Question Answering

In a [previous blog post](https://stefanbschneider.github.io/blog/posts/question-answering-huggingface/), I showed how answer document-related questions with [HuggingFace](https://huggingface.co/) LLMs in just a few lines of Python code and visualize them as simple [Gradio App](https://www.gradio.app/).

In that blog post, I used the standard question-answering pipeline from HuggingFace.
This pipeline defaults to a DistilBERT model fine-tuned on the Stanford Question Answering Dataset (SQuAD).
[This model](https://huggingface.co/distilbert/distilbert-base-cased-distilled-squad) does *extractive* question answering as illustrated in the following example:

In [None]:
%%capture --no-display
pip install -U pypdf torch transformers

In [7]:
from transformers import pipeline

extractive_qa = pipeline(task="question-answering")

# Abstract from "Attention is all you need" by Vaswani et al.: https://arxiv.org/abs/1706.03762
abstract = """The dominant sequence transduction models are based on complex recurrent or
convolutional neural networks that include an encoder and a decoder. The best
performing models also connect the encoder and decoder through an attention
mechanism. We propose a new simple network architecture, the Transformer,
based solely on attention mechanisms, dispensing with recurrence and convolutions
entirely. Experiments on two machine translation tasks show these models to
be superior in quality while being more parallelizable and requiring significantly
less time to train. Our model achieves 28.4 BLEU on the WMT 2014 Englishto-German translation task...
"""

extractive_qa("What's a transformer'?", abstract)

No model was supplied, defaulted to distilbert/distilbert-base-cased-distilled-squad and revision 564e9b5 (https://huggingface.co/distilbert/distilbert-base-cased-distilled-squad).
Using a pipeline without specifying a model name and revision in production is not recommended.
Device set to use mps:0


{'score': 0.4559027850627899,
 'start': 287,
 'end': 302,
 'answer': 'the Transformer'}

The pipeline is given a text as input, here parts of the "Attention is all you need" abstract (see [arxiv](https://arxiv.org/abs/1706.03762)),
and a question that should be answered based on the given text/context.

Rather than an answer in natural language, the model outputs an excerpt that is extraced from the original context, given by a start- and end-index within.
While this allows concise answers with clear reference to the original source, the answers are not very natural or accurate.
The model has no way of combining and merging information from different places of the original text.

In the example above, I asked what a transformer is and the model simply answered "the Transformer". Not very helpful!


# Extractive vs. Generative/Abstractive Question Answering

# Question Answering with HuggingFace

We can read the text of PDF document with `pypdf`. As an example, I'm using the author version of a [paper](https://ieeexplore.ieee.org/document/9789886) I wrote on [`mobile-env`](https://github.com/stefanbschneider/mobile-env). 

Now we can create a question answering pipeline using HuggingFace, loading a pre-trained model. Then we can ask some questions, providing the PDF text as context.

In [7]:
from transformers import pipeline

question_answerer = pipeline(task="question-answering", model="deepset/tinyroberta-squad2")

Device set to use mps:0


In [8]:
question_answerer("What is mobile-env?", pdf_text)



{'score': 0.9887111186981201,
 'start': 16488,
 'end': 16505,
 'answer': 'GitHub repository'}

In [9]:
question_answerer("What programming language is mobile-env written in?", pdf_text)

{'score': 0.9665615558624268, 'start': 3552, 'end': 3558, 'answer': 'Python'}

In [10]:
question_answerer("What is the main difference between mobile-env and other simulators?", pdf_text)

{'score': 0.6506955027580261,
 'start': 12539,
 'end': 12570,
 'answer': 'more ﬂexible, better documented'}

The pipeline returns a dict, where the answer is a quote from the given context, here the PDF document. This is called *extractive* question answering.

It also provides a score indicating the model's confindence in the answer and the start/end index from where the answer is quoted. 

That's it! Let's see how we can build a simple app on top of this.

# Building an App with Gradio

[Gradio](https://www.gradio.app/) allows building simple apps tailored for machine learning use cases.
You can define the inputs, a function to where to pass these inputs, and how to display the functions outputs.

Here, our inputs are the PDF document and the question.
The function loads the document and passes the question and text to the pre-trained model.
It then outputs the models answer to the user.

In [13]:
import gradio as gr

def answer_doc_question(pdf_file, question):
    pdf_text = get_text_from_pdf(pdf_file)
    answer = question_answerer(question, pdf_text)
    return answer["answer"]

# Add default a file and question, so it's easy to try out the app.
pdf_input = gr.File(
    value="https://ris.uni-paderborn.de/download/30236/30237/author_version.pdf",
    file_types=[".pdf"],
    label="Upload a PDF document and ask a question about it.",
)
question = gr.Textbox(
    value="What is mobile-env?",
    label="Type a question regarding the uploaded document here.",
)
gr.Interface(
    fn=answer_doc_question, inputs=[pdf_input, question], outputs="text"
).launch()

Running on local URL:  http://127.0.0.1:7862

To create a public link, set `share=True` in `launch()`.






If you run this locally, you should see a rendered app based on the question answering pipeline we built above!

# Deploying the app in HuggingFace Spaces

You can easily host the app on [HuggingFace Spaces](https://huggingface.co/spaces), which provide free (and slow) hosting (or fast paid hosting).

You simply create a new space under your account and add an `app.py`, which contains all code above. The requirements go into a `requirements.txt`. That's it!

This is the app we built here: [https://huggingface.co/spaces/stefanbschneider/pdf-question-answering](https://huggingface.co/spaces/stefanbschneider/pdf-question-answering)

<script
	type="module"
	src="https://gradio.s3-us-west-2.amazonaws.com/4.17.0/gradio.js"
></script>

<gradio-app src="https://stefanbschneider-pdf-question-answering.hf.space"></gradio-app>


# What's Next?

* [Read about the underlying transformer architecture powering most LLMs](https://stefanbschneider.github.io/blog/posts/understanding-transformers-attention/)
* Improve the quality of the question answering app. Some ideas:
    * Fine-tune the pre-trained model on a domain dataset, eg, [Arxiv Q&A](https://huggingface.co/datasets/taesiri/arxiv_qa)
    * [Domain adaptation by fine-tuning a masked model directly on the document](https://huggingface.co/learn/nlp-course/en/chapter7/3)
    * Using the [document-question-answering pipeline on HuggingFace](https://huggingface.co/tasks/document-question-answering)
    * Trying a model that supports *generative* question answering

