<a href="https://colab.research.google.com/github/rahiakela/transformers-research-and-practice/blob/main/transformers-for-natural-language-processing/11-question-answering/01_question_answering.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

##Question Answering

We will focus on using question-answering in an open environment where the questions were
not prepared beforehand. Transformer models require help from other NLP tasks and classical
programs. 

We will explore some methods to give an idea of how to combine tasks to reach the goal of a project:

* Method 0 explores a trial and error approach of asking questions randomly.
* Method 1 introduces NER to help prepare the question-answering tasks.
* Method 2 tries to help the default transformer with an ELECTRA transformer model. It
also introduces SRL to help the transformer prepare questions.

The introduction to these three methods shows that a single question-answering method will
not work for high-profile corporate projects. Adding NER and SRL will improve the linguistic
intelligence of a transformer agent solution.

##Setup

In [None]:
!pip -q install transformers[sentencepiece]

In [2]:
from transformers import pipeline

##Method 0: Trial and error

In [None]:
nlp_qa = pipeline("question-answering")

In [10]:
sequence = """The traffic began to slow down on Pioneer Boulevard in Los Angeles, making it difficult to get out of the city. However, WBGO was
playing some cool jazz, and the weather was cool, making it rather pleasant to be making it out of the city on this Friday afternoon. 
Nat King Cole was singing as Jo, and Maria slowly made their way out of LA and drove toward Barstow. They planned to get to Las Vegas early enough in the
evening to have a nice dinner and go see a show."""

nlp_qa(context=sequence, question="Where is Pioneer Boulevard?")

{'answer': 'Los Angeles', 'end': 66, 'score': 0.9882006645202637, 'start': 55}

##Method 1: NER first

In [None]:
# Using NER to find questions
nlp_ner = pipeline("ner")

In [6]:
print(nlp_ner(sequence))

[{'entity': 'I-LOC', 'score': 0.97324556, 'index': 8, 'word': 'Pioneer', 'start': 34, 'end': 41}, {'entity': 'I-LOC', 'score': 0.99442816, 'index': 9, 'word': 'Boulevard', 'start': 42, 'end': 51}, {'entity': 'I-LOC', 'score': 0.9995722, 'index': 11, 'word': 'Los', 'start': 55, 'end': 58}, {'entity': 'I-LOC', 'score': 0.99956805, 'index': 12, 'word': 'Angeles', 'start': 59, 'end': 66}, {'entity': 'I-ORG', 'score': 0.991907, 'index': 26, 'word': 'W', 'start': 121, 'end': 122}, {'entity': 'I-ORG', 'score': 0.9905408, 'index': 27, 'word': '##B', 'start': 122, 'end': 123}, {'entity': 'I-ORG', 'score': 0.988502, 'index': 28, 'word': '##G', 'start': 123, 'end': 124}, {'entity': 'I-ORG', 'score': 0.9714335, 'index': 29, 'word': '##O', 'start': 124, 'end': 125}, {'entity': 'I-PER', 'score': 0.9980508, 'index': 59, 'word': 'Nat', 'start': 264, 'end': 267}, {'entity': 'I-PER', 'score': 0.9984724, 'index': 60, 'word': 'King', 'start': 268, 'end': 272}, {'entity': 'I-PER', 'score': 0.99907494, 'ind

Let’s ask our transformer two types of questions:
* Questions related to locations
* Questions related to persons

###Location entity questions

In [None]:
nlp_qa = pipeline("question-answering")

In [16]:
print("Question 1.", nlp_qa(context=sequence, question="Where is Pioneer Boulevard?"))
print("Question 2.", nlp_qa(context=sequence, question="Where is Los Angeles located?"))
print("Question 3.", nlp_qa(context=sequence, question="Where is LA?"))
print("Question 4.", nlp_qa(context=sequence, question="Where is Barstow?"))
print("Question 5.", nlp_qa(context=sequence, question="Where is Las Vegas located?"))

Question 1. {'score': 0.9882006645202637, 'start': 55, 'end': 66, 'answer': 'Los Angeles'}
Question 2. {'score': 0.9880373477935791, 'start': 34, 'end': 51, 'answer': 'Pioneer Boulevard'}
Question 3. {'score': 0.3341643810272217, 'start': 55, 'end': 66, 'answer': 'Los Angeles'}
Question 4. {'score': 0.25206997990608215, 'start': 389, 'end': 398, 'answer': 'Las Vegas'}
Question 5. {'score': 0.17297087609767914, 'start': 55, 'end': 66, 'answer': 'Los Angeles'}


###Person entity questions

In [None]:
nlp_qa = pipeline("question-answering")

In [14]:
nlp_qa(context=sequence, question="Who was singing?")

{'answer': 'Nat King Cole',
 'end': 278,
 'score': 0.9848748445510864,
 'start': 265}

In [18]:
nlp_qa(context=sequence, question="Who was going to Las Vegas ?")

{'answer': 'Maria', 'end': 307, 'score': 0.5128909945487976, 'start': 302}

In [19]:
nlp_qa(context=sequence, question="Who are they?")

{'answer': 'Nat King Cole',
 'end': 278,
 'score': 0.612321674823761,
 'start': 265}

In [20]:
nlp_qa(context=sequence, question="Who drove to Las Vegas?")

{'answer': 'Maria', 'end': 307, 'score': 0.9794563055038452, 'start': 302}

We can see that the transformer
faced a semantic labeling problem. 

Let’s try to do better with person entity questions applying
an SRL-first method.

##Method 2: SRL first