### Install transformers library

In [None]:
!pip install transformers

Collecting transformers
[?25l  Downloading https://files.pythonhosted.org/packages/88/b1/41130a228dd656a1a31ba281598a968320283f48d42782845f6ba567f00b/transformers-4.2.2-py3-none-any.whl (1.8MB)
[K     |████████████████████████████████| 1.8MB 9.8MB/s 
[?25hCollecting tokenizers==0.9.4
[?25l  Downloading https://files.pythonhosted.org/packages/0f/1c/e789a8b12e28be5bc1ce2156cf87cb522b379be9cadc7ad8091a4cc107c4/tokenizers-0.9.4-cp36-cp36m-manylinux2010_x86_64.whl (2.9MB)
[K     |████████████████████████████████| 2.9MB 43.0MB/s 
Collecting sacremoses
[?25l  Downloading https://files.pythonhosted.org/packages/7d/34/09d19aff26edcc8eb2a01bed8e98f13a1537005d31e95233fd48216eed10/sacremoses-0.0.43.tar.gz (883kB)
[K     |████████████████████████████████| 890kB 47.0MB/s 
Building wheels for collected packages: sacremoses
  Building wheel for sacremoses (setup.py) ... [?25l[?25hdone
  Created wheel for sacremoses: filename=sacremoses-0.0.43-cp36-none-any.whl size=893261 sha256=fe7669e14ae00

### Essential libraries imported

In [None]:
import torch
from transformers import AutoTokenizer,BertTokenizerFast
from google.colab import drive
drive.mount("/content/drive") 

Mounted at /content/drive


### Prepare GPU Cuda.

In [None]:
device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
print("Device available for running: ")
print(device)

Device available for running: 
cpu


### Define tokenizer and load model that we fine tuned

In [None]:
tokenizer = BertTokenizerFast.from_pretrained('bert-base-uncased')
# tokenizer = AutoTokenizer.from_pretrained('bert-base-uncased')

# Load model that we fine tuned it
model = torch.load("/content/drive/MyDrive/bert/bert_6",map_location=torch.device('cpu'))
model.eval()

BertForQuestionAnswering(
  (bert): BertModel(
    (embeddings): BertEmbeddings(
      (word_embeddings): Embedding(30522, 768, padding_idx=0)
      (position_embeddings): Embedding(512, 768)
      (token_type_embeddings): Embedding(2, 768)
      (LayerNorm): LayerNorm((768,), eps=1e-12, elementwise_affine=True)
      (dropout): Dropout(p=0.1, inplace=False)
    )
    (encoder): BertEncoder(
      (layer): ModuleList(
        (0): BertLayer(
          (attention): BertAttention(
            (self): BertSelfAttention(
              (query): Linear(in_features=768, out_features=768, bias=True)
              (key): Linear(in_features=768, out_features=768, bias=True)
              (value): Linear(in_features=768, out_features=768, bias=True)
              (dropout): Dropout(p=0.1, inplace=False)
            )
            (output): BertSelfOutput(
              (dense): Linear(in_features=768, out_features=768, bias=True)
              (LayerNorm): LayerNorm((768,), eps=1e-12, elementwise_

### Declare some functions from official evaluation script of SQuAD 2.0 Dataset so as to evaluate model's answers

In [None]:
def get_prediction(context,question):

  inputs = tokenizer.encode_plus(question, context, return_tensors='pt').to(device)

  outputs = model(**inputs)
  answer_start = torch.argmax(outputs[0])  
  answer_end = torch.argmax(outputs[1]) + 1 

  answer = tokenizer.convert_tokens_to_string(tokenizer.convert_ids_to_tokens(inputs['input_ids'][0][answer_start:answer_end]))

  return answer

def normalize_text(s):
  """Removing articles and punctuation, and standardizing whitespace are all typical text processing steps."""
  import string, re

  def remove_articles(text):
    regex = re.compile(r"\b(a|an|the)\b", re.UNICODE)
    return re.sub(regex, " ", text)

  def white_space_fix(text):
    return " ".join(text.split())

  def remove_punc(text):
    exclude = set(string.punctuation)
    return "".join(ch for ch in text if ch not in exclude)

  def lower(text):
    return text.lower()

  return white_space_fix(remove_articles(remove_punc(lower(s))))

def compute_exact_match(prediction, truth):
    return int(normalize_text(prediction) == normalize_text(truth))

def compute_f1(prediction, truth):
  pred_tokens = normalize_text(prediction).split()
  truth_tokens = normalize_text(truth).split()
  
  # if either the prediction or the truth is no-answer then f1 = 1 if they agree, 0 otherwise
  if len(pred_tokens) == 0 or len(truth_tokens) == 0:
    return int(pred_tokens == truth_tokens)
  
  common_tokens = set(pred_tokens) & set(truth_tokens)
  
  # if there are no common tokens then f1 = 0
  if len(common_tokens) == 0:
    return 0
  
  prec = len(common_tokens) / len(pred_tokens)
  rec = len(common_tokens) / len(truth_tokens)
  
  return 2 * (prec * rec) / (prec + rec)
  
def query_answer(context,question,answer):

  prediction = get_prediction(context,question)
  em_score = compute_exact_match(prediction, answer)
  f1_score = compute_f1(prediction, answer)

  print(f"Question: {question}")
  print(f"Prediction: {prediction}")
  print(f"True Answer: {answer}")
  print(f"EM: {em_score} \t F1: {f1_score}\n")

```
> In this cell, we built some functions (with help of official evaluation script of SQuAD v2, in that way to give each 
time a paragraph (context in which at most times answer is included), a specific question and it's real answer!
```

From [Question Answering on SQuAD 2.0 Dataset](https://web.stanford.edu/class/archive/cs/cs224n/cs224n.1194/reports/default/15848195.pdf) paper we retrieve more information about evaluation metrics:
```
Two evaluation metrics are employed: Exact Match (EM) score and F1 score. EM is a binary measurement of whether the 
percentage of output from a system exactly matches the ground truth answer(the proportion of questions that are 
answered in exact same words as the ground truth). F1 score is a harmonic mean of precision an recall. For each 
question, precision is calculated as the number of correctly predicted words divided by the total words in the predicted 
answer. Recall is the number of correctly predicted words divided by the number of words in the ground truth answer.
The F1 score is averaged among questions
```

### First example with Carles Puyol

In [None]:
context = """Carles Puyol Saforcada, born 13 April 1978, is a Spanish retired professional footballer
             who played his entire career for Barcelona. Mainly a central defender, he could also play in either full-back position, mostly as a right-back, and 
             is regarded as one of the best defenders of his generation and all time. He was Barcelona's captain from August 2004 until his retirement in 2014, 
             and appeared in 593 competitive matches for the club. He won 18 major club titles, including six La Liga trophies and three Champions Leagues.
             Puyol won 100 caps for Spain, and was part of the squads that won Euro 2008 and the 2010 World Cup. In the 2010 World Cup semi-final, he scored 
             the only goal of the game against Germany.
          """

questions = [
             "How many caps did Puyol win for Spain?",
             "How many matches Puyol appeared?",
             "How many matches Puyol played?",
             "When did Puyol retire?"]
answers = ["100","593","593","2014"]

for q,a in zip(questions,answers):
  query_answer(context,q,a)

Question: How many caps did Puyol win for Spain?
Prediction: 100
True Answer: 100
EM: 1 	 F1: 1.0

Question: How many matches Puyol appeared?
Prediction: 593
True Answer: 593
EM: 1 	 F1: 1.0

Question: How many matches Puyol played?
Prediction: 593
True Answer: 593
EM: 1 	 F1: 1.0

Question: When did Puyol retire?
Prediction: 2014,
True Answer: 2014
EM: 1 	 F1: 1.0



> In short term answers, which includes numbers model was absolutely magnificent! Note that second and third question were the same with a slight difference of verb 'play' instead of 'appear', but still got it! 4/4 correct answers!

### Second example with Ioannina city

In [None]:
context = """Greece, officially the Hellenic Republic and also known as Hellas, is a country located in Southeast Europe.
             Its population is approximately 10.7 million as of 2018; Athens is its largest and capital city, followed by Thessaloniki.
             Situated on the southern tip of the Balkans, Greece is located at the crossroads of Europe, Asia, and Africa. 
             Another beautiful city which is located in the northwest part of Greece is Ioannina with population 100 thousand people. 
          """

questions = ["What's population of Ioannina?","Where is Ioannina located?","What is Ioannina?"]
answers = ["100 thousand","in the northwest part of Greece","city"]

for q,a in zip(questions,answers):
  query_answer(context,q,a)

Question: What's population of Ioannina?
Prediction: 100 thousand
True Answer: 100 thousand
EM: 1 	 F1: 1.0

Question: Where is Ioannina located?
Prediction: northwest part
True Answer: in the northwest part of Greece
EM: 0 	 F1: 0.5714285714285715

Question: What is Ioannina?
Prediction: beautiful city
True Answer: city
EM: 0 	 F1: 0.6666666666666666



> Here as we can easily see our model had a brilliant performance! 3/3 correct answers! Notice that we were pretty strict with our true answers, cause we demand for example for third example to answer 'city', but model respond 'beautiful city' which was acceptable!

### Third example with Cloud Computing


In [None]:
context = """Cloud computing is the on-demand availability of computer system resources, especially data storage (cloud storage) and computing power, without direct 
             active management by the user. The term is generally used to describe data centers available to many users over the Internet.Large clouds, predominant today, 
             often have functions distributed over multiple locations from central servers. If the connection to the user is relatively close, it may be designated an edge
             server. Clouds may be limited to a single organization (enterprise clouds), or be available to multiple organizations (public cloud).Cloud computing relies 
             on sharing of resources to achieve coherence and economies of scale.Advocates of public and hybrid clouds note that cloud computing allows companies to avoid or 
             minimize up-front IT infrastructure costs. Proponents also claim that cloud computing allows enterprises to get their applications up and running faster, with 
             improved manageability and less maintenance, and that it enables IT teams to more rapidly adjust resources to meet fluctuating and unpredictable demand, providing 
             the burst computing capability: high computing power at certain periods of peak demand.Cloud providers typically use a "pay-as-you-go" model, which can lead
             to unexpected operating expenses if administrators are not familiarized with cloud-pricing models.The availability of high-capacity networks, low-cost computers
             and storage devices as well as the widespread adoption of hardware virtualization, service-oriented architecture and autonomic and utility computing has led to
             growth in cloud computing.By 2019, Linux was the most widely used operating system, including in Microsoft's offerings and is thus described as dominant.
          """

questions = ["What is cloud computing?",
             "What does cloud computing allow?",
             "What has led to growth in cloud computing?",
             ]

answers =   ["on-demand availability of computer system resources",
             "enterprises to get their applications up and running faster",
             """The availability of high-capacity networks, low-cost computers and storage devices as well as the widespread adoption of hardware virtualization, 
             service-oriented architecture and autonomic and utility computing""",
             ]

for q,a in zip(questions,answers):
  query_answer(context,q,a)

Question: What is cloud computing?
Prediction: on - demand availability of computer system resources,
True Answer: on-demand availability of computer system resources
EM: 0 	 F1: 0.7692307692307692

Question: What does cloud computing allow?
Prediction: sharing of resources to achieve coherence and economies of scale.
True Answer: enterprises to get their applications up and running faster
EM: 0 	 F1: 0.2105263157894737

Question: What has led to growth in cloud computing?
Prediction: 
True Answer: The availability of high-capacity networks, low-cost computers and storage devices as well as the widespread adoption of hardware virtualization, 
             service-oriented architecture and autonomic and utility computing
EM: 0 	 F1: 0



> As we can observe, although our model found the answer to our first question...our two last answers were wrong! So when we ask more complicated questions (without a normal sentence structure) our model doesn't perform so well! Note that in our last question model predicted that answer doesn't exist! 

### Conclusions


> Despite the fact that our model was overfitting and we didn't make it to avoid this phenomenon, it still had a pretty 
descent performance. More specifically through our three examples we noticed that model did absolutely amazing work when
questions had as answers numbers or sentece's structure was simple. In our last example, we tried to make its work difficult
and it had some wrong results. Note that if you load a pretrained model (fine tuned bert for example you are going to 
have generally speaking better behavior to your examples) and that due to the fact that model has been fine tuned by 
experts with more data and with more computing power in their hands. You can check this behavior of 
'bert-large-uncased-whole-word-masking-finetuned-squad' model in the next notebook, called 'CompareWithBertFineTunedSQuAD'.

> All these which we mentioned above were notes which we conclude given that question answering is one of the most challenging
tasks in Natural Languages Processing, where the machine tries to comprehend a given passage of text and correctly gives 
an answer to the questions.

> Last but not least, note that our contexts and questions are really irrelevant with SQuAD dataset's examples, with which we fine tuned our model, thus makes our correct answers more remarkable!
