<a href="https://colab.research.google.com/github/rickqiu/deep-learning/blob/master/BERT%20Question%20Answering.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

<img src="https://upload.wikimedia.org/wikipedia/en/6/6d/Nvidia_image_logo.svg" style="width: 90px; float: right;">

# BERT Question Answering in TensorFlow with Mixed Precision

Copyright 2019 NVIDIA Corporation. All Rights Reserved.

Licensed under the Apache License, Version 2.0 (the "License");
you may not use this file except in compliance with the License.
You may obtain a copy of the License at

     http://www.apache.org/licenses/LICENSE-2.0

Unless required by applicable law or agreed to in writing, software
distributed under the License is distributed on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
See the License for the specific language governing permissions and
limitations under the License.

In [12]:
!nvidia-smi

Tue Jun 30 01:08:11 2020       
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 450.36.06    Driver Version: 418.67       CUDA Version: 10.1     |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|                               |                      |               MIG M. |
|   0  Tesla P100-PCIE...  Off  | 00000000:00:04.0 Off |                    0 |
| N/A   35C    P0    25W / 250W |      0MiB / 16280MiB |      0%      Default |
|                               |                      |                 ERR! |
+-------------------------------+----------------------+----------------------+
                                                                               
+-----------------------------------------------------------------------------+
| Proces

## 1. Overview

Bidirectional Embedding Representations from Transformers (BERT), is a method of pre-training language representations which obtains state-of-the-art results on a wide array of Natural Language Processing (NLP) tasks. 

The original paper can be found here: https://arxiv.org/abs/1810.04805.

NVIDIA's BERT is an optimized version of Google's official implementation, leveraging mixed precision arithmetic and tensor cores on V100 GPUS for faster training times while maintaining target accuracy.

### Learning objectives

This notebook demonstrates:
- Inference on Question Answering (QA) task with BERT Large model
- The use/download of fine-tuned NVIDIA BERT models from [NGC](https://ngc.nvidia.com)
- Use of Mixed Precision models for Inference

## 2. Setup

### Pre-Trained NVIDIA BERT TensorFlow Models on NGC

<img src="https://blogs.nvidia.com/wp-content/uploads/2019/03/18-ngc-software-stack-447x500.png" style="width: 360px;">

We will be using the following configuration of BERT in this example:

| **Model** | **Hidden layers** | **Hidden unit size** | **Attention heads** | **Feedforward filter size** | **Max sequence length** | **Parameters** |
|:---------:|:----------:|:----:|:---:|:--------:|:---:|:----:|
|BERTLARGE|24 encoder|1024| 16|4 x 1024|512|330M|

**To do so, we will take advantage of the pre-trained models available on the [NGC Model Registry](https://ngc.nvidia.com/catalog/models).**

Among the many configurations available we will download one of these two:

 - **bert_tf_v2_large_fp32_384**

 - **bert_tf_v2_large_fp16_384**

which are trained on the [SQuaD 2.0 Dataset](https://rajpurkar.github.io/SQuAD-explorer/).

We can choose the mixed precision model (which takes much less time to train than the fp32 version) without losing accuracy, with the following flag: 

In [13]:
use_mixed_precision_model = True

In [14]:
if use_mixed_precision_model:
    # bert_tf_v2_large_fp16_384
    !mkdir -p /workspace/bert/data/finetuned_model_fp16
    !wget -nc -q --show-progress -O /workspace/bert/data/finetuned_model_fp16/bert_tf_v2_large_fp16_384.zip \
    https://api.ngc.nvidia.com/v2/models/nvidia/bert_tf_v2_large_fp16_384/versions/1/zip
    !unzip -n -d /workspace/bert/data/finetuned_model_fp16/ /workspace/bert/data/finetuned_model_fp16/bert_tf_v2_large_fp16_384.zip 
else:
    # bert_tf_v2_large_fp32_384
    !mkdir -p /workspace/bert/data/finetuned_model_fp32
    !wget -nc -q --show-progress -O /workspace/bert/data/finetuned_model_fp32/bert_tf_v2_large_fp32_384.zip \
    https://api.ngc.nvidia.com/v2/models/nvidia/bert_tf_v2_large_fp32_384/versions/1/zip
    !unzip -n -d /workspace/bert/data/finetuned_model_fp32/ /workspace/bert/data/finetuned_model_fp32/bert_tf_v2_large_fp32_384.zip 

Archive:  /workspace/bert/data/finetuned_model_fp16/bert_tf_v2_large_fp16_384.zip
  inflating: /workspace/bert/data/finetuned_model_fp16/model.ckpt-8144.data-00000-of-00001  
  inflating: /workspace/bert/data/finetuned_model_fp16/model.ckpt-8144.index  
  inflating: /workspace/bert/data/finetuned_model_fp16/model.ckpt-8144.meta  
  inflating: /workspace/bert/data/finetuned_model_fp16/tf_bert_squad_1n_fp16_gbs32.190523090758.log  


### NGC Model Scripts

While we're at it, we'll also pull down some BERT helper scripts from the [NGC Model Scripts Registry](https://ngc.nvidia.com/catalog/model-scripts/nvidia:bert_for_tensorflow)

In [15]:
# Download BERT helper scripts
!wget -nc --show-progress -O bert_scripts.zip \
     https://api.ngc.nvidia.com/v2/recipes/nvidia/bert_for_tensorflow/versions/1/zip
!mkdir -p /workspace/bert
!unzip -n -d /workspace/bert bert_scripts.zip

--2020-06-30 01:12:01--  https://api.ngc.nvidia.com/v2/recipes/nvidia/bert_for_tensorflow/versions/1/zip
Resolving api.ngc.nvidia.com (api.ngc.nvidia.com)... 54.241.185.37, 54.153.26.224
Connecting to api.ngc.nvidia.com (api.ngc.nvidia.com)|54.241.185.37|:443... connected.
HTTP request sent, awaiting response... 302 
Location: https://s3.us-west-2.amazonaws.com/prod-model-registry-ngc-bucket/org/nvidia/recipes/bert_for_tensorflow/versions/1/files.zip?response-content-disposition=attachment%3B%20filename%3D%22files.zip%22&response-content-type=application%2Fzip&X-Amz-Security-Token=IQoJb3JpZ2luX2VjEIj%2F%2F%2F%2F%2F%2F%2F%2F%2F%2FwEaCXVzLXdlc3QtMSJHMEUCIQDBbyY8hKhSF1c6n8clKbokPz5BnvV4dkaHRLOMFl18ZAIgRpkcCWcmPO3Td9iLw5tYIkrduNQ1I6GP9W63NuhsyccqtAMIIRACGgw3ODkzNjMxMzUwMjciDDVPGhN3pyweaaM54CqRA3OCUjj4%2FW%2BOj%2F7qtg6kxHPaJna%2FHtTJXrvPgGYbw7C5M33OtZgbjFzdBCuUL3%2FmexWIrJ3gjzVnFAuA%2BdrOuCS23i1Epygj0Jil9RmrJ%2BNb5NyURz1UDVqiBD0ihqcw8DKckkZeuzbGck%2BXH4PGkAZuTyFALgkWSEzt%2BK1gZWtx3sm45z9uje

### BERT Config

In [16]:
# Download BERT vocab file
!mkdir -p /workspace/bert/config.qa
!wget -nc https://s3.amazonaws.com/models.huggingface.co/bert/bert-base-uncased-vocab.txt \
    -O /workspace/bert/config.qa/vocab.txt

File ‘/workspace/bert/config.qa/vocab.txt’ already there; not retrieving.


In [17]:
%%writefile /workspace/bert/config.qa/bert_config.json
{
  "attention_probs_dropout_prob": 0.1,
  "hidden_act": "gelu",
  "hidden_dropout_prob": 0.1,
  "hidden_size": 1024,
  "initializer_range": 0.02,
  "intermediate_size": 4096,
  "max_position_embeddings": 512,
  "num_attention_heads": 16,
  "num_hidden_layers": 24,
  "type_vocab_size": 2,
  "vocab_size": 30522
}

Overwriting /workspace/bert/config.qa/bert_config.json


### Helper Functions

In [18]:
# Create dynamic JSON files based on user inputs
def write_input_file(context, qinputs, predict_file):
    # Remove quotes and new lines from text for valid JSON
    context = context.replace('"', '').replace('\n', '')
    # Create JSON dict to write
    json_dict = {
      "data": [
        {
          "title": "BERT QA",
          "paragraphs": [
            {
              "context": context,
              "qas": qinputs
            }
          ]
        }
      ]
    }
    # Write JSON to input file
    with open(predict_file, 'w') as json_file:
        import json
        json.dump(json_dict, json_file, indent=2)
    
# Display Inference Results as HTML Table
def display_results(predict_file, output_prediction_file):
    import json
    from IPython.display import display, HTML

    # Here we show only the prediction results, nbest prediction is also available in the output directory
    results = ""
    with open(predict_file, 'r') as query_file:
        queries = json.load(query_file)
        input_data = queries["data"]
        with open(output_prediction_file, 'r') as result_file:
            data = json.load(result_file)
            for entry in input_data:
                for paragraph in entry["paragraphs"]:
                    for qa in paragraph["qas"]:
                        results += "<tr><td>{}</td><td>{}</td><td>{}</td></tr>".format(qa["id"], qa["question"], data[qa["id"]])

    display(HTML("<table><tr><th>Id</th><th>Question</th><th>Answer</th></tr>{}</table>".format(results)))

## 3. BERT Inference: Question Answering

We can run inference on a fine-tuned BERT model for tasks like Question Answering.

Here we use a BERT model fine-tuned on a [SQuaD 2.0 Dataset](https://rajpurkar.github.io/SQuAD-explorer/) which contains 100,000+ question-answer pairs on 500+ articles combined with over 50,000 new, unanswerable questions.

### Paragraph and Queries

In this example we will ask our BERT model questions related to the following paragraph:

**The Apollo Program**
_"The Apollo program, also known as Project Apollo, was the third United States human spaceflight program carried out by the National Aeronautics and Space Administration (NASA), which accomplished landing the first humans on the Moon from 1969 to 1972. First conceived during Dwight D. Eisenhower's administration as a three-man spacecraft to follow the one-man Project Mercury which put the first Americans in space, Apollo was later dedicated to President John F. Kennedy's national goal of landing a man on the Moon and returning him safely to the Earth by the end of the 1960s, which he proposed in a May 25, 1961, address to Congress. Project Mercury was followed by the two-man Project Gemini. The first manned flight of Apollo was in 1968. Apollo ran from 1961 to 1972, and was supported by the two-man Gemini program which ran concurrently with it from 1962 to 1966. Gemini missions developed some of the space travel techniques that were necessary for the success of the Apollo missions. Apollo used Saturn family rockets as launch vehicles. Apollo/Saturn vehicles were also used for an Apollo Applications Program, which consisted of Skylab, a space station that supported three manned missions in 1973-74, and the Apollo-Soyuz Test Project, a joint Earth orbit mission with the Soviet Union in 1975."_

  
---

The paragraph and the questions can be easily customized by changing the code below:

---

In [19]:
# Create BERT input file with (1) context and (2) questions to be answered based on that context
predict_file = '/workspace/bert/config.qa/input.json'

In [20]:
%%writefile $predict_file
{"data": 
 [
     {"title": "Project Apollo",
      "paragraphs": [
          {"context":"The Apollo program, also known as Project Apollo, was the third United States human spaceflight program carried out by the National Aeronautics and Space Administration (NASA), which accomplished landing the first humans on the Moon from 1969 to 1972. First conceived during Dwight D. Eisenhower's administration as a three-man spacecraft to follow the one-man Project Mercury which put the first Americans in space, Apollo was later dedicated to President John F. Kennedy's national goal of landing a man on the Moon and returning him safely to the Earth by the end of the 1960s, which he proposed in a May 25, 1961, address to Congress. Project Mercury was followed by the two-man Project Gemini. The first manned flight of Apollo was in 1968. Apollo ran from 1961 to 1972, and was supported by the two man Gemini program which ran concurrently with it from 1962 to 1966. Gemini missions developed some of the space travel techniques that were necessary for the success of the Apollo missions. Apollo used Saturn family rockets as launch vehicles. Apollo/Saturn vehicles were also used for an Apollo Applications Program, which consisted of Skylab, a space station that supported three manned missions in 1973-74, and the Apollo-Soyuz Test Project, a joint Earth orbit mission with the Soviet Union in 1975.", 
           "qas": [
               { "question": "What project put the first Americans into space?", 
                 "id": "Q1"
               },
               { "question": "What program was created to carry out these projects and missions?",
                 "id": "Q2"
               },
               { "question": "What year did the first manned Apollo flight occur?",
                 "id": "Q3"
               },                
               { "question": "What President is credited with the notion of putting Americans on the moon?",
                 "id": "Q4"
               },
               { "question": "Who did the U.S. collaborate with on an Earth orbit mission in 1975?",
                 "id": "Q5"
               },
               { "question": "How long did Project Apollo run?",
                 "id": "Q6"
               },               
               { "question": "What program helped develop space travel techniques that Project Apollo used?",
                 "id": "Q7"
               },                
               {"question": "What space station supported three manned missions in 1973-1974?",
                 "id": "Q8"
               }
]}]}]}

Overwriting /workspace/bert/config.qa/input.json


To effectively evaluate the speedup of mixed precision try a bigger workload by uncommenting the following lines:

**TODO: Waiting on model scripts repo to fix Windows newlines in squad_download.sh**


In [6]:
#!bash /workspace/bert/data/squad/squad_download.sh
#predict_file = '/workspace/bert/data/squad/v2.0/dev-v2.0.json'

## 4. Running Question/Answer Inference

To run QA inference we will launch the script run_squad.py with the following parameters:

In [21]:
import os

# This specifies the model architecture.
bert_config_file = '/workspace/bert/config.qa/bert_config.json'

# The vocabulary file that the BERT model was trained on.
vocab_file = '/workspace/bert/config.qa/vocab.txt'

# Depending on the mixed precision flag we use different fine-tuned model
if use_mixed_precision_model:
    init_checkpoint = '/workspace/bert/data/finetuned_model_fp16/model.ckpt-8144'
else:
    init_checkpoint = '/workspace/bert/data/finetuned_model_fp32/model.ckpt-8144'

# Create the output directory where all the results are saved.
output_dir = '/workspace/bert/results'
output_prediction_file = os.path.join(output_dir,'predictions.json')
    
# Whether to lower case the input - True for uncased models / False for cased models.
do_lower_case = True
  
# Total batch size for predictions
predict_batch_size = 8

# Whether to run eval on the dev set.
do_predict = True

# When splitting up a long document into chunks, how much stride to take between chunks.
doc_stride = 128

# The maximum total input sequence length after WordPiece tokenization.
# Sequences longer than this will be truncated, and sequences shorter than this will be padded.
max_seq_length = 384

In [24]:
%tensorflow_version 1.x

TensorFlow 1.x selected.


### 4a. Run Inference

In [30]:
!pip install horovod[tensorflow]

Collecting horovod[tensorflow]
  Using cached https://files.pythonhosted.org/packages/25/3a/289d100467ae33bce717daa3b285c72e0c82c761c5de37cc61940982c83c/horovod-0.19.5.tar.gz
Collecting gast==0.2.2
  Downloading https://files.pythonhosted.org/packages/4e/35/11749bf99b2d4e3cceb4d55ca22590b0d7c2c62b9de38ac4a4a7f4687421/gast-0.2.2.tar.gz
Building wheels for collected packages: horovod, gast
  Building wheel for horovod (setup.py) ... [?25l[?25hdone
  Created wheel for horovod: filename=horovod-0.19.5-cp36-cp36m-linux_x86_64.whl size=16814712 sha256=b095f9ed7ca617c0e26a5dbb31b7034876f3c1caa09091ec9077a976af2b74e1
  Stored in directory: /root/.cache/pip/wheels/c1/de/55/40364395c40c35292366a21572320a9b89029df9fb518b7668
  Building wheel for gast (setup.py) ... [?25l[?25hdone
  Created wheel for gast: filename=gast-0.2.2-cp36-none-any.whl size=7540 sha256=2ab5a1b69db2afd8e1866f4c54ad74328492fe51ecfe459ca8a313d2e392ecd4
  Stored in directory: /root/.cache/pip/wheels/5c/2e/7e/a1d4d4fcebe6c3

In [31]:
import horovod.tensorflow as hvd
# Ask BERT questions
!python /workspace/bert/run_squad.py \
  --bert_config_file=$bert_config_file \
  --vocab_file=$vocab_file \
  --init_checkpoint=$init_checkpoint \
  --output_dir=$output_dir \
  --do_predict=$do_predict \
  --predict_file=$predict_file \
  --predict_batch_size=$predict_batch_size \
  --doc_stride=$doc_stride \
  --max_seq_length=$max_seq_length









W0630 01:26:15.456149 140605881927552 module_wrapper.py:139] From /workspace/bert/run_squad.py:1174: The name tf.logging.set_verbosity is deprecated. Please use tf.compat.v1.logging.set_verbosity instead.


W0630 01:26:15.456306 140605881927552 module_wrapper.py:139] From /workspace/bert/run_squad.py:1174: The name tf.logging.INFO is deprecated. Please use tf.compat.v1.logging.INFO instead.


W0630 01:26:15.456432 140605881927552 module_wrapper.py:139] From /workspace/bert/modeling.py:94: The name tf.gfile.GFile is deprecated. Please use tf.io.gfile.GFile instead.


W0630 01:26:15.457169 140605881927552 module_wrapper.py:139] From /workspace/bert/run_squad.py:1183: The name tf.gfile.MakeDirs is deprecated. Please use tf.io.gfile.makedirs instead.


W0630 01:26:15.530535 140605881927552 module_wrapper.py:139] From /workspace/bert/run_squad.py:1199: The name tf.ConfigProto is deprecated. Please use tf.compat.v1.ConfigProto instead.

The TensorFlow contrib module will not be inclu

### 4b. Display Results:

In [32]:
display_results(predict_file, output_prediction_file)

Id,Question,Answer
Q1,What project put the first Americans into space?,Project Mercury
Q2,What program was created to carry out these projects and missions?,The Apollo program
Q3,What year did the first manned Apollo flight occur?,1968
Q4,What President is credited with the notion of putting Americans on the moon?,John F. Kennedy
Q5,Who did the U.S. collaborate with on an Earth orbit mission in 1975?,Soviet Union
Q6,How long did Project Apollo run?,1961 to 1972
Q7,What program helped develop space travel techniques that Project Apollo used?,Gemini missions
Q8,What space station supported three manned missions in 1973-1974?,Skylab


<details>
  <summary><b>Click to reveal expected answers to the questions above</b></summary>
  
| Id | Question | Answer |
|----|----------|--------|
| Q1 | What project put the first Americans into space? | Project Mercury |
| Q2 | What program was created to carry out these projects and missions? | The Apollo program |
| Q3 | What year did the first manned Apollo flight occur? | 1968 |
| Q4 | What President is credited with the notion of putting Americans on the moon?	 | John F. Kennedy |
| Q5 | Who did the U.S. collaborate with on an Earth orbit mission in 1975? | Soviet Union |
| Q6 | How long did Project Apollo run? | 1961 to 1972 |
| Q7 | What program helped develop space travel techniques that Project Apollo used? | Gemini missions |
| Q8 | What space station supported three manned missions in 1973-1974? | Skylab |

</details>

## 5. Custom Inputs

Now that you are familiar with running QA Inference on BERT, you may want to try
your own paragraphs and queries.


1. Copy and paste your context from Wikipedia, news articles, etc. when prompted below
2. Enter questions based on the context when prompted below.
3. Run the inference script
4. Display the inference results

In [33]:
predict_file = '/workspace/bert/config.qa/custom_input.json'
num_questions = 3           # You can configure this number

In [36]:
# Create your own context to ask questions about.
context = input("Paste your context here: ")

Paste your context here: Alan Mathison Turing OBE FRS (/ˈtjʊərɪŋ/; 23 June 1912 – 7 June 1954) was an English[6] mathematician, computer scientist, logician, cryptanalyst, philosopher and theoretical biologist.[7] Turing was highly influential in the development of theoretical computer science, providing a formalisation of the concepts of algorithm and computation with the Turing machine, which can be considered a model of a general-purpose computer.[8][9][10] Turing is widely considered to be the father of theoretical computer science and artificial intelligence.[11] Despite these accomplishments, he was not fully recognised in his home country during his lifetime, due to his homosexuality, and because much of his work was covered by the Official Secrets Act.  During the Second World War, Turing worked for the Government Code and Cypher School (GC&CS) at Bletchley Park, Britain's codebreaking centre that produced Ultra intelligence. For a time he led Hut 8, the section that was respon

In [37]:
# Get questions from user input
questions = [input("Question {}/{}: ".format(i+1, num_questions)) for i in range(num_questions)]
# Format questions and write to JSON input file
qinputs = [{ "question":q, "id":"Q{}".format(i+1)} for i,q in enumerate(questions)]
write_input_file(context, qinputs, predict_file)

Question 1/3: Who is Alan Turing?
Question 2/3: What did he do?
Question 3/3: Where did he work?


In [38]:
# Ask BERT questions
!python /workspace/bert/run_squad.py \
  --bert_config_file=$bert_config_file \
  --vocab_file=$vocab_file \
  --init_checkpoint=$init_checkpoint \
  --output_dir=$output_dir \
  --do_predict=$do_predict \
  --predict_file=$predict_file \
  --predict_batch_size=$predict_batch_size \
  --doc_stride=$doc_stride \
  --max_seq_length=$max_seq_length







W0630 01:33:02.349607 140258253739904 module_wrapper.py:139] From /workspace/bert/run_squad.py:1174: The name tf.logging.set_verbosity is deprecated. Please use tf.compat.v1.logging.set_verbosity instead.


W0630 01:33:02.349749 140258253739904 module_wrapper.py:139] From /workspace/bert/run_squad.py:1174: The name tf.logging.INFO is deprecated. Please use tf.compat.v1.logging.INFO instead.


W0630 01:33:02.349860 140258253739904 module_wrapper.py:139] From /workspace/bert/modeling.py:94: The name tf.gfile.GFile is deprecated. Please use tf.io.gfile.GFile instead.


W0630 01:33:02.350565 140258253739904 module_wrapper.py:139] From /workspace/bert/run_squad.py:1183: The name tf.gfile.MakeDirs is deprecated. Please use tf.io.gfile.makedirs instead.


W0630 01:33:02.423591 140258253739904 module_wrapper.py:139] From /workspace/bert/run_squad.py:1199: The name tf.ConfigProto is deprecated. Please use tf.compat.v1.ConfigProto instead.

The TensorFlow contrib module will not be include

In [39]:
display_results(predict_file, output_prediction_file)

Id,Question,Answer
Q1,Who is Alan Turing?,father of theoretical computer science and artificial intelligence
Q2,What did he do?,designed the Automatic Computing Engine
Q3,Where did he work?,Government Code and Cypher School
