# BioBERT Pre-trained Language Model for Biomedical Question Answering

BioBERT is a biomedical language representation model designed for biomedical text mining tasks \[[1](https://github.com/dmis-lab/bioasq-biobert)\]. 

In this notebook, we demonstrate how BioBERT can be applied to perform biomedical question answering using the BioASQ dataset, and how we can leverage OpenVINO's [Deep Learning Inference Engine](https://docs.openvinotoolkit.org/latest/_docs_IE_DG_inference_engine_intro.html) to allow high performance inference on many different Intel®  hardware types including CPU, GPU, FPGA, and VPU.

The code in this example is adapted from the BioASQ BioBERT repository, which is based on the BERT repository. Links to both can be found in the references below.

### References

\[[1](https://github.com/dmis-lab/bioasq-biobert)\] BioASQ BioBERT Github Repository

\[[2](https://arxiv.org/ftp/arxiv/papers/1901/1901.08746.pdf)\] Lee, Jinhyuk, et al. "BioBERT: a pre-trained biomedical language representation model for biomedical text mining." Bioinformatics 36.4 (2020): 1234-1240.

\[[3](https://github.com/google-research/bert)\] BERT Github Repository

\[[4](https://arxiv.org/pdf/1706.03762.pdf)\] Vaswani, Ashish, et al. "Attention is all you need." Advances in neural information processing systems. 2017.

\[[5](https://arxiv.org/pdf/1810.04805.pdf)\] Devlin, Jacob, et al. "Bert: Pre-training of deep bidirectional transformers for language understanding." arXiv preprint arXiv:1810.04805 (2018).

\[[6](https://arxiv.org/pdf/1909.08229.pdf)\] Yoon, Wonjin, et al. "Pre-trained Language Model for Biomedical Question Answering." arXiv preprint arXiv:1909.08229 (2019).

\[[7](https://arxiv.org/pdf/1609.08144.pdf)\] Wu, Yonghui, et al. "Google's neural machine translation system: Bridging the gap between human and machine translation." arXiv preprint arXiv:1609.08144 (2016).

\[[8](http://www.bioasq.org/)\] BioASQ Website

\[[9](https://github.com/BioASQ/Evaluation-Measures)\] BioASQ Evaluatio Github Repository


## Import dependencies

Run the following cell to import the python dependencies for running the Tensorflow and OpenVINO examples

In [None]:
import collections
import json
import os
import sys

import tokenization
import modeling
import numpy as np
import tensorflow as tf
tf.logging.set_verbosity(tf.logging.ERROR)

from tqdm.auto import tqdm
from openvino.inference_engine import IECore
from multiprocessing import Process
from run_factoid import write_predictions, read_squad_examples, convert_examples_to_features, model_fn_builder
from tensorflow.contrib import predictor

from qarpo.demoutils import *
from time import time

os.environ["TF_CPP_MIN_LOG_LEVEL"] = "2"

def run_background(function):
    p = Process(target=function)
    p.start()
    p.join()
    return function

## Overview of the BERT model

<br>
<figure>
<img src="BioBERT_training.png"/>
<figcaption style="text-align:center">BioBERT Training Methodology <a href="https://arxiv.org/ftp/arxiv/papers/1901/1901.08746.pdf">[2]</a></figcaption>
</figure>

BioBERT is initialized using the pre-trained weights from the original BERT model \[[3](https://github.com/google-research/bert)\] which is trained on BookCorpus and English Wikipedia data. Using that as a base, it additionally is pre-trained using PubMed abstracts for 1M steps (this is BioBERT 1.1).

After that step it is fine tuned for named entity recogniton, relation extraction, and question answering. BioBERT demonstrates that doing additional training on domain specific information can improve performance for NLP tasks.

The BERT model uses attention mechanisms \[[4](https://arxiv.org/pdf/1706.03762.pdf)\] to represent the relationships between a token an all other tokens . BERT is powerful not only because because it achieves state-of-the-art results, but also because generalizable to a wide array of different tasks such as machine translation, search, and question answering. \[[5](https://arxiv.org/pdf/1810.04805.pdf)\]


## Export Tensorflow model from checkpoint file

Before getting started, we need to first load the original Tensorflow checkpoint and export a saved model that we will use for inference and for converting to an OpenVINO intermediate representation (IR) later.

The conversion process will display multiple warnings due to issues with a specific package version which can be solved by installing gast==0.2.2, but the errors should not affect the export process. More info about the warnings can be found [here](https://github.com/tensorflow/tensorflow/issues/32949).

In [None]:
@run_background
def export():
    tf.logging.set_verbosity(tf.logging.INFO)

    bert_config = modeling.BertConfig.from_json_file('/data/BioBert/BERT-pubmed-1000000-SQuAD/bert_config.json')

    run_config = tf.contrib.tpu.RunConfig(
        model_dir='/data/BioBert/BERT-pubmed-1000000-SQuAD/',
        tpu_config=None)

    model_fn = model_fn_builder(
        bert_config=bert_config,
        init_checkpoint='/data/BioBert/BERT-pubmed-1000000-SQuAD/model.ckpt-14599',
        learning_rate=0,
        num_train_steps=0,
        num_warmup_steps=0,
        use_tpu=False,
        use_one_hot_embeddings=False)

    # If TPU is not available, this will fall back to normal Estimator on CPU
    # or GPU.
    estimator = tf.contrib.tpu.TPUEstimator(
        use_tpu=False,
        model_fn=model_fn,
        config=run_config,
        predict_batch_size=8)

    features = {
        "input_ids": tf.placeholder(shape=[None, 384], dtype=tf.int32, name='input_ids'),
        "input_mask": tf.placeholder(shape=[None, 384], dtype=tf.int32, name='input_mask'),
        "segment_ids": tf.placeholder(shape=[None, 384], dtype=tf.int32, name='segment_ids'),
        "unique_ids": tf.placeholder(shape=[None], dtype=tf.int32, name='unique_ids'),
        }
    serving_input_fn = tf.estimator.export.build_raw_serving_input_receiver_fn(features)
    estimator._export_to_tpu = False  
    estimator.export_saved_model(
        export_dir_base='./tf_saved_model',
        serving_input_receiver_fn=serving_input_fn)
        #as_text=True)

## BioBERT Walkthrough

The following section shows how input data is passed to the model and interpreted. We will go through the process of preparing inputs for the model and how to interpret the outputs. A diagram of how the BioBERT model is shown below for reference.

<br>
<figure>
<img src="BioBERT_model.png"/>
<figcaption style="text-align:center">BioBERT Model for Question Answering <a href="https://arxiv.org/pdf/1909.08229.pdf">[6]</a></figcaption>
</figure>

BioBERT question and answering relies on a context paragraph which contains the answer to a query that is passed in. An example of which is given in the cell below.

In [None]:
# Context text is taken from https://www.who.int/health-topics/coronavirus
example_context = "Coronaviruses (CoV) are a large family of viruses that cause illness ranging from the common cold to more severe diseases such as Middle East Respiratory Syndrome (MERS-CoV) and Severe Acute Respiratory Syndrome (SARS-CoV). A novel coronavirus (nCoV) is a new strain that has not been previously identified in humans. Coronaviruses are zoonotic, meaning they are transmitted between animals and people.  Detailed investigations found that SARS-CoV was transmitted from civet cats to humans and MERS-CoV from dromedary camels to humans. Several known coronaviruses are circulating in animals that have not yet infected humans. Common signs of infection include respiratory symptoms, fever, cough, shortness of breath and breathing difficulties. In more severe cases, infection can cause pneumonia, severe acute respiratory syndrome, kidney failure and even death. Standard recommendations to prevent infection spread include regular hand washing, covering mouth and nose when coughing and sneezing, thoroughly cooking meat and eggs. Avoid close contact with anyone showing symptoms of respiratory illness such as coughing and sneezing."

questions =  ["What is MERS-CoV?",
              "What are coronaviruses?", 
              "What are the common signs of coronavirus?",
              "How are coronaviruses spread?",
              "How do you prevent coronavirus infection?"]

The first step in the process is tokenizing the question and answer and converting them into a single input. In the cell below we load the vocabulary file, run the conversion using a tokenizer based on that file. 

Only a single question can be attached to a context at a time, so if there are multiple questions for a particular context paragraph, the process will need to be repeated for each question.

The tokenizer uses WordPiece tokenization, meaning it checks each word and breaks any words not found in the vocabulary into multiple tokens which are a part of the vocabulary. The tokenizer uses "##" to mark any words that have been split apart using this method. This allows words that are not originally part of the vocabulary to be constructed from tokens, and is a more efficient than a purely character based approach. More information about WordPiece encodings can be found in in the in the paper by Wu et al. \[[7](https://arxiv.org/pdf/1609.08144.pdf)\]

In our example, we can see that a word like "large" is included as a single token, while "zoonotic" is broken up into "zoo", "##not", and "##ic".

In [None]:
vocab_file = os.path.join("/data/BioBert/BERT-pubmed-1000000-SQuAD", "vocab.txt")
tokenizer = tokenization.FullTokenizer(vocab_file=vocab_file, do_lower_case=False)

question = questions[0]
def tokenize(question, context):
    return tokenizer.tokenize(question), tokenizer.tokenize(context)

query_tokens, context_tokens = tokenize(question, example_context)

print("Query tokens:\n", query_tokens)
print("\nContext tokens:\n", context_tokens)

Now that we have the question and context converted into tokens, we need to pass it to the model for inference. The model requires three distinct inputs:

- `input_ids` - a list of token numbers (corresponding to a line number in the vocab.txt file) for the input, which consists the question and the context concatenated together.
- `input_mask` - used to indicate where the input has actual token values since the inputs are padded with 0's to ensure a specific length
- `segment_ids` - distinguishes the question from the context, question and blank sections are 0's, context is 1's

Note: The original Tensorflow model technically has a fourth input, `unique_id`, but it is not used in the inference process and is passed through the model without any modification, so it can be safely left out/replaced with a placeholder value.

For the BioBERT model, each of the inputs needs to have an input length of 384, with shorter inputs being padded and longer inputs being broken into several context sections. As an additonal requirement, the beginning and the end of the question section of the input ids needs to have the \[CLS\] and \[SEP\] tokens, so an example input might look like "\[CLS\] ... question ... \[SEP\] ... context ... \[PAD\]". 

Runnning the cell below converts the tokens into the three inputs expected by the model.

In [None]:
def convert_inputs(query_tokens, context_tokens):
    question_ids = tokenizer.convert_tokens_to_ids(query_tokens)
    
    # Add [CLS] and [SEP] tokens to question
    question_ids.insert(0,101)
    question_ids.append(102)
    
    context_ids = tokenizer.convert_tokens_to_ids(context_tokens)

    question_len = len(question_ids)
    context_len = len(context_ids)
    
    segment_ids = np.zeros((1,384), dtype=np.int32)
    segment_ids[0,question_len:question_len+context_len] = 1
    input_mask = np.zeros((1,384), dtype=np.int32)
    input_mask[0,:question_len+context_len] = 1

    # Concatenate the question and context, and 0 pad the result.
    input_ids = np.expand_dims(np.concatenate((question_ids, context_ids, np.zeros((384-(question_len+context_len)), dtype=np.int32))),0)
    
    return input_ids, segment_ids, input_mask

input_ids, segment_ids, input_mask = convert_inputs(query_tokens, context_tokens)

print("Input ids:\n", input_ids)
print("\nInput mask:\n", input_mask)
print("\nSegment ids:\n", segment_ids)


Now that we have prepared all of the necessary inputs, we can load the model and run the inference. For question answering, the outputs we expect from the model are predictions for where in the context paragraph the answer starts and where it ends. Since the model output indicates the starting and ending token of the answer, the final step in the process is converting the tokens back into words.

Note: For maximum accuracy we would also want to: 
- ensure that the end point is not after the start point
- limit the length of the answer
- ensure that the answer does not contain the question itself
- list alternative answers and their likelihood
- clean up whitespace and unknown tokens in the output

However, for the purposes of this demonstration, we just use the maximum value of the start and end outputs to determine where the answer is. The code that we will use later with the BioASQ dataset will do all of the above checks.

In [None]:
@run_background
def tensorflow_inference():
    start_time = time()
    predict_fn = predictor.from_saved_model('./tf_saved_model/' + os.listdir('./tf_saved_model/')[0])
    print("Model loaded in {} seconds".format(time()-start_time))
    start_time = time()
    result = predict_fn({"input_ids": input_ids, "segment_ids": segment_ids, "input_mask": input_mask, "unique_ids": [1]})
    print("Inference took {} seconds".format(time()-start_time))
    sl = result["start_logits"][0,:]
    el = result["end_logits"][0,:]

    answer = tokenizer.convert_ids_to_tokens(input_ids[0][np.argmax(sl):np.argmax(el)+1])
    
    print("\n" + question)
    print("Answer tokens: ", answer)
    print(" ".join(answer).replace(" ##", ""))


### Convert the Tensorflow .pb file to IR

The next step in the process is running the OpenVINO Model Optimizer to generate an OpenVINO Intermediate Representation (IR) that uses FP16 precision. Due to the size of the model, we need to send the model conversion process to one of the edge nodes to run. 

The conversion process should take about 1-2 minutes. After the conversion is done, we can run the same inference as above using the OpenVINO model to ensure that the conversion was successful.

In [None]:
!qsub convert_tf_to_ov.sh -e logs/ -o logs/

Run the utility below to view the progress of the conversion. The outputs of the conversion process will be saved into the `logs/` folder. 
##### Wait for the conversion to complete before proceeding

In [None]:
liveQstat()

### Run the inference using OpenVINO

We can now run the same inference we just ran using the newly exported OpenVINO IR.


In [None]:
@run_background
def openvino_inference():
    start_time = time()
    ie = IECore()
    net = ie.read_network(model = './ov/saved_model.xml', weights = './ov/saved_model.bin')
    exec_net = ie.load_network(network=net, device_name='CPU')
    del net
    print("Model loaded in {} seconds".format(time()-start_time))
    
    start_time = time()
    result = exec_net.infer(inputs={"input_ids": input_ids, "segment_ids": segment_ids, "input_mask": input_mask})
    print("Inference took {} seconds".format(time()-start_time))
    sl = result["unstack/Squeeze_"][0,:]
    el = result["unstack/Split.1"][0,0,:]

    answer = tokenizer.convert_ids_to_tokens(input_ids[0][np.argmax(sl):np.argmax(el)+1])
    
    print("\n" + question)
    print("Answer tokens: ", answer)
    print(" ".join(answer).replace(" ##", ""))


## Run the OpenVINO inference on the BioASQ dataset

Now that we have converted the model into an OpenVINO IR, we can run it on the BioASQ task V dataset \[[8](http://www.bioasq.org/)\] and evaluate its results. For the first step in the process, we load our saved model IR and set up an executable network that will run on the CPU.

In [None]:
# Loading the OpenVINO IR
ie = IECore()
net = ie.read_network(model = './ov/saved_model.xml', weights = './ov/saved_model.bin')
exec_net = ie.load_network(network=net, device_name='CPU')
del net

### Load the BioASQ dataset

The BioASQ dataset contains excerpts from medical documents with questions and reference answers provided by a team of biomedical experts. The authors of the BioBERT model have converted the BioASQ data into the same format as the Stanford Question Answering Datset (SQuAD). We can see an example of the data by running the cell below.

In [None]:
!head -16 /data/BioBert/data-release/BioASQ-6b/test/Full-Abstract/BioASQ-test-factoid-6b-3.json

Running the following cell will load one of the BioASQ test datasets. The `convert_examples_to_features` function processes all of the data using the same method as the example shown before. If you wish to run the inference again with a different dataset, you can modify the `input_file` line.

In [None]:
# Change this line if you want to test out the model with a different dataset
input_file = "/data/BioBert/data-release/BioASQ-6b/test/Full-Abstract/BioASQ-test-factoid-6b-3.json"

eval_examples = read_squad_examples(input_file=input_file,is_training=False)

# These are the parameters specified from the BioBERT model
max_seq_length = 384
doc_stride = 128
max_query_length = 64
batch_size = 2
n_best_size = 20
max_answer_length = 30

data_features = []

def append_features(feature):
    data_features.append(feature)

# Use convert_examples_to_features method from run_factoid to convert the data
convert_examples_to_features(eval_examples, tokenizer,
                             max_seq_length, doc_stride,
                             max_query_length, False,
                             append_features)

### Running the inference

The outputs of the inference will be written to `predictions/predictions.json`.

In [None]:
bs = 1
n = len(data_features)
all_results = []

RawResult = collections.namedtuple("RawResult", ["unique_id", "start_logits", "end_logits"])

for idx in tqdm(range(n)):
    data = {"input_ids": list(map(lambda x: x.input_ids, data_features[idx:idx+bs])),
            "input_mask": list(map(lambda x: x.input_mask, data_features[idx:idx+bs])),
            "segment_ids": list(map(lambda x: x.segment_ids, data_features[idx:idx+bs]))}

    result = exec_net.infer(inputs=data)

    in_batch = result["unstack/Squeeze_"].shape[0]

    for i in range(in_batch):
        unique_id = 1000000000 + len(all_results)
        sl = result["unstack/Squeeze_"][i,:]
        el = result["unstack/Split.1"][0,i,:]
        start_logits = [float(x) for x in sl.flat]
        end_logits = [float(x) for x in el.flat]
        all_results.append(RawResult(unique_id=unique_id, start_logits=start_logits, end_logits=end_logits))

output_dir = "predictions"
os.makedirs(output_dir, exist_ok=True)
output_prediction_file = os.path.join(output_dir, "predictions.json")
output_nbest_file = os.path.join(output_dir, "nbest_predictions.json")
output_null_log_odds_file = os.path.join(output_dir, "null_odds.json")

tf.app.flags.DEFINE_string('f', '', 'kernel')

write_predictions(eval_examples, data_features, all_results,
                  n_best_size, max_answer_length,
                  True, output_prediction_file, output_nbest_file,
                  output_null_log_odds_file)

### Show the output of the inference

Run the following cell to view the results of the inference along with the original questions. 

It is also possible to run the official BioASQ evaluation measures to check the accuracy of the predictions \[[9](https://github.com/BioASQ/Evaluation-Measures)\]. However, we do not include this as part of the notebook since the official evaluation measures require Java, which is not included on the DevCloud. If you wish to run the evaluation locally, instructions for doing so can be found in the BioBERT repository \[[1](https://github.com/dmis-lab/bioasq-biobert)\] and the required golden answer datasets can be found at the BioASQ website ([6b](http://participants-area.bioasq.org/Tasks/6b/goldenDataset/), [7b](http://participants-area.bioasq.org/Tasks/6b/goldenDataset/)).

In [None]:
predictions_filename = "./predictions/predictions.json"

with open(input_file) as json_file2:
    contexts = json.load(json_file2)

with open(predictions_filename) as json_file:
    answers = json.load(json_file)

for data in answers.items():

    id = data[0]
    answer = data[1]
    print("ID: {}".format(id))
    print("="*32)

    for d in contexts["data"][0]["paragraphs"]:
        if d["qas"][0]["id"] == id:
            question = d["qas"][0]["question"]
            context = d["context"]
            print("Context:\n********\n{}\n\nQuestion:\n*********\n{}\n\nPrediction:\n***********\n{}".format(context, question, answer))

    print("\n\n")

### Enter your own context and questions

You can try the model out for yourself below by filling in values for the context and questions below. Note that the answer to the question must be present in the context and that the context has a maximum length of 384 tokens for the BioBERT model. 

In [None]:
# Enter in your own context and list of questions below and uncomment the lines
# example_context = "CONTEXT GOES HERE"
# questions = ["QUESTIONS GO HERE", ]


for item in questions:
    query_tokens, context_tokens = tokenize(item, example_context)
    input_ids, segment_ids, input_mask = convert_inputs(query_tokens, context_tokens)
    start_time = time()
    result = exec_net.infer(inputs={"input_ids": input_ids, "segment_ids": segment_ids, "input_mask": input_mask})

    sl = result["unstack/Squeeze_"][0,:]
    el = result["unstack/Split.1"][0,0,:]

    answer = tokenizer.convert_ids_to_tokens(input_ids[0][np.argmax(sl):np.argmax(el)+1])
    
    print("\nInference took {} seconds".format(time()-start_time))
    print(item)
    print(" ".join(answer).replace(" ##", ""))

## Inference on the Edge

All the code up to this point has been run within the Jupyter Notebook instance running on a development node based on an Intel Xeon Scalable processor. We will run the workload on other edge compute nodes represented in the IoT DevCloud. We will send work to the edge compute nodes by submitting the corresponding non-interactive jobs into a queue. For each job, we will specify the type of the edge compute server that must be allocated for the job.

The job file is written in Bash, and will be executed directly on the edge compute node. For this example, we have written the job file for you in the notebook. It performs the classification using the script "inference.sh".

In [None]:
%%writefile inference.sh

cd $PBS_O_WORKDIR

mkdir -p $1
OUTPUT_DIR=$1
DEVICE=$2

if [ "$DEVICE" = "HETERO:FPGA,CPU" ]; then
    # Environment variables and compilation for edge compute nodes with FPGAs - Updated for OpenVINO 2020.1
    source /opt/altera/aocl-pro-rte/aclrte-linux64/init_opencl.sh
    export LD_LIBRARY_PATH=${LD_LIBRARY_PATH}:/opt/intel/openvino/bitstreams/a10_vision_design_sg1_bitstreams/BSP/a10_1150_sg1/linux64/lib
    aocl program acl0 /opt/intel/openvino/bitstreams/a10_vision_design_sg1_bitstreams/2019R4_PL1_FP16_MobileNet_Clamp.aocx
    export CL_CONTEXT_COMPILER_MODE_INTELFPGA=3
fi

python3 inference.py -d ${DEVICE} -o ${OUTPUT_DIR}

### How jobs are submitted to the queue

Now that we have the job script, we can submit the jobs to edge compute nodes. In the IoT DevCloud, you can do this using the `qsub` command.
We can submit the job to 5 different types of edge compute nodes simultaneously or just one node at at time.

There are five options of `qsub` command that we use for this:
- `-l` : this option lets us select the number and the type of nodes using `nodes={node_count}:{property}`. 
- `-F` : this option lets us send arguments to the bash script. 
- `-N` : this option lets us name the job so that it is easier to distinguish between them.
- `-o` : this option lets us determine the path to be used for the standard output stream.
- `-e` : this option lets us determine the path to be used for the standard error stream.


The `-F` flag is used to pass in arguments to the job script.
The [inference.sh](inference.sh) script takes in 2 arguments:
1. the path to the directory for the output video and performance stats
2. targeted device (e.g. CPU, GPU, MYRIAD, HDDL or HETERO:FPGA,CPU)

The job scheduler will use the contents of `-F` flag as the argument to the job script.

If you are curious to see the available types of nodes on the IoT DevCloud, run the following cell.

In [None]:
!pbsnodes | grep compnode | sort | uniq -c

Here, the properties describe the node, and number on the left is the number of available nodes of that architecture.

### Job queue submission

The output of the cell is the `JobID` of your job, which you can use to track progress of a job.

**Note** You can submit all the jobs at once or follow one at a time. 

After submission, they will go into a queue and run as soon as the requested compute resources become available. 
(tip: **shift+enter** will run the cell and automatically move you to the next cell. So you can hit **shift+enter** multiple times to quickly run multiple cells).


#### Submitting to an edge compute node with an Intel Core CPU
In the cell below, we submit a job to an <a 
    href="https://software.intel.com/en-us/iot/hardware/iei-tank-dev-kit-core">IEI 
    Tank 870-Q170</a> edge node with an <a 
    href="https://ark.intel.com/products/88186/Intel-Core-i5-6500TE-Processor-6M-Cache-up-to-3-30-GHz-">Intel 
    Core i5-6500TE</a>. The inference workload will run on the CPU.

In [None]:
job_id_core = !qsub inference.sh -l nodes=1:idc001skl:i5-6500te -F "results/core/ CPU" -N BioBERT_core -e results/core/ -o results/core/   
print(job_id_core[0]) 
#Progress indicators
if job_id_core:
    progressIndicator('./logs', job_id_core[0]+'.txt', "Inference", 0, 100)

#### Submitting to an edge compute node with an 8th Generation Intel Core CPU
In the cell below, we submit a job to an <a 
    href="https://software.intel.com/en-us/iot/8th-gen-core-dev-kit">UP Xtreme Edge Compute Enabling Kit
    </a> edge node with a low power <a 
    href="https://ark.intel.com/content/www/us/en/ark/products/193554/intel-core-i7-8665ue-processor-8m-cache-up-to-4-40-ghz.html">Intel 
    Core i7-8865UE</a>. The inference workload will run on the CPU.


In [None]:
job_id_core2 = !qsub inference.sh -l nodes=1:idc014upxa10fx1 -F "results/core2/ CPU" -N BioBERT_core2 -e results/core2/ -o results/core2/
print(job_id_core2[0]) 
#Progress indicators
if job_id_core2:
    progressIndicator('./logs', job_id_core2[0]+'.txt', "Inference", 0, 100)    

#### Submitting to an edge compute node with Intel Xeon CPU
In the cell below, we submit a job to an <a 
    href="https://software.intel.com/en-us/iot/hardware/iei-tank-dev-kit-core">IEI 
    Tank 870-Q170</a> edge node with an <a 
    href="https://ark.intel.com/products/88178/Intel-Xeon-Processor-E3-1268L-v5-8M-Cache-2-40-GHz-">Intel 
    Xeon Processor E3-1268L v5</a>. The inference workload will run on the CPU.

In [None]:
job_id_xeon = !qsub inference.sh -l nodes=1:idc007xv5:intel-xeon -F "results/xeon/ CPU" -N BioBERT_xeon -e results/xeon/ -o results/xeon/
print(job_id_xeon[0]) 
#Progress indicators
if job_id_xeon:
    progressIndicator('./logs', job_id_xeon[0]+'.txt', "Inference", 0, 100)        

#### Submitting to an edge compute node with Intel® Core CPU and using the onboard Intel® GPU
In the cell below, we submit a job to an <a 
    href="https://software.intel.com/en-us/iot/hardware/iei-tank-dev-kit-core">IEI 
    Tank* 870-Q170</a> edge node with an <a href="https://ark.intel.com/products/88186/Intel-Core-i5-6500TE-Processor-6M-Cache-up-to-3-30-GHz-">Intel® Core i5-6500TE</a>. The inference workload will run on the Intel® HD Graphics 530 card integrated with the CPU.

In [None]:
job_id_gpu = !qsub inference.sh -l nodes=1:tank-870:i5-6500te:intel-hd-530 -F "results/gpu/ GPU" -N BioBERT_gpu -e results/gpu/ -o results/gpu/
print(job_id_gpu[0]) 
#Progress indicators
if job_id_gpu:
    progressIndicator('./logs', job_id_gpu[0]+'.txt', "Inference", 0, 100)    

#### Submitting to an edge compute node with  IEI Mustang-F100-A10 (Intel® Arria® 10 FPGA)
In the cell below, we submit a job to an <a 
    href="https://software.intel.com/en-us/iot/hardware/iei-tank-dev-kit-core">IEI 
    Tank 870-Q170</a> edge node with an <a href="https://ark.intel.com/products/88186/Intel-Core-i5-6500TE-Processor-6M-Cache-up-to-3-30-GHz-">Intel Core i5-6500te CPU</a> . The inference workload will run on the <a href="https://www.ieiworld.com/mustang-f100/en/"> IEI Mustang-F100-A10 </a> card installed in this node.

In [None]:
job_id_fpga = !qsub inference.sh -l nodes=1:idc003a10:iei-mustang-f100-a10 -F "results/fpga/ HETERO:FPGA,CPU" -N BioBERT_fpga -e results/fpga/ -o results/fpga/
print(job_id_fpga[0]) 
#Progress indicators
if job_id_fpga:
    progressIndicator('./logs', job_id_fpga[0]+'.txt', "Inference", 0, 100)    

#### Original Tensorflow model for Reference

In the cell below we run the original Tensorflow model on an <a 
    href="https://software.intel.com/en-us/iot/hardware/iei-tank-dev-kit-core">IEI 
    Tank 870-Q170</a> edge node with an <a 
    href="https://ark.intel.com/products/88186/Intel-Core-i5-6500TE-Processor-6M-Cache-up-to-3-30-GHz-">Intel 
    Core i5-6500TE</a>. The inference workload will run on the CPU.

In [None]:
job_id_tf = !qsub inference.sh -l nodes=1:idc001skl:i5-6500te -F "results/tf/ TF" -N BioBERT_tf -e results/tf/ -o results/tf/   
print(job_id_tf[0]) 
#Progress indicators
if job_id_tf:
    progressIndicator('./logs', job_id_tf[0]+'.txt', "Inference", 0, 100)

### Check if the job is done

In [None]:
liveQstat()

You should see the jobs you have submitted (referenced by `Job ID` that gets displayed right after you submit the job in step 2.3).
There should also be an extra job in the queue "jupyterhub": this job runs your current Jupyter Notebook session.

The 'S' column shows the current status. 
- If it is in Q state, it is in the queue waiting for available resources. 
- If it is in R state, it is running. 
- If the job is no longer listed, it means it is completed.

**Note**: Time spent in the queue depends on the number of users accessing the edge nodes. Once these jobs begin to run, they should take from 1 to 5 minutes to complete. 

***Wait!***

Please wait for the inference jobs complete before proceeding to the next step.

### View Results

We also saved the predicted answers as well as the performance data into the `results/` folder for each architecture.

The results of each of the job submissions should be identical, and can be verified by comparing the results of the predictions.json file in each folder. The script below takes a small excerpt from each of the prediction files for comparison.

In [None]:
!bash compare.sh

### Assess Performance

The total average time of each inference task is recorded in `results/{ARCH}/stats_{job_id}.txt`, where the subdirectory name corresponds to the architecture of the target edge compute node. Run the cell below to plot the results of all jobs side-by-side. Lower values mean better performance. Keep in mind that some architectures are optimized for the highest performance, others for low power or other metrics.

In [None]:
arch_list = [('core', 'Intel Core\ni5-6500TE\nCPU'),
             ('core2', 'Intel Core\ni7-8865UE\nCPU'),
             ('xeon', 'Intel Xeon\nE3-1268L v5\nCPU'),
             ('gpu', ' Intel Core\ni5-6500TE\nGPU'),
             ('fpga', ' IEI Mustang\nF100-A10\nFPGA'),
             ('tf', 'Original\nTF Model on\nIntel Core\ni5-6500TE\nCPU')]

stats_list = []
for arch, a_name in arch_list:
    if 'job_id_'+arch in vars():
        stats_list.append(('results/' + arch + '/stats_'+vars()['job_id_'+arch][0]+'.txt', a_name))
    else:
        stats_list.append(('placeholder'+arch, a_name))

plt.ion()
summaryPlot(stats_list, 'Architecture', 'Time, miliseconds', 'Processing Time Per Question', 'time' )