# Interactive Text Prediction with OpenVINO

This notebook shows interactive text prediction with OpenVINO. We use the [GPT-2](https://github.com/openvinotoolkit/open_model_zoo/tree/master/models/public/gpt-2) model, which is a part of the Generative Pre-trained Transformer (GPT) family. GPT-2 is pre-trained on a large corpus of English text using unsupervised training. The model is available from [Open Model Zoo](https://github.com/openvinotoolkit/open_model_zoo/), which we will use to download and convert the model to OpenVINO IR.

## Imports

In [None]:
import sys
import numpy as np
from openvino.runtime import Core
from IPython.display import Markdown, display
import json
from pathlib import Path

from transformers import GPT2Tokenizer
sys.path.append("../utils")

## The model


In [None]:
# directory where the model will be downloaded.
base_model_dir = "model"

# name of the model
model_name = 'gpt-2'

# desired precision
precision = "FP16"

model_path = f"model/public/{model_name}/{precision}/{model_name}.xml"
model_weights_path = f"model/public/{model_name}/{precision}/{model_name}.bin"

### Download GPT-2 from Open Model Zoo

We use `omz_downloader`, which is a command-line tool from the `openvino-dev` package. `omz_downloader` automatically creates a directory structure and downloads the selected model. Skip this step if the model is already downloaded. For this demo, we have to download and use `gpt-2` model.

In [None]:
download_command = f"omz_downloader " \
                   f"--name {model_name} " \
                   f"--output_dir {base_model_dir} " \
                   f"--cache_dir {base_model_dir}"

display(Markdown(f"Download command: `{download_command}`"))
display(Markdown(f"Downloading {model_name}... (This may take a few minutes depending on your connection.)"))

! $download_command

## Convert GPT-2 to OpenVINO IR
Since the downloaded GPT-2 model is not yet in OpenVINO IR format, we to perform an additional step to convert it. Use following command:

In [None]:
if not Path(model_path).exists():
    convert_command = (
        f"omz_converter --name {model_name} --precisions {precision}"
        f" --download_dir {base_model_dir} --output_dir {base_model_dir}"
    )
    display(Markdown(f"Convert command: `{convert_command}`"))
    display(Markdown(f"Converting {model_name}"))

    ! $convert_command

### Load the model

Converted models are located in a fixed directory structure, which indicates source, model name and precision. We start by building an Inference Engine object. Then we read the network architecture and model weights from the .xml and .bin files, respectively. Finally, we compile the model for the desired device. Because we use the dynamic shapes feature, which is only available on CPU, we must use `CPU` for the device. Dynamic shapes support on GPU is coming soon.

Since the text recognition model is with dynamic input shape and current release of OpenVINO 2022.1 does not support dynamic shape on `iGPU`, you cannot directly switch device to `iGPU` for inference in this case. Otherwise, you may need to resize the input images to this model into a fixed size and then try running the inference on `iGPU`.

In [None]:
# initialize inference engine
ie_core = Core()

# read the model and corresponding weights from file
model = ie_core.read_model(model=model_path, weights=model_weights_path)

# assigning dynamic shapes to every input layer
for input_layer in model.inputs:
    input_shape = input_layer.partial_shape
    input_shape[0] = -1
    input_shape[1] = -1
    model.reshape({input_layer: input_shape})

# compile the model for the CPU
compiled_model = ie_core.compile_model(model=model, device_name="CPU")

# get input and output names of nodes
input_keys = next(iter(compiled_model.inputs))
output_keys = next(iter(compiled_model.outputs))

Input keys are the names of the input nodes and output keys contain names of output nodes of the network. In the case of the gpt-2 model, we have `batch size` and `sequence length` as inputs and `batch size`, `sequence length` and `vocab size` as outputs.

## Processing

NLP models usually take a list of tokens as standard input. A token is a single word converted to some integer. To provide the proper input, we need the vocabulary for such mapping. So first let's input the vocabulary file.

In [None]:
def load_vocab_file(vocab_file_path):
    with open(vocab_file_path, "r", encoding="utf-8") as content:
        return json.load(content)

In [None]:
vocal_file_path = f"model/public/{model_name}/gpt2/vocab.json"
vocab = load_vocab_file(vocal_file_path)

## Define tokenizer

In [None]:
tokenizer = GPT2Tokenizer.from_pretrained("gpt2")

In [None]:
# the following function converts text to tokens
def tokenize(text):
    input_ids = tokenizer(text)['input_ids']
    input_ids = np.array(input_ids).reshape(1, -1)
    return input_ids

The last token in the vocabulary is `endoftext` token. We shall store the index of this token so that we can use this index for padding at later stage.

In [None]:
eos_token_id = len(vocab) - 1
tokenizer._convert_id_to_token(len(vocab) - 1)

### Define Softmax layer
We shall need softmax function to convert top-k logits into the probability distribution. 

In [None]:
def softmax(x):
    e_x = np.exp(x - np.max(x, axis=-1, keepdims=True))
    summation = e_x.sum(axis=-1, keepdims=True)
    return e_x / summation

### Set the minimum length of the sequence  
If the minimum length of the sequence is not reached, the following code will diminish the probability of occurrence of the `eos` token. Thereby continuing the process of generation of the next words.

In [None]:
def process_logits(input_ids, scores, eos_token_id, min_length=0):
    cur_length = input_ids.shape[-1]
    if cur_length < min_length:
        scores[:, eos_token_id] = -float("inf")
    return scores

### Top-K sampling
In Top-K sampling, we filter the K most likely next words and redistribute the probability mass among only those K next words. 

In [None]:
def get_top_k_logits(scores, top_k):
    filter_value = -float("inf")
    top_k = min(max(top_k, 1), scores.shape[-1])
    top_k_scores = -np.sort(-scores)[:, :top_k]
    indices_to_remove = scores < np.min(top_k_scores)
    filtred_scores = np.ma.array(scores, mask=indices_to_remove,
                                 fill_value=filter_value).filled()
    return filtred_scores

### Main Processing Function
Generating the predicted sequence.

In [None]:
def generate_sequence(input_ids, max_sequence_length=128,
                      eos_token_id=eos_token_id):
    while True:
        cur_input_len = len(input_ids[0])
        pad_len = max_sequence_length - cur_input_len
        model_input = np.concatenate((input_ids,
                                      [[eos_token_id] * pad_len]), axis=-1)
        # passing the padded sequnce into the model
        outputs = compiled_model(inputs=[model_input])[output_keys]
        next_token_logits = outputs[:, cur_input_len - 1, :]
        # pre-process distribution
        next_token_scores = process_logits(input_ids,
                                           next_token_logits, eos_token_id)
        top_k = 20
        next_token_scores = get_top_k_logits(next_token_scores, top_k)
        # get next token id
        probs = softmax(next_token_scores)
        next_tokens = np.random.choice(probs.shape[-1], 1,
                                       p=probs[0], replace=True)
        # break the loop if max length or end of text token is reached
        if cur_input_len == max_sequence_length or next_tokens == eos_token_id:
            break
        else:
            input_ids = np.concatenate((input_ids, [next_tokens]), axis=-1)
    return input_ids

# Run
Input the text in the input bar to get the predicted sequence.

In [None]:
text = "Deep learning is a type of machine learning that uses neural networks"
input_ids = tokenize(text)
output_ids = generate_sequence(input_ids)
S = " "
# Convert IDs to words and make the sentence from it
for i in output_ids[0]:
    S += tokenizer.convert_tokens_to_string(tokenizer._convert_id_to_token(i))
print("Input Text: ", text)
print()
print(f"Predicted Sequence:{S}")