# Sentiment Analysis with OpenVINO™

**Sentiment analysis** is the use of natural language processing, text analysis, computational linguistics, and biometrics to systematically identify, extract, quantify, and study affective states and subjective information. This notebook demonstrates how to convert and run a sequence classification model using OpenVINO. 

## Imports

In [None]:
from transformers import DistilBertForSequenceClassification, AutoTokenizer
import openvino.runtime as ov
import warnings
from pathlib import Path
import numpy as np
import time
import torch

## Initializing the Model
We will use the transformer-based [distilbert-base-uncased-finetuned-sst-2-english](https://huggingface.co/distilbert-base-uncased-finetuned-sst-2-english) model from Hugging Face.

In [None]:
checkpoint = "distilbert-base-uncased-finetuned-sst-2-english"
model = DistilBertForSequenceClassification.from_pretrained(
    pretrained_model_name_or_path=checkpoint
)

## Initializing the Tokenizer

Text Preprocessing cleans the text-based input data so it can be fed into the model. [Tokenization](https://towardsdatascience.com/tokenization-for-natural-language-processing-a179a891bad4) splits paragraphs and sentences into smaller units that can be more easily assigned meaning. It involves cleaning the data and assigning tokens or IDs to the words, so they are represented in a vector space where similar words have similar vectors. This helps the model understand the context of a sentence. Here, we will use [AutoTokenizer](https://huggingface.co/docs/transformers/main_classes/tokenizer) - a pre-trained tokenizer from Hugging Face: .

In [None]:
tokenizer = AutoTokenizer.from_pretrained(
    pretrained_model_name_or_path=checkpoint, model_max_length=128
)

## Convert to OpenVINO

Convert both tokenizer and the model to a single OpenVINO model.

In [None]:
from openvino.tools.mo import convert_model
from convert_tokenizer import convert_tokenizer
from convert_tokenizer import connect_models

# Convert the tokenizer to OpenVINO model
ov_tokenizer = convert_tokenizer(tokenizer)

# A sample input is required for tracing inside convert_model(model)
example_tokens = tokenizer('example text', return_tensors='pt')
ov_model = convert_model(model, example_input={**example_tokens})

# Now connect tokenizer and the main model together
ov_combined_model = connect_models(ov_tokenizer, ov_model)

OpenVINO™ Runtime uses the [Infer Request](https://docs.openvino.ai/2023.0/openvino_docs_OV_UG_Infer_request.html) mechanism which enables running models on different devices in asynchronous or synchronous manners. The model graph is sent as an argument to the OpenVINO API and an inference request is created. The default inference mode is AUTO but it can be changed according to requirements and hardware available. You can explore the different inference modes and their usage [in documentation.](https://docs.openvino.ai/2023.0/openvino_docs_Runtime_Inference_Modes_Overview.html)

In [None]:
warnings.filterwarnings("ignore")
core = ov.Core()
compiled_model = core.compile_model(ov_combined_model)

In [None]:
def softmax(x):
    """
    Defining a softmax function to extract
    the prediction from the output of the IR format
    Parameters: Logits array
    Returns: Probabilities
    """

    e_x = np.exp(x - np.max(x))
    return e_x / e_x.sum()

## Inference

In [None]:
from str_pack import pack_strings   # FIXME: required if string tensors are not supported

def infer(input_text):
    """
    Creating a generic inference function
    to read the batch of input strings and infer the result
    into 2 classes: Positive or Negative.
    Parameters: Text to be processed (list of strings)
    Returns: A list of labels, one per each input string: Positive or Negative.
    """

    label = {0: "NEGATIVE", 1: "POSITIVE"}
    result = compiled_model(pack_strings(input_text))
    return [label[np.argmax(softmax(x))] for x in result[0]]

### For a single input sentence

In [None]:
input_text = "I had a wonderful day"
start_time = time.perf_counter()
result = infer([input_text])
end_time = time.perf_counter()
total_time = end_time - start_time
print("Label: ", result[0])
print("Total Time: ", "%.2f" % total_time, " seconds")

### Read from a text file

In [None]:
start_time = time.perf_counter()
with open("../data/text/food_reviews.txt", "r") as f:
    input_text = f.read().splitlines()
    results = infer(input_text) # infer entire batch in one call
    for line, result in zip(input_text, results):
        print("User Input: ", line)
        print("Label: ", result, "\n")
end_time = time.perf_counter()
total_time = end_time - start_time
print("Total Time: ", "%.2f" % total_time, " seconds")