# Text Classification with BERT using Neural Modules - Inference


Note: You must run the "Text Classification with BERT using Neural Modules - Training" notebook before running this one.

In the training notebook you leveraged Neural Modules (NeMo) to train a state of the art NLP model (BERT). You started with a model that had been pre-trained on a massive corpus of text and assessed the performance against a trivial dataset. You then used the SST-2 dataset to fine-tune the model to perform a specific use case. In that notebook this training took a few minutes over a small number of epochs, but with real-world applications and larger datasets this fine-tuning can take hours or days.

During the training process you also leveraged NetApp storage to take Snapshots of the trained model checkpoints.

In the context of Artificial Intelligence and Machine Learning, inference refers to applying a trained model to real-world data that was not part of the training in order to make a prediction based on that data. For inference, you instantiate the same neural models you used for training, but using the checkpoints that captured the enhanced model that captures the training results.

In this notebook you will re-instantiate the enhanced model using the same model architecture and the model weights and checkpoints that were learned during training. You will then utilize the [NetApp AI Control Plane](https://blog.netapp.com/ai-control-plane) to perform data management tasks to assess and compare the results of the untrained and enhanced model versions. Lastly you will investigate how to use this functionality to implement traceability and rollbacks.

![NLP Fine-tuning](figures/nlp_fine_tuning.png)

Import all required modules/functions/classes.

In [None]:
import nemo
import nemo.collections.nlp as nemo_nlp
from nemo.collections.nlp.data.datasets import BertTextClassificationDataset
from nemo.collections.nlp.nm.data_layers.text_classification_datalayer import BertTextClassificationDataLayer
from nemo.collections.nlp.nm.trainables import SequenceClassifier

from nemo.backends.pytorch.common import CrossEntropyLossNM
from nemo.utils.lr_policies import get_lr_policy
from nemo.collections.nlp.callbacks.text_classification_callback import eval_iter_callback, eval_epochs_done_callback

import os
import json
import math
import numpy as np
import pandas as pd
pd.options.display.max_colwidth = -1

import matplotlib.pyplot as plt
from sklearn.manifold import TSNE
%matplotlib inline

import torch

from netapp_jupyter_utils import netappSnapshotCreate, netappGetSnapshots, netappRestoreSnapshot
from datetime import datetime

Once again, you can safely ignore any warnings regarding "torchaudio". 

# Inference Pipeline

The following steps set up and execute the inference pipeline using the real-world data set you will be working with.

## Recreate Model Architectures

Load up the model. If you watch your utilization tabs you will see GPU memory utilization increase somewhat.

In [None]:
# Define path to data, checkpoints, logs, and model files
WORK_DIR = 'logs'
DATA_DIR = 'data/SST-2'
SPLIT_DATA_DIR = os.path.join(DATA_DIR, 'split')
MODELS_DIR = 'models/'

# instantiate the neural module factory
nf = nemo.core.NeuralModuleFactory(log_dir=WORK_DIR,
                                   create_tb_writer=True,
                                   add_time_to_log_dir=False,
                                   optimization_level='O1')

# Read in saved checkpoint/volume information
%store -r datasetPvName
%store -r modelPvName
%store -r trainingRunTag

# Pre-trained BERT model, simple classifier, and tokenizer
PRETRAINED_MODEL_NAME = 'bert-base-uncased'
MAX_SEQ_LEN = 64 # we will pad with 0's shorter sentences and truncate longer
BATCH_SIZE = 128

bert = nemo_nlp.nm.trainables.huggingface.BERT(pretrained_model_name=PRETRAINED_MODEL_NAME)
bert_hidden_size = bert.hidden_size
tokenizer = nemo_nlp.data.NemoBertTokenizer(PRETRAINED_MODEL_NAME)
mlp = SequenceClassifier(hidden_size=bert_hidden_size, 
                         num_classes=2,
                         num_layers=2,
                         log_softmax=False,
                         dropout=0.1)

## Load Trained Checkpoints

As you load the trained checkpoints, you will see GPU memory utilization increase further.

In [None]:
bert.restore_from('logs/checkpoints/BERT-EPOCH-3.pt')
mlp.restore_from('logs/checkpoints/SequenceClassifier-EPOCH-3.pt')

## Single Sentence Classification Inference

Set up the function you will use to perform a quick inference on a small set of new sentences. The inference will predict whether these sentences are expressing positive or negative opinions.

In [None]:
def classify_sentence(nf, tokenizer, bert, mlp, sentence):
    sentence = sentence.lower()
    tmp_file = "/tmp/tmp_sentence.tsv"
    with open(tmp_file, 'w+') as tmp_tsv:
        header = 'sentence\tlabel\n'
        line = sentence + '\t0\n'
        tmp_tsv.writelines([header, line])

    tmp_data = BertTextClassificationDataLayer(input_file=tmp_file,
                                               tokenizer=tokenizer,
                                               max_seq_length=128,
                                               batch_size=1)
    
    tmp_input, tmp_token_types, tmp_attn_mask, _ = tmp_data()
    tmp_embeddings = bert(input_ids=tmp_input,
                          token_type_ids=tmp_token_types,
                          attention_mask=tmp_attn_mask)
    tmp_logits = mlp(hidden_states=tmp_embeddings)
    tmp_logits_tensors = nf.infer(tensors=[tmp_logits, tmp_embeddings])
    tmp_probs = torch.nn.functional.softmax(torch.cat(tmp_logits_tensors[0])).numpy()[:, 1] 
    print(f'{sentence} | {tmp_probs[0]}')

Execute the inference. If you get a deprecation warning during this execution, you can safely ignore it. 

In [None]:
sentences = ['point break is the best movie of all time',
             'the movie was a wonderful exercise in understanding the struggles of native americans',
             'the performance of diego luna had me excited and annoyed at the same time',
             'matt damon is the only good thing about this film']

for sentence in sentences:
    classify_sentence(nf, tokenizer, bert, mlp, sentence)

To view the results of each sentence classification, scan the output results looking for the lines that match the following pattern.

\[NeMo I \<date> \<time> actions:728\] Evaluating batch 0 out of 1

There should be a line like this for each of the four sentences listed in the "sentences" array defined at the beginning of the code cell that generated this output.

The line immediately following these lines will contain the sentence text, a pipe character, and then a decimal number between 0 and 1 indicating the model's interpretation of the sentiment expressed by that sentence. A value close to 0 indicates a strongly negative sentiment, while a value close to 1 indicates a strongly positive sentiment. 

# Visualizing BERT Embeddings After Fine-tuning

Now that you have a fine-tuned BERT model, you will run the same assessment you ran during training, this time using the enhanced model. You will see if it produces an improved plot of the "good" and "bad" sample words.

In [None]:
spectrum_words = ['abysmal', 'apalling', 'dreadful', 'awful', 'terrible',
                  'very bad', 'really bad', 'rubbish', 'unsatisfactory',
                  'bad', 'poor', 'great', 'really good', 'very good', 'awesome'
                  'fantastic', 'superb', 'brilliant', 'incredible', 'excellent'
                  'outstanding', 'perfect']

spectrum_file = os.path.join(SPLIT_DATA_DIR, 'positive_negative.tsv')
with open(spectrum_file, 'w+') as f:
    f.write('sentence\tlabel')
    for word in spectrum_words:
        f.write('\n' + word + '\t0')
        
spectrum_df = pd.read_csv(spectrum_file, delimiter='\t')

In [None]:
# Reformat text
spectrum_data = BertTextClassificationDataLayer(input_file=spectrum_file,
                                                tokenizer=tokenizer,
                                                max_seq_length=MAX_SEQ_LEN,
                                                batch_size=BATCH_SIZE)

In [None]:
# Use trained model to create embeddings

spectrum_input, spectrum_token_types, spectrum_attn_mask, spectrum_labels = spectrum_data()

spectrum_embeddings = bert(input_ids=spectrum_input,
                           token_type_ids=spectrum_token_types,
                           attention_mask=spectrum_attn_mask)

spectrum_embeddings_tensors = nf.infer(tensors=[spectrum_embeddings])

Now that you have a better-trained model, you should expect to see more clustering of "good" and "bad" words.

In [None]:
spectrum_activations = spectrum_embeddings_tensors[0][0][:,0,:].numpy()
tsne_spectrum = TSNE(n_components=2, perplexity=10, verbose=1, learning_rate=2,
                     random_state=123).fit_transform(spectrum_activations)

fig = plt.figure(figsize=(10,10))
plt.plot(tsne_spectrum[0:11, 0], tsne_spectrum[0:11, 1], 'rx')
plt.plot(tsne_spectrum[11:, 0], tsne_spectrum[11:, 1], 'bo')
for (x,y, label) in zip(tsne_spectrum[0:, 0], tsne_spectrum[0:, 1], spectrum_df.sentence.values.tolist() ):
    plt.annotate(label, # this is the text
                 (x,y), # this is the point to label
                 textcoords="offset points", # how to position the text
                 xytext=(0,10), # distance from text to points (x,y)
                 ha='center') # horizontal alignment can be left, right or center

This new plot show significant clustering of the "good" words in the far lower-left corner, and signficant clustering of the "bad" words in the far upper-right corner. The enhanced model is doing a much better job of identifying and distinguishng "good" and "bad" words.

## Save New Model Version

Use NetApp Snapshot technology to near-instaneously save a new version of the enhanced model so that you will be able to revert back to it in the future if necessary.

In [None]:
modelTag = 'enhanced_' + trainingRunTag
modelDescription = 'Enhanced BERT model.'

bert.config.save_pretrained(MODELS_DIR)
torch.save(bert.state_dict(), MODELS_DIR + 'pytorch_model.bin') # Save pre-trained model to volume

apiResponse, snapshot = netappSnapshotCreate(pvName = modelPvName, snapshotName = modelTag, snapshotComment = modelDescription)

print('API Response: ', apiResponse['state'])
print('Snapshot uuid: ', snapshot['uuid'])
print('Snapshot name: ', snapshot['name'])
print('Snapshot description: ', snapshot['comment'])

# Explore Saved Models

List all of the model versions that you have saved using NetApp Snapshot technology. If for some reason you weren't happy with the results of the newly-trained model, you can always use the snapshot to quickly revert back to one of the previous model versions.

Note that you could also use NetApp FlexClone technology to clone any one of these saved models in order to experiment with it in a sandboxed workspace. While this capability is not demonstrated in this lab, you can refer to the [NetApp AI Control Plane Technical Report](https://www.netapp.com/us/media/tr-4798.pdf) for more information.

In [None]:
snapshots = netappGetSnapshots(pvName = modelPvName)

# Print list of snapshots
print('Model Tag', '\t\t\t', 'Model Description')
for snapshot in snapshots :
    try:
        print(snapshot['name'], '\t', snapshot['comment'])
    except Exception:
        pass

The newly enhanced model is clearly more accurate than the baseline model, as demonstrated by the clustering of "good" and "bad" words in the above plot. However, "enhanced" models are not always more accurate than previously-trained models, so assume that this model is actually less accurate than a previous baseline version, and that you want to restore that previous model. To demonstrate this procedure, you will now use NetApp technology to quickly restore the previous baseline model.

Note that restoring a volume to a snapshot destroys any existing snapshots that were taken later on the volume. In this example you are going to restore the volume to the snapshot corresponding to the model's baseline, which will destroy the snapshot of the enhanced model that you just took in the preceding step. If you needed to retain those later snapshots, NetApp FlexClone technology can again help you. While this version of the lab does not include a demonstration of FlexClone, we intend to include it in a future version.

In [None]:
# Restore the Previously Saved Model
modelTag = 'baseline_' + trainingRunTag # Model Tag of the model that we wish to restore
print('Restoring: ', modelTag)

# Restore NetApp snapshot corresponding to model version
result = netappRestoreSnapshot(pvName = modelPvName, snapshotName = modelTag)
print(result)

# Reload model
bert = nemo_nlp.nm.trainables.huggingface.BERT(pretrained_model_name = MODELS_DIR)

# Re-visualize BERT Embeddings after Restore

Visualize the BERT embeddings once again to confirm that you have reloaded the baseline (non-enhanced) BERT model.

In [None]:
spectrum_words = ['abysmal', 'apalling', 'dreadful', 'awful', 'terrible',
                  'very bad', 'really bad', 'rubbish', 'unsatisfactory',
                  'bad', 'poor', 'great', 'really good', 'very good', 'awesome'
                  'fantastic', 'superb', 'brilliant', 'incredible', 'excellent'
                  'outstanding', 'perfect']

spectrum_file = os.path.join(SPLIT_DATA_DIR, 'positive_negative.tsv')
with open(spectrum_file, 'w+') as f:
    f.write('sentence\tlabel')
    for word in spectrum_words:
        f.write('\n' + word + '\t0')
        
spectrum_df = pd.read_csv(spectrum_file, delimiter='\t')

In [None]:
# Reformat text
spectrum_data = BertTextClassificationDataLayer(input_file=spectrum_file,
                                                tokenizer=tokenizer,
                                                max_seq_length=MAX_SEQ_LEN,
                                                batch_size=BATCH_SIZE)

In [None]:
# Use trained model to create embeddings

spectrum_input, spectrum_token_types, spectrum_attn_mask, spectrum_labels = spectrum_data()

spectrum_embeddings = bert(input_ids=spectrum_input,
                           token_type_ids=spectrum_token_types,
                           attention_mask=spectrum_attn_mask)

spectrum_embeddings_tensors = nf.infer(tensors=[spectrum_embeddings])

Now that you have reloaded the original model, you should expect the plot to once again show less clustering than you saw in the plot for the enhanced model.

In [None]:
spectrum_activations = spectrum_embeddings_tensors[0][0][:,0,:].numpy()
tsne_spectrum = TSNE(n_components=2, perplexity=10, verbose=1, learning_rate=2,
                     random_state=123).fit_transform(spectrum_activations)

fig = plt.figure(figsize=(10,10))
plt.plot(tsne_spectrum[0:11, 0], tsne_spectrum[0:11, 1], 'rx')
plt.plot(tsne_spectrum[11:, 0], tsne_spectrum[11:, 1], 'bo')
for (x,y, label) in zip(tsne_spectrum[0:, 0], tsne_spectrum[0:, 1], spectrum_df.sentence.values.tolist() ):
    plt.annotate(label, # this is the text
                 (x,y), # this is the point to label
                 textcoords="offset points", # how to position the text
                 xytext=(0,10), # distance from text to points (x,y)
                 ha='center') # horizontal alignment can be left, right or center

The words are no longer clustered in this plot.

This completes the notebook activities. Please refer to the lab guide again to complete the remainder of the lab.