### Introduction to NLP Fundamentals in TensorFlow

* NLP has the goal of deriving information out of natural language (could be sequences text or speech).

* Another common term for NLP problems is sequence to sequence problems (seq2seq).

**RNN:** A recurrent neural network (RNN) is a class of ANN where connections between nodes form a directed graph along a temporal sequence.

### Get helper functions

In [1]:
# import wget
# wget.download(url = "https://raw.githubusercontent.com/mrdbourke/tensorflow-deep-learning/main/extras/helper_functions.py")

In [2]:
from helper_functions import unzip_data , create_tensorboard_callback , plot_loss_curves, compare_historys

In [3]:
unzip_data("nlp_getting_started.zip")

### Visualizing a text dataset

In [4]:
import pandas as pd
train_df = pd.read_csv("train.csv")
test_df = pd.read_csv("test.csv")
train_df.head()

Unnamed: 0,id,keyword,location,text,target
0,1,,,Our Deeds are the Reason of this #earthquake M...,1
1,4,,,Forest fire near La Ronge Sask. Canada,1
2,5,,,All residents asked to 'shelter in place' are ...,1
3,6,,,"13,000 people receive #wildfires evacuation or...",1
4,7,,,Just got sent this photo from Ruby #Alaska as ...,1


In [5]:
# Shuffle training dataframe
train_df_shuffled = train_df.sample(frac = 1, random_state = 42)
train_df_shuffled.head()

Unnamed: 0,id,keyword,location,text,target
2644,3796,destruction,,So you have a new weapon that can cause un-ima...,1
2227,3185,deluge,,The f$&amp;@ing things I do for #GISHWHES Just...,0
5448,7769,police,UK,DT @georgegalloway: RT @Galloway4Mayor: ÛÏThe...,1
132,191,aftershock,,Aftershock back to school kick off was great. ...,0
6845,9810,trauma,"Montgomery County, MD",in response to trauma Children of Addicts deve...,0


In [6]:
# What does the test dataframe look like?
test_df.head() 

Unnamed: 0,id,keyword,location,text
0,0,,,Just happened a terrible car crash
1,2,,,"Heard about #earthquake is different cities, s..."
2,3,,,"there is a forest fire at spot pond, geese are..."
3,9,,,Apocalypse lighting. #Spokane #wildfires
4,11,,,Typhoon Soudelor kills 28 in China and Taiwan


In [7]:
# How many example of each class?
train_df.target.value_counts()

0    4342
1    3271
Name: target, dtype: int64

In [8]:
# How many total samples?
len(train_df) , len(test_df)

(7613, 3263)

In [10]:
# Let's visualize some random training examples
import random 
random_index = random.randint(0 , len(train_df) - 5)
for row in train_df_shuffled[['text' , 'target']][random_index: random_index + 5].itertuples():
    _, text , target = row
    print(f"Target: {target}", "(real disaster)" if target > 0 else "(not real disaster)")
    print(f"Text:\n{text}\n")
    print("---\n")

Target: 1 (real disaster)
Text:
Latest : Trains derailment: 'It's the freakiest of freak accidents' - The Indian Express: The Indi... http://t.co/iLdbeJe225 #IndianNews

---

Target: 1 (real disaster)
Text:
Motorcyclist bicyclist injured in Denver collision on Broadway: http://t.co/241cN8yxjq by @kierannicholson

---

Target: 1 (real disaster)
Text:
gmtTy mhtw4fnet

Officials: Alabama home quarantined over possible Ebola case - Washington Times

---

Target: 0 (not real disaster)
Text:
Your Router is One of the Latest DDoS Attack Weapons http://t.co/vXxMvgtzvg #phone #gaming #tv #news

---

Target: 1 (real disaster)
Text:
Ignition Knock (Detonation) Sensor-Senso Standard KS94 http://t.co/IhphZCkm41 http://t.co/wuICdTTUhf

---



### Split data into training and validation sets 

In [11]:
from sklearn.model_selection import train_test_split as tts
train_sentences , val_sentences , train_labels , val_labels = tts(train_df_shuffled['text'].to_numpy(),
                                                                    train_df_shuffled['target'].to_numpy(),
                                                                    test_size = 0.1, # use 10% of training data for validation
                                                                    random_state = 42)

In [12]:
# Check the lengths
len(train_sentences) , len(val_sentences) , len(train_labels) , len(val_labels)

(6851, 762, 6851, 762)

In [13]:
# Check the first 10 samples
train_sentences[:10] , train_labels[:10]

(array(['@mogacola @zamtriossu i screamed after hitting tweet',
        'Imagine getting flattened by Kurt Zouma',
        '@Gurmeetramrahim #MSGDoing111WelfareWorks Green S welfare force ke appx 65000 members har time disaster victim ki help ke liye tyar hai....',
        "@shakjn @C7 @Magnums im shaking in fear he's gonna hack the planet",
        'Somehow find you and I collide http://t.co/Ee8RpOahPk',
        '@EvaHanderek @MarleyKnysh great times until the bus driver held us hostage in the mall parking lot lmfao',
        'destroy the free fandom honestly',
        'Weapons stolen from National Guard Armory in New Albany still missing #Gunsense http://t.co/lKNU8902JE',
        '@wfaaweather Pete when will the heat wave pass? Is it really going to be mid month? Frisco Boy Scouts have a canoe trip in Okla.',
        'Patient-reported outcomes in long-term survivors of metastatic colorectal cancer - British Journal of Surgery http://t.co/5Yl4DC1Tqt'],
       dtype=object),
 array([0,

* **Tokenization** - Straight mapping from token to number (can be modelled but quickly gets too big).
* **Embedding** - Richer representation of relationships between tokens (can limit size + can be learned)

In NLP, there are two main concepts for turning text into numbers:

* Tokenization - A straight mapping from word or character or sub-word to a numerical value. There are three main levels of tokenization:

1. Using word-level tokenization with the sentence "I love TensorFlow" might result in "I" being 0, "love" being 1 and "TensorFlow" being 2. In this case, every word in a sequence considered a single token.

2. Character-level tokenization, such as converting the letters A-Z to values 1-26. In this case, every character in a sequence considered a single token.

3. Sub-word tokenization is in between word-level and character-level tokenization. It involves breaking individual words into smaller parts and then converting those smaller parts into numbers. For example, "my favorite food is pineapple pizza" might become "my, fav, avour, rite, fo, oo, od, is, pin, ine, app, le, piz, za". After doing this, these sub-words would then be mapped to a numerical value. In this case, every word could be considered multiple tokens.

* Embeddings - An embedding is a representation of natural language which can be learned. Representation comes in the form of a feature vector. For example, the word "dance" could be represented by the 5-dimensional vector [-0.8547, 0.4559, -0.3332, 0.9877, 0.1112]. It's important to note here, the size of the feature vector is tuneable. There are two ways to use embeddings:

1. Create your own embedding - Once your text has been turned into numbers (required for an embedding), you can put them through an embedding layer (such as tf.keras.layers.Embedding) and an embedding representation will be learned during model training.

2. Reuse a pre-learned embedding - Many pre-trained embeddings exist online. These pre-trained embeddings have often been learned on large corpuses of text (such as all of Wikipedia) and thus have a good underlying representation of natural language. You can use a pre-trained embedding to initialize your model and fine-tune it to your own specific task.

### Converting text into numbers

When dealing with a text problem, one of the first things you'll have to do before you can build a model is to convert your text to numbers.

There are a few ways to do this, namely:
* Tokenization - Direct mapping of token (a token could be a word or a character) to number.
* Embedding - Create a matrix of feature vector for each token (the size of the feature vector can be defined and this embedding can be learned).

### Text vectorization (tokenization)

The processing of each sample contains the following steps:
1. standardize each sample (usually lowercasing + punctuation stripping)
2. split each sample into substrings (usually words)
3. recombine substrings into tokens (usually ngrams) (ngrams is group of words)
4. index tokens (associate a uniquint value with each token)
5. transform each sample using this index, either into a vector of ints or a dense float vector.

In [14]:
import tensorflow as tf
from tensorflow.keras.layers.experimental.preprocessing import TextVectorization

### Text vectorization (tokenization)
Enough talking about tokenization and embeddings, let's create some.

We'll practice tokenzation (mapping our words to numbers) first.

To tokenize our words, we'll use the helpful preprocessing layer tf.keras.layers.experimental.preprocessing.TextVectorization.

The TextVectorization layer takes the following parameters:
* max_tokens - The maximum number of words in your vocabulary (e.g. 20000 or the number of unique words in your text), includes a value for OOV (out of vocabulary) tokens.
* standardize - Method for standardizing text. Default is "lower_and_strip_punctuation" which lowers text and removes all punctuation marks.
* split - How to split text, default is "whitespace" which splits on spaces.
* ngrams - How many words to contain per token split, for example, ngrams=2 splits tokens into continuous sequences of 2.
* output_mode - How to output tokens, can be "int" (integer mapping), "binary" (one-hot encoding), "count" or "tf-idf". See documentation for more.
* output_sequence_length - Length of tokenized sequence to output. For example, if output_sequence_length=150, all tokenized sequences will be 150 tokens long.
* pad_to_max_tokens - Defaults to False, if True, the output feature axis will be padded to max_tokens even if the number of unique tokens in the vocabulary is less than max_tokens. Only valid in certain modes, see docs for more.

In [15]:
# Use the default TextVectorization variables
text_vectorizer = TextVectorization(max_tokens=None, # how many words in the vocabulary (all of the different words in your text)
                                    standardize="lower_and_strip_punctuation", # how to process text
                                    split="whitespace", # how to split tokens
                                    ngrams=None, # create groups of n-words?
                                    output_mode="int", # how to map tokens to numbers
                                    output_sequence_length=None) # how long should the output sequence of tokens be?
                                    # pad_to_max_tokens=True) # Not valid if using max_tokens=None

In [16]:
train_sentences[0].split()

['@mogacola', '@zamtriossu', 'i', 'screamed', 'after', 'hitting', 'tweet']

In [17]:
# Find the average number of tokens (words) in the training tweets
round(sum([len(i.split()) for i in train_sentences]) / len(train_sentences))

15

In [18]:
# Setup text vectorization with custom variables
max_vocab_length = 10000 # max number of words to have in our vocabulary
max_length = 15 # max length our sequences will be (e.g. how many words from a Tweet does our model see?)

text_vectorizer = TextVectorization(max_tokens = max_vocab_length,
                                    output_mode = "int",
                                    output_sequence_length = max_length)

In [19]:
# Fit the text vectorizer to the training data
# adapt : Fits the state of the preprocessing layer to the dataset
text_vectorizer.adapt(train_sentences)

In [20]:
# Create a sample sentence and tokenize it 
sample_sentence = "There's a flood in my street!"
text_vectorizer([sample_sentence])

<tf.Tensor: shape=(1, 15), dtype=int64, numpy=
array([[264,   3, 232,   4,  13, 698,   0,   0,   0,   0,   0,   0,   0,
          0,   0]], dtype=int64)>

In [21]:
max_length

15

In [22]:
# Choose a random sentence from the training dataset and tokenize it
random_sentence = random.choice(train_sentences)
print(f"Original text:\n {random_sentence}\
    \n\nVectorized version:")
text_vectorizer([random_sentence])

Original text:
 70 years after #ABomb destroyd #HiroshimaÛÓ#BBC looks at wht #survived http://t.co/dLgNUuuUYn #CNV Watch Peace Vigils: http://t.co/jvkYzNDtja    

Vectorized version:


<tf.Tensor: shape=(1, 15), dtype=int64, numpy=
array([[ 325,  141,   43, 6284,    1,    1,  287,   17, 3250,  363,    1,
           1,  135,  675, 7026]], dtype=int64)>

In [23]:
# Get the unique words in the vocabulary
words_in_vocab = text_vectorizer.get_vocabulary() # Get all the unique words 
top_5_words = words_in_vocab[:5] # get the most common words
bottom_5_words= words_in_vocab[-5:] # get the least common words
print(f"Number of words in vocab: {len(words_in_vocab)}")
print(f"5 most common words: {top_5_words}")
print(f"5 least common words: {bottom_5_words}")

Number of words in vocab: 10000
5 most common words: ['', '[UNK]', 'the', 'a', 'in']
5 least common words: ['pages', 'paeds', 'pads', 'padres', 'paddytomlinson1']


### Creating an Embedding using an Embedding Layer
* Turns positive integers (indexes) into dense vectors of fixed size.

<https://www.tensorflow.org/api_docs/python/tf/keras/layers/Embedding>

The parameters we care most about for our embedding layer:
* input_dim = the size of our vocabulary
* output_dim = the size of the output embedding vector, for example, a value of 100 would be 100 long  
* input_length = length of the sequences being passed to the embedding layer

In [26]:
from tensorflow.keras import layers
from tensorflow.keras.layers import Embedding
embedding = layers.Embedding(input_dim = max_vocab_length, # set input shape
                             embeddings_initializer = "uniform",
                             output_dim = 128,
                             input_length = max_length # how long is each input
                             )

embedding

<keras.layers.embeddings.Embedding at 0x2e2e5458d90>

In [27]:
# Get a random sentence from the training set
random_sentence = random.choice(train_sentences)
print(f'Original text:\n {random_sentence}\
    \n\nEmbedded version:')

# Embed the random sentence (turn it into dense vectors of fixed size)
sample_embed = embedding(text_vectorizer([random_sentence]))
sample_embed

Original text:
 It's not a cute dinner date Til cams nose starts bleeding    

Embedded version:


<tf.Tensor: shape=(1, 15, 128), dtype=float32, numpy=
array([[[ 0.02485852,  0.03233856, -0.04716928, ...,  0.04837019,
          0.0213588 , -0.01183138],
        [-0.01495129, -0.01103653, -0.03942865, ..., -0.02537896,
         -0.02434093, -0.02711899],
        [-0.0443208 , -0.03886548,  0.01951635, ...,  0.0177494 ,
         -0.01526488,  0.02859325],
        ...,
        [ 0.00250883, -0.04312918,  0.03395155, ..., -0.00681384,
          0.02685476, -0.03502151],
        [ 0.00250883, -0.04312918,  0.03395155, ..., -0.00681384,
          0.02685476, -0.03502151],
        [ 0.00250883, -0.04312918,  0.03395155, ..., -0.00681384,
          0.02685476, -0.03502151]]], dtype=float32)>

In [28]:
# Check out a single token's embedding
sample_embed[0][0] , sample_embed[0][0].shape , random_sentence

(<tf.Tensor: shape=(128,), dtype=float32, numpy=
 array([ 0.02485852,  0.03233856, -0.04716928,  0.03723738, -0.01368067,
        -0.02831901, -0.0383569 , -0.01348202, -0.04510629,  0.03319595,
         0.02417148,  0.02111593, -0.02403777,  0.02472608, -0.04178238,
        -0.02655692,  0.04049946, -0.0412048 , -0.04689313,  0.03895967,
         0.04284109, -0.01429711,  0.00191595, -0.03761158, -0.03745779,
        -0.00706595, -0.01503211,  0.01544204,  0.0246747 , -0.02154256,
        -0.02964962,  0.03123445,  0.04383769, -0.01013821, -0.02994606,
        -0.04985712, -0.00602702, -0.00833012, -0.03166606,  0.01255685,
         0.00366644, -0.00241054, -0.04014023, -0.03372522,  0.03008522,
        -0.03119853, -0.04816645, -0.04837053,  0.04360266,  0.02849423,
        -0.04630911,  0.01529081,  0.01092824, -0.04813163, -0.04680904,
         0.03699395,  0.00532144,  0.04095611,  0.02114103,  0.04271544,
        -0.04360133, -0.04761641, -0.0201099 , -0.00509583, -0.01948512,
  

### Experiments we're running 
* Model 0: Naive Bayes with TF-IDF encoder (baseline)
* Model 1: Feed-forward neural network (dense model)
* Model 2: LSTM (RNN)
* Model 3: GRU (RNN)
* Model 4: Bidirectional-LSTM (RNN)
* Model 5: 1D Convolutional Neural Network
* Model 6: TensorFlow Hub Pretrained Feature Extractor
* Model 7: TensorFlow Hub Pretrained Feature Extractor
(10% of data) 

How are we going to approach all of these?
USe the standard steps in modelling with tensorflow
* Create a model
* Build a model
* Fit a model
* Evaluate our model

### Model 0: Getting a baseline

As with all machine learning modelling experiments, it's important to create a baseline model so you've got a benchmark for future experiments to build upon.

To create our baseline, we'll use SkLearn's Multi-nomial Naive Bayes using the TF-IDF formula to convert our words to numbers.

> **Note:** It's a common practice to use non-DL algorithm as a baseline because of their speed, and then later using DL to see if you can improve upon them.

In [29]:
from sklearn.feature_extraction.text import TfidfVectorizer
from sklearn.naive_bayes import MultinomialNB
from sklearn.pipeline import Pipeline

# Create tokenization and modelling pipeline
# TF-IDF (term frequency-inverse document frequency) is a statistical measure,
# that evaluates how relevant a word is to a document in a collection of documents.

# This is done by multiplying two metrics: how many times a word appears in a document,
# and the inverse document frequency of the word across a set of documents.

model_0 = Pipeline([
    ("tfidf" , TfidfVectorizer()), # Convert words to numbers using tfidf 
    ("clf" , MultinomialNB()) # Model the text and "clf" stands for classifier
])

# Fit the pipeline to the training data
model_0.fit(train_sentences , train_labels)

Pipeline(steps=[('tfidf', TfidfVectorizer()), ('clf', MultinomialNB())])

In [30]:
# Evaluate our baseline model
baseline_score = model_0.score(val_sentences , val_labels)
print(f"Our baseline model achieves an accuracy of: {baseline_score*100:.2f}%")

Our baseline model achieves an accuracy of: 79.27%


In [31]:
# Make predictions
baseline_preds = model_0.predict(val_sentences)
baseline_preds[:20]

array([1, 1, 1, 0, 0, 1, 1, 1, 1, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 1],
      dtype=int64)

In [32]:
train_labels

array([0, 0, 1, ..., 1, 1, 0], dtype=int64)

### Creating an evaluation function for our model experiments

We could evaluate all of our model's predictions with different metrics every time, however, this will be cumbersome and could  easily be fixed with a function.

Let's create one to compare our model's predictions with the truth labels using the following metrics:
* Accuracy
* Precision
* Recall
* F1

In [33]:
# Function to evaluate: accuracy, precision, recall, f1-score
from sklearn.metrics import accuracy_score , precision_recall_fscore_support
 
def calculate_results(y_true , y_pred):
    """
    Calculate model accuracy, precision, recall and f1 score of a binary classification model.
    """
    # Calculate model accuracy
    model_accuracy = accuracy_score(y_true , y_pred) * 100
    # Calculate model precision, recall and f1-score using "weighted" average
    model_precision, model_recall , model_f1 , _ = precision_recall_fscore_support(y_true , y_pred, average = 'weighted')
    model_results = {"accuracy": model_accuracy,
                     "precision" : model_precision,
                     "recall" : model_recall,
                     "f1" : model_f1}
    return model_results

In [34]:
# Get baseline results 
baseline_results = calculate_results(y_true = val_labels,
                                     y_pred = baseline_preds)
baseline_results

{'accuracy': 79.26509186351706,
 'precision': 0.8111390004213173,
 'recall': 0.7926509186351706,
 'f1': 0.7862189758049549}

### Model 1: A simple dense model

In [35]:
# Create a tensorboard callback (need to create a new one for each model)
from helper_functions import create_tensorboard_callback

# Create a directory to save TensorBoard logs
SAVE_DIR = "model_logs"

In [36]:
# Build model with the Functional API
from tensorflow.keras import layers
inputs = layers.Input(shape=(1,), dtype="string") # inputs are 1-dimensional strings
x = text_vectorizer(inputs) # turn the input text into numbers
x = embedding(x) # create an embedding of the numerized numbers
x = layers.GlobalMaxPooling1D()(x) # lower the dimensionality of the embedding (try running the model without this layer and see what happens)
outputs = layers.Dense(1, activation="sigmoid")(x) # create the output layer, want binary outputs so use sigmoid activation
model_1 = tf.keras.Model(inputs, outputs, name="model_1_dense") # construct the model

In [37]:
model_1.summary()

Model: "model_1_dense"
_________________________________________________________________
 Layer (type)                Output Shape              Param #   
 input_1 (InputLayer)        [(None, 1)]               0         
                                                                 
 text_vectorization_1 (TextV  (None, 15)               0         
 ectorization)                                                   
                                                                 
 embedding (Embedding)       (None, 15, 128)           1280000   
                                                                 
 global_max_pooling1d (Globa  (None, 128)              0         
 lMaxPooling1D)                                                  
                                                                 
 dense (Dense)               (None, 1)                 129       
                                                                 
Total params: 1,280,129
Trainable params: 1,280,129
N

In [38]:
# Compile model
model_1.compile(loss="binary_crossentropy",
                optimizer=tf.keras.optimizers.Adam(),
                metrics=["accuracy"])

In [39]:
# Fit the model
model_1_history = model_1.fit(train_sentences, # input sentences can be a list of strings due to text preprocessing layer built-in model
                              train_labels,
                              epochs=5,
                              validation_data=(val_sentences, val_labels))

Epoch 1/5
Epoch 2/5
Epoch 3/5
Epoch 4/5
Epoch 5/5


In [40]:
model_1.evaluate(val_sentences , val_labels)



[0.46970465779304504, 0.7913385629653931]

In [41]:
# Make some predictions and evaluate those
model_1_pred_probs = model_1.predict(val_sentences)
model_1_pred_probs.shape

(762, 1)

In [42]:
# Convert model prediction probabilities to label format
model_1_preds = tf.squeeze(tf.round(model_1_pred_probs)) 

In [43]:
model_1_preds[:10]

<tf.Tensor: shape=(10,), dtype=float32, numpy=array([0., 1., 1., 0., 0., 1., 1., 1., 1., 0.], dtype=float32)>

In [44]:
# Calculate our model 1 results
model_1_results = calculate_results(y_true = val_labels,
                                    y_pred = model_1_preds)
model_1_results

{'accuracy': 79.13385826771653,
 'precision': 0.7957855407433384,
 'recall': 0.7913385826771654,
 'f1': 0.7886149964743017}

In [45]:
baseline_results

{'accuracy': 79.26509186351706,
 'precision': 0.8111390004213173,
 'recall': 0.7926509186351706,
 'f1': 0.7862189758049549}

### Visualizing learned embeddings

In [46]:
# Get the vocabulary from the text vectorizattion layer
words_in_vocab = text_vectorizer.get_vocabulary()
len(words_in_vocab) , words_in_vocab[:10]

(10000, ['', '[UNK]', 'the', 'a', 'in', 'to', 'of', 'and', 'i', 'is'])

In [47]:
max_vocab_length

10000

In [48]:
model_1.summary()

Model: "model_1_dense"
_________________________________________________________________
 Layer (type)                Output Shape              Param #   
 input_1 (InputLayer)        [(None, 1)]               0         
                                                                 
 text_vectorization_1 (TextV  (None, 15)               0         
 ectorization)                                                   
                                                                 
 embedding (Embedding)       (None, 15, 128)           1280000   
                                                                 
 global_max_pooling1d (Globa  (None, 128)              0         
 lMaxPooling1D)                                                  
                                                                 
 dense (Dense)               (None, 1)                 129       
                                                                 
Total params: 1,280,129
Trainable params: 1,280,129
N

In [49]:
# Get the weight matrix of embedding layer
# (these are the numerical representations of each token in our training data,
# which have been learned for -5 epochs)

embed_weights = model_1.get_layer('embedding').get_weights()
embed_weights

[array([[-0.03737159, -0.09939637,  0.09148011, ..., -0.0810068 ,
         -0.03535755, -0.03866373],
        [-0.03986329, -0.0303731 , -0.04581859, ..., -0.05121184,
         -0.03792265, -0.04977376],
        [-0.02781821, -0.06172787, -0.01588579, ..., -0.0306049 ,
         -0.04177805, -0.01413913],
        ...,
        [ 0.02937244,  0.00391371,  0.01603809, ...,  0.01262745,
          0.03735823, -0.04100728],
        [-0.0409763 , -0.01351456,  0.02660655, ..., -0.02940427,
         -0.03146774, -0.03172175],
        [-0.02211489, -0.00451796,  0.04047531, ..., -0.01452338,
         -0.01123087, -0.04482847]], dtype=float32)]

Now we've got the embedding matrix our model has learned to represent our tokens, let's see how we can visualize it.

To do so, TensorFlow has a handy tool called projector: <http://projector.tensorflow.org/>

And TensorFlow also has an incredible guide on word embeddings themselves: <https://www.tensorflow.org/tutorials/text/word_embeddings>

In [50]:
weights = model_1.get_layer('embedding').get_weights()[0]
vocab = text_vectorizer.get_vocabulary()

In [51]:
# Create embedding files (we got this from TensorFlow's word embeddings documentation)
import io
out_v = io.open('vectors.tsv', 'w', encoding='utf-8')
out_m = io.open('metadata.tsv', 'w', encoding='utf-8')

for index, word in enumerate(vocab):
  if index == 0:
    continue  # skip 0, it's padding.
  vec = weights[index]
  out_v.write('\t'.join([str(x) for x in vec]) + "\n")
  out_m.write(word + "\n")
out_v.close()
out_m.close()

In [52]:
# Download files from Colab to upload to projector
try:
    from google.colab import files
    files.download('vector.tsv')
    files.download('metadata.tsv')
except Exception:
    pass

Downloading the files above we can visualize them using <https://projector.tensorflow.org/>

    Resources: If you'd like to know more about embeddings, check out:
* Jay Alammar's visualized word2vec post: <https:jalammar.github.io/illustrated-word2vec/>
* TensorFlow's Word Embeddings guide: <https://www.tensorflow.org/tutorials/text/word_embeddings> 

## Recurrent Neural Networks (RNN's)

RNN's are useful for sequence data.

The premise of a recurrent neural network is to use the representation of a previous input to aid the representation of a later input.

IF you want an overview of the internals of a recurrent neural network, see the following:
- MIT's sequence modelling lecture: <https://youtu.be/qjrad0V0uJE>
- Chris Olah's intro to LSTMs: <https://colah.github.io/posts/2015-08-Understanding-LSTMs/>
- Andrej Karpathy's the unreasonable effectiveness of recurrent neural networks: <http://karpathy.github.io/2015/05/21/rnn-effectiveness/>

### Model 2: LSTM 
LSTM = long short term memory (one of the most popular LSTM cells)
Our structure of an RNN typically looks like this:

Input (Text) -> Tokenize -> Embedding -> Layers (RNNs/dense) -> Output (Label probability)


In [53]:
# Create an LSTM model
from tensorflow.keras import layers
from tensorflow.keras.layers import LSTM
inputs = layers.Input(shape = (1 , ) , dtype = tf.string)
x = text_vectorizer(inputs)
x = embedding(x)
print(x.shape)
x = LSTM(units = 64 , return_sequences = True)(x) # When you're stacking RNN cells together, you need to return_sequences = True
print(x.shape)
x = layers.LSTM(64)(x)
print(x.shape)
x = layers.Dense(64 , activation = 'relu')(x)
outputs = layers.Dense(1 , activation = 'sigmoid')(x)

model_2 = tf.keras.Model(inputs , outputs , name = 'model_2_LSTM')

(None, 15, 128)
(None, 15, 64)
(None, 64)


In [54]:
model_2.summary()

Model: "model_2_LSTM"
_________________________________________________________________
 Layer (type)                Output Shape              Param #   
 input_2 (InputLayer)        [(None, 1)]               0         
                                                                 
 text_vectorization_1 (TextV  (None, 15)               0         
 ectorization)                                                   
                                                                 
 embedding (Embedding)       (None, 15, 128)           1280000   
                                                                 
 lstm (LSTM)                 (None, 15, 64)            49408     
                                                                 
 lstm_1 (LSTM)               (None, 64)                33024     
                                                                 
 dense_1 (Dense)             (None, 64)                4160      
                                                      

In [55]:
# Compile the model
model_2.compile(loss = 'binary_crossentropy',
                optimizer = tf.keras.optimizers.Adam(),
                metrics = ['accuracy'])

In [56]:
# Fit the model
model_2.fit(x = train_sentences,
            y = train_labels,
            epochs = 5,
            validation_data = (val_sentences , val_labels)
            )

Epoch 1/5
Epoch 2/5
Epoch 3/5
Epoch 4/5
Epoch 5/5


<keras.callbacks.History at 0x2e2e6b0acd0>

In [57]:
# Make predictions with LSTM model
model_2_pred_probs = model_2.predict(val_sentences)
model_2_pred_probs[:10]

array([[0.32794964],
       [0.32591593],
       [0.99734676],
       [0.15124673],
       [0.00184265],
       [0.99999446],
       [0.8675568 ],
       [0.99999595],
       [0.99999   ],
       [0.50123966]], dtype=float32)

In [58]:
# Convert model 2 pred probs to labels 
model_2_preds = tf.squeeze(tf.round(model_2_pred_probs))
model_2_preds[:10]

<tf.Tensor: shape=(10,), dtype=float32, numpy=array([0., 0., 1., 0., 0., 1., 1., 1., 1., 1.], dtype=float32)>

In [59]:
# Calculate model 2 results
model_2_results = calculate_results(y_true = val_labels,
                                    y_pred = model_2_preds)
model_2_results

{'accuracy': 76.64041994750657,
 'precision': 0.7661818147921768,
 'recall': 0.7664041994750657,
 'f1': 0.7655229275917975}

In [60]:
baseline_results

{'accuracy': 79.26509186351706,
 'precision': 0.8111390004213173,
 'recall': 0.7926509186351706,
 'f1': 0.7862189758049549}

### Model 3: GRU

Another popular and effective RNN component is the GRU or gated recurrent unit.

The GRU cell has similar features to an LSTM cell but has less parameters.

In [61]:
# Build an RNN using the GRU cell
from tensorflow.keras import layers
inputs = layers.Input(shape = (1, ) , dtype = tf.string)
x = text_vectorizer(inputs)
x = embedding(x)
x = layers.GRU(128 , return_sequences = True)(x)
x = layers.LSTM(512 , return_sequences = True)(x)
x = layers.GRU(1024 , return_sequences = True)(x)
x = layers.MaxPool1D()(x)
x = layers.Dense(256 , activation = 'LeakyReLU')(x)      
x = layers.GlobalAveragePooling1D()(x)
outputs = layers.Dense(1 , activation = 'sigmoid')(x)
model_3 = tf.keras.Model(inputs , outputs , name = 'model_3_GRU')

In [62]:
model_3.summary()

Model: "model_3_GRU"
_________________________________________________________________
 Layer (type)                Output Shape              Param #   
 input_3 (InputLayer)        [(None, 1)]               0         
                                                                 
 text_vectorization_1 (TextV  (None, 15)               0         
 ectorization)                                                   
                                                                 
 embedding (Embedding)       (None, 15, 128)           1280000   
                                                                 
 gru (GRU)                   (None, 15, 128)           99072     
                                                                 
 lstm_2 (LSTM)               (None, 15, 512)           1312768   
                                                                 
 gru_1 (GRU)                 (None, 15, 1024)          4724736   
                                                       

In [63]:
# Compile the model
model_3.compile(loss = 'binary_crossentropy',
                optimizer = tf.keras.optimizers.Adam(),
                metrics = ['accuracy'])

In [64]:
# Fit the model
model_3.fit(x = train_sentences,
            y = train_labels,
            epochs = 5,
            validation_data = (val_sentences , val_labels))

Epoch 1/5
Epoch 2/5
Epoch 3/5
Epoch 4/5
Epoch 5/5


<keras.callbacks.History at 0x2e2e8e3c610>

In [65]:
# Make some predictions with our GRU model
model_3_pred_probs = model_3.predict(val_sentences)
model_3_pred_probs[:10]

array([[2.4908960e-02],
       [6.3286465e-01],
       [9.9666882e-01],
       [5.3191483e-03],
       [2.0460395e-06],
       [9.9933851e-01],
       [6.4136481e-01],
       [9.9999905e-01],
       [9.9993122e-01],
       [2.9470533e-02]], dtype=float32)

In [66]:
# Convert model 3 pred probs to labels
model_3_preds = tf.squeeze(tf.round(model_3_pred_probs))
model_3_preds[:10]

<tf.Tensor: shape=(10,), dtype=float32, numpy=array([0., 1., 1., 0., 0., 1., 1., 1., 1., 0.], dtype=float32)>

In [67]:
model_3_results = calculate_results(y_true = val_labels,
                                    y_pred = model_3_preds)
model_3_results

{'accuracy': 77.69028871391076,
 'precision': 0.7822241302284023,
 'recall': 0.7769028871391076,
 'f1': 0.7734519762210931}

In [68]:
baseline_results

{'accuracy': 79.26509186351706,
 'precision': 0.8111390004213173,
 'recall': 0.7926509186351706,
 'f1': 0.7862189758049549}

### Model 4: Bidectional RNN
Normal RNN's go from left to right (just like you'd read an English sentence) however, a bidrectional RNN goes from right to left as well as left to right.

In [69]:
# Build a bidirectional RNN in TensorFlow
from tensorflow.keras import layers
inputs = layers.Input(shape = (1 , ) , dtype = tf.string)
x = text_vectorizer(inputs)
x = embedding(x)
x = layers.Bidirectional(layers.GRU(128 , return_sequences = True))(x)
x = layers.Bidirectional(layers.LSTM(64, return_sequences = True))(x)
x = layers.GlobalMaxPool1D()(x)
outputs = layers.Dense(1 , activation = 'sigmoid')(x)
model_4 = tf.keras.Model(inputs , outputs , name = 'model_4_bidirecttional')

In [70]:
model_4.summary()

Model: "model_4_bidirecttional"
_________________________________________________________________
 Layer (type)                Output Shape              Param #   
 input_4 (InputLayer)        [(None, 1)]               0         
                                                                 
 text_vectorization_1 (TextV  (None, 15)               0         
 ectorization)                                                   
                                                                 
 embedding (Embedding)       (None, 15, 128)           1280000   
                                                                 
 bidirectional (Bidirectiona  (None, 15, 256)          198144    
 l)                                                              
                                                                 
 bidirectional_1 (Bidirectio  (None, 15, 128)          164352    
 nal)                                                            
                                            

In [71]:
# Compile the model
model_4.compile(loss = 'binary_crossentropy',
                optimizer = tf.keras.optimizers.Adam(),
                metrics = ['accuracy'])

In [72]:
# Fit the model
model_4.fit(x = train_sentences,
            y = train_labels,
            epochs = 5,
            validation_data = (val_sentences , val_labels))

Epoch 1/5
Epoch 2/5
Epoch 3/5
Epoch 4/5
Epoch 5/5


<keras.callbacks.History at 0x2e284458820>

In [73]:
# Convert pred probs to pred labels
model_4_pred_probs = model_4.predict(val_sentences)
model_4_pred_probs[:10]

array([[2.4345517e-04],
       [7.3600268e-01],
       [9.9991959e-01],
       [1.3414189e-01],
       [8.5497908e-05],
       [9.9929065e-01],
       [8.9634168e-01],
       [9.9992436e-01],
       [9.9992985e-01],
       [3.6375433e-01]], dtype=float32)

In [74]:
# Convert pred probs to pred labels 
model_4_preds = tf.squeeze(tf.round(model_4_pred_probs))
model_4_preds[:10]

<tf.Tensor: shape=(10,), dtype=float32, numpy=array([0., 1., 1., 0., 0., 1., 1., 1., 1., 0.], dtype=float32)>

In [75]:
# Calculate the results of our bidrectional model
model_4_results = calculate_results(y_true = val_labels,
                                    y_pred = model_4_preds)
model_4_results

{'accuracy': 76.9028871391076,
 'precision': 0.7708613696015271,
 'recall': 0.7690288713910761,
 'f1': 0.766802361272573}

### Convolution Neural Networks for Text (and other types of sequences)

We've used CNNs for images but images are typically 2D (height x width)... however, out text data is 1D.

Previously we've Conv2D for our image data but now we're going to use Conv1D.

The typical structure of a Conv1D model for sequences (in our case, text):

```
Inputs (text) -> Tokenization -> Embedding -> Layer(s)
(typically Conv1D + pooling) -> Outputs (class probabilities)
```

### Model 5: Conv1D

In [76]:
# Test out out embedding layer, Conv1D layer and max pooling
from tensorflow.keras.layers import Conv1D
embedding_test = embedding(text_vectorizer(["this is a test sentence"])) # Turn target sequence into embedding
conv_1d = Conv1D(filters = 32,
                        kernel_size = 5, # this is also referred to as an ngram of 5 (meaning it looks at 5 words at a time)
                        activation = 'LeakyReLU',
                        padding = 'valid') # default = 'valid', the output is smaller than the input shape, 'same' means output is same shape as input
conv_1d_output = conv_1d(embedding_test) # pass test embedding through conv1d layer
max_pool = layers.GlobalMaxPool1D()
max_pool_output = max_pool(conv_1d_output) # equivalent to "get the most important feature" or "get the feature with the highest value"
embedding_test.shape , conv_1d_output.shape , max_pool_output.shape

(TensorShape([1, 15, 128]), TensorShape([1, 11, 32]), TensorShape([1, 32]))

In [77]:
# Create 1-dimensional convolutional layer to model sequences
from tensorflow.keras import layers
inputs = layers.Input(shape = (1 , ) , dtype = tf.string)
x = text_vectorizer(inputs)
x = embedding(x)
x = layers.Conv1D(filters = 64, kernel_size = 3 , activation = 'LeakyReLU', strides = 1, padding = 'same')(x)
x = layers.GlobalMaxPool1D()(x) 
x = layers.Dense(64 , activation = 'LeakyReLU')(x)
outputs = layers.Dense(1 , activation = 'sigmoid')(x)

model_5 = tf.keras.Model(inputs , outputs , name = 'model_5_Conv1D')

In [78]:
model_5.summary()

Model: "model_5_Conv1D"
_________________________________________________________________
 Layer (type)                Output Shape              Param #   
 input_5 (InputLayer)        [(None, 1)]               0         
                                                                 
 text_vectorization_1 (TextV  (None, 15)               0         
 ectorization)                                                   
                                                                 
 embedding (Embedding)       (None, 15, 128)           1280000   
                                                                 
 conv1d_1 (Conv1D)           (None, 15, 64)            24640     
                                                                 
 global_max_pooling1d_3 (Glo  (None, 64)               0         
 balMaxPooling1D)                                                
                                                                 
 dense_6 (Dense)             (None, 64)             

In [79]:
# Compile the model
model_5.compile(loss = 'binary_crossentropy',
                optimizer = tf.keras.optimizers.Adam(),
                metrics = ['accuracy'])


In [80]:
# Fit the model
model_5.fit(x = train_sentences,
            y = train_labels,
            epochs = 5,
            validation_data = (val_sentences , val_labels))

Epoch 1/5
Epoch 2/5
Epoch 3/5
Epoch 4/5
Epoch 5/5


<keras.callbacks.History at 0x2e28497dd60>

In [81]:
model_5_pred_probs = model_5.predict(val_sentences)
model_5_pred_probs[:10]

array([[2.4117351e-02],
       [7.0982897e-01],
       [9.9984968e-01],
       [6.5873057e-02],
       [2.5609457e-05],
       [9.9230397e-01],
       [9.9045825e-01],
       [9.9989414e-01],
       [9.9998546e-01],
       [1.5256831e-01]], dtype=float32)

In [82]:
# Convert model 5 pred probs to labels
model_5_preds = tf.squeeze(tf.round(model_5_pred_probs))
model_5_preds[:10]

<tf.Tensor: shape=(10,), dtype=float32, numpy=array([0., 1., 1., 0., 0., 1., 1., 1., 1., 0.], dtype=float32)>

In [83]:
# Evaluate model 5 predictions
model_5_results = calculate_results(y_true = val_labels,
                                    y_pred = model_5_preds)
model_5_results

{'accuracy': 76.37795275590551,
 'precision': 0.765729922717656,
 'recall': 0.7637795275590551,
 'f1': 0.7613639638656104}

### **USE-** Universal Feature Extractor
Source: <https://tfhub.dev/google/universal-sentence-encoder/4>

### Model 6: TensorFlow Hub Pretrained Sentence Encoder

In [84]:
sample_sentence

"There's a flood in my street!"

In [85]:
import tensorflow_hub as hub
embed = hub.load("https://tfhub.dev/google/universal-sentence-encoder/4")
embed_samples = embed([sample_sentence,
                      "When you can the universal sentence encoder on a sentence, it turns it into numbers."])
print(embed_samples[0][:50])

tf.Tensor(
[-0.01157027  0.02485911  0.02878048 -0.01271501  0.03971541  0.08827761
  0.02680985  0.05589836 -0.01068731 -0.00597293  0.00639323 -0.01819516
  0.00030815  0.09105889  0.05874644 -0.03180626  0.01512474 -0.05162926
  0.00991366 -0.06865344 -0.04209306  0.02678978  0.03011006  0.00321068
 -0.00337968 -0.04787356  0.0226672  -0.00985928 -0.04063615 -0.01292093
 -0.04666383  0.056303   -0.03949254  0.00517684  0.02495828 -0.0701444
  0.0287151   0.04947681 -0.00633977 -0.08960192  0.0280712  -0.00808363
 -0.01360601  0.0599865  -0.10361788 -0.05195374  0.00232955 -0.02332529
 -0.03758105  0.03327728], shape=(50,), dtype=float32)


In [86]:
embed_samples.shape

TensorShape([2, 512])

In [87]:
# Create a Keras Layer using the USE pretrained layer from tensorflow hub
sentence_encoder_layer = hub.KerasLayer("https://tfhub.dev/google/universal-sentence-encoder/4",
                                        input_shape = [],
                                        dtype = tf.string,
                                        trainable = False,
                                        name = "USE")

In [88]:
# Create model using the Sequential API
model_6 = tf.keras.Sequential([
    sentence_encoder_layer,
    layers.Dense(64 , activation = 'relu'),
    layers.Dense(1 , activation = 'sigmoid'),
], name = 'model_6_USE')

# Compile the model
model_6.compile(loss = 'binary_crossentropy',
                optimizer = tf.keras.optimizers.Adam(),
                metrics = ['accuracy'])

# Fit the model
history_model_6 = model_6.fit(x = train_sentences,
            y = train_labels,
            epochs = 5,
            validation_data = (val_sentences , val_labels))

Epoch 1/5
Epoch 2/5
Epoch 3/5
Epoch 4/5
Epoch 5/5


In [89]:
# Make predictions with USE TF Hub Model
model_6_pred_probs = model_6.predict(val_sentences)
model_6_pred_probs[:10]

array([[0.20312235],
       [0.80665845],
       [0.9886973 ],
       [0.23531431],
       [0.74456537],
       [0.78820384],
       [0.9805012 ],
       [0.984424  ],
       [0.94469553],
       [0.11483777]], dtype=float32)

In [90]:
# Convert prediction probabilities to label
model_6_preds = tf.squeeze(tf.round(model_6_pred_probs))
model_6_preds[:10]

<tf.Tensor: shape=(10,), dtype=float32, numpy=array([0., 1., 1., 0., 1., 1., 1., 1., 1., 0.], dtype=float32)>

In [91]:
# Calculate model 6 performance metrics
model_6_results = calculate_results(y_true = val_labels,
                                    y_pred = model_6_preds)
model_6_results

{'accuracy': 81.23359580052494,
 'precision': 0.8129238565064447,
 'recall': 0.8123359580052494,
 'f1': 0.8114314841586915}

### Model 7: TF Hub Pretrained USE but with 10% of training data

Transfer learning really helps when you don't have a large dataset.

In [None]:
training_10_percent_split = int(0.1 * len(train_sentences))
train_sentences_10_percent = train_sentences[: training_10_percent_split]
train_labels_10_percent = train_labels[:training_10_percent_split]

In [None]:
pd.Series(np.array(train_labels_10_percent)).value_counts()

To recreate a model the same as a previous model you've created you can use the `tf.keras.models.clone_model()` method
* Model cloning is similar to calling a model on new inputs, except that it creates new layer (and thus new weights) instead of sharing the weights of the existing layers.

In [None]:
model_7 = tf.keras.models.clone_model(model_6 , name = "model_7_USE")

# Compile the model
model_7.compile(loss = 'binary_crossentropy',
                optimizer = tf.keras.optimizers.Adam(),
                metrics = ['accuracy'])

# Fit the model
history_model_7 = model_7.fit(x = train_sentences,
            y = train_labels,
            epochs = 5,
            validation_data = (val_sentences , val_labels))

In [None]:
model_7.summary()

In [None]:
# Make predictions with the model trained on 10% of the data
model_7_pred_probs = model_7.predict(val_sentences)
model_7_pred_probs[:10]

In [None]:
# Turn pred probs into labels
model_7_preds = tf.squeeze(tf.round(model_7_pred_probs))
model_7_preds

In [None]:
model_7_results = calculate_results(y_true = val_labels,
                                    y_pred = model_7_preds)
model_7_results

### Comparing the performance of each of our models

In [None]:
# Combine model results into a DataFrame
all_model_results = pd.DataFrame({'0_baseline': baseline_results,
                                  '1_simple_dense': model_1_results,
                                  '2_lstm': model_2_results,
                                  '3_gru': model_3_results,
                                  '4_bidirectional': model_4_results,
                                  '5_conv1d': model_5_results,
                                  '6_tf_hub_use_encoder': model_6_results,
                                  '7_tf_hub__use_encoder': model_7_results})

all_model_results = all_model_results.transpose()
all_model_results

In [None]:
# Reduce the accuracy to the same scale as other metrics
all_model_results['accuracy'] = all_model_results['accuracy']/100

In [None]:
# Plot and compare all of the model results
all_model_results.plot(kind = 'bar' , figsize = (10 , 7).legend(bbox_to_anchor = (1.0 , 1.0)))

In [None]:
# Sort model results by f1-score
all_model_results.sort_values('f1' , ascending = False)["f1"].plot(kind = 'bar' , figsize = (10 , 7))

### Uploading our model training logs to TensoorBoard.dev

### Saving and loading a trained model
There are two main formats to save a model to in TensorFlow:
1. The HDF5 format
2. The `SavedModel` format (this is the default when using TensorFlow)

In [None]:
# Save TF Hub Sentence Encoder model to HDF5 format
model_6.save("model_6.h5")

In [None]:
# Load model with custom Hub Layer (required HDF5 format)
import tensorflow_hub as hub
loaded_model_6 = tf.keras.models.load_model("model_6.h5",
                                            custom_objects = {"KerasLayer": hub.KerasLayer})

Now let's save to the SavedModel format....

In [None]:
# Save TF Hub Sentence Encoder model to SavedModel format
model_6.save("model_6_SavedModel_format")

In [None]:
# Load in a model from the SavedModel format
loaded_model_6_SavedModel_format = tf.keras.models.load_model("model_6_SavedModel_format")

### Finding the most wrong examples

* If our best model still isn't perfect, what examples is it getting wrong?
* And of these wrong examples which ones is it getting *most* wrong (those will prediction probabilites closest to the opposite class)

For examples if a samples should have a label of 0 but our model predicts a prediction probability of 0.999 (really close to 1) and vice versa.

In [None]:
val_df = pd.DataFrame({'text': val_sentences,
                       'target': val_labels,
                       'pred': model_6_preds,
                       'pred_probs': tf.squeeze(model_6_pred_probs)})
val_df.head()

In [None]:
# Find the wrong predictions and sort by prediction probabilities
most_wrong = val_df[val_df['target'] != val_df['pred']].sort_values('pred_probs' , asceding = False)
most_wrong[:10]
# These are false positives

Let's remind ourselves of the target labels..
* 0 = not disaster
* 1 = disaster

In [None]:
most_wrong.tail()
# These are false negatives

In [None]:
# Check the false positives (model predicted 1 when should've been 0)
for row in most_wrong[:10].itertuples():
    _, text, target, pred, pred_prob = row
    print(f"Target: {target}, Pred: {pred}, Prob: {pred_prob}")
    print(f"Text:\n{text}\n")
    print("----\n")

In [None]:
# Check the false negatives (model predicted 0 when should've been 1)
for row in most_wrong[-10:].itertuples():
    _, text, target, pred, pred_prob = row
    print(f"Target: {target}, Pred: {pred}, Prob: {pred_prob}")
    print(f"Text:\n{text}\n")
    print("----\n")

In [None]:
test_df

### Making predictions on the test dataset

In [None]:
# Making predictions on the test dataset and visualizing them
test_sentences = test_df['text'].to_list()
test_samples = random.sample(test_sentences , 10)
for test_sample in test_samples:
    pred_prob = tf.squeeze(model_6.predict(test_samples))
    pred = tf.round(pred_prob)
    print(f"Pred: {int(pred)}, Prob: {pred_prob}")
    print(f"Text:\nn{test_sample}\n")
    print("----\n")

### The speed/score tradeoff

In [7]:
# let's make a function to measure the time of prediction
import time 
def pred_timer(model , samples):
    '''
    Times how long a model takes to make predictions on samples.
    '''
    start_time = time.perf_counter() #get start time 
    model.predict(samples) # Make predictions
    end_time = time.perf_counter() # Get finish time 
    total_time = end_time - start_time # Calculate how long predictions took to make
    time_per_pred = total_time / len(samples)
    return time_per_pred

In [None]:
# Calculate TF HUb Sentence Encoder time per pred
model_6_total_pred_time , model_6_time_per_pred = pred_timer(model = model_6,
                                                             samples = val_sentences)
model_6_total_pred_time , model_6_time_per_pred

In [None]:
# Calculate TF HUb Sentence Encoder time per pred
baseline_total_pred_time , baseline_time_per_pred = pred_timer(model = model_0,
                                                             samples = val_sentences)
baseline_total_pred_time , baseline_time_per_pred

In [None]:
import matplotlib.pyplot as plt 

plt.figure(figsize = (10 , 7))
plt.scatter(baseline_time_per_pred , baseline_results['f1'] , label = 'baseline')
plt.scatter(model_6_time_per_pred , model_6_results['f1'] , label = 'tf_hub')
plt.legend()
plt.title("F1-score versus time prediction")
plt.xlabel('Time per prediction')
plt.ylabel("F1-score")