# GRU & Glove Embeddings

Sequence prediction tasks require us to label each item of a sequence. Such tasks are common in natural language processing. Some examples include language modeling in which we predict the next word given a sequence of words at each step;
part-of-speech tagging, in which we predict the grammatical part of speech for each word; named entity recognition, in which we predict whether each word is part of a named entity, such as Person, Location, Product, or Organization;
and so on. 

Sometimes, in NLP literature, the sequence prediction tasks are also referred to as sequence labeling.

Although in theory we can use the **Vanilla/Elman recurrent neural networks for sequence prediction tasks, they fail to capture long-range dependencies well and perform poorly in practice**. 

Even though the vanilla/Elman RNN is well suited for modeling sequences, it has two issues that make it unsuitable for many tasks: the **inability to retain information for long-range predictions, and gradient stability**.
Recall that at their core, RNNs are computing a hidden state vector at each time step using the hidden state vector of the previous time step and an input vector at the current time step. It is this core computation that makes the RNN so powerful, but it also creates drastic numerical issues.

1. The first issue with Elman RNNs is the **difficulty in retaining long-range information**. At each time step the RNN simply updates the hidden state vector regardless of whether it made sense. As a consequence, the RNN has no control over which values are retained and which are discarded in the hidden state—that is entirely determined by the input. Intuitively, that doesn’t make sense. What is desired is some way for the RNN to decide if the update is optional, or if the update happens, by how much and what parts of the state vector, and so on.

2. The second issue with Elman RNNs is their tendency to cause **gradients to spiral out of control to zero or to infinity**. Unstable gradients that can spiral out of control are called either vanishing gradients or exploding gradients depending
on the direction in which the absolute values of the gradients are shrinking/growing. A really large absolute value of the gradient or a really small (less than 1) value can make the optimization procedure unstable. There are solutions to deal with these gradient problems in vanilla RNNs, such as the use of rectified linear units (ReLUs), gradient clipping, and careful
initialization. But none of the proposed solutions work as reliably as the technique called **gating**.

## Gating as a Solution to a Vanilla RNN’s Challenges

To intuitively understand gating, suppose that you were adding two quantities, a and b, but you wanted to control how much of b gets into the sum.
Mathematically, you can rewrite the sum a + b as:

   **a + λb**

where λ is a value between 0 and 1. If λ = 0, there is no contribution from b and if λ = 1, b contributes fully. Looking at it this way, you can interpret that λ acts as a “switch” or a “gate” in controlling the amount of b that gets into the sum.

This is the intuition behind the gating mechanism. 

Now let’s revisit the Elman RNN and see how gating can be incorporated into vanilla RNNs to make conditional updates. If the previous hidden state was h_t-1 and the current input is x_t, the recurrent update in the Elman RNN would look something like:

  **h_t = h_t-1 + F(h_t-1, x_t)**

where F is the recurrent computation of the RNN. Obviously, **this is an unconditioned sum** and has the evils described earlier. 

Now imagine if, instead of a constant, the λ in the previous example was a function of the previous hidden state vector h_t-1 and the current input x_t, and still produced the desired gating behavior; that is, a value between 0 and 1. With this gating function, our RNN update equation would appear as follows:

  **h_t = h_t-1 + λ(h_t-1, x_t) F(h_t-1, x_t)**
  
Now it becomes clear that the function λ controls how much of the current input gets to update the state h_t-1. Further, the function λ is context-dependent. This is the basic intuition behind all gated networks. The function λ is usually a sigmoid
function, which we know to produce a value between 0 and 1.


In the case of the **long short-term memory network (LSTM)** this basic intuition is extended carefully to incorporate not
only conditional updates, but also intentional forgetting of the values in the previous hidden state h_t-1. This “forgetting” happens by multiplying the previous hidden state value h_t-1 with another function, μ, that also produces values
between 0 and 1 and depends on the current input:

  **h_t = μ(h_t-1, x_t) h_t-1 + λ(h_t-1, x_t) F(h_t-1, x_t)**
  
The LSTM is only one of the many gated variants of the RNN. Another variant that’s becoming increasingly popular is the **gated recurrent unit (GRU)**. Fortunately, in PyTorch, you can simply replace the nn.RNN or nn.RNNCell with nn.LSTM or nn.LSTMCell with no other code change to switch to an LSTM (mutatis mutandis for GRU)!

The gating mechanism is an effective solution for Vanilla RNNs' problems seen earlier. It not only makes the updates controlled, but also keeps the gradient issues under check and makes training relatively easier. 


**Word Vectors** are often used as a fundamental component for downstream NLP tasks, e.g. question answering, text generation, translation, etc., so it is important to build some intuitions as to their strengths and weaknesses. Here, you will explore word vectors derived via **GloVe**.

## Import Packages

In [1]:
import os                                 # to create 'serialised' directory
import pandas as pd
import numpy as np
import json                               # for serialising dictionaries
import pickle                             # for serialising numpy arrays
import re
import time
import random
from collections import Counter           # this is for stopwords removal
from nltk.corpus import stopwords         # this is for stopwords removal

import torch
import torch.nn as nn
import torch.nn.functional as F
import torch.optim as optim
from torch.utils.data import TensorDataset, DataLoader

from sklearn.model_selection import train_test_split

import spacy                              # for tokenisation

In [2]:
pd.set_option('display.max_row', None)              # show all rows of a dataframe
pd.set_option('display.max_column', None)           # show all columns of a dataframe
pd.set_option('display.max_colwidth', None)         # show the full width of columns
pd.set_option('precision', 2)                       # round to 2 decimal points
pd.options.display.float_format = '{:,.2f}'.format  # comma separators and two decimal points: 4756.7890 => 4,756.79 and 4656 => 4,656.00 

In [3]:
TOK = spacy.load('en_core_web_sm')                                         # for tokenisation
stop_words = stopwords.words('english')                                    # for stopwords
stopwords_dict = Counter(stop_words)                                       # for stopwords
max_len = 100                                                              # for encoding

In [4]:
device = torch.device('cuda' if torch.cuda.is_available() else 'cpu')      # pytorch cuda

  return torch._C._cuda_getDeviceCount() > 0


## Functions

### 0. Radom Seed

In [5]:
def random_seeding(seed_value=9):
    random.seed(seed_value)                            # python 
    np.random.seed(seed_value)                         # numpy - global seeding. Sklearn uses this internally therefore there is no need to set a random seed when using Sklearn 
    torch.manual_seed(seed_value)                      # pytorch cpu
    if device=='cuda': 
        torch.cuda.manual_seed_all(seed_value)         # pytorch gpu

### 1. Read Datasets

In [6]:
def read_csv(filepath, filename, encoding='utf-8'):
    full_path = '{}\{}'.format(filepath, filename)
    return pd.read_csv(full_path, encoding=encoding, header=None)

### 2. Data Cleaning

In [7]:
def clean_tokenise(text):
    text = text.lower()
    text = re.sub(r"([.,!?])", r" \1 ", text)
    text = re.sub(r"[^a-zA-Z.,!?]+", r" ", text)
    text = ' '.join([word for word in text.split() if word not in stopwords_dict])  # remove stopwords
    
    return [token.text for token in TOK.tokenizer(text)]

### 3. Serialisation / Deserialisation

https://www.tutorialspoint.com/object_oriented_python/object_oriented_python_serialization.htm

In the context of data storage, serialization is the process of translating data structures or object state into a format that can be stored (for example, in a file or memory buffer) or transmitted and reconstructed later.

In serialization, an object is transformed into a format that can be stored, so as to be able to deserialize it later and recreate the original object from the serialized format.

We can do this with Pickle, JSON, ...

In [8]:
def serialise(obj_type, filepath, filename, obj):
    full_path = '{}\{}'.format(filepath, filename)
    if obj_type == 'python':
        with open(full_path, 'w') as f:
            json.dump(obj, f)
            
    if obj_type in ['numpy', 'pandas', 'tensor']:
        with open(full_path, 'wb') as f:
            pickle.dump(obj, f)                                         # protocol 0 is printable ASCII
        
        

def deserialise(obj_type, filepath, filename):
    full_path = '{}\{}'.format(filepath, filename)
    if obj_type == 'python':
        with open(full_path, 'r') as f:
            return json.load(f)
            
    if obj_type in ['numpy', 'pandas', 'tensor']:
        with open(full_path, 'rb') as f:
            return pickle.load(f)                                        # protocol 0 is printable ASCII

### 4. Vocabulary from the Traing Corpus

In [9]:
def tokens_dict(series):
    tokens = series.explode()
    tokens = tokens.tolist()
    tokens = Counter(tokens)                                    # frequency distribution

    num_words_before = len(tokens.keys())
    
    #To avoid 'RuntimeError: dictionary changed size during iteration' error, we need to make a .copy() of the dictionary.
    #This way we iterate over the original dictionary keys and delete elements on the fly.
    for k,v in tokens.copy().items():
        if v < 2:
            del tokens[k]

    tokens = {word: i for i,word in enumerate(tokens.keys())}    # word2index dictionary

    tokens['<UNK>'] = len(tokens)
    tokens['<MASK>'] = len(tokens)
    tokens['<END>'] = len(tokens)
    tokens['<START>'] = len(tokens)
    tokens[''] = len(tokens)
    
    num_words_after = len(tokens.keys())
    
    return num_words_before, num_words_after, tokens

### 5. All Glove Embeddings

In [10]:
def load_glove_from_file(glove_filepath):
    
    embeddings_dict = {} 
    
    with open(glove_filepath, mode='r', encoding="utf-8") as f:
        for index,line in enumerate(f):
            line_split = line.split()
            word = line_split[0]
            embeddings_dict[word] = np.array(line_split[1:], 'float32')
    
    return embeddings_dict

### 6. Embedding Matrix

In [11]:
def make_embedding_matrix(embeddings_dict, tokens):  
    
    embedding_size = len(next(iter(embeddings_dict.values())))     # length of each word embedding (e.g.300 dimensions)
    
    embeddings_matrix = np.zeros((len(tokens)+1, embedding_size))  # len(tokens) is the length of the vocabulary (i.e. unique words in the corpus)
                                                                   # the '+1' after len(tokens) is necessary because we want the first embedding (i.e. at index 0) to be a zero vector; this will be used for the paddings (we want all the '0' paddings to be paired with the zero vector)
    not_in_glove = []
    for token,i in tokens.items():
        if token in embeddings_dict:
            embeddings_matrix[i+1, :] = embeddings_dict[token]     # '+1' :see comment above
        else:
            not_in_glove.append(token)
            embedding_i = torch.ones(1, embedding_size)
            torch.nn.init.xavier_uniform_(embedding_i)             # if word is not in Glove emebeddings, creates a new vector with random numbers drawn from pytorch xavier_uniform distribution
            embeddings_matrix[i+1, :] = embedding_i                # '+1' :see comment above

    return not_in_glove, embeddings_matrix

### 7. Encoding

In [12]:
def encoding(input_sequence):
    encoded = np.zeros(max_len, dtype=int)
    encoded_lst = np.array([tokens.get(word, tokens['<UNK>']) for word in input_sequence])
    length = min(max_len, len(encoded_lst))
    encoded[:length] = encoded_lst[:length]
    return encoded

### 8. GRU Classifier

In [13]:
class GRUClassifier(nn.Module):
    def __init__(self, embedding_size, num_embeddings,          # embedding_size = 300, num_embeddings=95465
                 hidden_size, output_dim, dropout_p,
                 pretrained_embeddings=None, padding_idx=0):    # pretrained_embeddings = embedding_matrix
                                                                # padding_idx=0 makes sure that the padding vector (which in our embedding matrix is at index 0) doesn't get updated during training when Freeze=False
        super(GRUClassifier, self).__init__()                      

        if pretrained_embeddings is None:

            self.emb = nn.Embedding(embedding_dim=embedding_size,
                                    num_embeddings=num_embeddings,
                                    padding_idx=padding_idx)        
        else:
            pretrained_embeddings = torch.from_numpy(pretrained_embeddings).float()
            # freeze=False : the tensor does not get updated in the learning process. Equivalent to self.emb.weight.requires_grad = False
            self.emb = nn.Embedding.from_pretrained(pretrained_embeddings, freeze=True, padding_idx=padding_idx)  
            
#             self.emb = nn.Embedding(embedding_dim=embedding_size,
#                                     num_embeddings=num_embeddings,
#                                     padding_idx=padding_idx,
#                                     _weight=pretrained_embeddings)
            
            
            # nn.Embedding is a model parameter layer, which is by default trainable.
            # If you want to fine-tune word vectors during training, these word vectors are treated as model parameters 
            # and are updated by backpropagation. You can also make it untrainable by freezing its gradient 
            # (False ==> freezes the backprop) 
#             self.emb.weight.requires_grad=False    
            
        self.gru = nn.GRUCell(input_size=embedding_size, hidden_size=hidden_size, bias=True) # initialise LSTM model
        
        self._dropout_p = dropout_p      
        self.fc1 = nn.Linear(in_features=hidden_size, out_features=hidden_size)
        self.fc2 = nn.Linear(in_features=hidden_size, out_features=output_dim)
        self.sigmoid = nn.Sigmoid()
        

    def forward(self, x_in, apply_sigmoid=False):
    # Note that we don't use the Sigmoid activation in our final layer during training because we use the 
    # nn.BCEWithLogitsLoss() loss function which automatically applies the the Sigmoid activation.
        x_embedded = self.emb(x_in.long())                # x_in is the encoded review e.g. [3386  603 1112    0    0    0    0    0    0    0]. It comes from the train_loader
        batch_size, seq_size, embedding_size = x_embedded.size()
        x_embedded = x_embedded.permute(1, 0, 2)          
        
        hx = torch.randn(batch_size, hidden_size)  # initialise hidden state
        output = []
        for i in range(seq_size):
            hx = self.gru(x_embedded[i], hx)
            output.append(hx)
        
        output = torch.stack(output, dim=0)       
        output = output.permute(1, 0, 2)
        output = output[:,-1, :]
        
        hidden_layer = F.relu(F.dropout(self.fc1(output), p=self._dropout_p))
        output_layer = self.fc2(hidden_layer).squeeze(1)   # squeeze(1) to make the prediction the same shape as the target i.e. as a scaler not a 2d-vector
        
        if apply_sigmoid: 
            output_layer = sef.sigmoid(output_layer)
        return output_layer

### 9.  Accuracy

In [14]:
def binary_acc(y_hat, y):
    y_hat_label = torch.round(torch.sigmoid(y_hat))

    correct_predictions_sum = (y_hat_label == y).sum().float()
    acc = correct_predictions_sum/y.shape[0]
    acc = torch.round(acc * 100)
    
    return acc

### 10. Parameters

In [15]:
def params(full_embedding_layer_name):
    for param in full_embedding_layer_name.parameters():
        return param


def check_params(classifier_name):
    for name, child in classifier_name.named_children():
        print('Layer name: {} --- {}'.format(name, child), end='\n\n')            
        print('ToT Params: {:,}'. format(sum(p.numel() for p in child.parameters())), end='\n\n') 
    
        count = 0
        for param in child.parameters():
            print('Param length: {:,}'.format(len(param)), end='\n\n')
            print(param, end='\n\n')
            print('Are parameters being updated during backprop? {}'.format(param.requires_grad), end='\n\n')
            count += 1

        print('Total Sets of Parameters: {}'.format(count), end='\n\n')
        print('*' * 90)
    

def num_params(classifier_name):
    
    # PyTorch torch.numel() method returns the total number of elements in the input tensor
    trainable_parameters = sum(param.numel() for param in classifier_name.parameters() if param.requires_grad)  
    all_parameters = sum(param.numel() for param in classifier_name.parameters())  
    
    return trainable_parameters, all_parameters

### 11. Inference on new data

In [16]:
class Pipeline:
    def __init__(self):
        self.tasks = []
        
    def task(self, depends_on=None):
        idx = 0
        if depends_on:
            idx = self.tasks.index(depends_on) + 1
        def inner(f):
            self.tasks.insert(idx, f)
            return f
        return inner
    
    # Add the run() method which should take in an 'input_' argument
    def run(self, input_):
        output = input_
        # Iterate through the self.tasks property, and call each function with the previous output
        for task in self.tasks:
            output = task(output)
        return output

In [17]:
inference_pipeline = Pipeline()

In [18]:
inference_pipeline.tasks = [clean_tokenise, encoding]   #calls the functions created at the top of the notebook

In [19]:
# add new functions to the pipeline

@inference_pipeline.task(depends_on=encoding)
def convert_to_tensor(text):
    return torch.Tensor(text)

@inference_pipeline.task(depends_on=convert_to_tensor)
def infer(text):
    # Disable grad
    with torch.no_grad():
        
        # Generate prediction
        prediction = classifier(text.unsqueeze(0))
        probability_value = classifier.sigmoid(prediction).item()
        
        if probability_value < 0.5:
            prediction_label = 'Negative'
        else:
            prediction_label = 'Positive'
            
    return prediction_label, probability_value

## Code

### Random Seeds

In [20]:
random_seeding()

### Directories

In [21]:
dataset_directory = r'C:\Users\Mari\Desktop\MACHINE_LEARNING\NLP_Stanford_University\BOOK\YELP\dataset'
train_set_filename = 'raw_train.csv'
test_set_filename = 'raw_test.csv'

serialise_directory = r'C:\Users\Mari\Desktop\MACHINE_LEARNING\NLP_Stanford_University\BOOK\YELP\serialised'

In [None]:
# Create the 'serialised' directory if it doesn't exist

try:
    os.makedirs(serialise_directory)
except FileExistsError:
    # directory already exists
    pass

### Read Datasets

In [None]:
# Import Datasets
train_original = read_csv(dataset_directory, train_set_filename)
test = read_csv(dataset_directory, test_set_filename)

# Add column names
train_original.columns = ['target', 'review']
test.columns = ['target', 'review']

# Split Targets from Features
X_train_original = train_original['review']
y_train_original = train_original['target']
X_test = test['review']
y_test = test['target']

# Re-label Target: In PyTorch labels need to start at 0
# 1 ==> 0 (these are negative reviews)
# 2 ==> 1 (these are positive reviews)
y_train_original = y_train_original - 1
y_test = y_test - 1

# Split 'train_original' in 'train' and 'val' 
X_train, X_val, y_train, y_val = train_test_split(X_train_original, y_train_original, test_size=0.3)

In [None]:
# print(type(X_train), X_train.shape)
# print(type(X_val), X_val.shape)
# print(type(X_test), X_test.shape)
# print(type(y_train), y_train.shape)
# print(type(y_val), y_val.shape)
# print(type(y_test), y_test.shape, end='\n\n')
# print(X_train.head(2), end='\n\n')
# print(y_train.head(2))

### Data Cleaning And Tokenization

In addition to creating a subset that has three partitions for training, validation, and testing, we also minimally clean the data by adding whitespace around punctuation symbols and removing extraneous symbols that aren’t punctuation for all the splits.


1. **apply** works on a row / column basis of a DataFrame 
2. **applymap** works element-wise on a DataFrame
3. **map** works element-wise on a Series

In [None]:
X_train = X_train.map(clean_tokenise)
X_val = X_val.map(clean_tokenise)
X_test = X_test.map(clean_tokenise)

### Glove

In [None]:
# From glove txt, create a dictionary of all glove embeddings where KEY is a WORD, and VALUE is a NUMPY ARRAY:
glove_embeddings = load_glove_from_file('C:/GloVe/glove.6B.300d.txt')

### Create Vocabulary From Training Corpus

The embedding matrix (see later) is created only from the training dataset.

The training dataset should be sufficiently rich/representative enough to cover all data you expect to see in the future.

New data must have the same integer encoding as the training data prior to being mapped onto the embedding when making a prediction.

In [None]:
# create a DICT of the unique words in the training set
num_words_before, num_words_after, tokens = tokens_dict(X_train)

### Create Embedding Matrix

The embedding is created from the training dataset.

It should be sufficiently rich/representative enough to cover all data you expect to in the future.

New data must have the same integer encoding as the training data prior to being mapped onto the embedding when making a prediction.

In [None]:
not_in_glove, embedding_matrix = make_embedding_matrix(glove_embeddings, tokens)

### Encoding Training Dataset

We need to convert our text into a numerical form that can be fed to our model as input.

1. We have create a vocabulary (see section '10. Vocabulary') where each key is a unique word from the training corpus, and each value is the index of that word in the 'tokens' dictionary.
2. Choose the maximum length of any review.
3. Encode each list of tokens by replacing each word with its index from the 'tokens' dictionary.

Note: **mean_len** (see below) is the mean of tokens length in the training set. We set the max length of the encoded reviews equal to the mean_len.

In [None]:
X_train_encoded = X_train.map(lambda input_lst: encoding(input_lst, tokens=tokens, max_len=max_len))   
X_val_encoded = X_val.map(lambda input_lst: encoding(input_lst, tokens=tokens, max_len=max_len))
X_test_encoded = X_test.map(lambda input_lst: encoding(input_lst, tokens=tokens, max_len=max_len))

### PyTorch Dataset

In [None]:
# Convert pd.Series to PyTorch Tensors
# NB: set the values in X_train, X_val and X_test as a list of arrays (as opposed to array of arrays) --- see above

x_train_tensor = torch.Tensor(list(X_train_encoded.values))
x_val_tensor = torch.Tensor(list(X_val_encoded.values))
x_test_tensor = torch.Tensor(list(X_test_encoded.values))
y_train_tensor = torch.Tensor(list(y_train.values))
y_val_tensor = torch.Tensor(list(y_val.values))
y_test_tensor = torch.Tensor(list(y_test.values))

In [None]:
# Create a full dataset (like a DataFrame in Pandas) from the two tensors
train_dataset =  TensorDataset(x_train_tensor, y_train_tensor)
val_dataset = TensorDataset(x_val_tensor, y_val_tensor)
test_dataset = TensorDataset(x_test_tensor, y_test_tensor)

### Serialisation

In [None]:
# Serialise datasets after tokenisation
serialise('pandas', serialise_directory, 'X_train', X_train)
serialise('pandas', serialise_directory, 'X_val', X_val)
serialise('pandas', serialise_directory, 'X_test', X_test)
serialise('pandas', serialise_directory, 'y_train', y_train)
serialise('pandas', serialise_directory, 'y_val', y_val)
serialise('pandas', serialise_directory, 'y_test', y_test)
serialise('pandas', serialise_directory, 'mean_len', mean_len)

In [None]:
# Serialise glove embeddings (althought 'glove_embeddings' is a dictionary, its values are numpy arrays therefore we need to choose 'numpy')
serialise('numpy', serialise_directory, 'glove_embeddings', glove_embeddings)

In [None]:
# Serialise tokens
serialise('python', serialise_directory, 'tokens', tokens)
serialise('python', serialise_directory, 'num_words_before', num_words_before)
serialise('python', serialise_directory, 'num_words_after', num_words_after)

In [None]:
# Serialise datasets embedding matrix
serialise('numpy', serialise_directory, 'embedding_matrix', embedding_matrix)
serialise('python', serialise_directory, 'not_in_glove', not_in_glove)

In [None]:
# Serialise encoded datasets
serialise('pandas', serialise_directory, 'X_train_encoded', X_train_encoded)
serialise('pandas', serialise_directory, 'X_val_encoded', X_val_encoded)
serialise('pandas', serialise_directory, 'X_test_encoded', X_test_encoded)

In [None]:
# Serialise tensors
serialise('tensor', serialise_directory, 'x_train_tensor', x_train_tensor)
serialise('tensor', serialise_directory, 'x_val_tensor', x_val_tensor)
serialise('tensor', serialise_directory, 'x_test_tensor', x_test_tensor)
serialise('tensor', serialise_directory, 'y_train_tensor', y_train_tensor)
serialise('tensor', serialise_directory, 'y_val_tensor', y_val_tensor)
serialise('tensor', serialise_directory, 'y_test_tensor', y_test_tensor)

In [None]:
# Serialise PyTorch Dataset
serialise('tensor', serialise_directory, 'train_dataset', train_dataset)
serialise('tensor', serialise_directory, 'val_dataset', val_dataset)
serialise('tensor', serialise_directory, 'test_dataset', test_dataset)

### Deserialisation

In [None]:
# Deserialise datasets after tokenisation
X_train = deserialise('pandas', serialise_directory, 'X_train')
X_val = deserialise('pandas', serialise_directory, 'X_val')
X_test = deserialise('pandas', serialise_directory, 'X_test')

In [22]:
y_train = deserialise('pandas', serialise_directory, 'y_train')
y_val = deserialise('pandas', serialise_directory, 'y_val')
y_test = deserialise('pandas', serialise_directory, 'y_test')

In [23]:
# print(type(X_train), X_train.shape)
# print(type(X_val), X_val.shape)
# print(type(X_test), X_test.shape)
# print(type(y_train), y_train.shape)
# print(type(y_val), y_val.shape)
# print(type(y_test), y_test.shape, end='\n\n')
# print(X_train.head(2), end='\n\n')
# print(y_train.head(2), end='\n\n')
# print(type(mean_len))
# print(mean_len)

In [24]:
# Deserialise glove embeddings
glove_embeddings = deserialise('numpy', serialise_directory, 'glove_embeddings')

In [25]:
# print(type(glove_embeddings))
# print(len(glove_embeddings))
# print(glove_embeddings['car'])

In [26]:
# Deserialise datasets
tokens = deserialise('python', serialise_directory, 'tokens')
num_words_before = deserialise('python', serialise_directory, 'num_words_before')
num_words_after = deserialise('python', serialise_directory, 'num_words_after')

In [27]:
# print(type(tokens))
# print(len(tokens))
# print(tokens['car'], end='\n\n')
# print(type(num_words_before))
# print(num_words_before, end='\n\n')
# print(type(num_words_after))
# print(num_words_after, end='\n\n')

In [28]:
# Deserialise embedding matrix
embedding_matrix = deserialise('numpy', serialise_directory, 'embedding_matrix')
not_in_glove = deserialise('python', serialise_directory, 'not_in_glove')

In [29]:
# print(type(embedding_matrix))
# print(embedding_matrix.shape)
# print(embedding_matrix[56], end='\n\n')
# print(type(not_in_glove))
# print(len(not_in_glove))
# print(not_in_glove[:15])

In [30]:
# Deserialise encoded datasets
X_train_encoded = deserialise('numpy', serialise_directory, 'X_train_encoded')
X_val_encoded = deserialise('numpy', serialise_directory, 'X_val_encoded')
X_test_encoded = deserialise('numpy', serialise_directory, 'X_test_encoded')

In [31]:
# print(type(X_train_encoded))
# print(X_train_encoded.shape)
# print(type(X_val_encoded))
# print(X_val_encoded.shape)
# print(type(X_test_encoded))
# print(X_test_encoded.shape, end='\n\n')
# print(X_train.sample(2), end='\n\n')
# print((X_train_encoded.map(lambda x: len(x))).mean())
# print((X_val_encoded.map(lambda x: len(x))).mean())
# print((X_test_encoded.map(lambda x: len(x))).mean())

In [32]:
# Deserialise tensors
x_train_tensor = deserialise('tensor', serialise_directory, 'x_train_tensor')
x_val_tensor = deserialise('tensor', serialise_directory, 'x_val_tensor')
x_test_tensor = deserialise('tensor', serialise_directory, 'x_test_tensor')
y_train_tensor = deserialise('tensor', serialise_directory, 'y_train_tensor')
y_val_tensor = deserialise('tensor', serialise_directory, 'y_val_tensor')
y_test_tensor = deserialise('tensor', serialise_directory, 'y_test_tensor')

In [33]:
# print(type(x_train_tensor), x_train_tensor.shape)
# print(type(x_val_tensor), x_val_tensor.shape)
# print(type(x_test_tensor), x_test_tensor.shape)
# print(type(y_train_tensor), y_train_tensor.shape)
# print(type(y_val_tensor), y_val_tensor.shape)
# print(type(y_test_tensor), y_test_tensor.shape)

In [34]:
# Deserialise PyTorch Dataset
train_dataset = deserialise('tensor', serialise_directory, 'train_dataset')
val_dataset = deserialise('tensor', serialise_directory, 'val_dataset')
test_dataset = deserialise('tensor', serialise_directory, 'test_dataset')

In [35]:
# print(type(train_dataset), len(train_dataset))
# print(type(val_dataset), len(val_dataset))
# print(type(test_dataset), len(test_dataset))

### PyTorch DataLoader

In [36]:
# For small dataset is fine to use the whole training data at every training step (i.e. batch gradient descent). 
# If we want to go serious about all this, we must use mini-batch gradient descent. Thus, we need mini-batches. 
# Thus, we need to slice our dataset accordingly. Do you want to do it manually?! Me neither!
# So we use the 'DataLoader' class for this job. We tell it which dataset to use, the desired mini-batch size and if we’d 
# like to shuffle it or not. That’s it!
# Our loader will behave like an iterator, so we can loop over it and fetch a different mini-batch every time.

train_loader = DataLoader(dataset=train_dataset, batch_size=1048, shuffle=True)
val_loader = DataLoader(dataset=val_dataset, batch_size=1048, shuffle=False)
test_loader = DataLoader(dataset=test_dataset, batch_size=1048, shuffle=False)

# To retrieve a sample mini-batch, one can simply run the command below.
# It will return a list containing two tensors: one for the features, another one for the labels:
# next(iter(train_loader))

### Initialise Classifier

At its core, the training routine is responsible for instantiating the model, iterating over the dataset, computing the output of the model when given the data as input, computing the loss (how wrong the model is), and updating the model proportional to the loss. 

Although this may seem like a lot of details to manage, there are not many places to change the training routine, and as such it will become habitual in your deep learning development process.

In [37]:
num_embeddings, embedding_dim = embedding_matrix.shape
hidden_size = 64
output_dim = 1
dropout_p = 0.3

In [39]:
# Initialise classifier
# We need to send our model to the same device where the data is. If our data is made of GPU tensors, 
# our model must “live” inside the GPU as well. That's what '.to(device)' is there for.

classifier = GRUClassifier(embedding_size=embedding_dim, num_embeddings=num_embeddings, 
                           hidden_size=hidden_size, output_dim=output_dim,
                           dropout_p=dropout_p, pretrained_embeddings=embedding_matrix, padding_idx=0).to(device)

In [40]:
# Loss and optimizer
loss_func = nn.BCEWithLogitsLoss()

In [41]:
# Optimizer
optimizer = optim.Adam(classifier.parameters(), lr=0.001)
#optimizer = optim.Adam(filter(lambda p: p.requires_grad, classifier.parameters()), lr=0.01)    #filtering only for the params that require updating doesn't speed up training

### Serialise / Deserialise Embedding Parameters before training

In [42]:
params_before = params(classifier.emb)

In [43]:
# Serialise embedding params before training
serialise('tensor', serialise_directory, 'params_before', params_before)

In [44]:
# Deserialise embedding params before training
params_before = deserialise('tensor', serialise_directory, 'params_before')

In [45]:
params_before

Parameter containing:
tensor([[ 0.0000,  0.0000,  0.0000,  ...,  0.0000,  0.0000,  0.0000],
        [ 0.2319, -0.1954,  0.0334,  ...,  0.1139,  0.1851, -0.3686],
        [ 0.0206,  0.3291, -0.0162,  ..., -0.0801, -0.2023,  0.0679],
        ...,
        [-0.1062,  0.1081,  0.0919,  ..., -0.0722,  0.0423, -0.0406],
        [-0.0733,  0.1390, -0.0326,  ...,  0.1367,  0.0150, -0.0859],
        [-0.1353, -0.1157,  0.0118,  ..., -0.0008,  0.0778, -0.0832]])

In [46]:
# check params initialised by the classifier
check_params(classifier)

Layer name: emb --- Embedding(95399, 300, padding_idx=0)

ToT Params: 28,619,700

Param length: 95,399

Parameter containing:
tensor([[ 0.0000,  0.0000,  0.0000,  ...,  0.0000,  0.0000,  0.0000],
        [ 0.2319, -0.1954,  0.0334,  ...,  0.1139,  0.1851, -0.3686],
        [ 0.0206,  0.3291, -0.0162,  ..., -0.0801, -0.2023,  0.0679],
        ...,
        [-0.1062,  0.1081,  0.0919,  ..., -0.0722,  0.0423, -0.0406],
        [-0.0733,  0.1390, -0.0326,  ...,  0.1367,  0.0150, -0.0859],
        [-0.1353, -0.1157,  0.0118,  ..., -0.0008,  0.0778, -0.0832]])

Are parameters being updated during backprop? False

Total Sets of Parameters: 1

******************************************************************************************
Layer name: gru --- GRUCell(300, 64)

ToT Params: 70,272

Param length: 192

Parameter containing:
tensor([[ 0.0389, -0.0495, -0.0050,  ..., -0.0030,  0.0547, -0.1226],
        [ 0.1171, -0.0382, -0.0563,  ..., -0.0236,  0.0958, -0.0239],
        [ 0.0636,  0.1032, 

In [47]:
# trainable / all params before training
trainable_params, all_params = num_params(classifier)
print('The model has {:,} trainable parameters'.format(trainable_params))
print('The model has {:,} parameters overall'.format(all_params))

The model has 74,497 trainable parameters
The model has 28,694,197 parameters overall


## Training loop

The training loop is composed of two loops: an inner loop over minibatches in the dataset, and an outer loop, which repeats the inner loop a number of times. In the inner loop, losses are computed for each minibatch, and the optimizer is used to
update the model parameters.

In [48]:
start = time.time()
n_epochs = 10
n_epoch_freezed = 10

print('Starting training', end='\n\n')


# Enumerate epochs
epoch = 0

# For a certain number of epochs (defined by 'n_epoch_freezed'), the emebdding matrix is frozen, then it is unfrozen 
# i.e. the embeddings get trained (except for the padding vector which remains 0)
for epoch in range(n_epochs):
    if epoch < n_epoch_freezed:   
        pass   # keep the embedding layer frozen (i.e. classifier.emb.weight.requires_grad=False as set in section 8 above)
    else: 
        classifier.emb.weight.requires_grad=True

    # Training part
    classifier.train()
    
    epoch_train_loss = 0
    epoch_train_acc = 0
    
    for i, (x_train, y_train) in enumerate(train_loader):
        x_train = x_train.to(device)
        y_train = y_train.to(device)

        # Clear the gradients
        optimizer.zero_grad()
        
        # Forward propagation: compute the model output (i.e. predictions)
        y_pred = classifier(x_in=x_train)
    
        #print(x_train.requires_grad, y_train.requires_grad, y_pred.requires_grad)
                                                                                                        
        # Loss calculation
        t_loss = loss_func(y_pred, y_train)
        
        # Accuracy
        t_acc = binary_acc(y_pred, y_train)
        
        # Backward propagation: use loss to produce gradients
        t_loss.backward()
        
        # Weight optimization: use optimizer to take gradient step and update parameters (w,b) 
        optimizer.step()
        
        epoch_train_loss += t_loss.item()
        epoch_train_acc += t_acc.item()
                                                                                                             
   
    # Evaluation part
    classifier.eval() # .eval() tells PyTorch that we do not want to perform back-propagation during inference
    
    epoch_val_loss = 0
    epoch_val_acc = 0
    
    #We use torch.no_grad() which reduces memory usage and speeds up computation.
    with torch.no_grad():     #https://discuss.pytorch.org/t/model-eval-vs-with-torch-no-grad/19615/3 : torch.no_grad() deals with the autograd engine and stops it from calculating the gradients, which is the recommended way of doing validation
        for i, (x_val, y_val) in enumerate(val_loader):
            x_val = x_val.to(device)
            y_val = y_val.to(device)
        
            # Forward propagation: compute the model output (i.e. predictions)
            y_pred = classifier(x_in=x_val)     #tensors of probabilities
        
            # Loss calculation
            v_loss = loss_func(y_pred, y_val)  
            
            # Accuracy
            v_acc = binary_acc(y_pred, y_val)
            
            epoch_val_loss += v_loss.item()
            epoch_val_acc += v_acc.item()
            
    print('Epoch: {} | Train Loss: {:.3f} | Val Loss: {:.3f} | Train Acc: {:.3f} | Val Acc: {:.3f}'.format(epoch,
                                                                                epoch_train_loss/len(train_loader),
                                                                                epoch_val_loss/len(val_loader),
                                                                                epoch_train_acc/len(train_loader),
                                                                                epoch_val_acc/len(val_loader)))
    
    print(num_params(classifier), end='\n\n')
    
epoch += 1
    

print()
print('Training complete', end='\n\n')

end = time.time()
print(end - start)

Starting training

Epoch: 0 | Train Loss: 0.496 | Val Loss: 0.369 | Train Acc: 74.749 | Val Acc: 84.137
(74497, 28694197)

Epoch: 1 | Train Loss: 0.330 | Val Loss: 0.301 | Train Acc: 85.683 | Val Acc: 87.199
(74497, 28694197)

Epoch: 2 | Train Loss: 0.289 | Val Loss: 0.279 | Train Acc: 87.805 | Val Acc: 88.161
(74497, 28694197)

Epoch: 3 | Train Loss: 0.271 | Val Loss: 0.266 | Train Acc: 88.669 | Val Acc: 88.932
(74497, 28694197)

Epoch: 4 | Train Loss: 0.258 | Val Loss: 0.297 | Train Acc: 89.309 | Val Acc: 87.460
(74497, 28694197)

Epoch: 5 | Train Loss: 0.248 | Val Loss: 0.270 | Train Acc: 89.749 | Val Acc: 88.776
(74497, 28694197)

Epoch: 6 | Train Loss: 0.240 | Val Loss: 0.336 | Train Acc: 90.088 | Val Acc: 86.012
(74497, 28694197)

Epoch: 7 | Train Loss: 0.232 | Val Loss: 0.239 | Train Acc: 90.432 | Val Acc: 90.137
(74497, 28694197)

Epoch: 8 | Train Loss: 0.228 | Val Loss: 0.239 | Train Acc: 90.688 | Val Acc: 90.199
(74497, 28694197)

Epoch: 9 | Train Loss: 0.222 | Val Loss: 0.32

### Serialise / Deserialise Embedding Parameters after training

In [49]:
params_after = params(classifier.emb)

In [50]:
# Serialise embedding params after training
serialise('tensor', serialise_directory, 'params_after', params_after)

In [51]:
# Deserialise embedding params after training
params_after = deserialise('tensor', serialise_directory, 'params_after')

In [52]:
# check changes in params
params_after == params_before

tensor([[True, True, True,  ..., True, True, True],
        [True, True, True,  ..., True, True, True],
        [True, True, True,  ..., True, True, True],
        ...,
        [True, True, True,  ..., True, True, True],
        [True, True, True,  ..., True, True, True],
        [True, True, True,  ..., True, True, True]])

In [53]:
# check gradients
classifier.emb.weight.grad

In [54]:
# trainable / all params after training
trainable_params, all_params = num_params(classifier)
print('The model has {:,} trainable parameters'.format(trainable_params))
print('The model has {:,} parameters overall'.format(all_params))

The model has 74,497 trainable parameters
The model has 28,694,197 parameters overall


## Evaluating on test data

To evaluate the data on the held-out test set, the code is exactly the same as the validation loop in the training routine we saw in the previous step. 

The test set should be run as little as possible. Each time you run a trained model on the test set, make a new model decision (such as changing the size of the layers), and remeasure the new retrained model on the test set, you are biasing your
modeling decisions toward the test data. In other words, if you repeat that process often enough, the test set will become meaningless as an accurate measure of truly held-out data.

In [55]:
classifier.eval()  

test_loss = 0
test_acc = 0
    
with torch.no_grad():     #torch.no_grad() deals with the autograd engine and stops it from calculating the gradients, which is the recommended way of doing validation
    for i, (x_test, y_test) in enumerate(test_loader):
        x_test = x_test.to(device) 
        y_test = y_test.to(device)
        
        # Forward propagation: compute the model output (i.e. predictions)
        y_pred = classifier(x_in=x_test)     #tensors of probabilities
        
        # Loss calculation
        tst_loss = loss_func(y_pred, y_test)
        
        # Accuracy
        tst_acc = binary_acc(y_pred, y_test)
            
        test_loss += tst_loss.item()
        test_acc += tst_acc.item()
            
print('Test Loss: {:.3f} | Test Acc: {:.3f}'.format(test_loss/len(test_loader), test_acc/len(test_loader)))

print()
print('Done!')

Test Loss: 0.318 | Test Acc: 86.405

Done!


## Inference on New Data

In [56]:
reviews = ["Suspended my account without warning and my order got delayed. I called them two days in a row to be fobbed off \
          and to wait for an email. No email. No access to my online account. 3 amazon prime payments taken from my card \
          without my permission. Just got access back to claim my money back 12 weeks later and didn't even get an apology \
          from the online chat. Didn't even get my order either.",
          'Fantastic!!!!',
          'The products they sent me always work fine but... I have Amazon prime and they sometimes deliver days late or \
          just leave the product on your doorstep out in the open easy to be stolen. Overall, my experience with Amazon is \
          fairly good',
          'Safe place never used…doorstep deliveries. I am a long standing Amazon customer and high spending. I have had \
          a safe place named for deliveries for a long time. The odd driver hitters to open the door to the bin shed and \
          out the parcels in. The majority just leave it on the doorstep. With thefts high, I find this utterly ridiculous. \
          What is the point of a safe place if it is simply ignored and never used? I live in a house with a small front \
          garden - not a difficult to access apartment block or something. Amazon - please hold your drivers accountable \
          to the requests of customers. You offer and unique and great service but this lets you down.',
          'My beautiful Jenuh killed by incompetence! She went to Dewey Veterinary Medical Center 3x! My MISTAKE!! I was \
          told she had nothing but allergies! She was filled with injections that DID NOTHING for her along with \
          medications that were ridiculously priced and again that did nothing!! 5 days after her last visit with Dewey \
          I’m in another veterinary office with her in critical condition and them saying she was on oxygen and had a very \
          bad infection! They’re telling me the rash on her body was NOT allergies, but was in fact signs of trouble from \
          infection!!! She was never treated just given more shots and new medications and here she was suffering inside! \
          (All I was told was many pets were coming in suffering from allergies) My 8 month beautiful blue and green-eyed \
          Husky died all because Dewey Vet did NOTHING for her!! She lost her life and I didn’t get to enjoy loving her \
          for many year’s because her symptoms were not taken seriously ! She didn’t have to die SHE WAS MURDERED by lack \
          of care!!!!! As a pet mom you take your babies into veterinarians offices in hopes they will properly diagnose \
          and care for your kids basicaly!!! Jenuh was not cared for! This vet told me her impacted teeth (Adult teeth \
          came in but baby teeth didn’t fall out) wasn’t urgent she could get them out when they spayed her too which I \
          said I was told by other vet that I should wait between 1-1.5 years old which he blew off and said NOT TRUE! \
          (I read about the urgency to pull teeth online yet this was also blown off as nothing) His old school veterinary \
          services may actually be OUTDATED garbage! Maybe it’s time to retire!!!!',
          "It's a good search engine and info that shows up is mostly helpful, but lately the ads are flooding the face \
          page which is not always the best sites for what you are looking for.",
          "I bought google play credit.I have the virtual card.But they blocked all my account transaction, when i \
          contacted them,they told me that they would solve it,Instead, they locked it.Even after submitting all proper \
          documents, they are refusing to reactive.Such bad experience", 
          'I have had a long term rental association (as a renter) with my Property Manager Kiera Hannaford. I would like \
          to acknowledge her professional, friendly and thorough knowledge of all things related to rental property \
          management. Kiera always does her very best to help and goes to a great deal of trouble to ensure a positive \
          outcome with all queries and requests. Her honesty is such a valuable asset when there are challenging times. \
          As a renter I have always been able to rely on Kiera for help. I wish her the very best for a very bright \
          future and would recommend her if you ever need an excellent team member in the real estate industry. ',
          "It’s short from being the best because I’m concerned over some huge privacy details that may get leaked to the \
          greater public but otherwise it’s a good engine.",
          "Always had good experience with google before but lately there's been a lot of ad popping up and not \
          necessarily nice sites just sites that paid money to google for google to put their sites on first page. \
          But google is a nice service with a lot of functions although I wonder how much personal information they \
          store of you in the database.",
          "Kindred Healthcare physical therapists (PT) and occupational therapists (OT) in Okaloosa County, Florida, \
          helped me regain use of my right arm and increase my stamina and endurance despite COPD problems. During my \
          in-patient and outpatient therapy, they were always professional and polite while encouraging me to challenge \
          myself. PT and OT helped me regain independence, self-reliance, and dignity. I am grateful.",
          "Had a problem. Resolved immediately. Great customer service!!!",
          "had 3 hour drive ahead of me and Adam the fitter could see my distress at having to wait he jacked up the truck \
          in a flash and had me on my way, top sevices from Adam and the kwikfit team.",
          "Online spas- Absolutely lovely guys and excellent customer service. The guys showed me how to set up and did \
          all the hard work for me. Nothing was any problem :). Lovely the hot tub and would highly recommend to \
          friends ! Thanks again!",
          "Mr. Handyman is a great service. The staff isprofessional and all of the jobs get done. I have used Mr. \
          Handyman several times and I couldn’t be happier."]

In [57]:
#inference_pipeline.tasks

In [58]:
inferences = [inference_pipeline.run(r) for r in reviews]

In [59]:
for i,inf in enumerate(inferences):
    print('{}. {} --- probability value {:.2%}'.format(i+1, inf[0], inf[1]))

1. Negative --- probability value 9.51%
2. Positive --- probability value 96.34%
3. Positive --- probability value 82.13%
4. Negative --- probability value 11.10%
5. Negative --- probability value 4.12%
6. Positive --- probability value 98.40%
7. Negative --- probability value 21.16%
8. Positive --- probability value 99.89%
9. Positive --- probability value 94.90%
10. Positive --- probability value 93.70%
11. Positive --- probability value 99.86%
12. Positive --- probability value 99.11%
13. Positive --- probability value 77.13%
14. Positive --- probability value 99.56%
15. Positive --- probability value 91.08%


In [None]:
# 1.
# "Suspended my account without warning and my order got delayed. I called them two days in a row to be fobbed off \
# and to wait for an email. No email. No access to my online account. 3 amazon prime payments taken from my card \
# without my permission. Just got access back to claim my money back 12 weeks later and didn't even get an apology \
# from the online chat. Didn't even get my order either." 
# 1. Negative --- probability value 3.39%    --- CORRECT

# 2.
# 'Fantastic!!!!'  
# 2. Positive --- probability value 98.76%   --- CORRECT

# 3.
# 'The products they sent me always work fine but... I have Amazon prime and they sometimes deliver days late or \
# just leave the product on your doorstep out in the open easy to be stolen. Overall, my experience with Amazon is \
# fairly good' 
# 3. Positive --- probability value 66.64%   --- CORRECT

# 4.
# 'Safe place never used…doorstep deliveries. I am a long standing Amazon customer and high spending. I have had \
# a safe place named for deliveries for a long time. The odd driver hitters to open the door to the bin shed and \
# out the parcels in. The majority just leave it on the doorstep. With thefts high, I find this utterly ridiculous. \
# What is the point of a safe place if it is simply ignored and never used? I live in a house with a small front \
# garden - not a difficult to access apartment block or something. Amazon - please hold your drivers accountable \
# to the requests of customers. You offer and unique and great service but this lets you down.'
# 4. Negative --- probability value 11.10%    --- CORRECT 

# 5.
# 'My beautiful Jenuh killed by incompetence! She went to Dewey Veterinary Medical Center 3x! My MISTAKE!! I was \
# told she had nothing but allergies! She was filled with injections that DID NOTHING for her along with \
# medications that were ridiculously priced and again that did nothing!! 5 days after her last visit with Dewey \
# I’m in another veterinary office with her in critical condition and them saying she was on oxygen and had a very \
# bad infection! They’re telling me the rash on her body was NOT allergies, but was in fact signs of trouble from \
# infection!!! She was never treated just given more shots and new medications and here she was suffering inside! \
# (All I was told was many pets were coming in suffering from allergies) My 8 month beautiful blue and green-eyed \
# Husky died all because Dewey Vet did NOTHING for her!! She lost her life and I didn’t get to enjoy loving her \
# for many year’s because her symptoms were not taken seriously ! She didn’t have to die SHE WAS MURDERED by lack \
# of care!!!!! As a pet mom you take your babies into veterinarians offices in hopes they will properly diagnose \
# and care for your kids basicaly!!! Jenuh was not cared for! This vet told me her impacted teeth (Adult teeth \
# came in but baby teeth didn’t fall out) wasn’t urgent she could get them out when they spayed her too which I \
# said I was told by other vet that I should wait between 1-1.5 years old which he blew off and said NOT TRUE! \
# (I read about the urgency to pull teeth online yet this was also blown off as nothing) His old school veterinary \
# services may actually be OUTDATED garbage! Maybe it’s time to retire!!!!'
# 5. Negative --- probability value 4.70%  --- CORRECT

# 6.
# "It's a good search engine and info that shows up is mostly helpful, but lately the ads are flooding the face \
# page which is not always the best sites for what you are looking for."
# 6. Positive --- probability value 93.89%  --- CORRECT 

#7.
# "I bought google play credit.I have the virtual card.But they blocked all my account transaction, when i \
# contacted them,they told me that they would solve it,Instead, they locked it.Even after submitting all proper \
# documents, they are refusing to reactive.Such bad experience"
# 7. Negative --- probability value 0.35%   --- CORRECT

# 8.
# 'I have had a long term rental association (as a renter) with my Property Manager Kiera Hannaford. I would like \
# to acknowledge her professional, friendly and thorough knowledge of all things related to rental property \
# management. Kiera always does her very best to help and goes to a great deal of trouble to ensure a positive \
# outcome with all queries and requests. Her honesty is such a valuable asset when there are challenging times. \
# As a renter I have always been able to rely on Kiera for help. I wish her the very best for a very bright \
# future and would recommend her if you ever need an excellent team member in the real estate industry. '
# 8. Positive --- probability value 97.86%  --- CORRECT

# 9.
# "It’s short from being the best because I’m concerned over some huge privacy details that may get leaked to the \
# greater public but otherwise it’s a good engine."
# 9. Positive --- probability value 93.98% --- CORRECT

# 10.
# "Always had good experience with google before but lately there's been a lot of ad popping up and not \
# necessarily nice sites just sites that paid money to google for google to put their sites on first page. \
# But google is a nice service with a lot of functions although I wonder how much personal information they \
# store of you in the database."
# 10. Positive --- probability value 79.14%   --- CORRECT

# 11.
# "Kindred Healthcare physical therapists (PT) and occupational therapists (OT) in Okaloosa County, Florida, \
# helped me regain use of my right arm and increase my stamina and endurance despite COPD problems. During my \
# in-patient and outpatient therapy, they were always professional and polite while encouraging me to challenge \
# myself. PT and OT helped me regain independence, self-reliance, and dignity. I am grateful."
# 11. Positive --- probability value 94.97%   --- CORRECT

# 12.
# "Had a problem. Resolved immediately. Great customer service!!!"
# 12. Positive --- probability value 79.75%   --- CORRECT 

# 13.
# "had 3 hour drive ahead of me and Adam the fitter could see my distress at having to wait he jacked up the truck \
# in a flash and had me on my way, top sevices from Adam and the kwikfit team."
# 13. Positive --- probability value 64.67% --- CORRECT

# 14.
# "Online spas- Absolutely lovely guys and excellent customer service. The guys showed me how to set up and did \
# all the hard work for me. Nothing was any problem :). Lovely the hot tub and would highly recommend to \
# friends ! Thanks again!"
# 14. Positive --- probability value 99.78%  --- CORRECT

# 15.
# "Mr. Handyman is a great service. The staff isprofessional and all of the jobs get done. I have used Mr. \
# Handyman several times and I couldn’t be happier."
# 15. Positive --- probability value 75.25% --- CORRECT