# Assignment 1:  Sentiment with Deep Neural Networks

Welcome to the first assignment of course 3. In this assignment you will explore sentiment analysis using deep neural networks. 

## Outline

- [0 Import libraries and try out Trax](#0)
- [1 Importing the data](#1)
    - [1.2 Building the vocabulary](#1.2)
    - [1.3 Converting a tweet to a tensor](#1.3)
        - [Exercise 01](#ex01)
    - [1.4 Creating a batch generator](#1.4)
        - [Exercise 02](#ex02)
    - [2: Defining classes](#2)
    - [2.1 ReLU class](#1.2)
        - [Exercise 03](#ex03)
    - [2.2 Dense class](#2.2)
        - [Exercise 04](#ex04)
    - [2.3: Model](#2.3)
        - [Exercise 05](#ex05)
- [3 Training](#3)
    - [3.1 Training the Model](#3.1)
        - [Exercise 06](#ex06)
    - [3.2 Initialize a model with the trained weights](#3.2)
    - [3.3 Practice Making a prediction](#3.3)
- [4: Evaluation](#4)
    - [4.1 Computing the accuracy on a batch](#4.1)
        - [Exercise 07](#ex07)
    - [4.2 Testing your model on Validation Data](#4.2)
        - [Exercise 08](#ex08)
- [5: Testing with your own input](#5)    

In course 1, you implemented Logistic regression and Naive Bayes for sentiment analysis. However if you were to give your old models an example like:

<center> <span style='color:blue'> <b>This movie was almost good.</b> </span> </center>

Your model would have predicted a positive sentiment for that review. However, that sentence has a negative sentiment and indicates that the movie was not good. To solve those kinds of misclassifications, you will write a program that uses deep neural networks to identify sentiment on text. By completing this assignment you will: 

- Understand how you can build/design a model in tensorflow
- Train a model in tensorflow
- Use a binary cross entropy loss function
- Compute the accuracy of your model
- Predict using your own input

As you can tell, this model follows a similar structure to the one you previously implemented in the second course of this specialization. 
- Indeed most of the deep nets you will be implementing will have a similar structure. The only thing that changes is the model architecture, the inputs, and the outputs. Before starting the assignment, we will show you some cool functionalities of the latest google tensorflow update `trax`. 


Now we will show you how to compute the gradient of a certain function `f` by just using `trax.math.grad(f)`. 

- Trax source code can be found on Github: [trax](https://github.com/google/trax).
- The Trax code also uses the JAX library: [JAX](https://jax.readthedocs.io/en/latest/index.html)

<a name='0'></a>
# Part 0: Import libraries and try out Trax

- Let's import libraries and look at an example of using the trax library.

In [None]:
# Automatic gradient with replaced numpy.
#!pip -q install trax==1.2.4 # coursera

# import relevant libraries
import trax

# import trax.math.numpy
import trax.math.numpy as np

# import trax.layers
from trax import layers as tl

# import Layer from the utils.py file
from utils import Layer
import os 

In [None]:
# Create an array using trax.math.numpy
a = np.array(5.0)

# View the returned array
display(a)

print(type(a))

Notice that trax.math.numpy returns a DeviceArray from the jax library.

In [None]:
# Define a function that will use the trax.math.numpy array
def f(x):
    
    # f = x^2
    return (x**2)

In [None]:
# Call the function
print("f(a) for a is", f(a))

The gradient (derivative) of function `f` with respect to its input `x` is the derivative of $x^2$.
- The derivative of $x^2$ is $2x$.  
- When x is 5, then $2x=10$.

You can calculate the gradient of a function by using `trax.math.grad(fun=)` and passing in the name of the function.
- In this case the function you want to take the gradient of is `f`.
- The object returned (saved in `grad_f` in this example) is a function that can calculate the gradient of f for a given trax.math.numpy array.

In [None]:
# Directly use trax.math.grad to calculate the gradient (derivative) of the function
grad_f = trax.math.grad(fun=f)  # df / dx - Gradient of function f(x) with respect to x

# View the type of the retuned object (it's a function)
type(grad_f)

In [None]:
# Call the newly created function and pass in a value for x (the DeviceArray stored in 'a')
grad_calculation = grad_f(a)

# View the result of calling the grad_f function
display(grad_calculation)

The function returned by trax.math.grad returns takes in x=5 and calculates the gradient of f, which is 2*x, which is 10. The value is also stored as a DeviceArray from the jax library.

<a name='1'></a>
# Part 1: Importing the data

### 1.1 Loading in the data

Import the data set.  
- You may recognize this from earlier assignments in the specialization.
- Details of process_tweet function is available in utils.py file

In [None]:
## DO NOT EDIT THIS CELL

# Import functions from the utils.py file
from utils import load_tweets
import numpy as np

# Load positive and negative tweets
all_positive_tweets, all_negative_tweets = load_tweets()

# View the total number of positive and negative tweets.
print(f"The number of positive tweets: {len(all_positive_tweets)}")
print(f"The number of negative tweets: {len(all_negative_tweets)}")

# Split positive set into validation and training
val_pos   = all_positive_tweets[4000:] # generating validation set for positive tweets
train_pos  = all_positive_tweets[:4000]# generating training set for positive tweets

# Split negative set into validation and training
val_neg   = all_negative_tweets[4000:] # generating validation set for negative tweets
train_neg  = all_negative_tweets[:4000] # generating training set for nagative tweets

# Combine training data into one set
train_x = train_pos + train_neg 

# Combine validation data into one set
val_x  = val_pos + val_neg

# Set the labels for the training set (1 for positive, 0 for negative)
train_y = np.append(np.ones(len(train_pos)), np.zeros(len(train_neg)))

# Set the labels for the validation set (1 for positive, 0 for negative)
val_y  = np.append(np.ones(len(val_pos)), np.zeros(len(val_neg)))

print(f"length of train_x {len(train_x)}")
print(f"length of val_x {len(val_x)}")

Now import a function that processes tweets (we've provided this in the utils.py file).
- `process_tweets' removes unwanted characters e.g. hashtag, hyperlinks, stock tickers from tweet.
- It also returns a list of words (it tokenizes the original string).

In [None]:
# Import a function that processes the tweets
from utils import process_tweet

# Try out function that processes tweets
print("original tweet at training position 0")
print(train_pos[0])

print("Tweet at training position 0 after processing:")
process_tweet(train_pos[0])

Notice that the function `process_tweet` keeps key words, removes the hash # symbol, and ignores usernames (words that begin with '@').  It also returns a list of the words.

<a name='1.2'></a>
### 1.2 Building the vocabulary

Now build the vocabulary.
- Map each word in each tweet to an integer (an "index"). 
- The following code does this for you, but please read it and understand what it's doing.
- Note that you will build the vocabulary based on the training data. 
- To do so, you will assign an index to everyword by iterating over your training set.

The vocabulary will also include some special tokens
- `__PAD__`: padding
- `</e>`: end of line
- `__UNK__`: a token representing any word that is not in the vocabulary.

In [None]:
# Build the vocabulary
# Unit Test Note - There is no test set here only train/val

# Include special tokens 
# started with pad, end of line and unk tokens
Vocab = {'__PAD__': 0, '__</e>__': 1, '__UNK__': 2} 

# Note that we build vocab using training data
for tweet in train_x: 
    processed_tweet = process_tweet(tweet)
    for word in processed_tweet:
        if word not in Vocab: 
            Vocab[word] = len(Vocab)
    
print("Total words in vocab are",len(Vocab))
display(Vocab)

The dictionary `Vocab` will look like this:
```CPP
{'__PAD__': 0,
 '__</e>__': 1,
 '__UNK__': 2,
 'followfriday': 3,
 'top': 4,
 'engag': 5,
 ...
```

- Each unique word has a unique integer associated with it.
- The total number of words in Vocab: 9092

<a name='1.3'></a>
### 1.3 Converting a tweet to a tensor

Write a function that will convert each tweet to a tensor (a list of unique integer IDs representing the processed tweet).
- Note, the returned data type will be a **regular Python `list()`**
    - You won't use TensorFlow in this function
    - You also won't use a numpy array
    - You also won't use trax.math.numpy array
- For words in the tweet that are not in the vocabulary, set them to the unique ID for the token `__UNK__`.

##### Example
Input a tweet:
```CPP
'@happypuppy, is Maria happy?'
```

The tweet_to_tensor will first conver the tweet into a list of tokens (including only relevant words)
```CPP
['maria', 'happi']
```

Then it will convert each word into its unique integer

```CPP
[2, 56]
```
- Notice that the word "maria" is not in the vocabulary, so it is assigned the unique integer associated with the `__UNK__` token, because it is considered "unknown."



<a name='ex01'></a>
### Exercise 01
**Instructions:** Write a program `tweet_to_tensor` that takes in a tweet and converts it to an array of numbers. You can use the `Vocab` dictionary you just found to help create the tensor. 

- Use the vocab_dict parameter and not a global variable.
- Do not hard code the integer value for the `__UNK__` token.

<details>    
<summary>
    <font size="3" color="darkgreen"><b>Hints</b></font>
</summary>
<p>
<ul>
    <li>Map each word in tweet to corresponding token in 'Vocab'</li>
    <li>Use Python's Dictionary.get(key,value) so that the function returns a default value if the key is not found in the dictionary.</li>
</ul>
</p>


In [None]:
# GRADED FUNCTION: tweet_to_tensor
def tweet_to_tensor(tweet, vocab_dict, unk_token='__UNK__', verbose=False):
    
    # Process the tweet into a list of words
    # where only important words are kept (stop words removed)
    word_l = None
    
    if verbose:
        print("List of words from the processed tweet:")
        print(word_l)
        
    # Initialize the list that will contain the unique integer IDs of each word
    tensor_l = []
    
    # Get the unique integer ID of the __UNK__ token
    unk_ID = None
    
    if verbose:
        print(f"The unique integer ID for the unk_token is {unk_ID}")
        
    # for each word in the list:
    for word in None: # complete this line
        
        # Get the unique integer ID.
        # If the word doesn't exist in the vocab dictionary,
        # use the unique ID for __UNK__ instead.
        word_ID = None
            
        # Append the unique integer ID to the tensor list.
        tensor_l.append(word_ID) 
    
    return tensor_l

In [None]:
print("Actual tweet is\n",val_pos[0])
print("\nTensor of tweet:\n",tweet_to_tensor(val_pos[0], vocab_dict=Vocab))

##### Expected output

```CPP
Actual tweet is
 Bro:U wan cut hair anot,ur hair long Liao bo
Me:since ord liao,take it easy lor treat as save $ leave it longer :)
Bro:LOL Sibei xialan

Tensor of tweet:
 [1065, 136, 479, 2351, 745, 8146, 1123, 745, 53, 2, 2672, 791, 2, 2, 349, 601, 2, 3489, 1017, 597, 4559, 9, 1065, 157, 2, 2]
```

In [None]:
# test tweet_to_tensor

def test_tweet_to_tensor():
    test_cases = [
        
        {
            "name":"simple_test_check",
            "input": [val_pos[1],Vocab],
            "expected":[444, 2, 304, 567, 56, 9],
            "error":"The function gives bad output for val_pos[1]. Test failed"
        },
        {
            "name":"datatype_check",
            "input":[val_pos[1],Vocab],
            "expected":type([]),
            "error":"Datatype mismatch. Need only list not np.array"
        },
        {
            "name":"without_unk_check",
            "input":[val_pos[1],Vocab],
            "expected":6,
            "error":"Unk word check not done- Please check if you included mapping for unknown word"
        }
    ]
    count = 0
    for test_case in test_cases:
        
        try:
            if test_case['name'] == "simple_test_check":
                assert test_case["expected"] == tweet_to_tensor(*test_case['input'])
                count += 1
            if test_case['name'] == "datatype_check":
                assert isinstance(tweet_to_tensor(*test_case['input']),test_case["expected"])
                count += 1
            if test_case['name'] == "without_unk_check":
                assert None not in tweet_to_tensor(*test_case['input'])
                count += 1
                
            
            
        except:
            print(test_case['error'])
    if count == 3:
        print("all tests passed")
    else:
        print(count," Tests passed out of 3")
test_tweet_to_tensor()            

<a name='1.4'></a>
### 1.4 Creating a batch generator

Most of the time in Natural Language Processing, and AI in general we use batches when training our data sets. 
- If instead of training with batches of examples, you were to train a model with one example at a time, it would take a very long time to train the model. 
- You will now build a data generator that takes in the positive/negative tweets and returns a batch of training examples. It returns the model inputs and the targets (positive or negative labels). 

Once you create the generator, you could include it in a for loop

```CPP
for batch_inputs, batch_targets in data_generator:
    ...
```

You can also get a single batch like this:

```CPP
batch_inputs, batch_targets = next(data_generator)
```
The generator returns the next batch each time it's called. 
- This generator returns the data in a format (tensors) that you could directly use in your model.

<a name='ex02'></a>
### Exercise 02
Implement `data_generator`.

In [None]:
# GRADED: Data generator
def data_generator(data_pos, data_neg, batch_size, loop, vocab_dict):
    
    '''
    Input: 
     data_pos - Set of posstive examples
     data_neg - Set of negative examples
     batch_size - number of samples per batch
     loop - True or False
    '''   
### START GIVEN CODE ###

    # make sure the batch size is an even number
    # to allow an equal number of positive and negative samples
    assert batch_size % 2 == 0
    
    # Number of positive examples in each batch is half of the batch size
    # same with number of negative examples in each batch
    n_to_take = batch_size // 2
    
    # Use pos_index to walk through the data_pos array
    # same with neg_index and data_neg
    pos_index = 0
    neg_index = 0
    
    # Loop indefinitely
    while True:
        
        # If the positive index plus num of positive examples
        # goes past the positive dataset,
        if pos_index + n_to_take > len(data_pos): 
            
            # If user wants to keep re-using the data, reset the index
            if loop:
                pos_index = 0
                
            # otherwise exit the loop
            else:
                # exit the loop
                break
### END GIVEN CODE ###

### START CODE HERE ###
        # If the positive index plus num of negative examples
        # goes past the negative dataset,
        if None: # complete this line
            # If user wants to keep re-using the data, reset the index
            if loop:
                
                # Reset the negative index
                neg_index = None
                
            # otherwise exit the loop
            else:
                # exit the loop
                None
### END CODE HERE ###

### START GIVEN CODE ###
        # create a batch with positive examples
        batch = []
        
        # Start from pos_index and increment i up to n_to_take
        for i in range(n_to_take):
            # get the tweet as pos_index + i
            tweet = data_pos[pos_index + i]
            
            # convert the tweet into tensors of integers representing the processed words
            tensor = tweet_to_tensor(tweet,vocab_dict)
            
            # append the tensor to the batch list
            batch.append(tensor)

### END GIVEN CODE ###
            
### START CODE HERE ###

        # Using the same batch list, start from neg_index and increment i up to n_to_take
        for i in range(None): # complete this line
            # get the tweet as pos_index + i
            tweet = None
            
            # convert the tweet into tensors of integers representing the processed words
            tensor = None
            
            # append the tensor to the batch list
            None

### END CODE HERE ###        


### START GIVEN CODE ###

        # Update the start index for positive data 
        # so that it's n_to_take positions after the current pos_index
        pos_index += n_to_take
        
        # Update the start index for negative data 
        # so that it's n_to_take positions after the current neg_index
        neg_index += n_to_take
        
        # Get the max tweet length (the length of the longest tweet) 
        # (you will pad all shorter tweets to have this length)
        max_len = max([len(t) for t in batch]) 
        
        # Initialize the input_l, which will 
        # store the padded versions of the tensors
        tensor_pad_l = []
        # Pad shorter tweets with zeros
        for tensor in batch:
            
### END GIVEN CODE ###

### START CODE HERE ###            
            # Get the number of positions to pad for this tensor so that it will be max_len long
            n_pad = None
            
            # Generate a list of zeros, with length n_pad
            pad_l = None
            
            # concatenate the tensor and the list of padded zeros
            tensor_pad = None
            
            # append the padded tensor to the list of padded tensors
            None

          
        # convert the list of padded tensors to a numpy array
        # and store this as the model inputs
        inputs = None
  
        # Generate the list of targets for the positive examples (a list of ones)
        # The length is the number of positive examples in the batch
        target_pos = None
        
        # Generate the list of targets for the negative examples (a list of ones)
        # The length is the number of negative examples in the batch
        target_neg = None
        
        # Concatenate the positve and negative targets
        target_l = None
        
        # Convert the target list into a numpy array
        targets = None
             
### END CODE HERE ###

### GIVEN CODE ###
        # note we use yield and not return
        yield inputs, targets

Now you can use your data generator to create a data generator for the training data, and another data generator for the validation data.

In [None]:
# Create the training data generator
def train_generator(batch_size):
    return data_generator(train_pos, train_neg, batch_size, True, Vocab)

# Create the test data generator
def val_generator(batch_size):
    return data_generator(val_pos, val_neg, batch_size, False, Vocab)

# this will print a list of 4 tensors padded with zeros
print(next(train_generator(4))) # use next to get a new batch

In [None]:
# Test the train_generator

# Create a data generator for training data,
# which produces batches of size 4 (for tensors and their respective targets)
tmp_data_gen = train_generator(batch_size = 4)

# Call the data generator to get one batch and its targets
tmp_inputs, tmp_targets = next(tmp_data_gen)

print(f"The inputs shape is {tmp_inputs.shape}")
for i,t in enumerate(tmp_inputs):
    print(f"input tensor: {t}; target {tmp_targets[i]}")

##### Expected output

```CPP
The inputs shape is (4, 14)
input tensor: [3 4 5 6 7 8 9 0 0 0 0 0 0 0]; target 1
input tensor: [10 11 12 13 14 15 16 17 18 19 20  9 21 22]; target 1
input tensor: [5738 2901 3761    0    0    0    0    0    0    0    0    0    0    0]; target 0
input tensor: [ 858  256 3652 5739  307 4458  567 1230 2767  328 1202 3761    0    0]; target 0
```

Now that you have your train/val generators, you can just call them and they will return tensors which correspond to your tweets in the first column and their corresponding labels in the second column. Now you can go ahead and start building your neural network. 

<a name='2'></a>
# Part 2: Defining classes

We have given you the `Layer` class in the utils.py.

```CPP
class Layer(object):
    """ Base class for layers.
    """
      
    # Constructor
    def __init__(self):
        # set weights to None
        self.weights = None

    # The forward propagation should be implemented
    # by the subclass of this Layer class
    def forward(self, x, weights):
        raise NotImplementedError

    # This function returns new weights
    # based on the input signature and random key
    def new_weights(self, input_signature, random_key):
        return ()

    # This initializes the weights
    def init(self, input_signature, random_key):
        self.weights = self.new_weights(input_signature, random_key)

    # __call__ allows an object of this class
    # to be called like it's a function.
    def __call__(self, x):
        # When this layer object is called, 
        # it calls its forward propagation function
        return self.forward(x, self.weights)
```

<a name='2.1'></a>
### 2.1 ReLU class
You will now implement the ReLU activation function in a class below. The ReLU function looks as follows: 
<img src = "relu.jpg" style="width:300px;height:150px;"/>

$$ \mathrm{reLU}(x) = \mathrm{max}(0,x) $$


<a name='ex03'></a>
### Exercise 03
**Instructions:** Implement the ReLU activation function below. Your function should take in a matrix or vector and it should transform all the negative numbers into 0 while keeping all the positive numbers intact. 

<details>    
<summary>
    <font size="3" color="darkgreen"><b>Hints</b></font>
</summary>
<p>
<ul>
    <li>Please use numpy.maximum(A,k) to find the maximum between each element in A and a scalar k</li>
</ul>
</p>


In [None]:
# GRADED FUNCTION: Relu
class Relu(Layer):
    """Relu activation function implementation"""
    def forward(self, x, weights):
        '''
        Input: 
            - x (a numpy array): the input
            - weights: Not used for relu, but is included to inherit from Layer class
        Output:
            - activation (numpy array): all positive or 0 version of x
        '''
        # Delete the weights parameter because it's not used for 
        # this Relu subclass of Layer
        del weights 
        ### START CODE HERE ###
        
        activation = None

        ### END CODE HERE ###
        
        return activation

In [None]:
# Test your relu function
x = np.array([[-2.0, -1.0, 0.0], [0.0, 1.0, 2.0]], dtype=float)
relu_layer = Relu()
print("Test data is:")
print(x)
print("Output of Relu is:")
print(relu_layer(x))

##### Expected Outout
```CPP
Test data is:
[[-2. -1.  0.]
 [ 0.  1.  2.]]
Output of Relu is:
[[0. 0. 0.]
 [0. 1. 2.]]
```

<a name='2.2'></a>
### 2.2 Dense class 

### Exercise

Implement the forward function of the Dense class. 
- The forward function multiplies the input to the layer (`x`) by the weight matrix (`weights`)

$$\mathrm{forward}(\mathbf{x},\mathbf{weights}) = \mathbf{x} \times \mathbf{weights}$$

- You can use `numpy.dot` to perform the matrix multiplication.

Note that for more efficient code execution, you will use the trax version of `math`, which includes a trax version of `numpy` and also `random`.

Implement the weight initializer `new_weights` function
- Weights are initialized with a random key.
- The second parameter is a tuple for the desired shape of the weights (num_rows, num_cols)
- The num of rows for weights should equal the number of columns in x, because for forward propagation, you will multiply x times weights.

Please use `trax.math.random.normal(key, shape, dtype=tf.float32)` to generate random values for the weight matrix.
- `key` can be generated by calling `random.get_prng(seed=)` and passing in a number for the `seed`.
- `shape` is a tuple with the desired shape of the weight matrix.
    - The number of rows in the weight matrix should equal the number of columns in the variable `x`.  Since `x` may have 2 dimensions if it reprsents a single training example (row, col), or three dimensions (batch_size, row, col), get the last dimension from the tuple that holds the dimensions of x.
    - The number of columns in the weight matrix is the number of units chosen for that dense layer.  Look at the `__init__` function to see which variable stores the number of units.
- `dtype` is the data type of the values in the generated matrix; keep the default of `tf.float32`. In this case, don't explicitly set the dtype (just let it use the default value).

Set the standard deviation of the random values to 0.1
- The values generated have a mean of 0 and standard deviation of 1.
- Set the default standard deviation `stdev` to be 0.1 by multiplying the standard deviation to each of the values in the weight matrix.

In [None]:
# use the math module within trax
from trax import math

# use the numpy module from trax
np = math.numpy

# use the math.random module from trax
random = math.random

In [None]:
# See how the math.trax.random.normal function works
tmp_key = random.get_prng(seed=1)
print("The random seed generated by random.get_prng")
display(tmp_key)

print("choose a matrix with 2 rows and 3 columns")
tmp_shape=(2,3)
display(tmp_shape)

# Generate a weight matrix
# Note that you'll get an error if you try to set dtype to tf.float32, where tf is tensorflow
# Just avoid setting the dtype and allow it to use the default data type
tmp_weight = trax.math.random.normal(key=tmp_key, shape=tmp_shape)

print("Weight matrix generated with a normal distribution with mean 0 and stdev of 1")
display(tmp_weight)

<a name='ex04'></a>
### Exercise 04

Implement the `Dense` class.

In [None]:
# GRADED FUNCTION: Dense

class Dense(Layer):
    """
    A dense (fully-connected) layer.
    """

    # __init__ is implemented for you
    def __init__(self, n_units):
        
        # Set the number of units in this layer
        self._n_units = n_units

    # Please implement 'forward()'
    def forward(self, x, weights):
        ### START CODE HERE ###
        # Matrix multiply x and the weight matrix
        dense = None
        
        ### END CODE HERE ###
        return dense

    # new_weights
    def new_weights(self, input_signature, random_key, stdev=0.1):
        
### START CODE HERE ###
        # The input_signature has a .shape attribute that gives the shape as a tuple
        input_shape = None

        # Generate the weight matrix from a normal distribution, 
        # and standard deviation of 'stdev'        
        w = None
        
### END CODE HERE ###     
        return w

In [None]:
# Testing your Dense layer 
dense_layer = Dense(n_units=10)  #sets  number of units in dense layer
random_key = random.get_prng(seed=0)  # sets random seed
z = np.array([[2.0, 7.0, 25.0]]) # input array 

# CODE REVIEW - Dense object .init calls what exactly should be clarified 
# as there is no link between init, forward and new_weights.
# This dense is of trax layer class and should be given documentation or explaination from the codebase definition of layer
# https://github.com/google/trax/blob/1372b903bb66b0daccee19fd0b1fdf44f659330b/trax/layers/base.py#L243
# It returns self._weights, self.state

dense_layer.init(z,random_key)
print("Weights are\n ",dense_layer.weights) #Returns randomly generated weights
print("Foward function output is ", dense_layer(z)) # Returns multiplied values of units and weights

<a name='2.3'></a>
### 2.3: Model

Now you will implement a classifier using neural networks. Here is the model architecture you will be implementing. 

<img src = "nn.jpg" style="width:400px;height:250px;"/>

Note that the second character of `tl` is the lowercase of letter `L`, not the number 1.

- [tl.Serial](https://github.com/google/trax/blob/master/trax/layers/combinators.py#L26): Combinator that applies layers serially.  
    - You can pass in the layers as arguments to `Serial`, separated by commas. 
    - For example: `tl.Serial(tl.Embeddings(...), tl.Mean(...), tl.Dense(...), tl.LogSoftmax(...))`

Please use the `help` function to view documentation for each layer.

In [None]:
# View documentation on tl.Serial
help(tl.Serial)

- [tl.Embedding](https://github.com/google/trax/blob/1372b903bb66b0daccee19fd0b1fdf44f659330b/trax/layers/core.py#L113): Layer constructor function for an embedding layer.  
    - `tl.Embedding(d_feature, vocab_size)`.
    - `d_feature` is the number of elements in the word embedding (some choices for a word embedding size range from 150 to 300, for example).
    - `vocab_size` is the number of unique words in the given vocabulary.
    - Recall from the previous course 2, week 4, that the embedding is 

In [None]:
# View documentation for tl.Embedding
help(tl.Embedding)

In [None]:
tmp_embed = tl.Embedding(d_feature=2,vocab_size=2)
display(tmp_embed)

- [tl.Mean](https://github.com/google/trax/blob/1372b903bb66b0daccee19fd0b1fdf44f659330b/trax/layers/core.py#L276): Calculates means across an axis.  In this case, please choose axis = 1 to get an average embedding vector (an embedding vector that is an average of all words in the vocabulary).  
- For example, if the embedding matrix is 300 elements and vocab size is 10,000 words, taking the mean of the embedding matrix along axis=1 will yield a vector of 300 elements.

In [None]:
# view the documentation for tl.mean
help(tl.Mean)

In [None]:
# Pretend the embedding matrix uses 
# 2 elements for embedding the meaning of a word
# and has a vocabulary size of 3
# So it has shape (2,3)
tmp_embed = np.array([[1,2,3,],
                    [4,5,6]
                   ])

# take the mean along axis 0
print("The mean along axis 0 creates a vector whose length equals the vocabulary size")
display(np.mean(tmp_embed,axis=0))

print("The mean along axis 1 creates a vector whose length equals the number of elements in a word embedding")
display(np.mean(tmp_embed,axis=1))

- [tl.Dense](https://github.com/google/trax/blob/1372b903bb66b0daccee19fd0b1fdf44f659330b/trax/layers/core.py#L28): A dense (fully-connected) layer.
- `tl.Dense(n_units=)`: The parameter `n_units` is the number of units chosen for this dense layer.


In [None]:
help(tl.Dense)

In [None]:
tmp_dense = tl.Dense(n_units=2)
tmp_dense

- [tl.LogSoftmax](https://github.com/google/trax/blob/1372b903bb66b0daccee19fd0b1fdf44f659330b/trax/layers/core.py#L242): Implements log softmax function
- Here, you don't need to set any parameters for `LogSoftMax()`.

In [None]:
help(tl.LogSoftmax)

<a name='ex05'></a>
### Exercise 05
Implement the classifier function. 

In [None]:
# GRADED FUNCTION: classifier
def classifier(vocab_size=len(Vocab), embedding_dim=256, output_dim=2, mode='train'):
    
    # create embedding layer
    # number of rows is the embedding units
    # number of columns is the vocabulary size
    embed_layer = None
    
    # Create a mean layer, to create an "average" word embedding
    mean_layer = None
    
    # Create a dense layer, one unit for each output
    dense_output_layer = None
    
    # Create the log softmax layer (no parameters needed)
    log_softmax_layer = None
    
    # Use tl.Serial to combine all layers
    # and create the classifier
    # of type trax.layers.combinators.Serial
    model = None
    
    # return the model of type
    return model

In [None]:
tmp_model = classifier()

In [None]:
print(type(tmp_model))
display(tmp_model)

##### Expected output

```
<class 'trax.layers.combinators.Serial'>
Serial{in=1,out=1,sublayers=[Embedding{in=1,out=1}, Mean{in=1,out=1}, Dense{in=1,out=1}, LogSoftmax{in=1,out=1}]}
```

<a name='3'></a>
# Part 3: Training

Before, going into the training, we will define the inputs using `trax.supervised.Inputs`
- run `help(trax.supervised.Inputs)` for details
- You will pass in the data generators that provide processed training & validation sets.

In [None]:
# View documentation for trax.supervised.Inputs
help(trax.supervised.Inputs)

Note that the generators are not considered "callable", and you'll get an error message if you pass them to the `trax.supervised.Inputs` like this:

```CPP
tweet_inputs = trax.supervised.Inputs(
      train_stream=train_generator(16), # Keeping the batch size as 16
      eval_stream=val_generator(16)) # make this validation set
```

Error message is:
```CPP
TypeError: 'generator' object is not callable
```

In order to pass in a callable function, wrap the each generator in a function.
- Notice that the function takes in a single parameter that doesn't appear to be used.
- This is needed when trax.supervised.Inputs calls the function and passes in a seed value.

In [None]:
# Define a callable function that invokes the generator
# The parameter x is used when the trax.supervised.Inputs calls this function
# and sets a seed value.
def train_gen_callable(x):
    
    return train_generator(16)

# Define a callable function that invokes the generator
# The parameter x is used when the trax.supervised.Inputs calls this function
# and sets a seed value.
def val_gen_callable(x):
    return val_generator(16)

# Use trax.supervised.Inputs
# to process the inputs.
tweet_inputs = trax.supervised.Inputs(
      train_stream=train_gen_callable, 
      eval_stream=val_gen_callable)

A more compact way to do the same thing is to use a lambda function, like this:
```CPP
# Use lambda function to pass in callable functions that can invoke the generators
tweet_inputs = trax.supervised.Inputs(
      train_stream=lambda x: train_generator(16), # batch size 16
      eval_stream=lambda x: val_generator(16))
```

<a name='3.1'></a>
### 3.1 Training the model

Now you are going to train your model. 

You will use `trax.supervised.Trainer` to create a model.
- Feed in a model
- Define the cost function
- Choose the optimizer
- Choose the learning rate scheduler.
- Choose the inputs
- Choose and output directory

You will use `trax.supervised.Trainer` to create a model.

In [None]:
# View documentation for trax.supervised.Trainer
help(trax.supervised.Trainer)

Notice the constructor shows the parameters that you will give to the Trainer.
```CPP
__init__(self, model, loss_fn, optimizer, lr_schedule, inputs, output_dir=None, ...
```

For `trax.supervised.Trainer`
- Feed in a model
    - For the `model` parameter, pass in the name of the `classifier` function that you defined earlier. (The one which creates the tl.Serial() model.)

- define the cost function.
    - For the `loss_fn` parameter, choose `tl.CrossEntropyLoss`
    - Cross Entropy Loss is used for classification problems that choose between two classes (in this case, positive or negative sentiment).
    - Pass in the reference to the function `tl.CrossEntropyLoss` (don't include any parentheses `()` after it.)

In [None]:
# View metrics and loss functions that you could choose from
help(trax.layers.metrics)

Notice that some loss or evaluation metrics include: 

```CPP
AccuracyScalar
    
CrossEntropyLoss
    
L2Loss
```

In [None]:
# View documentation for tl.CrossEntropyLoss
help(tl.CrossEntropyLoss)

- Choose the value for the `optimizer` parameter:
    - Choose `trax.optimizers.Adam`
    - The Adam optimizer is a popular optimizer that tracks a learning weight for each weight and adjusts that learning rate during training.
    - Again, just pass in the reference to Adam (no parentheses)

In [None]:
# View optimizers that you could choose from
help(trax.optimizers)

Notice some available optimizers include:
```CPP
adafactor
    adam
    base
    momentum
    rms_prop
    sgd
    sm3
```

In [None]:
# View documentation for trax.optimizers.Adam
help(trax.optimizers.Adam)

- Choose the value for the `lr_schedule` parameter, which is the learning rate scheduler.
    - Choose `trax.lr.MultifactorSchedule`.  As with the other inputs, just pass in the reference to [MultifactorSchedule](https://github.com/google/trax/blob/1372b903bb66b0daccee19fd0b1fdf44f659330b/trax/lr_schedules.py#L47) (don't include any parentheses).
    - A learning rate scheduler is a function that determines the learning rate depending on the training step.  For example, it may make sense to have the learning rate get smaller as the model gets closer to the optimal weights.

In [None]:
# View learning rate functions that you could choose from
help(trax.lr)

Notice three learning rate schedules to choose from are:

```CPP
EvalAdjustingSchedule
MultifactorSchedule
PolicySchedule
```

In [None]:
# View documentation for trax.lr.MultifactorSchedule
help(trax.lr.MultifactorSchedule)

- Choose value for the `inputs` parameter:
    - Feed in the inputs that you created from calling `trax.supervised.Inputs`.
- Choose and `output_dir` output directory:
    - This is the folder where your model will be saved.
    - We've chosen the directory '~/train_dir/' for the output directory.

Use `os.path.expanduser()` to expand a shortcut for the path into the full path.
- For example, expand `~/tmp_dir/`, where `~` refers to the home directory, so that it shows the full path: `/home/jovyan/tmp_dir/`

In [None]:
# Study an example of expanding a directory path
tmp_dir = '~/tmp_dir/'
tmp_dir_expand = os.path.expanduser(tmp_dir)
print(tmp_dir_expand)

After you define the model using `trax.supervised.Trainer`, train the model over a chosen number of epochs
- One `epoch` is a full set of training steps and evaluation on validation data.

For each epoch, call the `trainer`'s `.train_epoch(n_steps=..., n_eval_steps=...)` function.
- `n_steps`: number of batches to train in one epoch. For example, if n_steps is 100, then it will train for 100 batches and then output evaluation metrics such as accuracy, before moving onto the next epoch.
- `n_eval_steps`: number of batches of the validation set to use when evaluating on the validation data (such as calculating accuracy).

In [None]:
help(trax.supervised.Trainer.train_epoch)

<a name='ex06'></a>
### Exercise 06
Implement `train_model` to define the model and then train it for the given number of epochs and training steps.

In [None]:
# GRADED FUNCTION: train_model
def train_model(classifier, n_epochs, train_steps, eval_steps, output_dir='~/train_dir/'):
    '''
    Input: 
    classifier - the model you are building
    n_epochs - number of times to go over all the data
    train_steps - number of training steps
    eval_steps - the evaluation steps
    output_dir - folder to save your files
    Output:
    trainer -  object ready for training
    '''
### START CODE HERE ###    
    # Expand the output directory to the full path name
    output_dir = os.path.expanduser(output_dir)

    # Create a trax.supervised.Trainer object
    trainer = trax.supervised.Trainer(
        model=None, # classifier function which you defined before
        loss_fn=None, # cross entropy loss
        optimizer=None,  # adam optimizer
        lr_schedule=None,  # Change lr schedule here. What is multi... schedule https://github.com/google/trax/blob/1372b903bb66b0daccee19fd0b1fdf44f659330b/trax/lr_schedules.py#L47
        inputs=None, # Processed inputs
        output_dir=None,
        random_seed=271)
    
    # iterate through each epoch
    for _ in range(None): # complete this line
        
        # call the trainer's train_epoch function
        # set n_steps and n_eval_steps
        trainer.train_epoch(n_steps=None,
                            n_eval_steps=None)
        
### END CODE HERE ###

    return trainer

In [None]:
# Try out train_model
# Use 1 epoch and 200 training steps
# This takes about 30 seconds to run. 
n_epochs  = 1
train_steps = 200
eval_steps = 10 
tmp_output_dir = '~/model1/' # remove previous model.pkl
!rm -f {tmp_output_dir}/model.pkl
trainer_1 = train_model(classifier, n_epochs, train_steps, eval_steps, output_dir=tmp_output_dir)

##### Expected output
```CPP
Step    200: Ran 200 train steps in 13.81 secs
Step    200: Evaluation
Step    200: train                   accuracy |  0.94375002
Step    200: train                       loss |  0.24765237
Step    200: train         neg_log_perplexity | -0.24765237
Step    200: train          sequence_accuracy |  0.94375002
Step    200: train weights_per_batch_per_core |  16.00000000
Step    200: eval                    accuracy |  0.98124999
Step    200: eval                        loss |  0.25154954
Step    200: eval          neg_log_perplexity | -0.25154954
Step    200: eval           sequence_accuracy |  0.98124999
Step    200: eval  weights_per_batch_per_core |  16.00000000
Step    200: Finished evaluation
```

In [None]:
# Try out your model with different hyperparameters
# Use 4 epochs and 50 training steps per epoch
# This takes about 30 seconds to run. 
n_epochs  = 4
train_steps = 50
eval_steps = 10 
tmp_output_dir_2 = '~/model2/' # remove previous model.pkl
!rm -f {tmp_output_dir_2}/model.pkl
trainer_2 = train_model(classifier, n_epochs, train_steps, eval_steps, output_dir=tmp_output_dir_2)

**Expected Output:**

```CPP
Step     50: Ran 50 train steps in 10.79 secs
Step     50: Evaluation
Step     50: train                   accuracy |  0.61874998
Step     50: train                       loss |  0.67425668
Step     50: train         neg_log_perplexity | -0.67425668
Step     50: train          sequence_accuracy |  0.61874998
Step     50: train weights_per_batch_per_core |  16.00000000
Step     50: eval                    accuracy |  0.50625002
Step     50: eval                        loss |  0.68561053
Step     50: eval          neg_log_perplexity | -0.68561053
Step     50: eval           sequence_accuracy |  0.50625002
Step     50: eval  weights_per_batch_per_core |  16.00000000
Step     50: Finished evaluation

Step    100: Ran 50 train steps in 1.06 secs
Step    100: Evaluation
Step    100: train                   accuracy |  0.86250001
Step    100: train                       loss |  0.52874601
Step    100: train         neg_log_perplexity | -0.52874601
Step    100: train          sequence_accuracy |  0.86250001
Step    100: train weights_per_batch_per_core |  16.00000000
Step    100: eval                    accuracy |  0.81875002
Step    100: eval                        loss |  0.53125608
Step    100: eval          neg_log_perplexity | -0.53125608
Step    100: eval           sequence_accuracy |  0.81875002
Step    100: eval  weights_per_batch_per_core |  16.00000000
Step    100: Finished evaluation

Step    150: Ran 50 train steps in 0.48 secs
Step    150: Evaluation
Step    150: train                   accuracy |  0.95625001
Step    150: train                       loss |  0.37101132
Step    150: train         neg_log_perplexity | -0.37101132
Step    150: train          sequence_accuracy |  0.95625001
Step    150: train weights_per_batch_per_core |  16.00000000
Step    150: eval                    accuracy |  0.93124998
Step    150: eval                        loss |  0.39398065
Step    150: eval          neg_log_perplexity | -0.39398065
Step    150: eval           sequence_accuracy |  0.93124998
Step    150: eval  weights_per_batch_per_core |  16.00000000
Step    150: Finished evaluation

Step    200: Ran 50 train steps in 1.05 secs
Step    200: Evaluation
Step    200: train                   accuracy |  0.96249998
Step    200: train                       loss |  0.21740448
Step    200: train         neg_log_perplexity | -0.21740448
Step    200: train          sequence_accuracy |  0.96249998
Step    200: train weights_per_batch_per_core |  16.00000000
Step    200: eval                    accuracy |  0.96875000
Step    200: eval                        loss |  0.24526033
Step    200: eval          neg_log_perplexity | -0.24526033
Step    200: eval           sequence_accuracy |  0.96875000
Step    200: eval  weights_per_batch_per_core |  16.00000000
Step    200: Finished evaluation
```

<a name='3.2'></a>
### Part 3.2 Initialize a model with the trained weights

Now that you have trained a model, the weights are stored in the `trainer_1` and `trainer_2` objects.  To initialize a model based on these weights
- Create an object based on the `classifier` that you defined.
- initialize the model
- assign the weights that were derived from training.

In [None]:
help(trax.shapes.ShapeDtype)

Recall that the model that is returned by your `classifier` function is of type `trax.layers.combinators.Serial`.  It has an init function `.init(input_signature=..., dtype=...)`.

In [None]:
help(trax.layers.combinators.Serial.init)

In [None]:
# Create an instance of your classifier
tmp_model = classifier()

# Initialize the model, which is of type trax.layers.combinators.Serial 
tmp_model.init(input_signature=trax.shapes.ShapeDtype((1, 1), dtype=np.int32))

# Assign the weights that you recently trained to the model
# Use trainer_2 weights (it had higher accuracy compared to trainer_1)
tmp_model.weights = trainer_2.model_weights 

<a name='3.3'></a>
### Part 3.3 Practice Making a prediction

Use the training data just to see how the prediction process works.  
- Later, you will use validation data to evaluate your model's performance.


In [None]:
# Create a generator object
tmp_val_generator = val_generator(16)

# get one batch
tmp_batch = next(tmp_val_generator)

# Position 0 has the model inputs (tweets as tensors)
# position 1 has the targets (the actual labels)
tmp_inputs, tmp_targets = tmp_batch

print(f"The batch is a tuple of length {len(tmp_batch)} because position 0 contains the tweets, and position 1 contains the targets.") 
print(f"The shape of the tweet tensors is {tmp_inputs.shape} (num of examples, length of tweet tensors)")
print(f"The shape of the labels is {tmp_targets.shape}, which is the batch size.")

In [None]:
# feed the tweet tensors into the model to get a prediction
tmp_pred = tmp_model(tmp_inputs)
print(f"The prediction shape is {tmp_pred.shape}, num of tensor_tweets as rows")
print("Column 0 is the probability of a negative sentiment (class 0)")
print("Column 1 is the probability of a positive sentiment (class 1)")
print()
print("View the prediction array")
tmp_pred

To turn these probabilities into categories (negative or positive sentiment prediction), for each row:
- Compare the probabilities in each column.
- If column 1 has a value greater than column 0, classify that as a positive tweet.
- Otherwise if column 1 is less than or equal to column 0, classify that example as a negative tweet.

In [None]:
# turn probabilites into category predictions
tmp_is_positive = tmp_pred[:,1] > tmp_pred[:,0]
for i, p in enumerate(tmp_is_positive):
    print(f"Neg prob {tmp_pred[i,0]:.4f}\tPos prob {tmp_pred[i,1]:.4f}\t is positive? {p}\t actual {tmp_targets[i]}")

Notice that since you are making a prediction using a training batch, it's more likely that the model's predictions match the actual targets (labels).  
- Every prediction that the tweet is positive is also matching the actual target of 1 (positive sentiment).
- Similarly, all predictions that the sentiment is not positive matches the actual target of 0 (negative sentiment)

One more useful thing to know is how to compare if the prediction is matching the actual target (label).  
- The result of calculation `is_positive` is a boolean.
- The target is a type trax.math.numpy.int32
- If you expect to be doing division, you may prefer to work with decimal numbers with the data type type trax.math.numpy.int32

In [None]:
# View the array of booleans
print("Array of booleans")
display(tmp_is_positive)

# convert boolean to type int32
# True is converted to 1
# False is converted to 0
tmp_is_positive_int = tmp_is_positive.astype(np.int32)


# View the array of integers
print("Array of integers")
display(tmp_is_positive_int)

# convert boolean to type float32
tmp_is_positive_float = tmp_is_positive.astype(np.float32)

# View the array of floats
print("Array of floats")
display(tmp_is_positive_float)

In [None]:
tmp_pred.shape

Note that Python usually does type conversion for you when you compare a boolean to an integer
- True compared to 1 is True, otherwise any other integer is False.
- False compared to 0 is True, otherwise any ohter integer is False.

In [None]:
print(f"True == 1: {True == 1}")
print(f"True == 2: {True == 2}")
print(f"False == 0: {False == 0}")
print(f"False == 2: {False == 2}")

However, we recommend that you keep track of the data type of your variables to avoid unexpected outcomes.  So it helps to convert the booleans into integers
- Compare 1 to 1 rather than comparing True to 1.

Hopefully you are now familiar with what kinds of inputs and outputs the model uses when making a prediction.
- This will help you implement a function that estimates the accuracy of the model's predictions.

<a name='4'></a>
# Part 4: Evaluation  

<a name='4.1'></a>
### 4.1 Computing the accuracy on a batch

You will now write a function that evaluates your model on the validation set and returns the accuracy. 
- `preds` contains the predictions.
    - Its dimensions are `(batch_size, output_dim)`.  `output_dim` is two in this case.  Column 0 contains the probability that the tweet belongs to class 0 (negative sentiment). Column 1 contains probability that it belongs to class 1 (positive sentiment).
    - If the probability in column 1 is greater than the probability in column 0, then interpret this as the model's prediction that the example has label 1 (positive sentiment).  
    - Otherwise, if the probabilities are equal or the probability in column 0 is higher, the model's prediction is 0 (negative sentiment).
- `y` contains the actual labels.

<a name='ex07'></a>
### Exercise 07
Implement `compute_accuracy`.

In [None]:
# GRADED FUNCTION: compute_accuracy

def compute_accuracy(preds, y):
    """
    Input: 
        preds: a tensor of shape (dim_batch, output_dim) 
        y: a tensor of shape (dim_batch, output_dim) with the true labels
    Output: 
        accuracy: a float between 0-1 
    """
    ### START CODE HERE ###

    # Create an array of booleans, 
    # True if the probability of positive sentiment is greater than
    # the probability of negative sentiment
    # else False
    is_pos =  None

    # convert the array of booleans into an array of np.int32
    is_pos_int = None
    
    # compare the array of predictions (as int32) with the target (labels) of type int32
    correct = None

    # Count the number of predictions
    num_predictions = None
    
    # convert the array of correct predictions (boolean) into an arrayof np.float32
    correct_float = None
    
    # Sum up the correct predictions (of type np.float32) 
    num_correct = None

    # Divide the number of correct predictions by the number of total predictions
    accuracy = num_correct / num_predictions

    ### END CODE HERE ###
    return accuracy, num_correct, num_predictions

In [None]:
# test your function
tmp_val_generator = val_generator(16)

# get one batch
tmp_batch = next(tmp_val_generator)

# Position 0 has the model inputs (tweets as tensors)
# position 1 has the targets (the actual labels)
tmp_inputs, tmp_targets = tmp_batch

# feed the tweet tensors into the model to get a prediction
tmp_pred = tmp_model(tmp_inputs)

tmp_acc, tmp_num_correct, tmp_num_predictions = compute_accuracy(preds=tmp_pred, y=tmp_targets)

print(f"Model's prediction accuracy on a single training batch is: {100 * tmp_acc}%")
print(f"Number of correct predictions {tmp_num_correct}; number of total observations predicted {tmp_num_predictions}")

##### Expected output

```
Model's prediction accuracy on a single training batch is: 93.75%
Number of correct predictions 15.0; number of total observations predicted 16
```

<a name='4.2'></a>
### 4.2 Testing your model on Validation Data

Now you will write test your model's prediction accuracy on validatio data. 

This program will take in a data generator and your model. 
- The generator allows you to get batches of data. You can use it with a `for` loop:

```
for batch in iterator: 
   # do something with that batch
```

`batch` has dimensions `(batch size, 2)`. 
- Column 0 corresponds to the tweet as a tensor.
- Column 1 corresponds to its target (actual label, positive or negative sentiment).
- You can feed the tweet into model and it will return the predictions for the batch. 


<a name='ex08'></a>
### Exercise 08

**Instructions:** 
- Compute the accuracy over all the batches in the validation iterator. 
- Make use of `compute_accuracy`, which you recently implemented, and return the overall accuracy.

In [None]:
# GRADED FUNCTION: test_model

def test_model(generator, model):
    '''
    Input: 
        generator: an iterator instance that provides batches of inputs and targets
        model: a model instance 
    Output: 
        acc: float corresponding to the accuracy
    '''
    
    accuracy = 0.
    total_num_correct = 0
    total_num_pred = 0
    count = 0
    
    ### START CODE HERE ###
    for batch in generator: 
        
        # Retrieve the inputs from the batch
        inputs = None
        
        # Retrieve the targets (actual labels) from the batch
        targets = None
        
        # Make predictions using the inputs
        pred = None
        
        # Calculate accuracy for the batch by comparing its predictions and targets
        batch_accuracy, batch_num_correct, batch_num_pred = None 
        
        # Update the total number of correct predictions
        # by adding the number of correct predictions from this batch
        total_num_correct += None
        
        # Update the total number of predictions 
        # by adding the number of predictions made for the batch
        total_num_pred += None

    # Calculate accuracy over all examples
    accuracy = None
    
    ### END CODE HERE ###
    return accuracy

In [None]:
# DO NOT EDIT THIS CELL
# testing the accuracy of your model: this takes around 20 seconds
model = classifier() # creates an instance of your classifier
model.init(trax.shapes.ShapeDtype((1, 1), dtype=np.int32))
model.weights = trainer_2.model_weights # Assigned trained model weights to the model

accuracy = test_model(val_generator(16), model)

print(f'The accuracy of your model on the validation set is {accuracy:.4f}', )

##### Expected Output

```CPP
The accuracy of your model on the validation set is 0.9640
```

<a name='5'></a>
# Part 5: Testing with your own input

Finally you will test with your own input. You will see that deepnets are more powerful than the older methods you have used before. Although you go close to 100% accuracy on the first two assignments, the task was way easier. 

In [None]:
# this is used to predict on your own sentnece
def predict(sentence):
    inputs = np.array(tweet_to_tensor(sentence, vocab_dict=Vocab))
    
    # Batch size 1, add dimension for batch, to work with the model
    inputs = inputs[None, :]  
    
    # predict with the model
    preds_probs = model(inputs)
    
    # Turn probabilities into categories
    preds = int(preds_probs[0, 1] > preds_probs[0, 0])
    
    sentiment = "negative"
    if preds == 1:
        sentiment = 'positive'

    return preds, sentiment

In [None]:
# try a positive sentence
sentence = "It's such a nice day, think i'll be taking Sid to Ramsgate, fish and chips for lunch at Peter's fish factory and then the beach maybe"
tmp_pred, tmp_sentiment = predict(sentence)
print(f"The sentiment of the sentence \n***\n\"{sentence}\"\n***\nis {tmp_sentiment}.")

print()
# try a negative sentence
sentence = "I hated my day, it was the worst, I'm so sad."
tmp_pred, tmp_sentiment = predict(sentence)
print(f"The sentiment of the sentence \n***\n\"{sentence}\"\n***\nis {tmp_sentiment}.")

Notice that the model seems to prefer prediction positive sentiment, even for a sentence that looks negative.

### On Deep Nets

Deep nets allow you to understand and capture dependencies that you would have not been able to capture with a simple linear regression, or logistic regression. 
- It also allows you to better use pre-trained embeddings for classification and tends to generalize better.