<table class="tfo-notebook-buttons" align="left">
  <td>
    <a href="https://colab.research.google.com/github/martin-fabbri/colab-notebooks/blob/master/deeplearning.ai/nlp/c3_w2_assignment_deep_n_grams.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>    
  </td>
  <td>
    <a href="https://github.com/martin-fabbri/colab-notebooks/blob/master/deeplearning.ai/nlp/c3_w2_assignment_deep_n_grams.ipynb" target="_parent"><img src="https://raw.githubusercontent.com/martin-fabbri/colab-notebooks/master/assets/github.svg" alt="View On Github"/></a>  </td>
</table>

# Assignment 2:  Deep N-grams

Welcome to the second assignment of course 3. In this assignment you will explore Recurrent Neural Networks `RNN`.
- You will be using the fundamentals of google's [trax](https://github.com/google/trax) package to implement any kind of deeplearning model. 

By completing this assignment, you will learn how to implement models from scratch:
- How to convert a line of text into a tensor
- Create an iterator to feed data to the model
- Define a GRU model using `trax`
- Train the model using `trax`
- Compute the accuracy of your model using the perplexity
- Predict using your own model


## Outline

- [Overview](#0)
- [Part 1: Importing the Data](#1)
    - [1.1 Loading in the data](#1.1)
    - [1.2 Convert a line to tensor](#1.2)
        - [Exercise 01](#ex01)
    - [1.3 Batch generator](#1.3)
        - [Exercise 02](#ex02)
    - [1.4 Repeating Batch generator](#1.4)        
- [Part 2: Defining the GRU model](#2)
    - [Exercise 03](#ex03)
- [Part 3: Training](#3)
    - [3.1 Training the Model](#3.1)
        - [Exercise 04](#ex04)
- [Part 4:  Evaluation](#4)
    - [4.1 Evaluating using the deep nets](#4.1)
        - [Exercise 05](#ex05)
- [Part 5: Generating the language with your own model](#5)    
- [Summary](#6)

<a name='0'></a>
### Overview

Your task will be to predict the next set of characters using the previous characters. 
- Although this task sounds simple, it is pretty useful.
- You will start by converting a line of text into a tensor
- Then you will create a generator to feed data into the model
- You will train a neural network in order to predict the new set of characters of defined length. 
- You will use embeddings for each character and feed them as inputs to your model. 
    - Many natural language tasks rely on using embeddings for predictions. 
- Your model will convert each character to its embedding, run the embeddings through a Gated Recurrent Unit `GRU`, and run it through a linear layer to predict the next set of characters.

<img src = "model.png" style="width:600px;height:150px;"/>

The figure above gives you a summary of what you are about to implement. 
- You will get the embeddings;
- Stack the embeddings on top of each other;
- Run them through two layers with a relu activation in the middle;
- Finally, you will compute the softmax. 

To predict the next character:
- Use the softmax output and identify the word with the highest probability.
- The word with the highest probability is the prediction for the next word.

In [None]:
%%capture
!pip install trax

In [None]:
import trax
from trax import layers as tl

import numpy as np
import torch 
import torch.nn as nn
import torch.nn.functional as F
import torch.optim as optim

In [None]:
!pip list | grep 'trax\|jax'

jax                           0.2.7                
jaxlib                        0.1.57+cuda101       
trax                          1.3.7                


In [None]:
def get_batch(source, i):
    '''
        returns a batch
    '''
    bptt = 35
    seq_len = min(bptt, len(source) - 1 - i)
    data = source[i:i+seq_len]
    target = source[i+1:i+1+seq_len].view(-1)
    
    return data, target


def batchify(data, bsz):
    # Work out how cleanly we can divide the dataset into bsz parts.
    nbatch = data.size(0) // bsz
    # Trim off any extra elements that wouldn't cleanly fit (remainders).
    data = data.narrow(0, 0, nbatch * bsz)
    # Evenly divide the data across the bsz batches.
    data = data.view(bsz, -1).t().contiguous()
    return data


# to detach the hidden state from the graph.
def detach(hidden):
    """
    This function detaches every single tensor. 
    """
    if isinstance(hidden, torch.Tensor):
        return hidden.detach()
    else:
        return tuple(detach(v) for v in hidden)