<a href="https://colab.research.google.com/github/rvraghvender/DeepLearningProjects/blob/main/NaturalLanguageProcessing/CharacterLevelLanguageModel/Character_level_language_model.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Character level language model - Dinosaurus Island

Welcome to Dinosaurus Island! 65 millions years ago, dinosaurs existed and they have returned in this project.

We are in charge of a special task: Leading biology researchers are creating new breeds of dinosuars and bringing them to life on earth, and our job is to give names to these dinosaurs. If a dinosaurs does not like its name, it might go berserk;). So, choose wisely!

<table>
<td>
<img src="https://github.com/rvraghvender/DeepLearningProjects/blob/main/NaturalLanguageProcessing/CharacterLevelLanguageModel/images/dino.jpg?raw=true" style="width:250;height:300px;">

</td>

</table>

Luckily, we'are equipped with some deep learning concepts, and we will use it to save the day! Our assistant has collected a list of all dinosuars names they could find, and compiled them into this [dataset](dinos.txt).  To create new dinosaur names, we will build a character-level language model to generate new names. Our algorithm, will learn the different name patterns, and randomly generate new names.

For completetion of this project, we need to:
 * Store text data for processing using an RNN
 * Build a character-level text generation model using an RNN
 * Sample novel sequences in an RNN
 * Explain the vanishing/exploding gradient problem in RNNs
 * Apply gradient clipping as a solution for exploding gradients



## Table of Contents

- [Packages](#0)
- [1 - Problem Statement](#1)
    - [1.1 - Dataset and Preprocessing](#1-1)
    - [1.2 - Overview of the Model](#1-2)
- [2 - Building Blocks of the Model](#2)
    - [2.1 - Clipping the Gradients in the Optimization Loop](#2-1)
        - [TODO 1 - clip](#ex-1)
    - [2.2 - Sampling](#2-2)
        - [TODO 2 - sample](#ex-2)
- [3 - Building the Language Model](#3)
    - [3.1 - Gradient Descent](#3-1)
        - [TODO 3 - optimize](#ex-3)
    - [3.2 - Training the Model](#3-2)
        - [TODO 4 - model](#ex-4)
- [4 - Writing like Shakespeare](#4)
- [5 - References](#5)

## Packages

In [2]:
import numpy as np
from utils import *
import random
import pprint
import copy

## 1 - Problem Statment

### 1.1 - Dataset and Preprocessing

Run the following cell to read the dataset of dinosaurs names, create a list of unique characters (such as a-z), and compute the dataset and vocabulary size.'

In [10]:
data = open('dinos.txt', 'r').read()
data = data.lower()
chars = list(set(data))
data_size, vocab_size = len(data), len(chars)
print(f'There are {data_size} total characters and {vocab_size} unique characters in our data.')

There are 19909 total characters and 27 unique characters in our data.


- The characters are a-z (26 characters) plus "\n" (or newline character)
- Here the newline character "\n" plays a role similar to the <EOS> (or "End of sentence") token.
    - Here, "\n" indicates the end of the dinosaur name rather than the end of sentence.
- `char_to_ix`: In the cell below, we'll create a Python dictionary to map each character to an index from 0-26.
- `ix_to_char`: Then, we will create second Python dictionary that maps each index back to the correspoding character.
    - This will help us to figure out which index corresponds to which character in the probability distribution output of the softmax layer.

In [11]:
chars = sorted(chars)
print(chars)

['\n', 'a', 'b', 'c', 'd', 'e', 'f', 'g', 'h', 'i', 'j', 'k', 'l', 'm', 'n', 'o', 'p', 'q', 'r', 's', 't', 'u', 'v', 'w', 'x', 'y', 'z']


In [16]:
char_to_ix = {ch:i for i,ch in enumerate(chars)}
ix_to_char = {i:ch for i,ch in enumerate(chars)}
pp = pprint.PrettyPrinter(indent=4)
pp.pprint(ix_to_char)

{   0: '\n',
    1: 'a',
    2: 'b',
    3: 'c',
    4: 'd',
    5: 'e',
    6: 'f',
    7: 'g',
    8: 'h',
    9: 'i',
    10: 'j',
    11: 'k',
    12: 'l',
    13: 'm',
    14: 'n',
    15: 'o',
    16: 'p',
    17: 'q',
    18: 'r',
    19: 's',
    20: 't',
    21: 'u',
    22: 'v',
    23: 'w',
    24: 'x',
    25: 'y',
    26: 'z'}


### 1.2 - Overview of the Model

Our model will have the following structure:

- Initialize parameters
- Run the optimization loop
    - Forward propagation to compute the loss function
    - Backward propagation to compute the gradients with respect to the loss function
    - Clip the gradients to avoid exploding gradients
    - Using the gradients, update our parameters with the gradient descent update rule.
- Return the learned parameteres.

<img src="https://github.com/rvraghvender/DeepLearningProjects/blob/main/NaturalLanguageProcessing/CharacterLevelLanguageModel/images/rnn.png?raw=true" style="width:450;height:300px;">
<caption><center><font color='purple'><b>Figure 1</b>: Recurrent Neural Network, similar to "Building a Recurrent Neural Network - Step by Step." notebook  </center></caption>

- At each time-step, the RNN tries to predict what the next characters is, given the previous characters.
- $X$ = ($x^{<1>}$, $x^{<2>}$, ... ,$x^{<T_x>}$) is a list of characters from the trainig set.
- $Y$ = ($y^{<1>}$, $y^{<2>}$, ... ,$y^{<T_x>}$) is the same list of characters but shifted one character forward.
- At every time-step $t$, $y^{<t>}$ = $x^{<t+1>}$. The prediction at time $t$ is the same as the input at the $t+1$.


## 2 - Building Block of the Model

In this part, we will build two important blocks of the overall model:

    1. Gradient clipping: to avoid exploding the gradients
    2. Sampling: a technique used to generate characters

We will apply these two functions to build the model.

### 2.1 - Clipping the Gradients in the Optimization Loop

In this section, we will implement `clip` function that we will call inside of our optimization loop.

#### Exploding gradients

- When gradients are very large, they're called "exploding gradients"
- Exploding gradients make the training process more difficult, because the updates may be so large that they "overshoot" the optimal values during back propagation.