In [11]:
import numpy as np
from utils import *
import random

# Step 01 - Processing the data

   * In this step we shall read the input words - dinosaurs names as each setence with '\n' char at the end of each name we also create our vocabulary characters list

   * We need to maintain two hashtable char_to-index and index_to_char each of these are mappings of characters in vocabulary to an index and viceversa

In [13]:
input_data = open('IndianNames.txt', 'r').read()
input_data = input_data.lower()

unique_chars = list(set(input_data))
input_data_size, voacb_chars_size = len(input_data), len(unique_chars)
unique_chars = sorted(unique_chars)

char_to_index = { ch:ind for ind,ch in enumerate(unique_chars) } #Hashtable mapping char -> index
index_to_char = { ind:ch for ind,ch in enumerate(unique_chars) } #Hashtable mapping index -> char

# Step 02 - Building the model

Building the model include the following:

    a) Training the model
    
    b) Optimizing:
    
        b.1) Forward Prop
        b.2) Backward Prop
        b.3) Gradient Clipping
        b.4) Parameters updation
        
    c) Performing sampling -> to ensure the training.

## 2.1 - Gradient Clipping

Gradient clipping is performed to avoid the exploding gradients problem, in which the gradients suffer from having extremely high values.

Hence, given a range [-n, n] check if the value is within the range if:

    a) num > n => num = n
    
    b) num < -n => num = -n

In [14]:
def clip(gradients, maxValue):
    
    dWaa, dWax, dWya, db, dby = gradients['dWaa'], gradients['dWax'], gradients['dWya'], gradients['db'], gradients['dby']
    
    for ele in [dWaa, dWax, dWya, db, dby]:
        np.clip(ele, -1 * maxValue, maxValue, out=ele)
    
    gradients = {"dWaa": dWaa, "dWax": dWax, "dWya": dWya, "db": db, "dby": dby}
    
    return gradients

## 2.2 - Sampling

Sampling the model is assuming that the model is trained and is ready to generate the next character and there by dinosaur name 

when it is fed by an input character. Problem here would be only one name will be generated for one character. Here are the 

steps for sampling:

**Step 01**: Retrieve the parameters into respective variables, and initalize the $x^{\langle 1 \rangle} = \vec{0}$ and 

$a^{\langle 0 \rangle} = \vec{0}$

**Step 02**: Calculate the $a^{\langle 1 \rangle}$ and $y^{\langle 1 \rangle}$, equations required are:

$$ a^{\langle t+1 \rangle} = \tanh(W_{ax}  x^{\langle t+1 \rangle } + W_{aa} a^{\langle t \rangle } + b)\tag{1}$$

$$ z^{\langle t + 1 \rangle } = W_{ya}  a^{\langle t + 1 \rangle } + b_y \tag{2}$$

$$ \hat{y}^{\langle t+1 \rangle } = softmax(z^{\langle t + 1 \rangle })\tag{3}$$

**Note:** The vectors **x** and **a** are 2D not 1D.

**Step 03**: Sampling, the $y^$ which is a result of softmax probability vector. If the highest probability index is concerned 

everytime there shall be a problem of having single name for a input character. For this case, we shall consider random function 

np.random.choice().

**Step 04**: Updating the $x^{\langle t \rangle}$ with the values of $x^{\langle t+1 \rangle}$ by initalizing all the values to 

zero and 1 to idx index indicating one-hot vector of that character.


In [15]:
def sample(parameters, char_to_ix, seed):
    
    # Retrieve parameters
    
    Waa, Wax, Wya, by, b = parameters['Waa'], parameters['Wax'], parameters['Wya'], parameters['by'], parameters['b']
    vocab_size = by.shape[0]
    n_a = Waa.shape[1]

    # Initalize x and a_prev with zeros
    x = np.zeros((vocab_size, 1))
    a_prev = np.zeros((n_a, 1))
    
    # Indices [] is the empty list, to which we append the index of the characters the model is giving.
    indices = []
    
    # this the number at every step which holds the index of the next character.
    idx = -1 
    
    # Looping over t number of time_steps, counter == 50 or when the next index is '\n' 
    counter = 0
    newline_character = char_to_ix['\n']
    
    while (idx != newline_character and counter != 50):
        
        # Three equations in step:02 - f.w.d propogation
        a = np.tanh(np.dot(Wax, x) + np.dot(Waa, a_prev) + b)
        z = np.dot(Wya, a) + by
        y = softmax(z)
        
        np.random.seed(counter+seed) 
        
        # sampling the index values out of the probabilities in y
        idx = np.random.choice(range(len(y)), p=y[:, 0])

        # Append the index to "indices[]"
        indices.append(idx)
        
        # Step-04
        x = np.zeros((vocab_size, 1))
        x[idx] = 1
        
        # Update "a_prev" to be "a"
        a_prev = a
        
        seed += 1
        counter +=1

    if (counter == 50):
        indices.append(char_to_ix['\n'])
    
    return indices

## 2.3 - Optimizing the Model


In this step we perform F.W.D propogation, B.W.D propogation, gradient Clipping and update the parameters

a) Perform F.W.D prop -> rnn_fwd_prop() method defined in utils

b) Perform B.W.D prop -> rnn_bck_prop() method defined in utils

c) Perform gradient clipping defined in step 2.1

d) Update the parameters -> update_parameters() method defined in utils


In [16]:
def optimize(X, Y, a_prev, parameters, learning_rate = 0.01):
        
    # a) Forward propagate 
    loss, cache = rnn_forward(X, Y, a_prev, parameters)
    
    # b) Backpropagate
    gradients, a = rnn_backward(X, Y, parameters, cache)
    
    # Gradient clipping with range [-5, 5]
    gradients = clip(gradients, 5)
    
    # Update parameters
    parameters = update_parameters(parameters, gradients, learning_rate)
    
    return loss, gradients, a[len(X)-1]

## 2.4 - Training the Model


a) In this we step we shall initialize the parameters -> Wax (n_x, n_a), Waa (n_a, n_a), Wya (n_y, n_a), b (n_a, 1), by(n_y, 1).

b) Calculate the initial loss. This is to ensure that we give a smooth calculation at every step where loss is calculated.

c) Prepare a list of dinosaurs name by stripping the '\n'. And shuffle this list.

d) Initialize a_prev (n_a, 1) with zeros.

e) Run the optimization loop over given number of iterations.

    e.1) Retrieve each example & create a list of the char's in each example & map these char's to index using char_to_index.
   
    e.2) Initalize X to [None] + above create list of indices of the example.
    
    e.3) Initalize Y to indices list of example + char_to_index('\n').
    
    e.4) Optimize the values (F.W.D Prop -> B.W.D Prop -> Gradient_Clipping -> Update_Parameters)
    
    e.5) Perform sampling for every 2000 iterations to check if the model is training properly.


In [19]:
def model(data, ix_to_char, char_to_ix, num_iterations = 35000, n_a = 50, dino_names = 7, vocab_size = 27, verbose = False):

    
    # x.shape(n_x, m, t_x), n_x = n_y = vocab_size
    n_x, n_y = vocab_size, vocab_size
    
    # Initialize parameters
    parameters = initialize_parameters(n_a, n_x, n_y)
    
    # Calculate initial loss, which helps in smoothing the loss in further time steps
    loss = get_initial_loss(vocab_size, dino_names)
    
    # Examples[] is the list of dino_names
    with open("IndianNames.txt") as f:
        examples = f.readlines()
    examples = [x.lower().strip() for x in examples]
    
    # Random shuffle of examples (.seed() initialize the random number generator)
    np.random.seed(0)
    np.random.shuffle(examples)
    
    # Initalize the previous activation to zeros.
    a_prev = np.zeros((n_a, 1))
    
    # Optimization loop
    for j in range(num_iterations):
        
        
        #idx will be the index of character in example.
        idx = j % len(examples)
        
        # Set the input X to [None] + list of indices of characters 
        X = [None] + [char_to_ix[c] for c in examples[idx]]
        
        # Set the labels Y 
        ix_newline = char_to_ix['\n']
        Y = X[1:] + [ix_newline]

        # Perform one optimization step: Forward-prop -> Backward-prop -> Clip -> Update parameters
        # Choose a learning rate of 0.01
        curr_loss, gradients, a_prev = optimize(X, Y, a_prev, parameters)
        
        # debug statements to aid in correctly forming X, Y
        if verbose and j in [0, len(examples) -1, len(examples)]:
            print("j = " , j, "idx = ", idx,) 
            
        if verbose and j in [0]:
            print("single_example =", examples[idx])
            print("single_example_ix", X[1:])
            print(" X = ", X, "\n", "Y =       ", Y, "\n")
        
        # Use a latency trick to keep the loss smooth. It happens here to accelerate the training.
        loss = smooth(loss, curr_loss)

        # Every 2000 Iteration, generate "n" characters thanks to sample() to check if the model is learning properly
        if j % 2000 == 0:
            
            print('Iteration: %d, Loss: %f' % (j, loss) + '\n')
            
            # The number of dinosaur names to print
            seed = 0
            for name in range(dino_names):
                
                # Sample indices and print them
                sampled_indices = sample(parameters, char_to_ix, seed)
                print_sample(sampled_indices, ix_to_char)
                
                seed += 1  # To get the same result (for grading purposes), increment the seed by one. 
      
            print('\n')
        
    return parameters

In [20]:
parameters = model(input_data, index_to_char, char_to_index, verbose = True)

j =  0 idx =  0
single_example = rachana
single_example_ix [18, 1, 3, 8, 1, 14, 1]
 X =  [None, 18, 1, 3, 8, 1, 14, 1] 
 Y =        [18, 1, 3, 8, 1, 14, 1, 0] 

Iteration: 0, Loss: 23.074155

Nkzxwtdmeqoeyhsqwasjjjvu
Kneb
Kzxwtdmeqoeyhsqwasjjjvu
Neb
Zxwtdmeqoeyhsqwasjjjvu
Eb
Xwtdmeqoeyhsqwasjjjvu


Iteration: 2000, Loss: 20.366060

Ilvspoan
Gha
Hutsibhariavaris
Ic
Wsrkan
A
Tsibhariavaris


Iteration: 4000, Loss: 18.508251

Laxtrkakamhavanisanadesm
Hej
Iytso
La
Xutn
Ca
Utn


Iteration: 6000, Loss: 17.703618

Nevtrdan
Jika
Kutrm
Nad
Wutma
Gaahnra
Tushh


j =  6469 idx =  6469
j =  6470 idx =  0
Iteration: 8000, Loss: 16.949048

Mayrla
Jeed
Kutrea
Mad
Wrppag
Da
Surak


Iteration: 10000, Loss: 16.758561

Nhutsl
Khad
Lutti
Nad
Vuto
Faajlo
Suram


Iteration: 12000, Loss: 16.558236

Onvisi
Mika
Mussi
Oid
Votpal
Gaakji
Sushi


Iteration: 14000, Loss: 16.369947

Mevoti
Kika
Kussi
Maeabisa
Voti
Eea
Suram


Iteration: 16000, Loss: 16.336368

Nevtun
Khad
Kusnmagama
Nabafira
Vrusghana
Daajmpa
Suram