# Character level language model - Dinosaurus land

Welcome to Dinosaurus Island! 65 million years ago, dinosaurs existed, and in this assignment they are back. You are in charge of a special task. Leading biology researchers are creating new breeds of dinosaurs and bringing them to life on earth, and your job is to give names to these dinosaurs. If a dinosaur does not like its name, it might go beserk, so choose wisely! 

<table>
<td>
<img src="images/dino.jpg" style="width:250;height:300px;">

</td>

</table>

Luckily you have learned some deep learning and you will use it to save the day. Your assistant has collected a list of all the dinosaur names they could find, and compiled them into this [dataset](dinos.txt). (Feel free to take a look by clicking the previous link.) To create new dinosaur names, you will build a character level language model to generate new names. Your algorithm will learn the different name patterns, and randomly generate new names. Hopefully this algorithm will keep you and your team safe from the dinosaurs' wrath! 

In [39]:
import torch
import torch.nn as nn
import torch.optim as optim
import torch.nn.functional as F
import torch.nn.init as init
import numpy as np
from torch.autograd import Variable
from torch.nn.utils import clip_grad_value_

In [40]:
# In this cell you will write code to do the following:
# Each line in dinos.txt contains the names of dinosaurs which will be used for training
# step 1: read the entire contents of the file as a string
# step 2: convert the entire string to lowercase
# step 3: extract the characters that make up the string in a set. This will be the vocabulary.
# step 4: print the size of the string (which will be total number of characters in the training set) and
#         size of vocabulary. Note that '\n' is part of vocabulary and it will be used as EOS character 
#         in our model
data = []
with open("dinos.txt") as fileobj:
    for line in fileobj:
        line = line.lower()
        for ch in line:
            data.append(ch)
vocab = set(data)
vocab_size = len(vocab)
print(vocab_size)


27


In [41]:
# In this cell you will write code to do the following:
# step 1: create a dictionary that maps characters to indices
# step 2: create another dictionary that maps indices to characters
char_to_idx = {}
idx_to_char = {}
for i, word in enumerate(vocab):
    char_to_idx[word] = i
    idx_to_char[i] = word

In [42]:
# Let's initialize few variables
n_hidden_nodes = 50
num_iterations = 35000
torch.backends.cudnn.enabled = False
device = torch.device("cuda" if torch.cuda.is_available() else "cpu")

### 1.2 - Overview of the model

A Pictorial representation of the model you will build is given below: 
    
<img src="images/rnn1.png" style="width:450;height:300px;">
<caption><center> **Figure 1**: Recurrent Neural Network, similar to what you had built in the previous notebook "Building a RNN - Step by Step".  </center></caption>

At each time-step, the RNN tries to predict what is the next character given the previous characters. The dataset $X = (x^{\langle 1 \rangle}, x^{\langle 2 \rangle}, ..., x^{\langle T_x \rangle})$ is a list of characters in the training set, while $Y = (y^{\langle 1 \rangle}, y^{\langle 2 \rangle}, ..., y^{\langle T_x \rangle})$ is such that at every time-step $t$, we have $y^{\langle t \rangle} = x^{\langle t+1 \rangle}$.

In [52]:
# In this cell you will write code to do the following:
# You need to create a class inherited from nn.Module that uses nn.RNN and nn.Linear along with F.log_softmax
# (instead of softmax) to forward the log probabilites to the caller of the forward method of this class.
# Note that you have to accumulate log probabilities with respect to every character in the sequence 
# and then forward this.
# You also have to explicitly initialize the parameters of RNN and Linear modules in the init method of this
# class. We will use 1 layer unidirectional RNN. Weight parameters for RNN are weight_ih_l0 and weight_hh_l0. 
# These are wrongly documented as weight_ih_l[0] and weight_hh_l[0] in the documentation. Bias parameters are 
# bias_ih_l0 and bias_hh_l0.
class cllm(nn.Module):
    def __init__(self,n_hidden_nodes,inputsize,vocab_size):
        super(cllm,self).__init__()
        self.rnn = nn.RNN(inputsize,n_hidden_nodes)
        self.linear = nn.Linear(n_hidden_nodes,vocab_size)
        #self.softmax = nn.LogSoftmax(dim=-1)
        self.weight_initialize
    
    def weight_initialize():
        init.normal_(self.rnn.weight_ih_l0)
        init.normal_(self.rnn.weight_hh_l0)
        init.constant_(self.rnn.bias_ih_l0,1)
        init.constant_(self.rnn.bias_hh_l0,1)
        
    def forward(self,inputs,h0):
        out,hidden = self.rnn(inputs,h0)
        out = self.linear(out)
        out = F.log_softmax(out)
        return out,hidden
        

In [54]:
# In this cell you will write code to do the following:
# step 1: Instantiate the model and port it to GPU
# step 2: set loss criterion to NLL loss
# step 3: set optimizer to SGD with a suitable learning rate
inputsize = 27
model = cllm(n_hidden_nodes,inputsize,vocab_size).to(device)
lossCriterion = nn.NLLLoss().to(device)
optimizer = torch.optim.SGD(model.parameters(), lr=0.001)


In [55]:
# Build list of all dinosaur names (training examples).
# Note that earlier we had just built the vocabulary.
# examples variable below is a list of dinosaur names in lowercase with all leading and trainling white spaces
# removed.
# The examples are randomly shuffled.
with open("dinos.txt") as f:
    examples = f.readlines()
examples = [x.lower().strip() for x in examples]
    
# Shuffle list of all dinosaur names
np.random.seed(0)
np.random.shuffle(examples)


In [56]:
#sampling the name

class Sampling(nn.Module):
    def __init__(self,myModel,n_hidden_nodes,inputsize,vocab_size):
        super(Sampling,self).__init__()
        self.rnncell = nn.RNNCell(inputsize,n_hidden_nodes)
        self.linear = nn.Linear(n_hidden_nodes,vocab_size)
        self.softmax = nn.Softmax(dim=-1)
        self.initialize_weights()
        
    def forward(self,one_hot,h_0):
        h_n = self.rnncell(one_hot,h_0)
        output = self.linear(h_n)
        output = self.softmax(output)
        return output,h_n
    
    def initialize_weights(self):
        self.rnncell.weight_ih_l0 = myModel.state_dict()['rnn.weight_ih_l0']
        self.rnncell.weight_hh_l0 = myModel.state_dict()['rnn.weight_hh_l0']
        self.rnncell.bias_ih_l0 = myModel.state_dict()['rnn.bias_ih_l0']
        self.rnncell.bias_hh_l0 = myModel.state_dict()['rnn.weight_hh_l0']

In [57]:
def get_dino_name(model,char_to_idx,idx_to_char):
    
    h_0 = torch.zeros(1,n_hidden_nodes)
    input_0 = torch.zeros(1,vocab_size)

    sampleModel = Sampling(model,n_hidden_nodes,27,vocab_size)
    dino_name = []
    h_n = h_0
    vocab_array = np.arange(0,vocab_size)
    one_hot = torch.zeros(1,vocab_size)
    for i in range(10):
        output,h_n = sampleModel.forward(input_0,h_n)
        log_prob = output.detach().cpu().numpy().ravel()
        index = np.random.choice(vocab_array,p = log_prob)
        one_hot[0,index] = 1
        if(idx_to_char[index]!='\n'):
            dino_name.append(index)
        input_0 = one_hot
    
    for i in dino_name:
        print("".join(idx_to_char[i]),end="")
    print("\n")
    
    

In [58]:
# Fill or complete the code where required.
# We will do training in this cell.
# Batch size will be 1 (one).
batch_size = 1
h_0 = torch.zeros(1,1,n_hidden_nodes)# fill h_0 with tensor of zeros. h_0 is the a_0 u saw in your lectures
      # see documentation to determine dimension of h_0 and accordingly create the tensor

for j in range(num_iterations):      # number of training iterations
    index = j % len(examples) # choose the index of a dinosaur name. Index is guarenteed to be in 
                              # range  of 0 to # examples - 1
    data = [char_to_idx[ch] for ch in examples[index]] # create the list of indices of charecters that
                                                      # make up the chosen dinosaur name
    label = data[1:] + [char_to_idx["\n"]] # the label list. Here label at each time instant is the character 
                                          # next to the input character at that time. So y(t) = x(t+1). When
                                          # t is final time instant, label will be EOS character which is 
                                          # '\n' for us
    
    # You are required to do the following below:
    # Convert data to a tensor of one hot representations
    # Convert label to a LongTensor of indices required for NLL loss. See documentation for clarity.
    # Do the forward propagation, receive log probabilities, compute loss at every time instant and aggregate 
    # the loss.
    # Print the loss in every iteration.
   
    #tensor into one hot
    seq_len = len(data)
    one_hot = torch.zeros(seq_len,1,inputsize)
    for i in range(seq_len):        
        one_hot[i,0,data[i]] = 1
    
    one_hot = one_hot.to(device)
    
    #convert label to a LongTensor of indices required for NLL loss. See documentation for clarity.
    label = torch.tensor(label,dtype=torch.long).reshape(seq_len,-1)
    label = label.to(device)
    
    model.zero_grad()
    total_loss = 0
    hidden = h_0.to(device)
    
    #back-propagation
    logprob,hidden = model.forward(one_hot,hidden)
    for i in range(seq_len):
        total_loss += lossCriterion(logprob[i],label[i])
    
    
    
    
    total_loss.backward()
    clip_grad_value_(model.parameters(),5)  #gradient clipping
    optimizer.step()
    loss = total_loss/seq_len
    #print("total Loss",(loss.item()))
    
    if(j%1000 == 0):
        print("Iteration %d loss %d"%(j,loss))
        
        for index in range(5):
            get_dino_name(model,char_to_idx,idx_to_char)
        
          
    
    
        



Iteration 0 loss 2
bpxqejhfei

jwmvqsmvfk

lixryfwejd

mrnbocmxpo

wgfvapsvy

Iteration 1000 loss 2
ebigroqvsr

ecdgjvvpsj

xvjegcslpd

mwtddtrkzz

wpbeyaw

Iteration 2000 loss 1
qpqmblxoso

mtfzbdhpk

lwkvepbow

sgfnapecv

croxnuwnwf

Iteration 3000 loss 1
wpusxbfzj

tljecvxdsz

mcfawserm

cnstvzdsc

whtxldct

Iteration 4000 loss 1
tgfkvhvnu

yffksesnzv

osffldkttl

nsinfkknef

wspwetznmk

Iteration 5000 loss 1
qfsxcaukrl

mjzkjeehh

gnjvqugbx

gkqrrbftqb

qstmbjjfn

Iteration 6000 loss 1
kevbvjurae

csabvsfmgi

hwdaceyhkk

ejcadnrjq

sdxroglauz

Iteration 7000 loss 1
idtfteruw

llvrlcxqb

tyadquernk

ptqvpnosn

jrcpnwgzh

Iteration 8000 loss 1
ogmctuiig

wjusbqujch

lcwdvbdtl

gerstuplty

lohillxtfn

Iteration 9000 loss 1
hnndigppx

bndvqmnai

gbgvjeqrm

rdsovohcet

zbqjuhsch

Iteration 10000 loss 1
cwjgtxfarm

swzvkowewt

wtyzqusaey

ndkopokvvs

mzdnflgdpq

Iteration 11000 loss 1
tosdcxyjqw

wkzgbzibp

osgndxfuvq

jeflxczyfc

qwdebynqfz

Iteration 12000 loss 1
zamjvxvmad

dtpdyncwmx

In [49]:
#save the model
torch.save(model.state_dict(),'./rnn.pth')

In [50]:
#load the model
myModel = cllm(n_hidden_nodes,inputsize,vocab_size).to(device)
myModel.load_state_dict(torch.load('./rnn.pth'))
