<a href="https://colab.research.google.com/github/surajsrivathsa/ovgu_deeplearning/blob/master/Assignment_6_Realistic_RNN.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

Deep Learning programming task

**Assignment 6:** Realistic Language Modelling and RNN

**Team members:**
1. Sanjeeth Busnur Indushekar: 224133 : sanjeeth.busnur@st.ovgu.de
2. Aditya Dey : 230580 : aditya.dey@st.ovgu.de
3. Suraj Shashidhar: 230052 : suraj.shashidhar@st.ovgu.de

**Tasks to be done:**

1) RNN that can be trained on variable-length sequences
(You can use keras layers, but implement the masking of the loss and the loss aggregation yourself)


2) Sampling variable length sequences from this RNN
If those two parts did not work out at all, include the re-implementation of the last assignment with keras functionality


3) Bonus: Language Model experiments


**Major errors made**

1) Did not save the model after training resulting in loss of compute time and resources. Later added code to save it and reload if necessary.

2) Tried to run RNN with stateless(previous hidden activation is notcarried forward for the sequence) --> set stateful = True

3) While language generation tried to generate entire sequence at once. Changed the code to generate character by character.

4) Had added the maxchar length break condition and not break on stop_sequence tag character, changed the code. However we see that with additional max char break condition, generated output looks much better.


5) Tried multiplying log probabilities instead of adding them, changed it to addition later


**Summary**


1) Language models related to both losses generate almost similar output. They hold structure in some places, but are filled with unnecessary spaces as probability of space is nearly 0.97, hence this keeps on repetaing.

2) Without stopping for stop character and if we keep on sampling, the network doesn't stop providing outputs. This may result in too much text overloading compute resource. Hence here for demo purpose we added max len of 5000 as break after generating long sequence.

3) Even though bible doesn't have word "abrams" but has word "Abrams" we see that probability of "abrams" > "Abrams" due to character level generation and probability and not word level



**Sample King James Bible**

1:1 In the beginning God created the heaven and the earth.

1:2 And the earth was without form, and void; and darkness was upon
the face of the deep. And the Spirit of God moved upon the face of the
waters.

1:3 And God said, Let there be light: and there was light.

1:4 And God saw the light, that it was good: and God divided the light
from the darkness.


In [0]:
import os
%tensorflow_version 2.x
import tensorflow as tf
from sklearn import preprocessing
from google.colab import files
from google.colab import drive
import matplotlib.pyplot as plt
import pandas as pd
import copy
import numpy as np

In [2]:
from tensorflow import keras
from tensorflow.keras.preprocessing.image import ImageDataGenerator
from keras.preprocessing import image
from tensorflow.keras import datasets, layers, models
from tensorflow.keras import initializers
import tensorboard
import time
from datetime import datetime
from keras import backend as K
from prepare_data2 import parse_seq
import pickle

Using TensorFlow backend.


In [0]:
# 448 is the longest length sequence there are 31k sequences

In [3]:
  print(os.getcwd())
  print(tf.__version__)

/content
2.2.0


In [0]:
drive.mount('/content/drive')

Go to this URL in a browser: https://accounts.google.com/o/oauth2/auth?client_id=947318989803-6bn6qk8qdgf4n4g3pfee6491hc0brc4i.apps.googleusercontent.com&redirect_uri=urn%3aietf%3awg%3aoauth%3a2.0%3aoob&response_type=code&scope=email%20https%3a%2f%2fwww.googleapis.com%2fauth%2fdocs.test%20https%3a%2f%2fwww.googleapis.com%2fauth%2fdrive%20https%3a%2f%2fwww.googleapis.com%2fauth%2fdrive.photos.readonly%20https%3a%2f%2fwww.googleapis.com%2fauth%2fpeopleapi.readonly

Enter your authorization code:
··········
Mounted at /content/drive


In [4]:

path = '.'
 
files = os.listdir(path)
for name in files:
    print(name)

.config
deeplearning.tfrecords
prepare_data2.py
the-king-james-bible.txt
deeplearning_vocab
__pycache__
sample_data


# **Data Preprocessing - King James Bible**

**Summary:** Data was already converted to sequences according to regex and serialized in local machine

1) Read the serialized data and vocab.

2) Parse the serialized data and convert it to category according to vocab dict

3) Use batch padding and max sequence size of 448 which was displayed to be the sequence of max length during serialization.

4) Convert the categorical data to onehot encoded data, so the dimensions must be **[batch size=128, max time steps = 448, vocab size = 78]**

5) Print all three data one with no padding, one with padding and one hot encoded so that we can see the difference for first few tensors

In [6]:


# this is just a datasets of "bytes" (not understandable)
data = tf.data.TFRecordDataset("deeplearning.tfrecords")


#data.padded_batch
#batched_data = data.padded_batch(batch_size = 128, drop_remainder=True)
# this maps a parser function that properly interprets the bytes over the dataset
# (with fixed sequence length 200)
# if you change the sequence length in preprocessing you also need to change it here
data = data.map(lambda x: parse_seq(x))


batched_categorical_data = data.padded_batch(batch_size=128, padded_shapes=448,padding_values=0, drop_remainder=True)

# a map from characters to indices
vocab = pickle.load(open("deeplearning_vocab", mode="rb"))
vocab_size = len(vocab)
# inverse mapping: indices to characters
ind_to_ch = {ind: ch for (ch, ind) in vocab.items()}
ch_to_ind = {v: k for k, v in ind_to_ch.items()}
print(vocab)
print(vocab_size)

{'a': 3, 'B': 4, 'r': 5, 'b': 6, 'p': 7, 'N': 8, '?': 9, 'C': 10, 'U': 11, '\ufeff': 12, 'l': 13, 'J': 14, 'd': 15, 'j': 16, ',': 17, 'I': 18, 'v': 19, 'M': 20, 'n': 21, 'g': 22, '4': 23, 'y': 24, '6': 25, '0': 26, 'T': 27, 'O': 28, '*': 29, 'x': 30, ')': 31, 'i': 32, 's': 33, 'L': 34, 'f': 35, 'z': 36, 'e': 37, '3': 38, 't': 39, 'V': 40, 'w': 41, '.': 42, 'G': 43, '!': 44, 'R': 45, ' ': 46, 'Q': 47, 'H': 48, '\n': 49, 'u': 50, '8': 51, 'E': 52, 'P': 53, '7': 54, 'Z': 55, 'o': 56, 'm': 57, ';': 58, '(': 59, 'h': 60, 'K': 61, 'D': 62, ':': 63, '1': 64, 'W': 65, 'q': 66, 'c': 67, '9': 68, "'": 69, 'S': 70, '-': 71, 'Y': 72, 'F': 73, '2': 74, 'A': 75, '5': 76, 'k': 77, '<PAD>': 0, '<S>': 1, '</S>': 2}
78


In [7]:
for num, elem in enumerate(data):
  if(num > 1):
    break;
  print(elem)
  print(" ====== Creating Labels by taking slices ==========")
  print(elem[1:])


tf.Tensor(
[ 1 12 27 60 37 46 73 32  5 33 39 46  4 56 56 77 46 56 35 46 20 56 33 37
 33 63 46 46 10  3 13 13 37 15 46 43 37 21 37 33 32 33 49 49 49  2], shape=(46,), dtype=int32)
tf.Tensor(
[12 27 60 37 46 73 32  5 33 39 46  4 56 56 77 46 56 35 46 20 56 33 37 33
 63 46 46 10  3 13 13 37 15 46 43 37 21 37 33 32 33 49 49 49  2], shape=(45,), dtype=int32)
tf.Tensor(
[ 1 46 18 21 46 39 60 37 46  6 37 22 32 21 21 32 21 22 46 43 56 15 46 67
  5 37  3 39 37 15 46 39 60 37 46 60 37  3 19 37 21 46  3 21 15 46 39 60
 37 46 37  3  5 39 60 42 49 49  2], shape=(59,), dtype=int32)
tf.Tensor(
[46 18 21 46 39 60 37 46  6 37 22 32 21 21 32 21 22 46 43 56 15 46 67  5
 37  3 39 37 15 46 39 60 37 46 60 37  3 19 37 21 46  3 21 15 46 39 60 37
 46 37  3  5 39 60 42 49 49  2], shape=(58,), dtype=int32)


In [0]:
def create_label(ds):
  return ds[1:];

all_label_data = data.map(create_label)
batched_label_data = all_label_data.padded_batch(batch_size=128, padded_shapes=448,padding_values=0, drop_remainder=True)

In [0]:
def onehotencode(ds):
  
  new_data = tf.one_hot(indices = ds, depth = vocab_size)
  return new_data;

onehot_encoded_batch_data = batched_categorical_data.map(onehotencode)
onehot_encoded_label_data = batched_label_data.map(onehotencode)
#new_data = data.map(onehotencode)
#list(new_data.as_numpy_iterator())[1:5]
#list(data.as_numpy_iterator())[1:2]
#tf.one_hot(data, depth=vocab_size)

In [10]:
#Display varying sizes of each sequence in data vs padded sequence in batched categorical data vs onehot encoded added elements

for batch_num, (original_element, padded_element, padded_label, onehotencoded_padded_element, onehotencoded_label) in enumerate(zip(data, batched_categorical_data, batched_label_data, onehot_encoded_batch_data, onehot_encoded_label_data)):
  if(batch_num > 3):
    break;
  print("Batch number is : {}".format(batch_num))
  print(" ===== ======= ======== ")
  print("original unpadded sequence")
  print(original_element)
  print(" ===== ======= ======== ")
  print("padded input sequence")
  print(padded_element)
  print(" ===== ======= ======== ")
  print("padded label")
  print(padded_label)
  print(" ======= ======= ======")
  print("onehot padded input")
  print(onehotencoded_padded_element)
  print(" ===== ======= ======== ")
  print("onehot padded label")
  print(onehotencoded_label)
  print("====== ======= =====")
  print()



Batch number is : 0
original unpadded sequence
tf.Tensor(
[ 1 12 27 60 37 46 73 32  5 33 39 46  4 56 56 77 46 56 35 46 20 56 33 37
 33 63 46 46 10  3 13 13 37 15 46 43 37 21 37 33 32 33 49 49 49  2], shape=(46,), dtype=int32)
padded input sequence
tf.Tensor(
[[ 1 12 27 ...  0  0  0]
 [ 1 46 18 ...  0  0  0]
 [ 1 46 75 ...  0  0  0]
 ...
 [ 1 46 75 ...  0  0  0]
 [ 1 46 75 ...  0  0  0]
 [ 1 46 75 ...  0  0  0]], shape=(128, 448), dtype=int32)
padded label
tf.Tensor(
[[12 27 60 ...  0  0  0]
 [46 18 21 ...  0  0  0]
 [46 75 21 ...  0  0  0]
 ...
 [46 75 21 ...  0  0  0]
 [46 75 21 ...  0  0  0]
 [46 75 21 ...  0  0  0]], shape=(128, 448), dtype=int32)
onehot padded input
tf.Tensor(
[[[0. 1. 0. ... 0. 0. 0.]
  [0. 0. 0. ... 0. 0. 0.]
  [0. 0. 0. ... 0. 0. 0.]
  ...
  [1. 0. 0. ... 0. 0. 0.]
  [1. 0. 0. ... 0. 0. 0.]
  [1. 0. 0. ... 0. 0. 0.]]

 [[0. 1. 0. ... 0. 0. 0.]
  [0. 0. 0. ... 0. 0. 0.]
  [0. 0. 0. ... 0. 0. 0.]
  ...
  [1. 0. 0. ... 0. 0. 0.]
  [1. 0. 0. ... 0. 0. 0.]
  [1. 0. 0

# **Testing out Keras RNN based full layers (Fully Prebuilt RNNS) before applying to full king james data**

In [0]:
#Making RNNs stateful, that is it remembers previous activations or state and can use it for next  character
def build_model(vocab_size, rnn_units, batch_size):
  model = tf.keras.Sequential([
    tf.keras.layers.GRU(rnn_units,
                        return_sequences=True,
                        stateful=True,
                        recurrent_initializer='glorot_uniform'),
    tf.keras.layers.Dense(vocab_size)
  ])
  return model

In [0]:
model = build_model(
  vocab_size = len(vocab),
  rnn_units=448,
  batch_size=128)

In [0]:
# Creating a sample tensor of the same shape as each batch and fitting the model and printing the summary
# This also hekps print model summary as model is not initialized just by defining it, 
# it has to be first instantiated really by feeding the data

example_input_tensor = tf.Variable(tf.initializers.GlorotUniform(seed = 0)(shape=[128, 448, vocab_size]))
example_prediction = model(example_input_tensor)
#print(example_prediction)
example_prediction.shape

TensorShape([128, 448, 78])

In [0]:
model.summary()

Model: "sequential"
_________________________________________________________________
Layer (type)                 Output Shape              Param #   
gru (GRU)                    multiple                  709632    
_________________________________________________________________
dense (Dense)                multiple                  35022     
Total params: 744,654
Trainable params: 744,654
Non-trainable params: 0
_________________________________________________________________


In [0]:
# Directory where the checkpoints will be saved
checkpoint_dir = './training_checkpoints'
# Name of the checkpoint files
checkpoint_prefix = os.path.join(checkpoint_dir, "ckpt_{epoch}")

checkpoint_callback=tf.keras.callbacks.ModelCheckpoint(
    filepath=checkpoint_prefix,
    save_weights_only=True)

In [0]:
# Define custom loss
"""
def custom_loss(one_batch_data, max_len):

    # Create a loss function that adds the MSE loss to the mean of all squared activations of a specific layer
    def loss(labels,logits):
      non_zero_counts = tf.math.count_nonzero(input=one_batch_data, axis = 1, dtype=tf.dtypes.float32)
      non_zero_counts = non_zero_counts - 1
      mask = tf.sequence_mask(non_zero_counts, max_len, dtype=tf.dtypes.float32 )

      loss_tnsr = tf.nn.sparse_softmax_cross_entropy_with_logits(labels=labels, logits=logits)

      masked_loss = tf.multiply(loss_tnsr, mask)

      summed_loss_per_sequence_of_batch = tf.reduce_sum(masked_loss, axis = 1)

      average_loss_per_sequence_of_batch = tf.divide(summed_loss_per_sequence_of_batch, non_zero_counts)

      return average_loss_per_sequence_of_batch

    return loss
"""  
    
    

def loss_fnc(labels, logits):
  return tf.keras.losses.sparse_categorical_crossentropy(labels, logits, from_logits=True)

"""
example_batch_loss  = loss(target_example_batch, example_prediction)
print("Prediction shape: ", example_batch_predictions.shape, " # (batch_size, sequence_length, vocab_size)")
print("scalar_loss:      ", example_batch_loss.numpy().mean())
"""

'\nexample_batch_loss  = loss(target_example_batch, example_prediction)\nprint("Prediction shape: ", example_batch_predictions.shape, " # (batch_size, sequence_length, vocab_size)")\nprint("scalar_loss:      ", example_batch_loss.numpy().mean())\n'

In [0]:
# model = tf.keras.models.load_model('model.h5', custom_objects={'loss': custom_loss(one_batch_data, max_len=448)})

model.compile(optimizer='adam', loss= loss_fnc, metrics=['accuracy'])
epochs = 2

In [0]:
history = model.fit(zip(onehot_encoded_batch_data, batched_label_data),epochs=epochs)


Epoch 1/2
Epoch 2/2


AttributeError: ignored

# **Testing out loss masking on toy dataset before applying to Final RNN model**

**Summary:** We are testing two custom loss functions both are masking based.

**custom_loss_fnc_1**: After masking don't do reduce sum and average it by non-zero counts. Instead calculate sum of loss for every sequence, if we have 128 as batch size then we will have 128 losses.
Divide each loss element by its corresponding non zero count per sequence. Finally average this 128 dim tensor.

**custom_loss_fnc_2**: Same as mentioned in assignment, once masked directly take reduce sum and average by non-zero count sum


As expected loss_fnc 1 gives a bit higher loss than loss fnc 2 as first we take sequence wiae operation and then average out for entire batch

![alt text](https://drive.google.com/uc?id=1DK7odnZeI6Hq8nbxU_Ce4uKGbm45O8oV)

In [0]:
def loss_function(labels, logits):
  return tf.nn.sparse_softmax_cross_entropy_with_logits(labels=labels, logits=logits)

In [0]:
non_zero_counts = tf.Variable(tf.initializers.GlorotUniform(seed = 0)(shape=[128, ]))
print(non_zero_counts)
non_zero_counts = non_zero_counts - 1
print(non_zero_counts)

<tf.Variable 'Variable:0' shape=(128,) dtype=float32, numpy=
array([-0.08848309,  0.06307742,  0.09187527,  0.08291286, -0.03411262,
        0.11339895,  0.09436277, -0.09279253,  0.10694189,  0.01230136,
       -0.11142575, -0.11904617,  0.07565774, -0.09266244, -0.04355676,
        0.00621504, -0.0226154 , -0.08840972, -0.08646747, -0.10446479,
       -0.10289109,  0.1522908 , -0.02661152, -0.09176905,  0.06746647,
       -0.13089491, -0.06654619, -0.0944139 , -0.13866946,  0.05551672,
       -0.10106552, -0.02755217, -0.14622976,  0.06731291, -0.0801073 ,
        0.02602693,  0.01420946, -0.08935799, -0.1409924 ,  0.03435118,
        0.12686832,  0.12743042, -0.06754315,  0.15068687,  0.1249377 ,
        0.10851289,  0.11357336, -0.02233432, -0.03777158, -0.1305582 ,
        0.03638826, -0.01808612,  0.08201273, -0.13927397,  0.00888909,
       -0.14504269, -0.02896619, -0.02775975,  0.06411932,  0.11143847,
       -0.08448561,  0.05828932,  0.12321164, -0.14291412,  0.07584174,
   

In [0]:
# No difference between loss functions of keras and tensorflow
# tf.nn.sparse_softmax_cross_entropy_with_logits vs tf.keras.sparse_softmax_cross_entropy_with_logits

for x, y in zip(onehot_encoded_batch_data.take(2), batched_label_data.take(2)):
  non_zero_counts = tf.math.count_nonzero(input=y,axis = 1, dtype=tf.dtypes.float32)
  mask = tf.sequence_mask(non_zero_counts, 448, dtype=tf.dtypes.float32 )
  example_prediction = model(x)
  #loss_tnsr_keras = loss_fnc(y, example_prediction)
  #print(loss_tnsr_keras)
  print(non_zero_counts)
  print("====== ========")
  print()
  loss_tnsr_flow = loss_function(y, example_prediction)
  print(loss_tnsr_flow)
  print()
  print("========== summed_loss_per_sequence_of_batch without masking ==============")
  print()
  summed_loss_per_sequence_of_batch_without_masking = tf.reduce_sum(loss_tnsr_flow, axis = 1)
  print(summed_loss_per_sequence_of_batch_without_masking)
  print()
  print(" ==== ====== ==")
  print(" Masked loss")
  masked_loss = tf.multiply(loss_tnsr_flow, mask)
  print(masked_loss)
  print()
  print("========== summed_loss_per_sequence_of_batch ==============")
  print()
  summed_loss_per_sequence_of_batch = tf.reduce_sum(masked_loss, axis = 1)
  print(summed_loss_per_sequence_of_batch)
  print()
  print("========= average_loss_per_sequence_of_batch ========= ")
  average_loss_per_sequence_of_batch = tf.divide(summed_loss_per_sequence_of_batch, non_zero_counts)
  print(average_loss_per_sequence_of_batch)
  print()
  
  #print(mask)

tf.Tensor(
[ 45.  58. 146.  58.  89. 119. 116. 148.  93. 130. 125. 180. 178.  55.
 170. 106. 133.  80. 121.  56. 162. 201. 129.  55. 161. 175. 255. 109.
 246. 204. 193. 128.  75. 134. 134. 145. 207.  86. 139. 100. 206. 112.
 110.  76. 105. 134.  98.  98. 143. 109. 234. 154. 142. 102. 132. 119.
  73. 186.  91. 148.  64. 133. 241. 142. 194.  73. 106. 134. 103. 135.
 221. 146. 194. 262.  99. 167.  82.  90. 188. 111. 168. 110. 109. 115.
 139. 108.  87. 167. 150. 110. 102. 116. 141.  74. 234. 181. 103. 152.
 116. 103.  97.  97. 132. 178.  75. 162. 129. 116. 117. 130. 109.  85.
  60.  98.  77.  49. 102.  75.  55. 106.  76.  62. 106.  89.  70.  90.
  81.  62.], shape=(128,), dtype=float32)

tf.Tensor(
[[1.6106581e+01 3.3956265e+00 3.7805769e-01 ... 1.7343017e-03
  1.7344207e-03 1.7345398e-03]
 [5.0245047e-01 2.6580746e+00 2.8654106e+00 ... 1.7312076e-03
  1.7312076e-03 1.7312076e-03]
 [5.0337994e-01 2.0717266e+00 4.8572069e-01 ... 1.7175222e-03
  1.7176411e-03 1.7177602e-03]
 ...
 [5.0717580e

**Loss function - 2**

![alt text](https://drive.google.com/uc?id=1DMUmJpvKnAqho3uDWSgUDeAMH7x08lbo)

In [0]:
for x, y in zip(onehot_encoded_batch_data.take(2), batched_label_data.take(2)):
  non_zero_counts = tf.math.count_nonzero(input=y,axis = 1, dtype=tf.dtypes.float32)
  mask = tf.sequence_mask(non_zero_counts, 448, dtype=tf.dtypes.float32 )
  example_prediction = model(x)
  #loss_tnsr_keras = loss_fnc(y, example_prediction)
  #print(loss_tnsr_keras)
  print(non_zero_counts)
  sum_non_zero_counts_per_batch = tf.reduce_sum(non_zero_counts, axis = None)
  print("====== ========")
  print()
  loss_tnsr_flow = loss_function(y, example_prediction)
  print(loss_tnsr_flow)
  print()
  print("========== summed_loss_per_sequence_of_batch without masking ==============")
  print()
  summed_loss_per_sequence_of_batch_without_masking = tf.reduce_sum(loss_tnsr_flow, axis = 1)
  print(summed_loss_per_sequence_of_batch_without_masking)
  print()
  print(" ==== ====== ==")
  print(" Masked loss")
  masked_loss = tf.multiply(loss_tnsr_flow, mask)
  print(masked_loss)
  print()
  print("========== summed_loss_per_sequence_of_batch ==============")
  print()
  summed_loss_per_sequence_of_batch = tf.reduce_sum(masked_loss, axis = None)
  print(summed_loss_per_sequence_of_batch)
  print()
  print("========= average_loss_per_sequence_of_batch ========= ")
  average_loss_per_sequence_of_batch = tf.divide(summed_loss_per_sequence_of_batch, sum_non_zero_counts_per_batch)
  print(average_loss_per_sequence_of_batch)
  print()
  

tf.Tensor(
[ 45.  58. 146.  58.  89. 119. 116. 148.  93. 130. 125. 180. 178.  55.
 170. 106. 133.  80. 121.  56. 162. 201. 129.  55. 161. 175. 255. 109.
 246. 204. 193. 128.  75. 134. 134. 145. 207.  86. 139. 100. 206. 112.
 110.  76. 105. 134.  98.  98. 143. 109. 234. 154. 142. 102. 132. 119.
  73. 186.  91. 148.  64. 133. 241. 142. 194.  73. 106. 134. 103. 135.
 221. 146. 194. 262.  99. 167.  82.  90. 188. 111. 168. 110. 109. 115.
 139. 108.  87. 167. 150. 110. 102. 116. 141.  74. 234. 181. 103. 152.
 116. 103.  97.  97. 132. 178.  75. 162. 129. 116. 117. 130. 109.  85.
  60.  98.  77.  49. 102.  75.  55. 106.  76.  62. 106.  89.  70.  90.
  81.  62.], shape=(128,), dtype=float32)

tf.Tensor(
[[1.6111551e+01 3.3855002e+00 3.6975500e-01 ... 1.7357297e-03
  1.7358487e-03 1.7358487e-03]
 [5.0986779e-01 2.6823816e+00 2.8770697e+00 ... 1.7297795e-03
  1.7300176e-03 1.7300176e-03]
 [5.1856112e-01 2.0734143e+00 4.8905951e-01 ... 1.7176411e-03
  1.7176411e-03 1.7177602e-03]
 ...
 [5.2241099e

# **Complete RNN-GRU based model for the assignment**

**summary:** Both models were run for 20 epochs one after the other. Model took batch size of 128, max time step size of 448, onehot size of 78

**Model with custom_loss_fnc_1** : Loss = 1.05, time per epoch = 28 sec, hidden units = 450

**Model with custom_loss_fnc_2** : Loss = 1.43, time per epoch = 28 sec, hidden units = 450

**Model with custom_loss_fnc_2** : epochs = 33, Loss = 1.43, time per epoch = 25 sec, hidden units = 512, learning rate = 0.0012

In [0]:
def build_model(vocab_size, rnn_units, batch_size):
  model = tf.keras.Sequential([
    tf.keras.layers.GRU(rnn_units,
                        return_sequences=True,
                        stateful=True,
                        recurrent_initializer='glorot_uniform'),
    tf.keras.layers.Dense(vocab_size)
  ])
  return model

In [0]:
gru_training_model_512 = build_model(
  vocab_size = len(vocab),
  rnn_units=512,
  batch_size=128)

In [13]:
example_input_tensor = tf.Variable(tf.initializers.GlorotUniform(seed = 0)(shape=[128, 448, vocab_size]))
example_prediction = gru_training_model_512(example_input_tensor)
#print(example_prediction)
example_prediction.shape

TensorShape([128, 448, 78])

In [15]:
gru_training_model_512.summary()

Model: "sequential"
_________________________________________________________________
Layer (type)                 Output Shape              Param #   
gru (GRU)                    multiple                  909312    
_________________________________________________________________
dense (Dense)                multiple                  40014     
Total params: 949,326
Trainable params: 949,326
Non-trainable params: 0
_________________________________________________________________


In [0]:
# Directory where the checkpoints will be saved
checkpoint_dir3 = './training_checkpoints'
# Name of the checkpoint files
checkpoint_prefix3 = os.path.join(checkpoint_dir3, "ckpt_512act_{epoch}")

checkpoint_callback3=tf.keras.callbacks.ModelCheckpoint(
    filepath=checkpoint_prefix3,
    save_weights_only=True)

In [0]:
def custom_masked_loss2(one_batch_data, max_len, logits, labels):

    non_zero_counts = tf.math.count_nonzero(input=one_batch_data, axis = 1, dtype=tf.dtypes.float32)

    #Do -1 as the last element of each sequence isn’t used as input.
    non_zero_counts = non_zero_counts - 1

    #Creating the mask
    mask = tf.sequence_mask(non_zero_counts, max_len, dtype=tf.dtypes.float32 )

    #Calculate loss for each element of the batch and timestep
    loss_tnsr = tf.nn.sparse_softmax_cross_entropy_with_logits(labels=labels, logits=logits)

    #Create a mask based on non-zero sequence
    masked_loss = tf.multiply(loss_tnsr, mask)

    summed_loss_per_sequence_of_batch = tf.reduce_sum(masked_loss, axis = None)
    summed_non_zero_counts_of_batch = tf.reduce_sum(non_zero_counts, axis = None)
    average_masked_loss_per_batch = tf.divide(summed_loss_per_sequence_of_batch, summed_non_zero_counts_of_batch)

    return average_masked_loss_per_batch;

In [0]:
@tf.function
def train_step(inp, target, one_batch_data, max_len):
  with tf.GradientTape() as tape:
    predictions = gru_training_model_512(inp)
    loss = custom_masked_loss2(one_batch_data, max_len, logits = predictions, labels=target)
  grads = tape.gradient(loss, gru_training_model_512.trainable_variables)
  optimizer.apply_gradients(zip(grads, gru_training_model_512.trainable_variables))

  return loss;

In [0]:
optimizer = tf.keras.optimizers.Adam(learning_rate=0.0012)

In [23]:
# Training step
EPOCHS = 33
max_len = 448
for epoch in range(EPOCHS):
  start = time.time()

  # initializing the hidden state at the start of every epoch
  # initally hidden is None
  hidden = gru_training_model_512.reset_states()

  for (batch_n, (onehot_inp, categ_inp, categ_target)) in enumerate(zip(onehot_encoded_batch_data, batched_categorical_data, batched_label_data)):
    loss = train_step(inp = onehot_inp, target = categ_target, one_batch_data = categ_inp, max_len = max_len)

    if batch_n % 100 == 0:
      template = 'Epoch {} Batch {} Loss {}'
      print(template.format(epoch+1, batch_n, loss))

  # saving (checkpoint) the model every 5 epochs
  if (epoch + 1) % 5 == 0:
    gru_training_model_512.save_weights(checkpoint_prefix3.format(epoch=epoch))

  print ('Epoch {} Loss {:.4f}'.format(epoch+1, loss))
  print ('Time taken for 1 epoch {} sec\n'.format(time.time() - start))

gru_training_model_512.save_weights(checkpoint_prefix3.format(epoch=epoch))

Epoch 1 Batch 0 Loss 4.357568740844727
Epoch 1 Batch 100 Loss 2.378333330154419
Epoch 1 Batch 200 Loss 2.0631933212280273
Epoch 1 Loss 1.9581
Time taken for 1 epoch 25.494893312454224 sec

Epoch 2 Batch 0 Loss 2.03416109085083
Epoch 2 Batch 100 Loss 1.9421584606170654
Epoch 2 Batch 200 Loss 1.8093279600143433
Epoch 2 Loss 1.7557
Time taken for 1 epoch 24.220762968063354 sec

Epoch 3 Batch 0 Loss 1.7994821071624756
Epoch 3 Batch 100 Loss 1.7473204135894775
Epoch 3 Batch 200 Loss 1.5871973037719727
Epoch 3 Loss 1.5589
Time taken for 1 epoch 24.2340407371521 sec

Epoch 4 Batch 0 Loss 1.6117326021194458
Epoch 4 Batch 100 Loss 1.5695502758026123
Epoch 4 Batch 200 Loss 1.4285595417022705
Epoch 4 Loss 1.4173
Time taken for 1 epoch 24.21273708343506 sec

Epoch 5 Batch 0 Loss 1.4783350229263306
Epoch 5 Batch 100 Loss 1.445587158203125
Epoch 5 Batch 200 Loss 1.320628046989441
Epoch 5 Loss 1.3097
Time taken for 1 epoch 24.18663001060486 sec

Epoch 6 Batch 0 Loss 1.3788082599639893
Epoch 6 Batch 1

In [24]:
tf.train.latest_checkpoint(checkpoint_dir3)

'./training_checkpoints/ckpt_512act_32'

In [25]:
!mkdir -p saved_model_3

gru_training_model_512.save('saved_model_3/my_model1') 

!zip -r /content/saved_model_3.zip /content/saved_model_3


Instructions for updating:
If using Keras pass *_constraint arguments to layers.
INFO:tensorflow:Assets written to: saved_model_3/my_model1/assets
  adding: content/saved_model_3/ (stored 0%)
  adding: content/saved_model_3/my_model1/ (stored 0%)
  adding: content/saved_model_3/my_model1/variables/ (stored 0%)
  adding: content/saved_model_3/my_model1/variables/variables.index (deflated 42%)
  adding: content/saved_model_3/my_model1/variables/variables.data-00000-of-00002 (deflated 71%)
  adding: content/saved_model_3/my_model1/variables/variables.data-00001-of-00002 (deflated 7%)
  adding: content/saved_model_3/my_model1/saved_model.pb (deflated 90%)
  adding: content/saved_model_3/my_model1/assets/ (stored 0%)


In [0]:
gru_training_model = build_model(
  vocab_size = len(vocab),
  rnn_units=448,
  batch_size=128)

In [0]:
example_input_tensor = tf.Variable(tf.initializers.GlorotUniform(seed = 0)(shape=[128, 448, vocab_size]))
example_prediction = gru_training_model(example_input_tensor)
#print(example_prediction)
example_prediction.shape

TensorShape([128, 448, 78])

In [0]:
gru_training_model.summary()

Model: "sequential_1"
_________________________________________________________________
Layer (type)                 Output Shape              Param #   
gru_1 (GRU)                  multiple                  709632    
_________________________________________________________________
dense_1 (Dense)              multiple                  35022     
Total params: 744,654
Trainable params: 744,654
Non-trainable params: 0
_________________________________________________________________


In [0]:
# Directory where the checkpoints will be saved
checkpoint_dir = './training_checkpoints'
# Name of the checkpoint files
checkpoint_prefix = os.path.join(checkpoint_dir, "ckpt_{epoch}")

checkpoint_callback=tf.keras.callbacks.ModelCheckpoint(
    filepath=checkpoint_prefix,
    save_weights_only=True)

In [0]:
def custom_masked_loss1(one_batch_data, max_len, logits, labels):

    non_zero_counts = tf.math.count_nonzero(input=one_batch_data, axis = 1, dtype=tf.dtypes.float32)

    #Do -1 as the last element of each sequence isn’t used as input.
    non_zero_counts = non_zero_counts - 1

    #Creating the mask
    mask = tf.sequence_mask(non_zero_counts, max_len, dtype=tf.dtypes.float32 )

    #Calculate loss for each element of the batch and timestep
    loss_tnsr = tf.nn.sparse_softmax_cross_entropy_with_logits(labels=labels, logits=logits)

    #Create a mask based on non-zero sequence
    masked_loss = tf.multiply(loss_tnsr, mask)

    summed_loss_per_sequence_of_batch = tf.reduce_sum(masked_loss, axis = 1)

    average_loss_per_sequence_of_batch = tf.divide(summed_loss_per_sequence_of_batch, non_zero_counts)

    #Since the loss is already masked and averaged per sequence
    average_masked_loss_per_batch = tf.reduce_mean(average_loss_per_sequence_of_batch, axis = None)

    return average_masked_loss_per_batch;

In [0]:
def custom_masked_loss2(one_batch_data, max_len, logits, labels):

    non_zero_counts = tf.math.count_nonzero(input=one_batch_data, axis = 1, dtype=tf.dtypes.float32)

    #Do -1 as the last element of each sequence isn’t used as input.
    non_zero_counts = non_zero_counts - 1

    #Creating the mask
    mask = tf.sequence_mask(non_zero_counts, max_len, dtype=tf.dtypes.float32 )

    #Calculate loss for each element of the batch and timestep
    loss_tnsr = tf.nn.sparse_softmax_cross_entropy_with_logits(labels=labels, logits=logits)

    #Create a mask based on non-zero sequence
    masked_loss = tf.multiply(loss_tnsr, mask)

    summed_loss_per_sequence_of_batch = tf.reduce_sum(masked_loss, axis = None)
    summed_non_zero_counts_of_batch = tf.reduce_sum(non_zero_counts, axis = None)
    average_masked_loss_per_batch = tf.divide(summed_loss_per_sequence_of_batch, summed_non_zero_counts_of_batch)

    return average_masked_loss_per_batch;

**Training using custom loss function 1**

In [0]:
@tf.function
def train_step(inp, target, one_batch_data, max_len):
  with tf.GradientTape() as tape:
    predictions = gru_training_model(inp)
    loss = custom_masked_loss1(one_batch_data, max_len, logits = predictions, labels=target)
  grads = tape.gradient(loss, gru_training_model.trainable_variables)
  optimizer.apply_gradients(zip(grads, gru_training_model.trainable_variables))

  return loss;

In [0]:
optimizer = tf.keras.optimizers.Adam()

In [0]:
# Training step
EPOCHS = 20
max_len = 448
for epoch in range(EPOCHS):
  start = time.time()

  # initializing the hidden state at the start of every epoch
  # initally hidden is None
  hidden = gru_training_model.reset_states()

  for (batch_n, (onehot_inp, categ_inp, categ_target)) in enumerate(zip(onehot_encoded_batch_data, batched_categorical_data, batched_label_data)):
    loss = train_step(inp = onehot_inp, target = categ_target, one_batch_data = categ_inp, max_len = max_len)

    if batch_n % 100 == 0:
      template = 'Epoch {} Batch {} Loss {}'
      print(template.format(epoch+1, batch_n, loss))

  # saving (checkpoint) the model every 5 epochs
  if (epoch + 1) % 5 == 0:
    gru_training_model.save_weights(checkpoint_prefix.format(epoch=epoch))

  print ('Epoch {} Loss {:.4f}'.format(epoch+1, loss))
  print ('Time taken for 1 epoch {} sec\n'.format(time.time() - start))

gru_training_model.save_weights(checkpoint_prefix.format(epoch=epoch))

Epoch 1 Batch 0 Loss 4.357480049133301
Epoch 1 Batch 100 Loss 2.440483570098877
Epoch 1 Batch 200 Loss 2.0773138999938965
Epoch 1 Loss 1.9816
Time taken for 1 epoch 27.535837411880493 sec

Epoch 2 Batch 0 Loss 2.0602526664733887
Epoch 2 Batch 100 Loss 1.9782841205596924
Epoch 2 Batch 200 Loss 1.8430869579315186
Epoch 2 Loss 1.8067
Time taken for 1 epoch 26.861476182937622 sec

Epoch 3 Batch 0 Loss 1.8456518650054932
Epoch 3 Batch 100 Loss 1.8182569742202759
Epoch 3 Batch 200 Loss 1.6516454219818115
Epoch 3 Loss 1.6413
Time taken for 1 epoch 27.246646642684937 sec

Epoch 4 Batch 0 Loss 1.680924892425537
Epoch 4 Batch 100 Loss 1.665950059890747
Epoch 4 Batch 200 Loss 1.5056744813919067
Epoch 4 Loss 1.5073
Time taken for 1 epoch 27.61521577835083 sec

Epoch 5 Batch 0 Loss 1.5570275783538818
Epoch 5 Batch 100 Loss 1.5552847385406494
Epoch 5 Batch 200 Loss 1.4008651971817017
Epoch 5 Loss 1.4060
Time taken for 1 epoch 27.77303457260132 sec

Epoch 6 Batch 0 Loss 1.4630892276763916
Epoch 6 Bat

In [0]:
tf.train.latest_checkpoint(checkpoint_dir)

'./training_checkpoints/ckpt_19'

In [0]:
gru_training_model2 = build_model(
  vocab_size = len(vocab),
  rnn_units=448,
  batch_size=128)

example_input_tensor = tf.Variable(tf.initializers.GlorotUniform(seed = 0)(shape=[128, 448, vocab_size]))
example_prediction = gru_training_model2(example_input_tensor)
#print(example_prediction)
example_prediction.shape

gru_training_model2.summary()

Model: "sequential_2"
_________________________________________________________________
Layer (type)                 Output Shape              Param #   
gru_2 (GRU)                  multiple                  709632    
_________________________________________________________________
dense_2 (Dense)              multiple                  35022     
Total params: 744,654
Trainable params: 744,654
Non-trainable params: 0
_________________________________________________________________


In [0]:
# Directory where the checkpoints will be saved
checkpoint_dir2 = './training_checkpoints'
# Name of the checkpoint files
checkpoint_prefix2 = os.path.join(checkpoint_dir2, "ckpt_model2_{epoch}")

checkpoint_callback2=tf.keras.callbacks.ModelCheckpoint(
    filepath=checkpoint_prefix2,
    save_weights_only=True)

In [0]:
@tf.function
def train_step2(inp, target, one_batch_data, max_len):
  with tf.GradientTape() as tape:
    predictions = gru_training_model2(inp)
    loss = custom_masked_loss2(one_batch_data, max_len, logits = predictions, labels=target)
  grads = tape.gradient(loss, gru_training_model2.trainable_variables)
  optimizer.apply_gradients(zip(grads, gru_training_model2.trainable_variables))

  return loss;

In [0]:
optimizer2 = tf.keras.optimizers.Adam()

In [0]:
# Training step
EPOCHS = 20
max_len = 448
for epoch in range(EPOCHS):
  start = time.time()

  # initializing the hidden state at the start of every epoch
  # initally hidden is None
  hidden = gru_training_model2.reset_states()

  for (batch_n, (onehot_inp, categ_inp, categ_target)) in enumerate(zip(onehot_encoded_batch_data, batched_categorical_data, batched_label_data)):
    loss = train_step2(inp = onehot_inp, target = categ_target, one_batch_data = categ_inp, max_len = max_len)

    if batch_n % 100 == 0:
      template = 'Epoch {} Batch {} Loss {}'
      print(template.format(epoch+1, batch_n, loss))

  # saving (checkpoint) the model every 5 epochs
  if (epoch + 1) % 5 == 0:
    gru_training_model2.save_weights(checkpoint_prefix2.format(epoch=epoch))

  print ('Epoch {} Loss {:.4f}'.format(epoch+1, loss))
  print ('Time taken for 1 epoch {} sec\n'.format(time.time() - start))

gru_training_model2.save_weights(checkpoint_prefix2.format(epoch=epoch))

Epoch 1 Batch 0 Loss 4.358142375946045
Epoch 1 Batch 100 Loss 2.385197162628174
Epoch 1 Batch 200 Loss 2.2783901691436768
Epoch 1 Loss 2.1744
Time taken for 1 epoch 27.627805471420288 sec

Epoch 2 Batch 0 Loss 2.5182697772979736
Epoch 2 Batch 100 Loss 2.1150431632995605
Epoch 2 Batch 200 Loss 2.0190112590789795
Epoch 2 Loss 1.9390
Time taken for 1 epoch 26.934789896011353 sec

Epoch 3 Batch 0 Loss 2.00642466545105
Epoch 3 Batch 100 Loss 1.9639248847961426
Epoch 3 Batch 200 Loss 1.8999260663986206
Epoch 3 Loss 1.8389
Time taken for 1 epoch 27.448675870895386 sec

Epoch 4 Batch 0 Loss 1.8690279722213745
Epoch 4 Batch 100 Loss 1.8850500583648682
Epoch 4 Batch 200 Loss 1.820862889289856
Epoch 4 Loss 1.7749
Time taken for 1 epoch 27.508951663970947 sec

Epoch 5 Batch 0 Loss 1.7887191772460938
Epoch 5 Batch 100 Loss 1.8267598152160645
Epoch 5 Batch 200 Loss 1.7662551403045654
Epoch 5 Loss 1.7269
Time taken for 1 epoch 27.330742597579956 sec

Epoch 6 Batch 0 Loss 1.7258942127227783
Epoch 6 Ba

In [0]:
!mkdir -p saved_model_1
!mkdir -p saved_model_2

In [0]:
gru_training_model.save('saved_model_1/my_model1') 
gru_training_model2.save('saved_model_2/my_model2') 

Instructions for updating:
If using Keras pass *_constraint arguments to layers.
INFO:tensorflow:Assets written to: saved_model_1/my_model1/assets
INFO:tensorflow:Assets written to: saved_model_2/my_model2/assets


In [0]:
!zip -r /content/saved_model_1.zip /content/saved_model_1
!zip -r /content/saved_model_2.zip /content/saved_model_2

  adding: content/saved_model_1/ (stored 0%)
  adding: content/saved_model_1/my_model1/ (stored 0%)
  adding: content/saved_model_1/my_model1/saved_model.pb (deflated 90%)
  adding: content/saved_model_1/my_model1/assets/ (stored 0%)
  adding: content/saved_model_1/my_model1/variables/ (stored 0%)
  adding: content/saved_model_1/my_model1/variables/variables.index (deflated 41%)
  adding: content/saved_model_1/my_model1/variables/variables.data-00000-of-00002 (deflated 71%)
  adding: content/saved_model_1/my_model1/variables/variables.data-00001-of-00002 (deflated 8%)
  adding: content/saved_model_2/ (stored 0%)
  adding: content/saved_model_2/my_model2/ (stored 0%)
  adding: content/saved_model_2/my_model2/saved_model.pb (deflated 91%)
  adding: content/saved_model_2/my_model2/assets/ (stored 0%)
  adding: content/saved_model_2/my_model2/variables/ (stored 0%)
  adding: content/saved_model_2/my_model2/variables/variables.index (deflated 41%)
  adding: content/saved_model_2/my_model2/v

In [0]:
from google.colab import files
files.download("/content/saved_model_1.zip")
files.download("/content/saved_model_2.zip")

# **Language Model**

**Summary:** Language is generated by both models, we also tested generating language by remving stop condition on the stop_character.

Both the languages generated by models 1 and 2 were almost the same and followed some structure. Some most repeating words like Jesus, LORD and GOD appeared in the output

Without the stop consition, the language model did not stop and started to fill the text cell. Hence we added a max char length of 5000 to avoid crashing of our notebook. If left to its own then the model doesn't seem to stop and would eventually result in filling up of memory of the system, hence breaks it.

Also more activations could improve the performance of the languae model as it could remember more parts of the data for a bit more time

**Model with 512 hidden units and loss function 2:** Provided a better output with no unnecessary spaces, line breaks and conformed to the structure mentioned in the bible

**Model with 450 hidden units and loss function 1:** Provided worse results than the 512 units one but same output as that of  loss function 2 450 units. It had unecessary spaces , line breaks and some special characters appearing where it shouldn't

**Model with 450 hidden units and loss function 2:** Provided the same output as 450 units loss function 1

**Language model for model with 512 hidden activations and loss function 2**

In [26]:
language_model_512 = build_model(vocab_size=len(vocab),  rnn_units=512, batch_size=1)

language_model_512.load_weights(tf.train.latest_checkpoint(checkpoint_dir3))

language_model_512.build(tf.TensorShape([1, None, vocab_size]))

language_model_512.summary()

Model: "sequential_1"
_________________________________________________________________
Layer (type)                 Output Shape              Param #   
gru_1 (GRU)                  multiple                  909312    
_________________________________________________________________
dense_1 (Dense)              multiple                  40014     
Total params: 949,326
Trainable params: 949,326
Non-trainable params: 0
_________________________________________________________________


In [27]:
line_size = 10
max_char_per_line = 500
stop_char = "</S>"
start_char = "<S>"
start_ind = ch_to_ind["<S>"]
stop_ind = ch_to_ind["</S>"]
start_sequence_onehot_encode = tf.expand_dims(tf.expand_dims(tf.one_hot(indices = start_ind, depth = vocab_size), 0), 0)
stop_sequence_onehot_encode = tf.expand_dims(tf.expand_dims(tf.one_hot(indices = stop_ind, depth = vocab_size), 0), 0)
print(start_sequence_onehot_encode.shape)
string = ""
tmp = None
index_list = list(range(vocab_size))

(1, 1, 78)


In [28]:
print(tf.expand_dims(start_sequence_onehot_encode, 0))
out_one_time_step = tf.nn.softmax(axis=-1,logits=tf.expand_dims(start_sequence_onehot_encode, 0))
out_one_time_step

tf.Tensor(
[[[[0. 1. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0.
    0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0.
    0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0.
    0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0.]]]], shape=(1, 1, 1, 78), dtype=float32)


<tf.Tensor: shape=(1, 1, 1, 78), dtype=float32, numpy=
array([[[[0.01254417, 0.0340986 , 0.01254417, 0.01254417, 0.01254417,
          0.01254417, 0.01254417, 0.01254417, 0.01254417, 0.01254417,
          0.01254417, 0.01254417, 0.01254417, 0.01254417, 0.01254417,
          0.01254417, 0.01254417, 0.01254417, 0.01254417, 0.01254417,
          0.01254417, 0.01254417, 0.01254417, 0.01254417, 0.01254417,
          0.01254417, 0.01254417, 0.01254417, 0.01254417, 0.01254417,
          0.01254417, 0.01254417, 0.01254417, 0.01254417, 0.01254417,
          0.01254417, 0.01254417, 0.01254417, 0.01254417, 0.01254417,
          0.01254417, 0.01254417, 0.01254417, 0.01254417, 0.01254417,
          0.01254417, 0.01254417, 0.01254417, 0.01254417, 0.01254417,
          0.01254417, 0.01254417, 0.01254417, 0.01254417, 0.01254417,
          0.01254417, 0.01254417, 0.01254417, 0.01254417, 0.01254417,
          0.01254417, 0.01254417, 0.01254417, 0.01254417, 0.01254417,
          0.01254417, 0.01254417, 0

In [0]:
def softmax_output(logits):
  return tf.nn.softmax(axis = -1, logits = logits)

In [32]:
for line in range(15):
  string = ""
  ind = 0
  out_ind = None
  language_model_512.reset_states()
  while(out_ind != ch_to_ind["</S>"]):
    if(ind == 0):
      predicted_char_logits = language_model_512(start_sequence_onehot_encode)
      predicted_char_softmax = softmax_output(predicted_char_logits)
      #print(predicted_char_softmax)
    else:
      next_sequence_onehot_encode = tf.expand_dims(tf.expand_dims(tf.one_hot(indices = ind, depth = vocab_size), 0), 0)
      predicted_char_logits = language_model_512(next_sequence_onehot_encode)
      predicted_char_softmax = softmax_output(predicted_char_logits)

    np_array = predicted_char_softmax.numpy()
    
    ind = np.random.choice( index_list, p=np_array.flatten())
    out_ind = ind
    if(ind != ch_to_ind["</S>"]):
      #string = string + "\n";
      #break;    
      string = string + ind_to_ch[ind]
    
    
  
  print(string)

 Parain among you.



My herd, is from things yourselves of, saying will I say?  
 And he cast fords
to greater that they may all their own sight he will know, many that
secreth some mother, let him doest shall purso rule over me as I out
of thestung?  
 And Moses beseech you to years of gold, and dragouse not the
forms of the fruits we found shall take the maits, and how
charged in the altar that stood by saints; 
 There are partbedmoness of their own corruptions, anded the country
of nine anger, saying, What they seen to bring
noneness to pray, and often many at tearitl.


 Let her closhing for nought with the holy
aposklessed, and my God had the bond of Christ.


 Every sign chareth all things from deather.


 And he shall out of them that hath an exatines was law wait at your
old and unwisers, the world would cause them to the parpase of one
of these feightiness.


 We do well in the way, I will be desolate,
by one man's people might stand and forsaken.


 So kind send common taken

**Language model for model with 450 hidden activations and loss function 1**

In [0]:
language_model = build_model(vocab_size=len(vocab),  rnn_units=448, batch_size=1)

language_model.load_weights(tf.train.latest_checkpoint(checkpoint_dir))



<tensorflow.python.training.tracking.util.CheckpointLoadStatus at 0x7fac36b4ff60>

In [0]:
language_model.build(tf.TensorShape([1, None, vocab_size]))

In [0]:
language_model.summary()

Model: "sequential_4"
_________________________________________________________________
Layer (type)                 Output Shape              Param #   
gru_4 (GRU)                  multiple                  709632    
_________________________________________________________________
dense_4 (Dense)              multiple                  35022     
Total params: 744,654
Trainable params: 744,654
Non-trainable params: 0
_________________________________________________________________


In [0]:
line_size = 10
max_char_per_line = 500
stop_char = "</S>"
start_char = "<S>"
start_ind = ch_to_ind["<S>"]
stop_ind = ch_to_ind["</S>"]
start_sequence_onehot_encode = tf.expand_dims(tf.expand_dims(tf.one_hot(indices = start_ind, depth = vocab_size), 0), 0)
stop_sequence_onehot_encode = tf.expand_dims(tf.expand_dims(tf.one_hot(indices = stop_ind, depth = vocab_size), 0), 0)
print(start_sequence_onehot_encode.shape)
string = ""
tmp = None
index_list = list(range(vocab_size))

(1, 1, 78)


In [0]:
print(tf.expand_dims(start_sequence_onehot_encode, 0))
out_one_time_step = tf.nn.softmax(axis=-1,logits=tf.expand_dims(start_sequence_onehot_encode, 0))
out_one_time_step

tf.Tensor(
[[[[0. 1. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0.
    0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0.
    0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0.
    0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0.]]]], shape=(1, 1, 1, 78), dtype=float32)


<tf.Tensor: shape=(1, 1, 1, 78), dtype=float32, numpy=
array([[[[0.01254417, 0.0340986 , 0.01254417, 0.01254417, 0.01254417,
          0.01254417, 0.01254417, 0.01254417, 0.01254417, 0.01254417,
          0.01254417, 0.01254417, 0.01254417, 0.01254417, 0.01254417,
          0.01254417, 0.01254417, 0.01254417, 0.01254417, 0.01254417,
          0.01254417, 0.01254417, 0.01254417, 0.01254417, 0.01254417,
          0.01254417, 0.01254417, 0.01254417, 0.01254417, 0.01254417,
          0.01254417, 0.01254417, 0.01254417, 0.01254417, 0.01254417,
          0.01254417, 0.01254417, 0.01254417, 0.01254417, 0.01254417,
          0.01254417, 0.01254417, 0.01254417, 0.01254417, 0.01254417,
          0.01254417, 0.01254417, 0.01254417, 0.01254417, 0.01254417,
          0.01254417, 0.01254417, 0.01254417, 0.01254417, 0.01254417,
          0.01254417, 0.01254417, 0.01254417, 0.01254417, 0.01254417,
          0.01254417, 0.01254417, 0.01254417, 0.01254417, 0.01254417,
          0.01254417, 0.01254417, 0

In [0]:
def softmax_output(logits):
  return tf.nn.softmax(axis = -1, logits = logits)

In [0]:
for line in range(15):
  string = ""
  ind = 0
  out_ind = None
  language_model.reset_states()
  while(out_ind != ch_to_ind["</S>"]):
    if(i == 0):
      predicted_char_logits = language_model(start_sequence_onehot_encode)
      predicted_char_softmax = softmax_output(predicted_char_logits)
      #print(predicted_char_softmax)
    else:
      next_sequence_onehot_encode = tf.expand_dims(tf.expand_dims(tf.one_hot(indices = ind, depth = vocab_size), 0), 0)
      predicted_char_logits = language_model(next_sequence_onehot_encode)
      predicted_char_softmax = softmax_output(predicted_char_logits)

    np_array = predicted_char_softmax.numpy()
    
    ind = np.random.choice( index_list, p=np_array.flatten())
    out_ind = ind
    if(ind != ch_to_ind["</S>"]):
      #string = string + "\n";
      #break;    
      string = string + ind_to_ch[ind]
    
    
  
  print(string)

prot my night
borth it ye beccut, The rewe
to cumbors which his visclorright acrood; thy sight und
mers, an which whence I may devagrises, thou
mare bectud for the twere spirt.


0d.


re
ghorsing agrait faturntimer for him; for this

men.


, When
whear I shralive the chill seven the LORD, whather
surils firve the Loble gves thim divich an that one incersife: 
Uzz.


GD!

(GOD Jesus have connottles?


d:

oven:

6m
go mosces. Thus gromong mud.


res
of our hove chuit the Spiright the Gibien we prame of
herof, Go all the clirightwee, are Gals
of manus in thinco Fetwered bringthing thine an his
saccrich, all the doscerelves,
and drighth him not be by cries.


!


ak
thou had from Gisma.


y marither.


Th the
Lept.




Language generated by loss function 2

In [0]:
language_model2 = build_model(vocab_size=len(vocab),  rnn_units=448, batch_size=1)

language_model2.load_weights(tf.train.latest_checkpoint(checkpoint_dir2))

language_model2.build(tf.TensorShape([1, None, vocab_size]))

language_model2.summary()

Model: "sequential_3"
_________________________________________________________________
Layer (type)                 Output Shape              Param #   
gru_3 (GRU)                  multiple                  709632    
_________________________________________________________________
dense_3 (Dense)              multiple                  35022     
Total params: 744,654
Trainable params: 744,654
Non-trainable params: 0
_________________________________________________________________


In [0]:
line_size = 10
max_char_per_line = 500
stop_char = "</S>"
start_char = "<S>"
start_ind = ch_to_ind["<S>"]
stop_ind = ch_to_ind["</S>"]
start_sequence_onehot_encode = tf.expand_dims(tf.expand_dims(tf.one_hot(indices = start_ind, depth = vocab_size), 0), 0)
stop_sequence_onehot_encode = tf.expand_dims(tf.expand_dims(tf.one_hot(indices = stop_ind, depth = vocab_size), 0), 0)
print(start_sequence_onehot_encode.shape)
string = ""
tmp = None
index_list = list(range(vocab_size))

(1, 1, 78)


In [0]:
def softmax_output2(logits):
  return tf.nn.softmax(axis = -1, logits = logits)

In [0]:
for line in range(30):
  string = ""
  ind = 0
  out_ind = None
  language_model2.reset_states()
  while(out_ind != ch_to_ind["</S>"]):
    if(i == 0):
      predicted_char_logits = language_model2(start_sequence_onehot_encode)
      predicted_char_softmax = softmax_output2(predicted_char_logits)
      #print(predicted_char_softmax)
    else:
      next_sequence_onehot_encode = tf.expand_dims(tf.expand_dims(tf.one_hot(indices = ind, depth = vocab_size), 0), 0)
      predicted_char_logits = language_model2(next_sequence_onehot_encode)
      predicted_char_softmax = softmax_output2(predicted_char_logits)

    np_array = predicted_char_softmax.numpy()
    
    ind = np.random.choice( index_list, p=np_array.flatten())
    out_ind = ind   
    if(ind != ch_to_ind["</S>"]):
      string = string + ind_to_ch[ind]
 
  print(string)

RD of
forther, braks it, goth the Kascoows God?


w
Glore from he bowr him not.


,)

p.


, 
'
had il all.


Kwn
whose is mormer you, a and things, thou
for Ellariah, which worrifes when hopsed hound on
hight up by a prayser.


,) 
ys
ramingur lam went is oul rices.


y?


e; 
's
hach which are her saypith as 
w, I
beke whring ols ir that yiras see plest
the dethiff: 
Je

n.


:

zrow
thor hear him great ambrati neming them unto wither,
and even also his in thou ath
which which from the law: but I chimitions.




Berusthong of the Lord of for hear; 
pp:
but cupt Jush prock, thferews you la,
dessige, the scerit, but it whichithan, 
0 Gaving.


7Kortay;

RD's that is firt be
ghrand to aris even cortionsatich withat: thme ht Rair have
glorce; 
8
hand not wam him, What have he
sebr is ab cruthing throm Eshrown with it; 
LORD:
Behold the his rechipher there hath laamone; for
his
Leaziah they
had God, arcambrings of Egypt the prostsong on live me,
men theron of shrece be from Chramces of dr

Without considering **</S>** as stop and allowing network to stop itself we see network breaks and overloads colab notebook with text data. hence we have set the char limit to 5000 after running without this limit. Basically network never stops, It keeps on going till we put a break  

In [0]:
for line in range(10):
  string = ""
  ind = 0
  out_ind = None
  language_model2.reset_states()
  for i in range(5000):
    if(i == 0):
      predicted_char_logits = language_model2(start_sequence_onehot_encode)
      predicted_char_softmax = softmax_output2(predicted_char_logits)
      #print(predicted_char_softmax)
    else:
      next_sequence_onehot_encode = tf.expand_dims(tf.expand_dims(tf.one_hot(indices = ind, depth = vocab_size), 0), 0)
      predicted_char_logits = language_model2(next_sequence_onehot_encode)
      predicted_char_softmax = softmax_output2(predicted_char_logits)

    np_array = predicted_char_softmax.numpy()
    
    ind = np.random.choice( index_list, p=np_array.flatten())
    out_ind = ind
    
    if(ind == ch_to_ind["</S>"]):
      continue;
      #string = string + "\n";
      #break;
    
    
    string = string + ind_to_ch[ind]
     
  print(string)
  print(" ============ ============== ============= ")

 And Gloard the forath all tood I reignd fuen int.

e unto Jerusale, even the arebod of king.

ooved do
he the mere shall nwerthinse is stree eville mority thie said unto dost conce
fime, by the LORD thou all two his lotedideth thy brought bub.

 leven attereth that ye rones;
ous me was one down: gee pave a woll and deats and of have of ye, I whrece the head
hand not enter into the son, The father, and thisterkel lifolive my for these basked in
his joicray wncmitifey, aroumited from thereof,
lve ys them that dlest for unto the though wosk
go teelves, and, Ipren the man the are y atingh the faliet herver: eisnsely sholl not
go vomesy wither them in my Gine; ide the sinilustion
uncording, and the kingled the vissee in all; the
God be say, youse thou hast in the
Lord becaue it, and these, belony shall and at him.

hose no go they
abomen; and that every nive mes: worves saigh yorness forlaned the land
wintite: and cled it an yourness of tirned to desterd.

rea dest all head now ale him,
wa

In [0]:
for i in range(line_size):
  ch_to_ind(start_char)

# **Bonus Section - Applying Language model**

Please note that we haven't tried to directly replace LORD with LOOD in bble ---> retrain the model and compare due to compute resource crunch. Instead we would be usoing the same model to compare between different words. Most interesting results were found for Abrams vs abrams.

**There is also probability of start sequence, but since we always give the first character as start sequence, its probability is artifically 1.**

**Without stateful = True, Rnn might not remember its predecessors important information** and hence cannot use the historical info to update current information. This may generate characters with probability independent of each other. More like what happens in Naive Bayes where there is an assumption of **conditional independence**.

Probabilities for model of 450 activations and loss function 2:

**Take LORD and LOOD for example**
**Summary:** 
Feed network with start_sequence_char S_tag ---> Find probability of L. (a)


Now feed L to network --> Find probability of O (b)

Feed O to network again ---> find probability of R (c)

Feed R to network again ---> find probability of D (d)

Probability of word LORD is a*b*c*d and take log of it according to the formula.

Now to the same with word LOOD.

The log probability of LORD > log prob of LOOD.

Do this for words 

2) **GOD vs DOG vs dog**

3) **Abram vs abram vs Arbam**

**Summary:** As expected for case 1 log prob of LORD > LOOD and same holds for 2ndd words pro(dog) > prob(GOD)


**Log probability for GOD vs DOG vs dog are : -27.079181671142578 , -47.95939636230469, -34.88587951660156**

**Log probability for LORD vs LOOD are : -27.39081382751465 , -45.164154052734375** 

**But we see that prob(Abram) > prob(abram) > prob(Arbam) Log probability for Abram vs abram vs Arbam are : -33.820594787597656 , -28.015478134155273, -41.56006240844726**

**Reason:** Even though Abram is a name/Noun and starts with capital A and probability of whole word Abram > abram as per simple word search in the king james bible.

**But thing is we are generating character by character, hence probability of start --> A --> b  << start --> a --> b as seen below. This results in small "abrams" having more probability than camel case "Abrams"**

current character and proba: A, -18.6795711517334

current character and proba: b, -25.245803833007812

current character and proba: r, -25.974218368530273

current character and proba: a, -30.421810150146484

current character and proba: m, -33.820594787597656

=========== ============= ========

current character and proba: a, -11.571039199829102

current character and proba: b, -15.479758262634277

current character and proba: r, -19.45343017578125

current character and proba: a, -24.344633102416992

current character and proba: m, -28.015478134155273

=========== ============= ========

Probabilities for model of 512 activations and loss function 2:

**Take LORD and LOOD for example**
**Summary:** 
Feed network with start_sequence_char S_tag ---> Find probability of L. (a)


Now feed L to network --> Find probability of O (b)

Feed O to network again ---> find probability of R (c)

Feed R to network again ---> find probability of D (d)

Probability of word LORD is a*b*c*d and take log of it according to the formula.

Now to the same with word LOOD.

The log probability of LORD > log prob of LOOD.

current character and proba: L, -15.66189193725586

current character and proba: O, -15.690693855285645

current character and proba: R, -19.119600296020508

current character and proba: D, -19.31654167175293

=========== ============= ========

current character and proba: L, -13.107548713684082

current character and proba: O, -15.677525520324707

current character and proba: O, -21.368053436279297

current character and proba: D, -23.021589279174805

=========== ============= ========

Log probability for LORD vs LOOD are : -19.31654167175293 , -23.021589279174805




2) **GOD vs DOG vs dog vs god**

current character and proba: G, -11.321189880371094

current character and proba: O, -13.071130752563477

current character and proba: D, -14.047483444213867

=========== ============= ========

current character and proba: D, -2.5910139083862305

current character and proba: O, -8.607620239257812
current character and proba: G, -15.203033447265625
=========== ============= ========

current character and proba: d, -11.219135284423828

current character and proba: o, -20.93477439880371

current character and proba: g, -29.923145294189453

=========== ============= ========

current character and proba: g, -8.381122589111328

current character and proba: o, -16.48155975341797

current character and proba: d, -22.021503448486328

=========== ============= ========

Log probability for GOD vs DOG vs dog vs god are : -14.047483444213867 , -15.203033447265625, -29.923145294189453, -22.021503448486328

3) **Abram vs abram vs Arbam**

current character and proba: A, -5.761716365814209

current character and proba: b, -10.800869941711426

current character and proba: r, -14.162717819213867

current character and proba: a, -19.593551635742188

current character and proba: m, -22.035017013549805

=========== ============= ========

current character and proba: a, -12.653196334838867

current character and proba: b, -17.244342803955078

current character and proba: r, -25.514951705932617

current character and proba: a, -28.499645233154297

current character and proba: m, -32.45140075683594

=========== ============= ========

current character and proba: A, -19.785926818847656

current character and proba: r, -31.840320587158203

current character and proba: b, -39.851715087890625

current character and proba: a, -43.39006423950195

current character and proba: m, -50.917354583740234

=========== ============= ========

Log probability for Abram vs abram vs Arbam are : -22.035017013549805 , -32.45140075683594, -50.917354583740234






![alt text](https://drive.google.com/uc?id=1eL5U6eyZihECmWErZHoQx0nzrIe62UOJ)

In [0]:
predicted_char_logits = language_model2(start_sequence_onehot_encode)
predicted_char_softmax = softmax_output2(predicted_char_logits)
print(predicted_char_softmax)
np_array = predicted_char_softmax.numpy()
tmp_array = np_array.flatten()


tf.Tensor(
[[[2.12039536e-14 8.37471809e-15 6.85631348e-13 8.74304291e-08
   1.15932475e-09 4.41184334e-08 7.35895123e-09 2.03050476e-11
   1.02917417e-08 2.89543340e-08 8.89758336e-11 7.39147077e-10
   1.96617167e-11 3.30223280e-08 2.45882481e-10 2.64938649e-09
   1.79954870e-11 2.02320621e-07 9.12394049e-10 2.75503509e-10
   5.95913874e-10 1.63898321e-07 8.41319991e-09 1.05264723e-14
   2.74374468e-10 3.19579263e-14 6.40001451e-15 1.45123458e-09
   1.19471666e-09 4.23740348e-11 2.39680470e-10 2.94248048e-06
   4.62800775e-09 1.62578018e-09 6.62761512e-11 2.94862925e-08
   2.98785552e-09 4.48625270e-09 1.33363084e-14 3.70850739e-10
   2.02745709e-09 2.27049828e-08 1.49620067e-08 3.19294591e-11
   3.44046369e-09 1.51796242e-09 9.96089935e-01 7.26451954e-10
   6.70469347e-10 3.90631286e-03 5.33783739e-10 8.51387290e-15
   6.16130980e-10 2.25722260e-10 3.43686092e-14 1.74746224e-11
   1.45528505e-08 4.94860366e-08 1.35537581e-08 1.97638461e-10
   7.24767329e-11 2.70808848e-10 1.57273938e

6.627615e-11

In [0]:
ch_to_ind["L"]

34

In [0]:
print("Character index and the highest probability is : {}, {}".format(46, tmp_array[46]))
print("Character string with highrst probability: {}".format(ind_to_ch[46]))

Character index and the highest probability is : 46, 0.9960899353027344
Characyer string with highrst probability:  


In [0]:
# tf.expand_dims(tf.expand_dims(tf.one_hot(indices = start_ind, depth = vocab_size), 0), 0)
log_prob = 1.0
predicted_char_logits=None
predicted_char_softmax=None
string = "LORD"
current_char = None
for i in range(len(string)):
  if(i == 0):
    predicted_char_logits = language_model2(start_sequence_onehot_encode)
    predicted_char_softmax = softmax_output2(predicted_char_logits)
    np_array = predicted_char_softmax.numpy()
    tmp_array = np_array.flatten()
    
    actual_prob = tmp_array[ch_to_ind[string[i]]]
    log_prob = log_prob * tf.math.log(actual_prob)
    print("current character and proba: {}, {}".format(string[i], log_prob))
    current_char = string[i]
  else:
    current_char_onehot_encode = tf.expand_dims(tf.expand_dims(tf.one_hot(indices = ch_to_ind[current_char], depth = vocab_size), 0), 0)
    predicted_char_logits = language_model2(current_char_onehot_encode)
    predicted_char_softmax = softmax_output2(predicted_char_logits)
    np_array = predicted_char_softmax.numpy()
    tmp_array = np_array.flatten()
    
    actual_prob = tmp_array[ch_to_ind[string[i]]]
    log_prob = log_prob * tf.math.log(actual_prob)
    print("current character and proba: {}, {}".format(string[i], log_prob))
    current_char = string[i]


current character and proba: L, -23.39385414123535
current character and proba: O, 9.980377197265625
current character and proba: R, -17.146482467651367
current character and proba: D, 0.0199663657695055


In [0]:
def find_log_prob_for_string(string):
  log_prob = 0.0
  predicted_char_logits=None
  predicted_char_softmax=None
  string = string
  current_char = None
  for i in range(len(string)):
    if(i == 0):
      predicted_char_logits = language_model2(start_sequence_onehot_encode)
      predicted_char_softmax = softmax_output2(predicted_char_logits)
      np_array = predicted_char_softmax.numpy()
      tmp_array = np_array.flatten()
      
      actual_prob = tmp_array[ch_to_ind[string[i]]]
      log_prob = log_prob + tf.math.log(actual_prob)
      print("current character and proba: {}, {}".format(string[i], log_prob))
      current_char = string[i]
    else:
      current_char_onehot_encode = tf.expand_dims(tf.expand_dims(tf.one_hot(indices = ch_to_ind[current_char], depth = vocab_size), 0), 0)
      predicted_char_logits = language_model2(current_char_onehot_encode)
      predicted_char_softmax = softmax_output2(predicted_char_logits)
      np_array = predicted_char_softmax.numpy()
      tmp_array = np_array.flatten()
      
      actual_prob = tmp_array[ch_to_ind[string[i]]]
      log_prob = log_prob + tf.math.log(actual_prob)
      print("current character and proba: {}, {}".format(string[i], log_prob))
      current_char = string[i]
  print("=========== ============= ========")
  print()
  return log_prob;


In [0]:
print("Log probability for LORD vs LOOD are : {} , {}".format(find_log_prob_for_string("LORD"),find_log_prob_for_string("LOOD")))

current character and proba: L, -25.012290954589844
current character and proba: O, -26.0313720703125
current character and proba: R, -27.389360427856445
current character and proba: D, -27.39081382751465

current character and proba: L, -26.758562088012695
current character and proba: O, -27.96279525756836
current character and proba: O, -44.068145751953125
current character and proba: D, -45.164154052734375

Log probability for LORD vs LOOD are : -27.39081382751465 , -45.164154052734375


In [0]:
print("Log probability for GOD vs DOG vs dog are : {} , {}, {}".format(find_log_prob_for_string("GOD"),find_log_prob_for_string("DOG"),find_log_prob_for_string("dog")))

current character and proba: G, -26.17877197265625
current character and proba: O, -26.806764602661133
current character and proba: D, -27.079181671142578

current character and proba: D, -20.507095336914062
current character and proba: O, -33.156158447265625
current character and proba: G, -47.95939636230469

current character and proba: d, -18.361841201782227
current character and proba: o, -30.30622100830078
current character and proba: g, -34.88587951660156

Log probability for GOD vs DOG vs dog are : -27.079181671142578 , -47.95939636230469, -34.88587951660156


In [0]:
print("Log probability for Abram vs abram vs Arbam are : {} , {}, {}".format(find_log_prob_for_string("Abram"),find_log_prob_for_string("abram"),find_log_prob_for_string("Arbam")))

current character and proba: A, -18.6795711517334
current character and proba: b, -25.245803833007812
current character and proba: r, -25.974218368530273
current character and proba: a, -30.421810150146484
current character and proba: m, -33.820594787597656

current character and proba: a, -11.571039199829102
current character and proba: b, -15.479758262634277
current character and proba: r, -19.45343017578125
current character and proba: a, -24.344633102416992
current character and proba: m, -28.015478134155273

current character and proba: A, -19.575712203979492
current character and proba: r, -28.545684814453125
current character and proba: b, -34.47380828857422
current character and proba: a, -36.628562927246094
current character and proba: m, -41.560062408447266

Log probability for Abram vs abram vs Arbam are : -33.820594787597656 , -28.015478134155273, -41.560062408447266


In [0]:
def find_log_prob_for_string_512(string):
  log_prob = 0.0
  predicted_char_logits=None
  predicted_char_softmax=None
  string = string
  current_char = None
  for i in range(len(string)):
    if(i == 0):
      predicted_char_logits = language_model_512(start_sequence_onehot_encode)
      predicted_char_softmax = softmax_output(predicted_char_logits)
      np_array = predicted_char_softmax.numpy()
      tmp_array = np_array.flatten()
      
      actual_prob = tmp_array[ch_to_ind[string[i]]]
      log_prob = log_prob + tf.math.log(actual_prob)
      print("current character and proba: {}, {}".format(string[i], log_prob))
      current_char = string[i]
    else:
      current_char_onehot_encode = tf.expand_dims(tf.expand_dims(tf.one_hot(indices = ch_to_ind[current_char], depth = vocab_size), 0), 0)
      predicted_char_logits = language_model_512(current_char_onehot_encode)
      predicted_char_softmax = softmax_output(predicted_char_logits)
      np_array = predicted_char_softmax.numpy()
      tmp_array = np_array.flatten()
      
      actual_prob = tmp_array[ch_to_ind[string[i]]]
      log_prob = log_prob + tf.math.log(actual_prob)
      print("current character and proba: {}, {}".format(string[i], log_prob))
      current_char = string[i]
  print("=========== ============= ========")
  print()
  return log_prob;

In [35]:
print("Log probability for LORD vs LOOD are : {} , {}".format(find_log_prob_for_string_512("LORD"),find_log_prob_for_string_512("LOOD")))

current character and proba: L, -15.66189193725586
current character and proba: O, -15.690693855285645
current character and proba: R, -19.119600296020508
current character and proba: D, -19.31654167175293

current character and proba: L, -13.107548713684082
current character and proba: O, -15.677525520324707
current character and proba: O, -21.368053436279297
current character and proba: D, -23.021589279174805

Log probability for LORD vs LOOD are : -19.31654167175293 , -23.021589279174805


In [37]:
print("Log probability for GOD vs DOG vs dog are : {} , {}, {}, {}".format(find_log_prob_for_string_512("GOD"),find_log_prob_for_string_512("DOG"),find_log_prob_for_string_512("dog"), find_log_prob_for_string_512("god")))

current character and proba: G, -11.321189880371094
current character and proba: O, -13.071130752563477
current character and proba: D, -14.047483444213867

current character and proba: D, -2.5910139083862305
current character and proba: O, -8.607620239257812
current character and proba: G, -15.203033447265625

current character and proba: d, -11.219135284423828
current character and proba: o, -20.93477439880371
current character and proba: g, -29.923145294189453

current character and proba: g, -8.381122589111328
current character and proba: o, -16.48155975341797
current character and proba: d, -22.021503448486328

Log probability for GOD vs DOG vs dog are : -14.047483444213867 , -15.203033447265625, -29.923145294189453, -22.021503448486328


In [39]:
print("Log probability for Abram vs abram vs Arbam are : {} , {}, {}".format(find_log_prob_for_string_512("Abram"),find_log_prob_for_string_512("abram"),find_log_prob_for_string_512("Arbam")))

current character and proba: A, -5.761716365814209
current character and proba: b, -10.800869941711426
current character and proba: r, -14.162717819213867
current character and proba: a, -19.593551635742188
current character and proba: m, -22.035017013549805

current character and proba: a, -12.653196334838867
current character and proba: b, -17.244342803955078
current character and proba: r, -25.514951705932617
current character and proba: a, -28.499645233154297
current character and proba: m, -32.45140075683594

current character and proba: A, -19.785926818847656
current character and proba: r, -31.840320587158203
current character and proba: b, -39.851715087890625
current character and proba: a, -43.39006423950195
current character and proba: m, -50.917354583740234

Log probability for Abram vs abram vs Arbam are : -22.035017013549805 , -32.45140075683594, -50.917354583740234
