# *Text Prediction using Sequential Models*
 **UNI:** sk4819 | **Name:** Shreyans Kothari


### 1) Problem: 
Text prediction models utilize Natural Language Processing and Machine Learning to predict the subsequent word(s) in a sentence. The user inputs a few words and the model predicts the words that are most likely to come after them. Text predicition models have a lot of use, depending on the industry and field they are employed in. They are quite ubiquitious and assist us on a daily basis, sometimes without us even realizing. If you go to Google and type in a few words in the search bar, Google's autofill feature completes the sentence for you without making you type all the words- saving your precious time. On gmail, the model learns your writing style from the emails that you send and over time starts reccomending words/phrases. Instead of you having to type "Dear Prof. Morales, I hope you're doing well!", all you have to do is start typing "Dear.." and gmail does the rest for you- again saving a lot of time.


The biggest problem models like these solve is that of speed. Text prediciton models increase efficiency in different processes, thus increasing the speed with which pepole conduct business, talk to each other, and live their lives. As these models keep getting faster and more accurate, the ease in our interactions and communication keeps improving. In a sense, text prediction models increase the speed of development of our economies by allowing people to find the right information and connect with the right people in an appropriate and efficient manner.


### 2) Data: 
I wanted to takle this problem in a slightly different way. I wanted to create a model that would be personalized to me and my writing style.

I trained the model on papers, emails, and creative writings (short stories, poems, incomplete long-er stories, etc.) that I have written before and at graduate school. I thought it would be interesting to teach the model my style. I put all the texts in one single document and imported it to Python using the textract package.

In addition to my personal texts, I also decided to include a text corpus from the nltk library: the WSJ articles corpus in order to increase the number of unique tokens and expand the scope of the model. I didn't do any preprocessing other than combining all texts, removing blank spaces and punctuation, and converting all words to lowercase. The final dataset had a total of 168,719 words, out of which 13,673 were unique. 


### 3) Deep Learning:
In order to build a text prediciton model, we need to use a sequential model because the sequence of the words matter. If we tokenized all words and just treated them as separate points, we would not be able to get a model that would be able to predict the subsequent words. The Sequential Deep Learning models that I use in this project allow the model to actually learn the "right" order of words to output when given any words. 

### Importing Libraries

In [None]:
!pip install textract

In [2]:
import pandas as pd
import numpy as np
import nltk
import re
import tensorflow as tf
from google.colab import drive
#import docx
import textract
from google.colab import files
import pickle
path_in = "/content/drive/MyDrive/ML_FinalProject/"
drive.mount('/content/drive')

Drive already mounted at /content/drive; to attempt to forcibly remount, call drive.mount("/content/drive", force_remount=True).


In [3]:
cd "/content/drive/MyDrive/ML_FinalProject/"

/content/drive/MyDrive/ML_FinalProject


### Importing Data

In [6]:
text1 = textract.process("/content/drive/MyDrive/ML_FinalProject/train1.docx")
text2 = textract.process("/content/drive/MyDrive/ML_FinalProject/train2.docx")
text3 = textract.process("/content/drive/MyDrive/ML_FinalProject/train3.docx")
text1 = text1.decode("utf-8") 
text2 = text2.decode("utf-8") 
text3 = text3.decode("utf-8") 

In [None]:
nltk.download("book")
from nltk.book import text7
text7 = str(' '.join(text7[:]))

In [8]:
text =  text1  + " " + text2 + " " + text3 + " " + text7

#### Text Excerpts

In [7]:
text[10000:11000]

'ng their collective bargaining power will leave the management with no choice but to invest in improving the workers’ living conditions. A union led by the workers will also validate the negotiation for better healthcare, daily quotas, and wage outcomes for both permanent and temporary workers. Restricted Flow of Information The Assamese tea farmers in large plantations are greatly reliant on the Tocklai Tea Research Association, a research institute established by the Tea Board of India, for information on fertilizer and pesticide use, water irrigation management, soil erosion, and other sustainable farming practices. However, the smallholder farmers have been unable to benefit from these land management and farming best practices due to accessibility issues. The information is made available in systems and formats that lie outside the capacity and reach of small plantation owners. The lack of information affects not only the smallholder farmers’ productivity and yields (and conseque

In [8]:
text[20000:21000]

's. KalaaCo is extremely thankful to your team for all your help in facilitating the translation of KalaaCo from a simple idea into a tangible social enterprise that seeks to improve the market outcomes for small-scale artisans in Rajasthan, India.\n\nPlease feel free to reach out to us at sk4819@columbia.edu if you have any follow up questions or comments. The Rajasthani artform encompasses traditions and techniques that have been passed down generations, and customs that have been practiced for over decades (some over a millennia). An untrained eye might fail to notice the influences of the various empires – Mauryan, Rajput, Mughal, Hindu, British, etc. – that ruled over this region. The hundreds of thousands Rajasthani craftspeople, artisans, and handicraft workers constantly traverse the intersection of a myriad cultural and ethnic identities. The intricacies and the delicate-nature of their art renders these small-scale artists unable to compete with large-scale corporations whose

In [9]:
text[100000:110000]

'ountless times, “Stop! Open your eyes, you dimwits! Look at how senseless all this walking is!” Whenever this feeling starts overpowering me, I have to calm myself down by looking away. Statues are not supposed to have thoughts like these, you know. Don’t let me down He told me plainly, "It is up to you now. Don\'t let me down."\xa0 It was a warm evening and Raul was playing with his toy car in the backyard. I had just put a teapot on the stove when my father walked in.\xa0 "Would you like some tea?"\xa0 "Why, yes. Do you have any of the Verbena Lemon left? I\'ll have some of that. Please and thank you." I took out the blue ceramic cups one aunt Galdys had gifted me on my wedding day. The blue had started to dull over the years but I could never bring myself to throw them away. They were Peter\'s favorite cups and throwing them away felt like I was betraying him. I poured the hot brown liquid into the cups and sat down across from my father. He was lost in his thoughts and I let him b

### Preprocessing

In [9]:
import string

# turn a doc into clean tokens

def clean_doc(doc):
    # replace '--' with a space ' '
    #doc = doc.replace('--', ' ')
    # split into tokens by white space
    tokens = doc.split()
    # remove punctuation from each token
    tokens = [' ' if w in string.punctuation else w for w in tokens]
    # remove remaining tokens that are not alphabetic
    #tokens = [word for word in tokens if word.isalpha()]
    # make lower case
    tokens = [word.lower() for word in tokens]
    #tokens = tokens.strip()
    return tokens

def clean_text(txt_in):
    import re
    clean = re.sub('[^A-Za-z0-9]+', " ", txt_in).lower().strip()
    clean = clean.split()
    return clean
 
tokens = clean_text(text)#clean_doc(text)

number_of_unique_tokens = len(set(tokens))

print('Total Tokens: %d' % len(tokens))
print('Unique Tokens: %d' % number_of_unique_tokens)
print('These are the first 50 tokens: %s' % tokens[:50])

# A key design decision is how long the input sequences should be. 
# They need to be long enough to allow the model to learn the context for the words to predict. 
# This input length will also define the length of seed text used to generate new sequences 
# when we use the model.
# There is no correct answer. With enough time and resources, we could explore the ability of 
# the model to learn with differently sized input sequences.

sequence_length = 7

# organize into sequences of tokens of input words plus one output word
length = sequence_length + 1 # This was changed to 2 from 1
sequences = list()
for i in range(length, len(tokens)):
    # select sequence of tokens
    seq = tokens[i-length:i]
    # convert into a line
    line = ' '.join(seq)
    # store
    sequences.append(line)

print ('Total Sequences: %d' % len(sequences))
print ('This is the first sequence: {0}'.format(sequences[0]))

Total Tokens: 168719
Unique Tokens: 13673
These are the first 50 tokens: ['the', 'modern', 'connotations', 'of', 'tea', 'extend', 'beyond', 'the', 'idea', 'of', 'a', 'beverage', 'it', 'is', 'increasingly', 'becoming', 'a', 'symbol', 'of', 'healthy', 'living', 'and', 'well', 'being', 'in', 'addition', 'to', 'its', 'associated', 'health', 'benefits', 'a', 'rapidly', 'growing', 'middle', 'class', 'in', 'emerging', 'countries', 'has', 'boosted', 'global', 'consumption', 'of', 'tea', 'which', 'currently', 'sits', 'at', '5']
Total Sequences: 168711
This is the first sequence: the modern connotations of tea extend beyond the


### 4) Model Fitting

#### Model 1: Basic LSTM with 200 nodes and 100 epochs

In [None]:
with tf.device('/device:GPU:0'):
  import numpy as np
  from keras.preprocessing.text import Tokenizer
  from tensorflow.keras.utils import to_categorical
  from keras.models import Sequential
  from keras.layers import Dense, Conv1D, Flatten
  from keras.layers import LSTM
  from keras.layers import Embedding

  tokenizer = Tokenizer()
  tokenizer.fit_on_texts(sequences)
  sequ = tokenizer.texts_to_sequences(sequences)

  vocab_size = number_of_unique_tokens + 1

  sequences0 = np.array(sequ)
  X, y = sequences0[:,:-1], sequences0[:,-1]
  y = to_categorical(y, num_classes=vocab_size)

  model = Sequential()
  model.add(Embedding(vocab_size, sequence_length, input_length=sequence_length))
  model.add(LSTM(200))
  model.add(Dense(100, activation='relu'))
  model.add(Dense(vocab_size, activation='softmax'))
  
  print(model.summary())

  model.compile(loss='categorical_crossentropy', optimizer='Adam', metrics=['accuracy'])

  model.fit(X, y, batch_size=80, epochs=100)


In [12]:
# Saving model 1
pickle.dump(model, open(path_in + "model1.pkl", "wb"))



INFO:tensorflow:Assets written to: ram://d03a68a4-3be9-4aa5-9402-e01babe44eba/assets


INFO:tensorflow:Assets written to: ram://d03a68a4-3be9-4aa5-9402-e01babe44eba/assets


In [None]:
print (X.shape)
prediction = model.predict(X[0].reshape(1,sequence_length))
print (prediction.shape)
print (prediction)

In [98]:
# Trying different phrases:
## "Raul was playing with his toy car" --> 'Raul was playing with his toy car in the case whereas supply facilitate china'
# "I think there has been an error" --> "I think there has been an error and would tend to a lack of"
# "I am a dual degree" --> "I am a dual degree masters candidate from a several development"
# "Data Science and Machine Learning " --> "Data Science and Machine Learning  practices to address them the artisans"
# "The artisans from" --> "The artisans from the vehicle war and politics programs"
# "I dont know where to start" --> "I dont know where to start more about vox food and economic"
# "Artificial Intelligence is going to" --> "Artificial Intelligence is going to examine the extinction of rajasthani artforms"
test = ['Once upon a time there was a']


for t in test:
    example = tokenizer.texts_to_sequences([t])
    prediction = model.predict(np.array(example))
    predicted_word = np.argmax(prediction)
    reverse_word_map = dict(map(reversed, tokenizer.word_index.items()))  # https://stackoverflow.com/a/43927939/246508
    t1 = t + " " + reverse_word_map[predicted_word]

    word2 = tokenizer.texts_to_sequences([t1])
    prediction1 = model.predict(np.array(word2))
    predicted_word1 = np.argmax(prediction1)
    reverse_word_map = dict(map(reversed, tokenizer.word_index.items()))
    t2 = t1 + " " + reverse_word_map[predicted_word1]

    word2 = tokenizer.texts_to_sequences([t2])
    prediction1 = model.predict(np.array(word2))
    predicted_word1 = np.argmax(prediction1)
    reverse_word_map = dict(map(reversed, tokenizer.word_index.items()))
    t3 = t2 + " " + reverse_word_map[predicted_word1]

    word2 = tokenizer.texts_to_sequences([t3])
    prediction1 = model.predict(np.array(word2))
    predicted_word1 = np.argmax(prediction1)
    reverse_word_map = dict(map(reversed, tokenizer.word_index.items()))
    t4 = t3 + " " + reverse_word_map[predicted_word1]

    word3 = tokenizer.texts_to_sequences([t4])
    prediction1 = model.predict(np.array(word3))
    predicted_word1 = np.argmax(prediction1)
    reverse_word_map = dict(map(reversed, tokenizer.word_index.items()))
    t5 = t4 + " " + reverse_word_map[predicted_word1]

    word4 = tokenizer.texts_to_sequences([t5])
    prediction1 = model.predict(np.array(word4))
    predicted_word1 = np.argmax(prediction1)
    reverse_word_map = dict(map(reversed, tokenizer.word_index.items()))
    t6 = t5 + " " + reverse_word_map[predicted_word1]

    word5 = tokenizer.texts_to_sequences([t6])
    prediction1 = model.predict(np.array(word5))
    predicted_word1 = np.argmax(prediction1)
    reverse_word_map = dict(map(reversed, tokenizer.word_index.items()))
    t7 = t6 + " " + reverse_word_map[predicted_word1]

    word6 = tokenizer.texts_to_sequences([t7])
    prediction1 = model.predict(np.array(word6))
    predicted_word1 = np.argmax(prediction1)
    reverse_word_map = dict(map(reversed, tokenizer.word_index.items()))
    t8 = t7 + " " + reverse_word_map[predicted_word1]  

    word7 = tokenizer.texts_to_sequences([t8])
    prediction1 = model.predict(np.array(word7))
    predicted_word1 = np.argmax(prediction1)
    reverse_word_map = dict(map(reversed, tokenizer.word_index.items()))
    t9 = t8 + " " + reverse_word_map[predicted_word1]

    print(t6)
    #print ("{0} -> {1}".format(t, reverse_word_map[predicted_word]))


Once upon a time there was a touch coated in the western world


Model 1 had a validation set score of 0.6066, which is not too bad for the first model. I tried the model to predict a few phrases (above) and most of them make gramatically sense (not contextually tho). It was interesting to see the model give sentences that could somewhat make sense if the context was right. 

#### Model 2: Adding an additional LSTM layer with 100 nodes

In [None]:
with tf.device('/device:GPU:0'):
  import numpy as np
  from keras.preprocessing.text import Tokenizer
  from tensorflow.keras.utils import to_categorical
  from keras.models import Sequential
  from keras.layers import Dense, Conv1D, Flatten
  from keras.layers import LSTM
  from keras.layers import Embedding

  tokenizer = Tokenizer()
  tokenizer.fit_on_texts(sequences)
  sequ = tokenizer.texts_to_sequences(sequences)

  vocab_size = number_of_unique_tokens + 1

  sequences0 = np.array(sequ)
  X, y = sequences0[:,:-1], sequences0[:,-1]
  y = to_categorical(y, num_classes=vocab_size)

  model = Sequential()
  model.add(Embedding(vocab_size, sequence_length, input_length=sequence_length))
  model.add(LSTM(200, return_sequences=True ))
  model.add(LSTM(100))
  model.add(Dense(100, activation='relu'))
  model.add(Dense(vocab_size, activation='softmax'))
  
  print(model.summary())

  model.compile(loss='categorical_crossentropy', optimizer='Adam', metrics=['accuracy'])

  model.fit(X, y, batch_size=80, epochs=100)

In [10]:
# Saving model 2
path_in = "/content/drive/MyDrive/ML_FinalProject/"
pickle.dump(model, open(path_in + "model2.pkl", "wb"))



INFO:tensorflow:Assets written to: ram://2d6cde0c-f050-48d5-ba81-be8599f1aec7/assets


INFO:tensorflow:Assets written to: ram://2d6cde0c-f050-48d5-ba81-be8599f1aec7/assets


In [None]:
print (X.shape)
prediction = model.predict(X[0].reshape(1,sequence_length))
print (prediction.shape)
print (prediction)

In [21]:
# Trying different phrases:
## Hi how are you doing today
# "Raul was playing with his toy car" --> 'Raul was playing with his toy car in india and india and india'
# "I think there has been an error" --> "I think there has been an error decrease working going to more ways"
# "I am a dual degree" --> "I am a dual degree is i believe you realizes to"
# "Data Science and Machine Learning" --> "Data Science and Machine Learning separate focus on fundamental years peddle"
# "I dont know where to start" --> "I dont know where to start read surrounded out about it harder"
# "Artificial Intelligence is going to" --> "Artificial Intelligence is going to distrust and programs in rajasthan arl"
test = ['Hi how are you doing today']


for t in test:
    example = tokenizer.texts_to_sequences([t])
    prediction = model.predict(np.array(example))
    predicted_word = np.argmax(prediction)
    reverse_word_map = dict(map(reversed, tokenizer.word_index.items()))  # https://stackoverflow.com/a/43927939/246508
    t1 = t + " " + reverse_word_map[predicted_word]

    word2 = tokenizer.texts_to_sequences([t1])
    prediction1 = model.predict(np.array(word2))
    predicted_word1 = np.argmax(prediction1)
    reverse_word_map = dict(map(reversed, tokenizer.word_index.items()))
    t2 = t1 + " " + reverse_word_map[predicted_word1]

    word2 = tokenizer.texts_to_sequences([t2])
    prediction1 = model.predict(np.array(word2))
    predicted_word1 = np.argmax(prediction1)
    reverse_word_map = dict(map(reversed, tokenizer.word_index.items()))
    t3 = t2 + " " + reverse_word_map[predicted_word1]

    word2 = tokenizer.texts_to_sequences([t3])
    prediction1 = model.predict(np.array(word2))
    predicted_word1 = np.argmax(prediction1)
    reverse_word_map = dict(map(reversed, tokenizer.word_index.items()))
    t4 = t3 + " " + reverse_word_map[predicted_word1]

    word3 = tokenizer.texts_to_sequences([t4])
    prediction1 = model.predict(np.array(word3))
    predicted_word1 = np.argmax(prediction1)
    reverse_word_map = dict(map(reversed, tokenizer.word_index.items()))
    t5 = t4 + " " + reverse_word_map[predicted_word1]

    word4 = tokenizer.texts_to_sequences([t5])
    prediction1 = model.predict(np.array(word4))
    predicted_word1 = np.argmax(prediction1)
    reverse_word_map = dict(map(reversed, tokenizer.word_index.items()))
    t6 = t5 + " " + reverse_word_map[predicted_word1]

    word5 = tokenizer.texts_to_sequences([t6])
    prediction1 = model.predict(np.array(word5))
    predicted_word1 = np.argmax(prediction1)
    reverse_word_map = dict(map(reversed, tokenizer.word_index.items()))
    t7 = t6 + " " + reverse_word_map[predicted_word1]

    word6 = tokenizer.texts_to_sequences([t7])
    prediction1 = model.predict(np.array(word6))
    predicted_word1 = np.argmax(prediction1)
    reverse_word_map = dict(map(reversed, tokenizer.word_index.items()))
    t8 = t7 + " " + reverse_word_map[predicted_word1]  

    word7 = tokenizer.texts_to_sequences([t8])
    prediction1 = model.predict(np.array(word7))
    predicted_word1 = np.argmax(prediction1)
    reverse_word_map = dict(map(reversed, tokenizer.word_index.items()))
    t9 = t8 + " " + reverse_word_map[predicted_word1]

    print(t6)
    #print ("{0} -> {1}".format(t, reverse_word_map[predicted_word]))


Hi how are you doing today s steps in the past several


Model 2's validation set score decreased to 0.4765 from model 1. Seems like adding an additional layer did not improve the model and had the opposite effect. The text predictions the model outputs make even less sense than those from Model 1. Model 1 seems to be better than Model 2.

#### Model 3: Model with one SimpleRNN layer and 200 nodes

In [22]:
with tf.device('/device:GPU:0'):
  import numpy as np
  from keras.preprocessing.text import Tokenizer
  from tensorflow.keras.utils import to_categorical
  from keras.models import Sequential
  from keras.layers import Dense, Conv1D, Flatten
  from keras.layers import LSTM, SimpleRNN
  from keras.layers import Embedding

  tokenizer = Tokenizer()
  tokenizer.fit_on_texts(sequences)
  sequ = tokenizer.texts_to_sequences(sequences)

  vocab_size = number_of_unique_tokens + 1

  sequences0 = np.array(sequ)
  X, y = sequences0[:,:-1], sequences0[:,-1]
  y = to_categorical(y, num_classes=vocab_size)

  model = Sequential()
  model.add(Embedding(vocab_size, sequence_length, input_length=sequence_length))
  model.add(SimpleRNN(200))
  model.add(Dense(100, activation='relu'))
  model.add(Dense(vocab_size, activation='softmax'))
  
  print(model.summary())

  model.compile(loss='categorical_crossentropy', optimizer='Adam', metrics=['accuracy'])

  model.fit(X, y, batch_size=80, epochs=100)
  pickle.dump(model, open(path_in + "model3.pkl", "wb"))

Model: "sequential_1"
_________________________________________________________________
 Layer (type)                Output Shape              Param #   
 embedding_1 (Embedding)     (None, 7, 7)              95718     
                                                                 
 simple_rnn (SimpleRNN)      (None, 200)               41600     
                                                                 
 dense_2 (Dense)             (None, 100)               20100     
                                                                 
 dense_3 (Dense)             (None, 13674)             1381074   
                                                                 
Total params: 1,538,492
Trainable params: 1,538,492
Non-trainable params: 0
_________________________________________________________________
None
Epoch 1/100
Epoch 2/100
Epoch 3/100
Epoch 4/100
Epoch 5/100
Epoch 6/100
Epoch 7/100
Epoch 8/100
Epoch 9/100
Epoch 10/100
Epoch 11/100
Epoch 12/100
Epoch 13/100
Epoch 14/10

INFO:tensorflow:Assets written to: ram://4e8672ef-8c23-4c15-aab6-217eb9c6d242/assets


In [30]:
# Trying different phrases:
## Hi how are you doing today sir
# "Raul was playing with his toy car" --> 'Raul was playing with his toy car of the company is a share'
# "I think there has been an error" --> "I think there has been an error and the company s of the"
# "I am a dual degree" --> "I am a dual degree i am a share of the"
# "Data Science and Machine Learning" --> "Data Science and Machine Learning and the company s of the"
# "I dont know where to start" --> "I dont know where to start i am a share of the"
# "Artificial Intelligence is going to" --> "Artificial Intelligence is going to the company is a share of the company s"
test = ['Artificial Intelligence is going to']


for t in test:
    example = tokenizer.texts_to_sequences([t])
    prediction = model.predict(np.array(example))
    predicted_word = np.argmax(prediction)
    reverse_word_map = dict(map(reversed, tokenizer.word_index.items()))  # https://stackoverflow.com/a/43927939/246508
    t1 = t + " " + reverse_word_map[predicted_word]

    word2 = tokenizer.texts_to_sequences([t1])
    prediction1 = model.predict(np.array(word2))
    predicted_word1 = np.argmax(prediction1)
    reverse_word_map = dict(map(reversed, tokenizer.word_index.items()))
    t2 = t1 + " " + reverse_word_map[predicted_word1]

    word2 = tokenizer.texts_to_sequences([t2])
    prediction1 = model.predict(np.array(word2))
    predicted_word1 = np.argmax(prediction1)
    reverse_word_map = dict(map(reversed, tokenizer.word_index.items()))
    t3 = t2 + " " + reverse_word_map[predicted_word1]

    word2 = tokenizer.texts_to_sequences([t3])
    prediction1 = model.predict(np.array(word2))
    predicted_word1 = np.argmax(prediction1)
    reverse_word_map = dict(map(reversed, tokenizer.word_index.items()))
    t4 = t3 + " " + reverse_word_map[predicted_word1]

    word3 = tokenizer.texts_to_sequences([t4])
    prediction1 = model.predict(np.array(word3))
    predicted_word1 = np.argmax(prediction1)
    reverse_word_map = dict(map(reversed, tokenizer.word_index.items()))
    t5 = t4 + " " + reverse_word_map[predicted_word1]

    word4 = tokenizer.texts_to_sequences([t5])
    prediction1 = model.predict(np.array(word4))
    predicted_word1 = np.argmax(prediction1)
    reverse_word_map = dict(map(reversed, tokenizer.word_index.items()))
    t6 = t5 + " " + reverse_word_map[predicted_word1]

    word5 = tokenizer.texts_to_sequences([t6])
    prediction1 = model.predict(np.array(word5))
    predicted_word1 = np.argmax(prediction1)
    reverse_word_map = dict(map(reversed, tokenizer.word_index.items()))
    t7 = t6 + " " + reverse_word_map[predicted_word1]

    word6 = tokenizer.texts_to_sequences([t7])
    prediction1 = model.predict(np.array(word6))
    predicted_word1 = np.argmax(prediction1)
    reverse_word_map = dict(map(reversed, tokenizer.word_index.items()))
    t8 = t7 + " " + reverse_word_map[predicted_word1]  

    word7 = tokenizer.texts_to_sequences([t8])
    prediction1 = model.predict(np.array(word7))
    predicted_word1 = np.argmax(prediction1)
    reverse_word_map = dict(map(reversed, tokenizer.word_index.items()))
    t9 = t8 + " " + reverse_word_map[predicted_word1]

    print(t9)
    #print ("{0} -> {1}".format(t, reverse_word_map[predicted_word]))


Artificial Intelligence is going to the company is a share of the company s


Model 3 performed even worse than model 2 (from ~0.4 accuracy to about 0.16. The text predictions don't seem to be making any sense either. 

In [None]:
#### Model 3: Model with one SimpleRNN layer and 200 nodes

#### Model 4: Model with one LSTM layer and one SimpleRNN layer with a dropout

In [31]:
with tf.device('/device:GPU:0'):
  import numpy as np
  from keras.preprocessing.text import Tokenizer
  from tensorflow.keras.utils import to_categorical
  from keras.models import Sequential
  from keras.layers import Dense, Conv1D, Flatten
  from keras.layers import LSTM, SimpleRNN
  from keras.layers import Embedding, Dropout

  tokenizer = Tokenizer()
  tokenizer.fit_on_texts(sequences)
  sequ = tokenizer.texts_to_sequences(sequences)

  vocab_size = number_of_unique_tokens + 1

  sequences0 = np.array(sequ)
  X, y = sequences0[:,:-1], sequences0[:,-1]
  y = to_categorical(y, num_classes=vocab_size)

  model = Sequential()
  model.add(Embedding(vocab_size, sequence_length, input_length=sequence_length))
  model.add(LSTM(200, return_sequences= True))
  model.add(SimpleRNN(100))
  model.add(Dropout(0.2))
  model.add(Dense(100, activation='relu'))
  model.add(Dense(vocab_size, activation='softmax'))
  
  print(model.summary())

  model.compile(loss='categorical_crossentropy', optimizer='Adam', metrics=['accuracy'])

  model.fit(X, y, batch_size=80, epochs=100)
  pickle.dump(model, open(path_in + "model4.pkl", "wb"))


Model: "sequential_2"
_________________________________________________________________
 Layer (type)                Output Shape              Param #   
 embedding_2 (Embedding)     (None, 7, 7)              95718     
                                                                 
 lstm_2 (LSTM)               (None, 7, 200)            166400    
                                                                 
 simple_rnn_1 (SimpleRNN)    (None, 100)               30100     
                                                                 
 dropout (Dropout)           (None, 100)               0         
                                                                 
 dense_4 (Dense)             (None, 100)               10100     
                                                                 
 dense_5 (Dense)             (None, 13674)             1381074   
                                                                 
Total params: 1,683,392
Trainable params: 1,683,392
No



INFO:tensorflow:Assets written to: ram://99e9752c-e36c-481e-a897-89dcbdd6beb3/assets


INFO:tensorflow:Assets written to: ram://99e9752c-e36c-481e-a897-89dcbdd6beb3/assets


In [41]:
# Trying different phrases:
## Hi how are you doing today
# "Raul was playing with his toy car" --> 'Raul was playing with his toy car in the company the company said'
# "I think there has been an error" --> "I think there has been an error in the interview the company s"
# "I am a dual degree" --> "I am a dual degree corps the company s the editorial"
# "Data Science and Machine Learning" --> "Data Science and Machine Learning tracking of the company s mother"
# "I dont know where to start" --> "I dont know where to start the other market the company said"
# "Artificial Intelligence is going to" --> "Artificial Intelligence is going to the two bloc of the u"
test = ['Hi how are you doing today']


for t in test:
    example = tokenizer.texts_to_sequences([t])
    prediction = model.predict(np.array(example))
    predicted_word = np.argmax(prediction)
    reverse_word_map = dict(map(reversed, tokenizer.word_index.items()))  # https://stackoverflow.com/a/43927939/246508
    t1 = t + " " + reverse_word_map[predicted_word]

    word2 = tokenizer.texts_to_sequences([t1])
    prediction1 = model.predict(np.array(word2))
    predicted_word1 = np.argmax(prediction1)
    reverse_word_map = dict(map(reversed, tokenizer.word_index.items()))
    t2 = t1 + " " + reverse_word_map[predicted_word1]

    word2 = tokenizer.texts_to_sequences([t2])
    prediction1 = model.predict(np.array(word2))
    predicted_word1 = np.argmax(prediction1)
    reverse_word_map = dict(map(reversed, tokenizer.word_index.items()))
    t3 = t2 + " " + reverse_word_map[predicted_word1]

    word2 = tokenizer.texts_to_sequences([t3])
    prediction1 = model.predict(np.array(word2))
    predicted_word1 = np.argmax(prediction1)
    reverse_word_map = dict(map(reversed, tokenizer.word_index.items()))
    t4 = t3 + " " + reverse_word_map[predicted_word1]

    word3 = tokenizer.texts_to_sequences([t4])
    prediction1 = model.predict(np.array(word3))
    predicted_word1 = np.argmax(prediction1)
    reverse_word_map = dict(map(reversed, tokenizer.word_index.items()))
    t5 = t4 + " " + reverse_word_map[predicted_word1]

    word4 = tokenizer.texts_to_sequences([t5])
    prediction1 = model.predict(np.array(word4))
    predicted_word1 = np.argmax(prediction1)
    reverse_word_map = dict(map(reversed, tokenizer.word_index.items()))
    t6 = t5 + " " + reverse_word_map[predicted_word1]

    word5 = tokenizer.texts_to_sequences([t6])
    prediction1 = model.predict(np.array(word5))
    predicted_word1 = np.argmax(prediction1)
    reverse_word_map = dict(map(reversed, tokenizer.word_index.items()))
    t7 = t6 + " " + reverse_word_map[predicted_word1]

    word6 = tokenizer.texts_to_sequences([t7])
    prediction1 = model.predict(np.array(word6))
    predicted_word1 = np.argmax(prediction1)
    reverse_word_map = dict(map(reversed, tokenizer.word_index.items()))
    t8 = t7 + " " + reverse_word_map[predicted_word1]  

    word7 = tokenizer.texts_to_sequences([t8])
    prediction1 = model.predict(np.array(word7))
    predicted_word1 = np.argmax(prediction1)
    reverse_word_map = dict(map(reversed, tokenizer.word_index.items()))
    t9 = t8 + " " + reverse_word_map[predicted_word1]

    print(t6)
    #print ("{0} -> {1}".format(t, reverse_word_map[predicted_word]))


Hi how are you doing today the company s is t 1


Model 4 performed better than model 3, but still was not as good as model 1 and model 2.

#### Model 5: Model with one LSTM layer with 600 nodes

In [43]:
with tf.device('/device:GPU:0'):
  import numpy as np
  from keras.preprocessing.text import Tokenizer
  from tensorflow.keras.utils import to_categorical
  from keras.models import Sequential
  from keras.layers import Dense, Conv1D, Flatten
  from keras.layers import LSTM, SimpleRNN
  from keras.layers import Embedding, Dropout

  tokenizer = Tokenizer()
  tokenizer.fit_on_texts(sequences)
  sequ = tokenizer.texts_to_sequences(sequences)

  vocab_size = number_of_unique_tokens + 1

  sequences0 = np.array(sequ)
  X, y = sequences0[:,:-1], sequences0[:,-1]
  y = to_categorical(y, num_classes=vocab_size)

  model = Sequential()
  model.add(Embedding(vocab_size, sequence_length, input_length=sequence_length))
  model.add(LSTM(600))
  model.add(Dense(100, activation='relu'))
  model.add(Dense(vocab_size, activation='softmax'))
  
  print(model.summary())

  model.compile(loss='categorical_crossentropy', optimizer='Adam', metrics=['accuracy'])

  model.fit(X, y, batch_size=80, epochs=100)
  pickle.dump(model, open(path_in + "model5.pkl", "wb"))


Model: "sequential_3"
_________________________________________________________________
 Layer (type)                Output Shape              Param #   
 embedding_3 (Embedding)     (None, 7, 7)              95718     
                                                                 
 lstm_3 (LSTM)               (None, 600)               1459200   
                                                                 
 dense_6 (Dense)             (None, 100)               60100     
                                                                 
 dense_7 (Dense)             (None, 13674)             1381074   
                                                                 
Total params: 2,996,092
Trainable params: 2,996,092
Non-trainable params: 0
_________________________________________________________________
None
Epoch 1/100
Epoch 2/100
Epoch 3/100
Epoch 4/100
Epoch 5/100
Epoch 6/100
Epoch 7/100
Epoch 8/100
Epoch 9/100
Epoch 10/100
Epoch 11/100
Epoch 12/100
Epoch 13/100
Epoch 14/10



INFO:tensorflow:Assets written to: ram://12d87985-5355-4445-95b1-4ab8d24fd32d/assets


INFO:tensorflow:Assets written to: ram://12d87985-5355-4445-95b1-4ab8d24fd32d/assets


In [122]:
# Trying different phrases:
## Hi how are you doing today sir
# 'I am so happy that' --> ''I am so happy that I am able to understand her'
# "I think that he" --> "I think that he might get sure new two arms"
# 'I am a student at' --> "I am a student at boston university i aim to be"
# "Raul was playing with his toy car" --> 'Raul was playing with his toy car in the company the company said'
# "I think there has been an error" --> "I think there has been an error in the interview the company s"
# "I am a dual degree" --> "I am a dual degree corps the company s the editorial"
# "Data Science and Machine Learning" --> "Data Science and Machine Learning tracking of the company s mother"
# "I dont know where to start" --> "I dont know where to start the other market the company said"
# "Artificial Intelligence is going to" --> "Artificial Intelligence is going to the two bloc of the u"
test = ['I think that I ']


for t in test:
    example = tokenizer.texts_to_sequences([t])
    prediction = model.predict(np.array(example))
    predicted_word = np.argmax(prediction)
    reverse_word_map = dict(map(reversed, tokenizer.word_index.items()))  # https://stackoverflow.com/a/43927939/246508
    t1 = t + " " + reverse_word_map[predicted_word]

    word2 = tokenizer.texts_to_sequences([t1])
    prediction1 = model.predict(np.array(word2))
    predicted_word1 = np.argmax(prediction1)
    reverse_word_map = dict(map(reversed, tokenizer.word_index.items()))
    t2 = t1 + " " + reverse_word_map[predicted_word1]

    word2 = tokenizer.texts_to_sequences([t2])
    prediction1 = model.predict(np.array(word2))
    predicted_word1 = np.argmax(prediction1)
    reverse_word_map = dict(map(reversed, tokenizer.word_index.items()))
    t3 = t2 + " " + reverse_word_map[predicted_word1]

    word2 = tokenizer.texts_to_sequences([t3])
    prediction1 = model.predict(np.array(word2))
    predicted_word1 = np.argmax(prediction1)
    reverse_word_map = dict(map(reversed, tokenizer.word_index.items()))
    t4 = t3 + " " + reverse_word_map[predicted_word1]

    word3 = tokenizer.texts_to_sequences([t4])
    prediction1 = model.predict(np.array(word3))
    predicted_word1 = np.argmax(prediction1)
    reverse_word_map = dict(map(reversed, tokenizer.word_index.items()))
    t5 = t4 + " " + reverse_word_map[predicted_word1]

    word4 = tokenizer.texts_to_sequences([t5])
    prediction1 = model.predict(np.array(word4))
    predicted_word1 = np.argmax(prediction1)
    reverse_word_map = dict(map(reversed, tokenizer.word_index.items()))
    t6 = t5 + " " + reverse_word_map[predicted_word1]

    word5 = tokenizer.texts_to_sequences([t6])
    prediction1 = model.predict(np.array(word5))
    predicted_word1 = np.argmax(prediction1)
    reverse_word_map = dict(map(reversed, tokenizer.word_index.items()))
    t7 = t6 + " " + reverse_word_map[predicted_word1]

    word6 = tokenizer.texts_to_sequences([t7])
    prediction1 = model.predict(np.array(word6))
    predicted_word1 = np.argmax(prediction1)
    reverse_word_map = dict(map(reversed, tokenizer.word_index.items()))
    t8 = t7 + " " + reverse_word_map[predicted_word1]  

    word7 = tokenizer.texts_to_sequences([t8])
    prediction1 = model.predict(np.array(word7))
    predicted_word1 = np.argmax(prediction1)
    reverse_word_map = dict(map(reversed, tokenizer.word_index.items()))
    t9 = t8 + " " + reverse_word_map[predicted_word1]

    print(t6)
    #print ("{0} -> {1}".format(t, reverse_word_map[predicted_word]))


I think that I  m not challenging sure to be


Model 5 performed really well- a lot better than all other models. It scored a validation accuracy of 0.92 and predicted the text with better accuracy- the output made a lot more sense in most of the cases. For example, when I input "I am so happy that" it returned "I am so happy that I am able to understand her" which is a legit English sentence with proper grammar and meaning. When I input "I am a student at", the model output "I am a student at boston university i aim to be". This is very interesting because I completed my undergrad at Boston University; it seems like the model has been learning my style and my writings fairly well.

#### Model 6: Model with one LSTM layer with 800 nodes

In [123]:
with tf.device('/device:GPU:0'):
  import numpy as np
  from keras.preprocessing.text import Tokenizer
  from tensorflow.keras.utils import to_categorical
  from keras.models import Sequential
  from keras.layers import Dense, Conv1D, Flatten
  from keras.layers import LSTM, SimpleRNN
  from keras.layers import Embedding, Dropout

  tokenizer = Tokenizer()
  tokenizer.fit_on_texts(sequences)
  sequ = tokenizer.texts_to_sequences(sequences)

  vocab_size = number_of_unique_tokens + 1

  sequences0 = np.array(sequ)
  X, y = sequences0[:,:-1], sequences0[:,-1]
  y = to_categorical(y, num_classes=vocab_size)

  model = Sequential()
  model.add(Embedding(vocab_size, sequence_length, input_length=sequence_length))
  model.add(LSTM(800))
  model.add(Dense(100, activation='relu'))
  model.add(Dense(vocab_size, activation='softmax'))
  
  print(model.summary())

  model.compile(loss='categorical_crossentropy', optimizer='Adam', metrics=['accuracy'])

  model.fit(X, y, batch_size=80, epochs=100)
  pickle.dump(model, open(path_in + "model6.pkl", "wb"))


Model: "sequential_4"
_________________________________________________________________
 Layer (type)                Output Shape              Param #   
 embedding_4 (Embedding)     (None, 7, 7)              95718     
                                                                 
 lstm_4 (LSTM)               (None, 800)               2585600   
                                                                 
 dense_8 (Dense)             (None, 100)               80100     
                                                                 
 dense_9 (Dense)             (None, 13674)             1381074   
                                                                 
Total params: 4,142,492
Trainable params: 4,142,492
Non-trainable params: 0
_________________________________________________________________
None
Epoch 1/100
Epoch 2/100
Epoch 3/100
Epoch 4/100
Epoch 5/100
Epoch 6/100
Epoch 7/100
Epoch 8/100
Epoch 9/100
Epoch 10/100
Epoch 11/100
Epoch 12/100
Epoch 13/100
Epoch 14/10



INFO:tensorflow:Assets written to: ram://898a3a33-73b7-4a24-a9b7-ffa6e13ce13b/assets


INFO:tensorflow:Assets written to: ram://898a3a33-73b7-4a24-a9b7-ffa6e13ce13b/assets


In [156]:
# Trying different phrases:
## Hi how are you doing today sir
# 'I am so happy that' --> ''I am so happy that I am able to understand her'
# "I think that he" --> "I think that he might get sure new two arms"
# 'I am a student at' --> "I am a student at boston university i aim to be"
# "Raul was playing with his toy car" --> 'Raul was playing with his toy car in the company the company said'
# "I think there has been an error" --> "I think there has been an error in the interview the company s"
# "I am a dual degree" --> "I am a dual degree corps the company s the editorial"
# "Data Science and Machine Learning" --> "Data Science and Machine Learning tracking of the company s mother"
# "I dont know where to start" --> "I dont know where to start the other market the company said"
# "Artificial Intelligence is going to" --> "Artificial Intelligence is going to the two bloc of the u"
test = ['why is the belt below the buckle']


for t in test:
    example = tokenizer.texts_to_sequences([t])
    prediction = model.predict(np.array(example))
    predicted_word = np.argmax(prediction)
    reverse_word_map = dict(map(reversed, tokenizer.word_index.items()))  # https://stackoverflow.com/a/43927939/246508
    t1 = t + " " + reverse_word_map[predicted_word]

    word2 = tokenizer.texts_to_sequences([t1])
    prediction1 = model.predict(np.array(word2))
    predicted_word1 = np.argmax(prediction1)
    reverse_word_map = dict(map(reversed, tokenizer.word_index.items()))
    t2 = t1 + " " + reverse_word_map[predicted_word1]

    word2 = tokenizer.texts_to_sequences([t2])
    prediction1 = model.predict(np.array(word2))
    predicted_word1 = np.argmax(prediction1)
    reverse_word_map = dict(map(reversed, tokenizer.word_index.items()))
    t3 = t2 + " " + reverse_word_map[predicted_word1]

    word2 = tokenizer.texts_to_sequences([t3])
    prediction1 = model.predict(np.array(word2))
    predicted_word1 = np.argmax(prediction1)
    reverse_word_map = dict(map(reversed, tokenizer.word_index.items()))
    t4 = t3 + " " + reverse_word_map[predicted_word1]

    word3 = tokenizer.texts_to_sequences([t4])
    prediction1 = model.predict(np.array(word3))
    predicted_word1 = np.argmax(prediction1)
    reverse_word_map = dict(map(reversed, tokenizer.word_index.items()))
    t5 = t4 + " " + reverse_word_map[predicted_word1]

    word4 = tokenizer.texts_to_sequences([t5])
    prediction1 = model.predict(np.array(word4))
    predicted_word1 = np.argmax(prediction1)
    reverse_word_map = dict(map(reversed, tokenizer.word_index.items()))
    t6 = t5 + " " + reverse_word_map[predicted_word1]

    word5 = tokenizer.texts_to_sequences([t6])
    prediction1 = model.predict(np.array(word5))
    predicted_word1 = np.argmax(prediction1)
    reverse_word_map = dict(map(reversed, tokenizer.word_index.items()))
    t7 = t6 + " " + reverse_word_map[predicted_word1]

    word6 = tokenizer.texts_to_sequences([t7])
    prediction1 = model.predict(np.array(word6))
    predicted_word1 = np.argmax(prediction1)
    reverse_word_map = dict(map(reversed, tokenizer.word_index.items()))
    t8 = t7 + " " + reverse_word_map[predicted_word1]  

    word7 = tokenizer.texts_to_sequences([t8])
    prediction1 = model.predict(np.array(word7))
    predicted_word1 = np.argmax(prediction1)
    reverse_word_map = dict(map(reversed, tokenizer.word_index.items()))
    t9 = t8 + " " + reverse_word_map[predicted_word1]

    print(t6)
    #print ("{0} -> {1}".format(t, reverse_word_map[predicted_word]))


why is the belt below the buckle with the good level of the


#### Model 7: One LSTM layer with 1000 nodes and one fully-connected layer with 500 nodes

In [158]:
with tf.device('/device:GPU:0'):
  import numpy as np
  from keras.preprocessing.text import Tokenizer
  from tensorflow.keras.utils import to_categorical
  from keras.models import Sequential
  from keras.layers import Dense, Conv1D, Flatten
  from keras.layers import LSTM, SimpleRNN
  from keras.layers import Embedding, Dropout

  tokenizer = Tokenizer()
  tokenizer.fit_on_texts(sequences)
  sequ = tokenizer.texts_to_sequences(sequences)

  vocab_size = number_of_unique_tokens + 1

  sequences0 = np.array(sequ)
  X, y = sequences0[:,:-1], sequences0[:,-1]
  y = to_categorical(y, num_classes=vocab_size)

  model = Sequential()
  model.add(Embedding(vocab_size, sequence_length, input_length=sequence_length))
  model.add(LSTM(1000))
  model.add(Dense(500, activation='relu'))
  model.add(Dense(vocab_size, activation='softmax'))
  
  print(model.summary())

  model.compile(loss='categorical_crossentropy', optimizer='Adam', metrics=['accuracy'])

  model.fit(X, y, batch_size=80, epochs=100)
  pickle.dump(model, open(path_in + "model7.pkl", "wb"))


Model: "sequential_6"
_________________________________________________________________
 Layer (type)                Output Shape              Param #   
 embedding_6 (Embedding)     (None, 7, 7)              95718     
                                                                 
 lstm_6 (LSTM)               (None, 1000)              4032000   
                                                                 
 dense_12 (Dense)            (None, 500)               500500    
                                                                 
 dense_13 (Dense)            (None, 13674)             6850674   
                                                                 
Total params: 11,478,892
Trainable params: 11,478,892
Non-trainable params: 0
_________________________________________________________________
None
Epoch 1/100
Epoch 2/100
Epoch 3/100
Epoch 4/100
Epoch 5/100
Epoch 6/100
Epoch 7/100
Epoch 8/100
Epoch 9/100
Epoch 10/100
Epoch 11/100
Epoch 12/100
Epoch 13/100
Epoch 14/



INFO:tensorflow:Assets written to: ram://ae9ad397-8607-4018-8188-900db0117a47/assets


INFO:tensorflow:Assets written to: ram://ae9ad397-8607-4018-8188-900db0117a47/assets


In [None]:
print (X.shape)
prediction = model.predict(X[0].reshape(1,sequence_length))
print (prediction.shape)
print (prediction)

In [21]:

# Trying different phrases:
## Hi how are you doing today sir
# 'I am a student at' --> "i am a student at columbia university studying rescue project i"
# "I am a dual degree" --> "I am a dual degree master s student at i am"
# "my name is" --> "my name is shreyans kothari and i am a dual degree masters"
# "Data Science and Machine Learning" --> "Data Science and Machine Learning data collection instruments art from 2 the mindless is"
# "Artificial Intelligence is going to" --> "Artificial Intelligence is going to you today is you a others as you big"
test = ['My name is']


for t in test:
    example = tokenizer.texts_to_sequences([t])
    prediction = model.predict(np.array(example))
    predicted_word = np.argmax(prediction)
    reverse_word_map = dict(map(reversed, tokenizer.word_index.items()))  # https://stackoverflow.com/a/43927939/246508
    t1 = t + " " + reverse_word_map[predicted_word]

    word2 = tokenizer.texts_to_sequences([t1])
    prediction1 = model.predict(np.array(word2))
    predicted_word1 = np.argmax(prediction1)
    reverse_word_map = dict(map(reversed, tokenizer.word_index.items()))
    t2 = t1 + " " + reverse_word_map[predicted_word1]

    word2 = tokenizer.texts_to_sequences([t2])
    prediction1 = model.predict(np.array(word2))
    predicted_word1 = np.argmax(prediction1)
    reverse_word_map = dict(map(reversed, tokenizer.word_index.items()))
    t3 = t2 + " " + reverse_word_map[predicted_word1]

    word2 = tokenizer.texts_to_sequences([t3])
    prediction1 = model.predict(np.array(word2))
    predicted_word1 = np.argmax(prediction1)
    reverse_word_map = dict(map(reversed, tokenizer.word_index.items()))
    t4 = t3 + " " + reverse_word_map[predicted_word1]

    word3 = tokenizer.texts_to_sequences([t4])
    prediction1 = model.predict(np.array(word3))
    predicted_word1 = np.argmax(prediction1)
    reverse_word_map = dict(map(reversed, tokenizer.word_index.items()))
    t5 = t4 + " " + reverse_word_map[predicted_word1]

    word4 = tokenizer.texts_to_sequences([t5])
    prediction1 = model.predict(np.array(word4))
    predicted_word1 = np.argmax(prediction1)
    reverse_word_map = dict(map(reversed, tokenizer.word_index.items()))
    t6 = t5 + " " + reverse_word_map[predicted_word1]

    word5 = tokenizer.texts_to_sequences([t6])
    prediction1 = model.predict(np.array(word5))
    predicted_word1 = np.argmax(prediction1)
    reverse_word_map = dict(map(reversed, tokenizer.word_index.items()))
    t7 = t6 + " " + reverse_word_map[predicted_word1]

    word6 = tokenizer.texts_to_sequences([t7])
    prediction1 = model.predict(np.array(word6))
    predicted_word1 = np.argmax(prediction1)
    reverse_word_map = dict(map(reversed, tokenizer.word_index.items()))
    t8 = t7 + " " + reverse_word_map[predicted_word1]  

    word7 = tokenizer.texts_to_sequences([t8])
    prediction1 = model.predict(np.array(word7))
    predicted_word1 = np.argmax(prediction1)
    reverse_word_map = dict(map(reversed, tokenizer.word_index.items()))
    t9 = t8 + " " + reverse_word_map[predicted_word1]

    print(t9)
    #print ("{0} -> {1}".format(t, reverse_word_map[predicted_word]))


My name is shreyans kothari and i am a dual degree masters


Model 7 performed the best out of all the models. It achieved an accuracy rate of 0.985- the highest of all. The predicitons from the model support the high accuracy rate; the model learned from the training text really well. When I input "My name is" the model outputs "My name is Shreyans Kothari and I am a dual degree..."; I have used that same sentence in a lot of cover letters/emails. However, it still isn't able to give outputs that make contextual and grammatical sense when I input random words/phrases like "Artificial Intelligence is going to". This just goes to say that the model is fit really well on my data but is not generlizable enough to perform good on other texts as well. 

### 5) Best Model:
Model 7 is the best model of all the model architectures I employed because it fulfills the purpose for which it was created. I hoped to develop a model that would understand my style and learn from my past writings. The results from model 7 conclude that it does just that. The model is very personalized to me . It outputs phrases that I could use in emails/cover letters and other pieces that I write in the future. When I input "My name is", it outputs "my name is shreyans kothari and i am a dual degree masters..", words I would definitely use in a sentence that begins with "My name is". Attaching this model (or a slighly better version) to an app like my email browser will allow me to infuse efficiency into my work by automating mundane tasks like responding to emails.