#### **LSTM for next word predictions**

`Thanos` is a an iconic Marvel villian who brought Earth's mightiest herous, `the Avengers` to their knees when he wiped out half of the population to correct what he terms as a resource imbalance and lack of gratitude. Thanos has some pretty iconic quotes that we will try and train a model to recreate.

**Dependencies**

In [46]:
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import LSTM, Dense,Dropout, Bidirectional, Embedding
from tensorflow.keras.regularizers import l2
from tensorflow.keras.preprocessing.text import Tokenizer
from nltk.tokenize import sent_tokenize
import numpy as np

**Data Import and Model Building**

We will use a collcetion of the most famous Thanos quotes as a our corpus, pre-process and generate skipgrams to use as input output pairs for our model.

In [47]:
file_path = r"C:\Documents\Thanos quotes.txt"

In [48]:
text = []
for i in range(200):
    with open(file=file_path) as file:
        for line in file:
            if line.startswith("Thanos"):
                continue
            text.append(line.strip())

In [49]:
corpus = " ".join(text)

In [50]:
sents = sent_tokenize(corpus)
sents

['- "Fun isnâ€™t something one considers when balancing the universe.',
 'But thisâ€¦ does put a smile on my face."',
 '- "Dread it.',
 'Run from it.',
 'Destiny arrives all the same."',
 '- "The hardest choices require the strongest wills."',
 '- "Youâ€™re strong.',
 'But I could snap my fingers, and youâ€™d all cease to exist."',
 '- "Perfectly balanced, as all things should be."',
 '- "Reality is often disappointing."',
 '- "In time, you will know what itâ€™s like to lose, to feel so desperately that youâ€™re right, yet to fail all the same."',
 '- "I ignored my destiny once.',
 'I cannot do that again."',
 '- "A small price to pay for salvation."',
 '- "I hope they remember you."',
 '- "I am inevitable."',
 '- "You could not live with your own failure.',
 'And where did that bring you?',
 'Back to me."',
 '- "I will shred this universe down to its last atom and create a new one, a grateful universe."',
 '- "I donâ€™t even know who you are."',
 '- "Your planet was on the brink of co

In [51]:
tokenizer = Tokenizer(num_words=200,filters='""-.\n')
tokenizer.fit_on_texts(sents)

In [52]:
seqs = tokenizer.texts_to_sequences(sents)

In [53]:
max_len = max([len(i) for i in seqs])

**Generating inputs and outputs**

In [54]:
Xs, ys = [], []
window_size = 2

for seq in seqs:
    for i in range(len(seq) - window_size):
        if i+1<len(seq):
            Xs.append(seq[i:i+window_size])
            ys.append(seq[i+window_size])


In [55]:
vocab_size = len(tokenizer.word_index.values()) + 1

In [56]:
X = np.array(Xs)
y = np.array(ys)

**Building the neural network**

In [57]:
model = Sequential([
    Embedding(vocab_size,output_dim=256,),
    Bidirectional(LSTM(128,return_sequences=True)),
    Dropout(0.2),
    LSTM(100),
    Dense(64,activation='relu',kernel_regularizer=l2(0.01)),
    Dense(vocab_size,activation='softmax')
])

In [58]:
model.compile(optimizer='adam',loss='sparse_categorical_crossentropy',metrics=['accuracy'])

In [59]:
model.fit(X,y,epochs=50)

Epoch 1/50
[1m857/857[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m16s[0m 11ms/step - accuracy: 0.5912 - loss: 2.4225
Epoch 2/50
[1m857/857[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m8s[0m 10ms/step - accuracy: 1.0000 - loss: 0.1982
Epoch 3/50
[1m857/857[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m9s[0m 10ms/step - accuracy: 1.0000 - loss: 0.1092
Epoch 4/50
[1m857/857[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m9s[0m 10ms/step - accuracy: 1.0000 - loss: 0.0708
Epoch 5/50
[1m857/857[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m9s[0m 10ms/step - accuracy: 1.0000 - loss: 0.0490
Epoch 6/50
[1m857/857[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m9s[0m 11ms/step - accuracy: 1.0000 - loss: 0.0353
Epoch 7/50
[1m857/857[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m10s[0m 10ms/step - accuracy: 1.0000 - loss: 0.0272
Epoch 8/50
[1m857/857[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m10s[0m 11ms/step - accuracy: 1.0000 - loss: 0.0220
Epoch 9/50
[1m857/857[0m [

<keras.src.callbacks.history.History at 0x13a15ecd040>

In [60]:
model.summary()

In [61]:
seed_text = "Fun isn't"

In [62]:
word_index = {}
for key, value in tokenizer.word_index.items():
    word_index[value] = key

In [63]:
pred_text = seed_text

In [64]:
for _ in range(20):
    seed_seq = np.array(tokenizer.texts_to_sequences([seed_text]))
    seed_seq = seed_seq.reshape(seed_seq.shape[0],seed_seq.shape[-1],1)
    pred = np.argmax(model.predict(seed_seq))
    if pred in tokenizer.word_index.values():
        seed_text += " " + word_index[pred]
        pred_text += " " + word_index[pred]

[1m1/1[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m1s[0m 1s/step
[1m1/1[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m1s[0m 1s/step
[1m1/1[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 90ms/step
[1m1/1[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 88ms/step
[1m1/1[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 98ms/step
[1m1/1[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 95ms/step
[1m1/1[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 98ms/step
[1m1/1[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 93ms/step
[1m1/1[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 100ms/step
[1m1/1[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 87ms/step
[1m1/1[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 99ms/step
[1m1/1[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 93ms/step
[1m1/1[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 95ms/step
[1m1/1[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 94ms/

In [65]:
pred_text

"Fun isn't something something something considers considers often disappointing disappointing disappointing considers often disappointing disappointing disappointing disappointing considers often disappointing disappointing disappointing"