# <b><p style="background-color: #ff6200; font-family:calibri; color:white; font-size:100%; font-family:Verdana; text-align:center; border-radius:15px 50px;">Task 39-> Implement with TensorFlow/Keras (RNN)</p>

## <span style='color:#ff6200'> Importing Libraries</span>

In [1]:
import numpy as np
import tensorflow as tf
from tensorflow.keras.preprocessing.text import Tokenizer
from tensorflow.keras.preprocessing.sequence import pad_sequences
import requests

## <span style='color:#ff6200'> Sample Data</span>

In [2]:
url = 'https://storage.googleapis.com/download.tensorflow.org/data/shakespeare.txt'
response = requests.get(url)
text = response.text

print(text[:500])

First Citizen:
Before we proceed any further, hear me speak.

All:
Speak, speak.

First Citizen:
You are all resolved rather to die than to famish?

All:
Resolved. resolved.

First Citizen:
First, you know Caius Marcius is chief enemy to the people.

All:
We know't, we know't.

First Citizen:
Let us kill him, and we'll have corn at our own price.
Is't a verdict?

All:
No more talking on't; let it be done: away, away!

Second Citizen:
One word, good citizens.

First Citizen:
We are accounted poor


## <span style='color:#ff6200'>Pre processing</span>

In [3]:
sentences = text.split('\n')

max_words = 10000
tokenizer = Tokenizer(num_words=max_words)
tokenizer.fit_on_texts(sentences)
total_words = min(max_words, len(tokenizer.word_index) + 1)

sequences = tokenizer.texts_to_sequences(sentences)

input_sequences = []
for seq in sequences:
    for i in range(1, len(seq)):
        n_gram_sequence = seq[:i+1]
        input_sequences.append(n_gram_sequence)

max_sequence_length = 30
input_sequences = pad_sequences(input_sequences, maxlen=max_sequence_length, padding='pre')

In [4]:
X, y = input_sequences[:, :-1], input_sequences[:, -1]

y = np.array(y)

## <span style='color:#ff6200'>Build Model</span>

In [5]:
model = tf.keras.Sequential([
    tf.keras.layers.Embedding(total_words, 50),  
    tf.keras.layers.SimpleRNN(100, return_sequences=False),
    tf.keras.layers.Dense(total_words, activation='softmax')
])

model.compile(optimizer='adam', loss='sparse_categorical_crossentropy', metrics=['accuracy'])

In [6]:
model.fit(X, y, epochs=100, verbose=1)

Epoch 1/100
[1m5272/5272[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m207s[0m 39ms/step - accuracy: 0.0479 - loss: 6.8024
Epoch 2/100
[1m5272/5272[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m258s[0m 38ms/step - accuracy: 0.0956 - loss: 5.8860
Epoch 3/100
[1m5272/5272[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m207s[0m 39ms/step - accuracy: 0.1134 - loss: 5.5342
Epoch 4/100
[1m5272/5272[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m259s[0m 38ms/step - accuracy: 0.1255 - loss: 5.2535
Epoch 5/100
[1m5272/5272[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m260s[0m 38ms/step - accuracy: 0.1407 - loss: 5.0165
Epoch 6/100
[1m5272/5272[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m200s[0m 38ms/step - accuracy: 0.1526 - loss: 4.8038
Epoch 7/100
[1m5272/5272[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m206s[0m 39ms/step - accuracy: 0.1670 - loss: 4.6163
Epoch 8/100
[1m5272/5272[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m206s[0m 39ms/step - accuracy: 0.1863 - loss:

<keras.src.callbacks.history.History at 0x7843a862ce20>

The model is taking an excessive amount of time to train, which is not feasible given our current time constraints and computational resources. Prolonged training can lead to diminishing returns, with only marginal improvements in performance while significantly increasing costs and delays. Therefore, I've decided to halt further training. This decision will allow us to allocate resources more efficiently project objectives.

## <span style='color:#ff6200'>Predictions</span>

In [7]:
def predict_next_word(model, tokenizer, text, max_sequence_length):

    sequence = tokenizer.texts_to_sequences([text])[0]
    sequence = pad_sequences([sequence], maxlen=max_sequence_length-1, padding='pre')

    predicted_probabilities = model.predict(sequence, verbose=0)
    predicted_word_index = np.argmax(predicted_probabilities, axis=-1)

    index_to_word = {index: word for word, index in tokenizer.word_index.items()}
    predicted_word = index_to_word[predicted_word_index[0]]

    return predicted_word

In [8]:
input_text = 'to be'
predicted_word = predict_next_word(model, tokenizer, input_text, max_sequence_length)
print(f"Next word prediction: {predicted_word}")

Next word prediction: the


In [9]:
def generate_text(model, tokenizer, seed_text, max_sequence_length, num_words):
    generated_text = seed_text
    for _ in range(num_words):
        next_word = predict_next_word(model, tokenizer, generated_text, max_sequence_length)
        if next_word:
            generated_text += ' ' + next_word
        else:
            break
    return generated_text

In [10]:
seed_text = 'to be'
num_words_to_generate = 10
generated_line = generate_text(model, tokenizer, seed_text, max_sequence_length, num_words_to_generate)
print(f"Generated line: {generated_line}")

Generated line: to be the matter that we may have it so the mayor
