### Data used
### https://github.com/rishabh89007/Time_Series_Datasets/blob/main/Nuclear%20Capacity.csv
### We are going to use LSTM to generate a summary of text (corpus)

In [11]:
import numpy as np
import tensorflow as tf
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import Embedding, LSTM, Dense
from tensorflow.keras.preprocessing.text import Tokenizer
from tensorflow.keras.preprocessing.sequence import pad_sequences
from nltk.translate.bleu_score import sentence_bleu

# Sample text corpus
corpus = """The evolution of Apple iPhones has been nothing short of remarkable. 
It all began in 2007 when Steve Jobs unveiled the very first iPhone, revolutionizing the smartphone industry. 
With its sleek design, multi-touch screen, and the introduction of the App Store, it set a new standard for mobile devices. 
Over the years, iPhones have continually pushed the boundaries of technology. 
From the iPhone 4's Retina display to the iPhone 5's Lightning connector, each iteration brought something new and exciting. 
The iPhone 6 and 6 Plus introduced larger screens, while the iPhone X incorporated facial recognition technology. 
Today, the iPhone continues to be a symbol of innovation, with features like the A15 Bionic chip, 5G connectivity, and the ProRAW camera capabilities.
As we look to the future, it's clear that Apple's commitment to evolution and excellence will ensure that iPhones remain at the forefront of the smartphone industry.
One of the most noteworthy advancements in recent iPhone models is the introduction of ProRAW capabilities. 
With ProRAW, Apple has given photographers and videographers a powerful tool to capture and edit images like never before. 
ProRAW allows users to capture photos in a format that retains all the data from the camera's sensor, providing an incredible level of flexibility in post-processing. 
This means you can adjust aspects like exposure, color balance, and even individual elements of the image with unprecedented precision. 
The combination of ProRAW and the iPhone's computational photography features ensures that you can capture stunning images in challenging lighting conditions. 
It's a game-changer for both amateur photographers and professionals, as it enables them to push the boundaries of their creativity and produce truly remarkable photos and videos with their iPhones."""

# Tokenize the text and convert it to sequences
# Remember the concept of toeknizing from our lesson on data representation
tokenizer = Tokenizer()
tokenizer.fit_on_texts([corpus])
total_words = len(tokenizer.word_index) + 1

input_sequences = []
for line in corpus.split('\n'):
    token_list = tokenizer.texts_to_sequences([line])[0]
    for i in range(1, len(token_list)):
        n_gram_sequence = token_list[:i+1]
        input_sequences.append(n_gram_sequence)

# Pad sequences for consistent input size
max_sequence_length = max([len(x) for x in input_sequences])
input_sequences = pad_sequences(input_sequences, maxlen=max_sequence_length, padding='pre')

# Create input and target data
X, y = input_sequences[:,:-1], input_sequences[:,-1]

# Convert target to one-hot encoding
y = tf.keras.utils.to_categorical(y, num_classes=total_words)

# Define the LSTM model
model = Sequential([
    Embedding(total_words, 100, input_length=max_sequence_length-1),
    LSTM(150),
    Dense(total_words, activation='softmax')
])

# Compile the model
model.compile(loss='categorical_crossentropy', optimizer='adam')

# Train the model
model.fit(X, y, epochs=100, verbose=2)

# Function to generate text
def generate_text(seed_text, next_words, model, max_sequence_length):
    generated_text = seed_text
    for _ in range(next_words):
        token_list = tokenizer.texts_to_sequences([seed_text])[0]
        token_list = pad_sequences([token_list], maxlen=max_sequence_length-1, padding='pre')
        predicted_probs = model.predict(token_list, verbose=0)[0]
        predicted_index = np.argmax(predicted_probs)
        
        # Find the corresponding word for the predicted index
        output_word = None
        for word, index in tokenizer.word_index.items():
            if index == predicted_index:
                output_word = word
                break
        
        # If the output word is found, append it to the generated text
        if output_word is not None:
            seed_text += " " + output_word
            generated_text += " " + output_word
        else:
            break  # Stop generating if no word is found
        
    return generated_text


Epoch 1/100
9/9 - 1s - loss: 5.1500 - 842ms/epoch - 94ms/step
Epoch 2/100
9/9 - 0s - loss: 5.0864 - 163ms/epoch - 18ms/step
Epoch 3/100
9/9 - 0s - loss: 4.8822 - 158ms/epoch - 18ms/step
Epoch 4/100
9/9 - 0s - loss: 4.8214 - 163ms/epoch - 18ms/step
Epoch 5/100
9/9 - 0s - loss: 4.7801 - 170ms/epoch - 19ms/step
Epoch 6/100
9/9 - 0s - loss: 4.7554 - 161ms/epoch - 18ms/step
Epoch 7/100
9/9 - 0s - loss: 4.7144 - 162ms/epoch - 18ms/step
Epoch 8/100
9/9 - 0s - loss: 4.6812 - 167ms/epoch - 19ms/step
Epoch 9/100
9/9 - 0s - loss: 4.6318 - 160ms/epoch - 18ms/step
Epoch 10/100
9/9 - 0s - loss: 4.5970 - 160ms/epoch - 18ms/step
Epoch 11/100
9/9 - 0s - loss: 4.5251 - 160ms/epoch - 18ms/step
Epoch 12/100
9/9 - 0s - loss: 4.4488 - 158ms/epoch - 18ms/step
Epoch 13/100
9/9 - 0s - loss: 4.3755 - 160ms/epoch - 18ms/step
Epoch 14/100
9/9 - 0s - loss: 4.2798 - 159ms/epoch - 18ms/step
Epoch 15/100
9/9 - 0s - loss: 4.1991 - 162ms/epoch - 18ms/step
Epoch 16/100
9/9 - 0s - loss: 4.1114 - 170ms/epoch - 19ms/step
E

In [14]:
generate_text("The evolution of the iphone is", 10, model, max_sequence_length)

'The evolution of the iphone is been nothing short of remarkable remarkable with like the a15'

### Funny :-) play around with the parameters of the generate_text function