<a href="https://colab.research.google.com/github/sutharimanikanta/-technity-tasks-/blob/main/A_Text_Generation_Model_.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Below is the process we can follow for the task of building a Text Generation Model:

* Understand what you want to achieve with the text generation model (e.g., chatbot responses, creative writing, code generation).
* Consider the style, complexity, and length of the text to be generated.
Collect a large dataset of text that’s representative of the style and content you want to generate.
* Clean the text data (remove unwanted characters, correct spellings), and preprocess it (tokenization, lowercasing, removing stop words if necessary).
Choose a deep neural network architecture to handle sequences for text generation.
* Frame the problem as a sequence modelling task where the model learns to predict the next words in a sequence.
Use your text data to train the model.

# Text Generation Model using Python

In [None]:
import tensorflow as tf
import tensorflow_datasets as tfds
import numpy as np
dataset,info=tfds.load('tiny_shakespeare', with_info=True, as_supervised=False)

Downloading and preparing dataset Unknown size (download: Unknown size, generated: 1.06 MiB, total: 1.06 MiB) to /root/tensorflow_datasets/tiny_shakespeare/1.0.0...


Dl Completed...: 0 url [00:00, ? url/s]

Dl Size...: 0 MiB [00:00, ? MiB/s]

Generating splits...:   0%|          | 0/3 [00:00<?, ? splits/s]

Generating train examples...:   0%|          | 0/1 [00:00<?, ? examples/s]

Shuffling /root/tensorflow_datasets/tiny_shakespeare/1.0.0.incompleteLN6V80/tiny_shakespeare-train.tfrecord*..…

Generating validation examples...:   0%|          | 0/1 [00:00<?, ? examples/s]

Shuffling /root/tensorflow_datasets/tiny_shakespeare/1.0.0.incompleteLN6V80/tiny_shakespeare-validation.tfreco…

Generating test examples...:   0%|          | 0/1 [00:00<?, ? examples/s]

Shuffling /root/tensorflow_datasets/tiny_shakespeare/1.0.0.incompleteLN6V80/tiny_shakespeare-test.tfrecord*...…

Dataset tiny_shakespeare downloaded and prepared to /root/tensorflow_datasets/tiny_shakespeare/1.0.0. Subsequent calls will reuse this data.


# conversion into numeric form

In [None]:
# Get the first batch of text from the training dataset
text = next(iter(dataset['train']))['text'].numpy().decode('utf-8')

# Create a sorted list of unique characters in the text
vocab = sorted(set(text))

# Create a mapping from characters to their index
char2idx = {char: idx for idx, char in enumerate(vocab)}

# Create an array that maps indices to characters
idx2char = np.array(vocab)

# Convert the text to a sequence of integers
text_as_int = np.array([char2idx[c] for c in text])

# Set the length of each input sequence
seq_length = 100

# Calculate the number of sequences we can generate from the text
examples_per_epoch = len(text) // (seq_length + 1)

# Create a TensorFlow Dataset object from the sequence of integers
char_dataset = tf.data.Dataset.from_tensor_slices(text_as_int)

# Group the integers into sequences of the specified length
sequences = char_dataset.batch(seq_length + 1, drop_remainder=True)


In [None]:
# Define a function to split the input and target text
def split_input_target(chunk):
    # The input text is all characters except the last one
    input_text = chunk[:-1]
    # The target text is all characters except the first one
    target_text = chunk[1:]
    return input_text, target_text

# Apply the split_input_target function to each sequence in the dataset
dataset = sequences.map(split_input_target)


In [None]:
# Set the batch size for training
BATCH_SIZE = 64

# Set the buffer size for shuffling the dataset
BUFFER_SIZE = 10000

# Prepare the dataset for training
dataset = (
    dataset
    # Shuffle the dataset with the specified buffer size
    .shuffle(BUFFER_SIZE)
    # Batch the dataset with the specified batch size, dropping the last batch if it's smaller than the batch size
    .batch(BATCH_SIZE, drop_remainder=True)
    # Prefetch batches to improve performance
    .prefetch(tf.data.experimental.AUTOTUNE)
)


In [None]:
# Define the size of the vocabulary
vocab_size = len(vocab)

# Define the dimensionality of the embedding layer
embedding_dim = 256

# Define the number of units in the LSTM layer
rnn_units = 1024

# Define the batch size
batch_size = BATCH_SIZE

# Define a function to build the RNN model
def build_model(vocab_size, embedding_dim, rnn_units, batch_size):
    # Create a Sequential model
    model = tf.keras.Sequential([
        # Embedding layer: Maps each character index to a dense vector of embedding_dim dimensions
        tf.keras.layers.Embedding(vocab_size, embedding_dim, batch_input_shape=[batch_size, None]),

        # LSTM layer: Long Short-Term Memory (LSTM) with rnn_units units
        # Returns sequences to be used in the next LSTM layer
        # Uses stateful=True to maintain state between batches
        # Uses recurrent_initializer='glorot_uniform' for better initialization of recurrent weights
        tf.keras.layers.LSTM(rnn_units, return_sequences=True, stateful=True, recurrent_initializer='glorot_uniform'),

        # Dense layer: A fully connected layer with vocab_size units
        tf.keras.layers.Dense(vocab_size)
    ])

    return model

# Build the model
model = build_model(vocab_size, embedding_dim, rnn_units, BATCH_SIZE)


In [None]:
# Define the loss function
def loss(labels, logits):
    # Use sparse categorical crossentropy loss
    return tf.keras.losses.sparse_categorical_crossentropy(labels, logits, from_logits=True)
# Compile the model
model.compile(optimizer='adam', loss=loss)


In [None]:
import os
checkpoint_dir = './training_checkpoints'
checkpoint_prefix=os.path.join(checkpoint_dir,"ckpt_{epoch}")
checkpoint_callback=tf.keras.callbacks.ModelCheckpoint(
    filepath=checkpoint_prefix,
    save_weights_only=True
)
EPOCHS =10
history =model.fit(dataset,epochs=EPOCHS,callbacks=[checkpoint_callback])

Epoch 1/10
Epoch 2/10
Epoch 3/10
Epoch 4/10
Epoch 5/10
Epoch 6/10
Epoch 7/10
Epoch 8/10
Epoch 9/10
Epoch 10/10


In [None]:
# Rebuild the model for inference with a batch size of 1
model = build_model(vocab_size, embedding_dim, rnn_units, batch_size=1)

# Load the weights of the trained model from the latest checkpoint
model.load_weights(tf.train.latest_checkpoint(checkpoint_dir))

# Build the model with a specific input shape for inference
model.build(tf.TensorShape([1, None]))


In [None]:
def generate_text(model, start_string):
    # Number of characters to generate
    num_generate = 1000

    # Convert the start string to indices using char2idx mapping
    input_eval = [char2idx[s] for s in start_string]
    input_eval = tf.expand_dims(input_eval, 0)  # Add a batch dimension

    # Initialize an empty list to store the generated text
    text_generated = []

    # Reset the model's state
    model.reset_states()

    # Generate text
    for i in range(num_generate):
        # Get predictions from the model
        predictions = model(input_eval)

        # Remove the batch dimension and get the predictions for the last character
        predictions = tf.squeeze(predictions, 0)

        # Sample the next character from the probability distribution
        predicted_id = tf.random.categorical(predictions, num_samples=1)[-1, 0].numpy()

        # Add the predicted character to the generated text
        text_generated.append(idx2char[predicted_id])

        # Use the predicted character as the input for the next prediction
        input_eval = tf.expand_dims([predicted_id], 0)

    # Combine the start string and the generated text
    return (start_string + ''.join(text_generated))

# Example usage
print(generate_text(model, start_string=u"QUEEN: So, lets end this"))


QUEEN: So, lets end this
Lord ClipOLED:
Nay, good my consul. Rone of me.
There should she should be about man into law the Tower,
Or Joy aUple a disolveren, brother,
The citizens the duke disprisent,
Renowned him till I will hear it like a gentle entertainung of his three,
That we disdain'd it out of it as any title
Look, to be made to end;
And never lawful lords of this land's charge?

First Polrmunater:
Patience it, and I
be even concealed dit, and say 'tis o, my young uncle: gentlemen.

GLOUCESTER:
Came bethow the day, I of the accent, would thrust a
linel-governy.

GLOUCESTER:
No warrant here doubliun, and bug dearly age:
I do not need at such a happy since
I
Nay! how to murderes the pault the fair contrary.

SICINIUS:
Wewher it got!

MARCIUS:
Be so? Well please toward years it we to him:
For times, your honour yet, my deaths of my
majesty-bedger in threces,
Which didst be new-won upon bewith us unlengation.
She is healthly.
As least and longing place.
Rale.

LUCIO:
I know our disp

In [None]:
print(generate_text(model, start_string=u"QUEEN:O, rath"))


QUEEN:O, rather be robed by done?

Page:
This, may it is the
strange and broks in rose would sing,
Whilst yet I thought the leason of ChaRDIUS:
You have resides as those the spirit of your silence!

KING RICHARD III:
Away to it, and she was absent bed.
USERSEY:
Ay, my brother Romeo:
I was clouding to sendisg him to kin.
For London Clirature Was writ in Clifford's heart;
Safe yours a pluck o'er the carempt are
Being one gueds o' the noble
Off I could deny thee in Rome children's mett!

DUKE VINCENTIO:
The ridges of this one three feasts never be
Forced to Lady's sake to meet my brother.

Lord:
Now a favour, give me at the bask,
Highton your lips, these Volscian so
its speech: for we were heard it encied,
And yet I can leate thy secret subrets, minusant lands:
Yet to mine honour,
To know or morsolver?

LARTIUS:
My Lord of Norfolk, by your having, thou becoved and selfuring;
His name hath closs the treader deny remember:
Straweth Probbanca:
How dons these thousand?
ISABELLA:
'Thalived, my