

**Language Modeling** is like teaching a computer to finish your sentences. Imagine you start writing a sentence, "The weather today is...", and someone jumps in to complete it with "...sunny." That's what a language model does; it predicts the next word(s) based on the words that come before. This model learns from a vast amount of text to understand how words naturally come together in human language.

**Text Generation** takes this a step further. Using a language model, it can create entirely new sentences or even paragraphs that sound like they could have been written by a human. It's like giving the computer a theme and watching it write its own story or article. It works by utilizing algorithms and language models to process input data and generate output text. It involves training AI models on large datasets of text to learn patterns, grammar, and contextual information. These models then use this learned knowledge to generate new text based on given prompts or conditions.

**Do Large Language Models (LLMs) Work on the Same Principle?**

Yes, Large Language Models (LLMs) like GPT (Generative Pretrained Transformer) work on the same principle but at a much larger scale and complexity. They're trained on extensive text data, enabling them to generate more coherent, diverse, and contextually relevant text. These models have a deeper understanding of language nuances, can maintain context over longer stretches of text, and can even mimic specific writing styles.
At the core of text generation are language models, such as GPT (Generative Pre-trained Transformer) and Google’s PaLM, which have been trained on vast amounts of text data from the internet. These models employ deep learning techniques, specifically neural networks, to understand the structure of sentences and generate coherent and contextually relevant text.

During the text generation process, the AI model takes a seed input, such as a sentence or a keyword, and uses its learned knowledge to predict the most probable next words or phrases. The model continues to generate text, incorporating context and coherence, until a desired length or condition is met.

In [17]:
import tensorflow as tf
import numpy as np

# Download the dataset
path_to_file = tf.keras.utils.get_file('shakespeare.txt', 'https://storage.googleapis.com/download.tensorflow.org/data/shakespeare.txt')


In [18]:
# Read the text
text = open(path_to_file, 'rb').read().decode(encoding='utf-8')

# Create a mapping from unique characters to indices
vocab = sorted(set(text))
char2idx = {u:i for i, u in enumerate(vocab)}
idx2char = np.array(vocab)

# The text mapped to int
text_as_int = np.array([char2idx[c] for c in text])


In [19]:
# Set the maximum length for a single input
seq_length = 100
examples_per_epoch = len(text)//(seq_length+1)

# Create training examples / targets
char_dataset = tf.data.Dataset.from_tensor_slices(text_as_int)

# Create sequences
sequences = char_dataset.batch(seq_length+1, drop_remainder=True)

# Split into input and target text
def split_input_target(chunk):
    input_text = chunk[:-1]
    target_text = chunk[1:]
    return input_text, target_text

dataset = sequences.map(split_input_target)

# Shuffle and batch the data
BATCH_SIZE = 64
BUFFER_SIZE = 10000

dataset = dataset.shuffle(BUFFER_SIZE).batch(BATCH_SIZE, drop_remainder=True)


In [20]:
vocab_size = len(vocab)  # Length of the vocabulary
embedding_dim = 256      # The embedding dimension
rnn_units = 1024         # Number of RNN units

# Define the model
model = tf.keras.Sequential([
    tf.keras.layers.Embedding(vocab_size, embedding_dim),
    # Ensure GRU layer is aware it's stateful; actual batch size defined during training
    tf.keras.layers.GRU(rnn_units, return_sequences=True, stateful=True, recurrent_initializer='glorot_uniform'),
    tf.keras.layers.Dense(vocab_size),
])

# Due to limitations in specifying the batch size directly in the model architecture without causing errors,
# ensure your training and generation phases accommodate the statefulness of the model.

# Example: Running a single batch through the model to build it
for input_example_batch, target_example_batch in dataset.take(1):
    # Just a forward pass to build the model
    example_batch_predictions = model(input_example_batch)

# Now, try printing the model summary again
model.summary()


In [21]:
def loss(labels, logits):
    return tf.keras.losses.sparse_categorical_crossentropy(labels, logits, from_logits=True)

model.compile(optimizer='adam', loss=loss)

# Training step
EPOCHS=10
history = model.fit(dataset, epochs=EPOCHS)


Epoch 1/10
[1m172/172[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m123s[0m 699ms/step - loss: 3.0876
Epoch 2/10
[1m172/172[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m125s[0m 719ms/step - loss: 1.9244
Epoch 3/10
[1m172/172[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m124s[0m 714ms/step - loss: 1.6457
Epoch 4/10
[1m172/172[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m124s[0m 717ms/step - loss: 1.5101
Epoch 5/10
[1m172/172[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m125s[0m 719ms/step - loss: 1.4252
Epoch 6/10
[1m172/172[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m127s[0m 734ms/step - loss: 1.3649
Epoch 7/10
[1m172/172[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m128s[0m 738ms/step - loss: 1.3221
Epoch 8/10
[1m172/172[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m133s[0m 769ms/step - loss: 1.2866
Epoch 9/10
[1m172/172[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m140s[0m 811ms/step - loss: 1.2549
Epoch 10/10
[1m172/172[0m [32m━━━━━━━━━━━━━━━━━━━━[

In [45]:
def generate_text(model, start_string, num_generate=1000):
    # Converting start_string to numbers (vectorizing)
    input_eval = [char2idx[c] for c in start_string]
    input_eval = tf.expand_dims(input_eval, 0)  # Expand to 2D for the model input, shape: (1, len(input_eval))

    # List to store the generated text
    text_generated = []

    # Ensuring the internal state is reset
    #model.reset_states()

    for i in range(num_generate):
        predictions = model(input_eval)
        # predictions shape is (batch_size, sequence_length, vocab_size)
        # We use the last character from the last time step
        predictions = predictions[:, -1, :]  # Shape: (1, vocab_size)
        predicted_id = tf.random.categorical(predictions, num_samples=1)[0,0].numpy()

        # Prepare the input for the next pass
        input_eval = tf.expand_dims([predicted_id], 0)

        # Save the generated character
        text_generated.append(idx2char[predicted_id])

    return start_string + ''.join(text_generated)

# Generate text
print(generate_text(model, start_string=u"JULIET: "))


JULIET: you as that lies us answer a quarteran slave's tongue
And thus I said Kate to be to enocure he,
Procurve kill why, the dishonour news,
Did no man did wrangling write life
From the vault and like mine, fare you work?
And hath charitenced my sons.

Signior Grelion:
Help me, in mistrust, ann words,
That Edward should be pulled us.

LEONTES:
Go, cithe design, or else he is
A thing to learn to favours in this land,
Poor ears to Liceo oftend 'gainst Menenius
With closely counten and suspicion
As thought my clivaring world corrage warrant?
Who both is fair wreth, and shalt be ng clothe ere
doth gain both rime nights: thou shaming stumplers' record.

QUEEN MARGARET:
An andave us, I would yield do well
But what woe doth husband you promised
The provost blosd behome against the easter lend the very light:
And soon bribe and leave in bribe, our sight son
would flesh up the tempest to his friends.

VALET:
Ay, or else you fit your part, God have young
Both Strobt villain: I will attend thy 