# Lab 4 — Transformer Based Text Generation

This notebook implements a mini Transformer model for word-level text generation.

Pipeline:
1. Load dataset
2. Word tokenization
3. Create n-gram sequences
4. Positional encoding
5. Transformer block
6. Train model
7. Generate text


In [1]:
!pip install tensorflow




In [2]:
import numpy as np
import tensorflow as tf
from tensorflow.keras.preprocessing.text import Tokenizer
from tensorflow.keras.preprocessing.sequence import pad_sequences


In [3]:
text = """
artificial intelligence is transforming modern society.
it is used in healthcare finance education and transportation.
machine learning allows systems to improve automatically with experience.
data plays a critical role in training intelligent systems.
large datasets help models learn complex patterns.
deep learning uses multi layer neural networks.
neural networks are inspired by biological neurons.
each neuron processes input and produces an output.
training a neural network requires optimization techniques.
gradient descent minimizes the loss function.

natural language processing helps computers understand human language.
text generation is a key task in nlp.
language models predict the next word or character.
recurrent neural networks handle sequential data.
lstm and gru models address long term dependency problems.

transformer models changed the field of nlp.
they rely on self attention mechanisms.
attention allows the model to focus on relevant context.
transformers process data in parallel.
modern language models are based on transformers.

education is being improved using artificial intelligence.
intelligent tutoring systems personalize learning.
automated grading saves time for teachers.
online education platforms use recommendation systems.

ethical considerations are important in artificial intelligence.
fairness transparency and accountability must be ensured.
data privacy and security are major concerns.

text generation models can create stories poems and articles.
generated text should be meaningful and coherent.
continuous learning is essential in the field of ai.
""".lower()

print("Corpus length:", len(text))


Corpus length: 1611


In [4]:
tokenizer = Tokenizer()
tokenizer.fit_on_texts([text])

total_words = len(tokenizer.word_index) + 1

print("Total vocabulary size:", total_words)


Total vocabulary size: 151


In [5]:
input_sequences = []

for line in text.split("\n"):
    token_list = tokenizer.texts_to_sequences([line])[0]

    for i in range(1, len(token_list)):
        n_gram = token_list[:i+1]
        input_sequences.append(n_gram)

print("Total sequences:", len(input_sequences))
print(input_sequences[:5])


Total sequences: 188
[[12, 13], [12, 13, 4], [12, 13, 4, 31], [12, 13, 4, 31, 19], [12, 13, 4, 31, 19, 32]]


In [6]:
max_len = max(len(seq) for seq in input_sequences)

input_sequences = pad_sequences(input_sequences, maxlen=max_len, padding='pre')

X = input_sequences[:, :-1]
y = input_sequences[:, -1]

# one-hot encode output
y = tf.keras.utils.to_categorical(y, num_classes=total_words)

print("X shape:", X.shape)
print("y shape:", y.shape)


X shape: (188, 8)
y shape: (188, 151)


In [7]:
def positional_encoding(length, depth):
    depth = depth / 2

    positions = np.arange(length)[:, np.newaxis]
    depths = np.arange(depth)[np.newaxis, :] / depth

    angle_rates = 1 / (10000**depths)
    angle_rads = positions * angle_rates

    pos_encoding = np.concatenate(
        [np.sin(angle_rads), np.cos(angle_rads)],
        axis=-1
    )

    return tf.cast(pos_encoding, dtype=tf.float32)


In [8]:
from tensorflow.keras.layers import LayerNormalization, MultiHeadAttention, Dense

def transformer_block(x, embed_dim, num_heads, ff_dim):

    attention = MultiHeadAttention(
        num_heads=num_heads,
        key_dim=embed_dim
    )(x, x)

    x = LayerNormalization(epsilon=1e-6)(x + attention)

    ffn = Dense(ff_dim, activation="relu")(x)
    ffn = Dense(embed_dim)(ffn)

    x = LayerNormalization(epsilon=1e-6)(x + ffn)

    return x


In [9]:
from tensorflow.keras.layers import Input, Embedding, GlobalAveragePooling1D
from tensorflow.keras.models import Model

embed_dim = 64

inputs = Input(shape=(max_len-1,))
x = Embedding(total_words, embed_dim)(inputs)

# add positional encoding
pos_encoding = positional_encoding(max_len-1, embed_dim)
x = x + pos_encoding

# transformer encoder block
x = transformer_block(x, embed_dim, num_heads=2, ff_dim=128)

x = GlobalAveragePooling1D()(x)
outputs = Dense(total_words, activation="softmax")(x)

model = Model(inputs, outputs)

model.compile(
    loss="categorical_crossentropy",
    optimizer="adam",
    metrics=["accuracy"]
)

model.summary()


In [10]:
history = model.fit(
    X, y,
    epochs=200,
    verbose=1
)


Epoch 1/200
[1m6/6[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m8s[0m 497ms/step - accuracy: 0.0030 - loss: 5.2286
Epoch 2/200
[1m6/6[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 7ms/step - accuracy: 0.0110 - loss: 4.9804     
Epoch 3/200
[1m6/6[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 8ms/step - accuracy: 0.0394 - loss: 4.8635 
Epoch 4/200
[1m6/6[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 7ms/step - accuracy: 0.0100 - loss: 4.7694     
Epoch 5/200
[1m6/6[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 8ms/step - accuracy: 0.0259 - loss: 4.8127     
Epoch 6/200
[1m6/6[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 7ms/step - accuracy: 0.0405 - loss: 4.7069 
Epoch 7/200
[1m6/6[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 7ms/step - accuracy: 0.0309 - loss: 4.7276     
Epoch 8/200
[1m6/6[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 7ms/step - accuracy: 0.0363 - loss: 4.7382 
Epoch 9/200
[1m6/6[0m [32m━━━━━━━━━━

In [11]:
def generate_transformer(seed_text, next_words=25):

    for _ in range(next_words):
        token_list = tokenizer.texts_to_sequences([seed_text])[0]
        token_list = pad_sequences([token_list], maxlen=max_len-1, padding='pre')

        predicted = np.argmax(model.predict(token_list, verbose=0))

        output_word = ""
        for word, index in tokenizer.word_index.items():
            if index == predicted:
                output_word = word
                break

        seed_text += " " + output_word

    return seed_text


In [12]:
print(generate_transformer("artificial intelligence"))
print()
print(generate_transformer("transformer models"))


artificial intelligence is transforming modern society society are based on transformers transformers be models can for teachers modern on self attention mechanisms allows the model to focus

transformer models changed the field of nlp be meaningful and coherent coherent systems to improve automatically with experience experience on self attention mechanisms allows the model to


## Observations — Transformer Text Generation

The Transformer model generates more meaningful and structured sentences compared to the LSTM model.

Observations:
• Words are correctly formed
• Sentences are partially meaningful
• Context is preserved better than character-level LSTM
• Repetition still occurs due to small dataset size

The model successfully learned word relationships and sentence structure even with limited training data.


## Comparison — LSTM vs Transformer

| Feature | LSTM Model | Transformer Model |
|---|---|---|
| Tokenization | Character-level | Word-level |
| Training Speed | Slow | Faster |
| Context Handling | Limited memory | Better context awareness |
| Output Quality | Mostly characters | Meaningful sentences |
| Modern Usage | Older approach | State-of-the-art |

Conclusion:
Transformer models perform significantly better for text generation because they use self-attention to capture long-range dependencies.


## Conclusion

In this lab, we implemented two text generation approaches:

1. Character-level LSTM model
2. Transformer-based model

Key Learnings:
• Text must be tokenized before training neural networks
• Sequence models learn patterns to predict next tokens
• Temperature controls creativity during generation
• Transformers outperform RNN/LSTM for NLP tasks

The experiment demonstrates why modern AI systems rely on Transformer architectures for language modeling.
