# Lab 4 — Text Generation using LSTM

This notebook implements a character-level LSTM model that learns patterns from a small AI corpus and generates new text.

Pipeline:
1. Load dataset
2. Character tokenization
3. Create sequences
4. Train LSTM model
5. Generate new text


In [None]:
!pip install tensorflow




In [None]:
import numpy as np
import tensorflow as tf
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import Dense, LSTM


In [None]:
text = """
artificial intelligence is transforming modern society.
it is used in healthcare finance education and transportation.
machine learning allows systems to improve automatically with experience.
data plays a critical role in training intelligent systems.
large datasets help models learn complex patterns.
deep learning uses multi layer neural networks.
neural networks are inspired by biological neurons.
each neuron processes input and produces an output.
training a neural network requires optimization techniques.
gradient descent minimizes the loss function.

natural language processing helps computers understand human language.
text generation is a key task in nlp.
language models predict the next word or character.
recurrent neural networks handle sequential data.
lstm and gru models address long term dependency problems.

transformer models changed the field of nlp.
they rely on self attention mechanisms.
attention allows the model to focus on relevant context.
transformers process data in parallel.
modern language models are based on transformers.

education is being improved using artificial intelligence.
intelligent tutoring systems personalize learning.
automated grading saves time for teachers.
online education platforms use recommendation systems.

ethical considerations are important in artificial intelligence.
fairness transparency and accountability must be ensured.
data privacy and security are major concerns.

text generation models can create stories poems and articles.
generated text should be meaningful and coherent.
continuous learning is essential in the field of ai.
""".lower()

print("Corpus length:", len(text))
print(text[:250])
text = """
artificial intelligence is transforming modern society.
it is used in healthcare finance education and transportation.
machine learning allows systems to improve automatically with experience.
data plays a critical role in training intelligent systems.
large datasets help models learn complex patterns.
deep learning uses multi layer neural networks.
neural networks are inspired by biological neurons.
each neuron processes input and produces an output.
training a neural network requires optimization techniques.
gradient descent minimizes the loss function.

natural language processing helps computers understand human language.
text generation is a key task in nlp.
language models predict the next word or character.
recurrent neural networks handle sequential data.
lstm and gru models address long term dependency problems.

transformer models changed the field of nlp.
they rely on self attention mechanisms.
attention allows the model to focus on relevant context.
transformers process data in parallel.
modern language models are based on transformers.

education is being improved using artificial intelligence.
intelligent tutoring systems personalize learning.
automated grading saves time for teachers.
online education platforms use recommendation systems.

ethical considerations are important in artificial intelligence.
fairness transparency and accountability must be ensured.
data privacy and security are major concerns.

text generation models can create stories poems and articles.
generated text should be meaningful and coherent.
continuous learning is essential in the field of ai.
""".lower()

print("Corpus length:", len(text))
print(text[:250])


Corpus length: 1611

artificial intelligence is transforming modern society.
it is used in healthcare finance education and transportation.
machine learning allows systems to improve automatically with experience.
data plays a critical role in training intelligent syste
Corpus length: 1611

artificial intelligence is transforming modern society.
it is used in healthcare finance education and transportation.
machine learning allows systems to improve automatically with experience.
data plays a critical role in training intelligent syste


In [None]:
# Get all unique characters
chars = sorted(list(set(text)))

# Create mappings
char2idx = {c:i for i,c in enumerate(chars)}
idx2char = {i:c for c,i in char2idx.items()}

vocab_size = len(chars)

print("Unique characters:", vocab_size)
print(chars)


Unique characters: 29
['\n', ' ', '.', 'a', 'b', 'c', 'd', 'e', 'f', 'g', 'h', 'i', 'j', 'k', 'l', 'm', 'n', 'o', 'p', 'q', 'r', 's', 't', 'u', 'v', 'w', 'x', 'y', 'z']


In [None]:
seq_length = 40
step = 3

sequences = []
next_chars = []

for i in range(0, len(text) - seq_length, step):
    sequences.append(text[i:i+seq_length])
    next_chars.append(text[i+seq_length])

print("Number of sequences:", len(sequences))
print("Example sequence:\n", sequences[0])


Number of sequences: 524
Example sequence:
 
artificial intelligence is transforming


In [None]:
X = np.zeros((len(sequences), seq_length, vocab_size), dtype=np.bool_)
y = np.zeros((len(sequences), vocab_size), dtype=np.bool_)

for i, seq in enumerate(sequences):
    for t, char in enumerate(seq):
        X[i, t, char2idx[char]] = 1
    y[i, char2idx[next_chars[i]]] = 1

print("X shape:", X.shape)
print("y shape:", y.shape)


X shape: (524, 40, 29)
y shape: (524, 29)


In [None]:
model = Sequential([
    LSTM(128, input_shape=(seq_length, vocab_size)),
    Dense(vocab_size, activation='softmax')
])

model.compile(
    loss='categorical_crossentropy',
    optimizer='adam'
)

model.summary()


  super().__init__(**kwargs)


In [None]:
history = model.fit(
    X, y,
    batch_size=64,
    epochs=30
)


Epoch 1/30
[1m9/9[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m2s[0m 12ms/step - loss: 3.3589
Epoch 2/30
[1m9/9[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 8ms/step - loss: 3.1923 
Epoch 3/30
[1m9/9[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 8ms/step - loss: 2.9841 
Epoch 4/30
[1m9/9[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 9ms/step - loss: 2.9593 
Epoch 5/30
[1m9/9[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 8ms/step - loss: 2.9556 
Epoch 6/30
[1m9/9[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 8ms/step - loss: 2.9110 
Epoch 7/30
[1m9/9[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 8ms/step - loss: 2.9351 
Epoch 8/30
[1m9/9[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 8ms/step - loss: 2.8788 
Epoch 9/30
[1m9/9[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 8ms/step - loss: 2.8829 
Epoch 10/30
[1m9/9[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 8ms/step - loss: 2.8576 
Epoch 11/30
[1m9/9

In [None]:
def sample(preds, temperature=1.0):
    preds = np.asarray(preds).astype("float64")
    preds = np.log(preds + 1e-8) / temperature
    exp_preds = np.exp(preds)
    preds = exp_preds / np.sum(exp_preds)
    return np.random.choice(len(preds), p=preds)


In [None]:
def generate_text(seed, length=300, temperature=0.4):
    generated = seed

    for _ in range(length):
        x_pred = np.zeros((1, seq_length, vocab_size))

        for t, char in enumerate(seed):
            x_pred[0, t, char2idx[char]] = 1

        preds = model.predict(x_pred, verbose=0)[0]
        next_index = sample(preds, temperature)
        next_char = idx2char[next_index]

        generated += next_char
        seed = seed[1:] + next_char

    return generated


In [None]:
seed = text[:40]
print(generate_text(seed, temperature=0.3))



artificial intelligence is transforming ts ae tte  ae  ati an a e ii ae tian ta atri in atas t ae   t iae  ie s tea  od iati  ae e  ae  te  e t aatt ata ta o tat  aeanian aarane ane  ee isle see mese s mese te iom ae te  t i a te at o re  aee n saen nuase aatin ate t aiaiio ted t t e ie  ttn  iee n tas aron te atle aat a tan a ttan   ea 


In [None]:
seed = text[:40]

print("Temperature 0.2 (safe):\n")
print(generate_text(seed, temperature=0.2))

print("\n\nTemperature 0.5 (balanced):\n")
print(generate_text(seed, temperature=0.5))

print("\n\nTemperature 0.8 (creative):\n")
print(generate_text(seed, temperature=0.8))


Temperature 0.2 (safe):


artificial intelligence is transforming ee ate an e at i ae n ata a t an  a ae  ao  ae  ati an an tat ao  at ate   ate an t ae t  ae n ae  ine se se ees aee te ae tae  ie iatae  at t  a  e   al   e  an ian ate  ane a ne tean arate e ae se en ae eat ate at an ae a to  ae ae t at  in ate a t o at  at  e ae in aen ane aee n ee aesaten aee t


Temperature 0.5 (balanced):


artificial intelligence is transformings useinn seee  t teian t la poe on  e ce taien s iseseem nes dlpe n  e  s a th  i n t  te i arnesdahat tte attan sma itee  n em s aes i msiamel in tee d ies nes tte auaen rae nn eesne aue ameesanae araen tacims iesato tiai is d to te  n an  e akid na rsaan e tate m tod ad tca  aet i da i eamn n met 


Temperature 0.8 (creative):


artificial intelligence is transformingdsse la mhoa to nioodaaintt cnpe tmihaas atai  noitasmimeanmetnta psn ua  oe ciiidieentden ttpano  ruashtt  cgin atteain souis methmopme icsom mtee e
spnn d sdaanl an rn nenariaanenaeg 
sludt

## Observations — LSTM Text Generation

The model successfully learned character patterns from the corpus.

Temperature effects:
• Low temperature (0.2): repetitive and safe text
• Medium temperature (0.5): partially readable words
• High temperature (0.8): creative but less coherent

Because the dataset is very small, the model learns spelling patterns better than full language structure.
This is expected behavior for character-level LSTM models.

##Temperature is a sampling parameter that controls randomness and creativity during text generation.
