## This project demonstrates how to use an LSTM (Long Short-Term Memory) neural network for generating meaningful text based on user input. It also includes temperature control to affect the creativity of the output text.



### 1. `import numpy as np`

* **Numpy** is a core library for numerical computing in Python.
* `np` is the commonly used alias for numpy.
* Useful functions: arrays, math operations, random generation, etc.

---

### 2. `import tensorflow as tf`

* **TensorFlow** is an open-source library for machine learning and deep learning.
* `tf` is the standard alias.

---

### 3. `from tensorflow.keras.models import Sequential`

* Imports the **Sequential model class** from Keras.
* It is used to **build models layer by layer**, where each layer has exactly one input tensor and one output tensor.

 
### 4. `from tensorflow.keras.layers import LSTM, Dense, Embedding`

These are **types of layers** that can be added to a Sequential model:

#### `LSTM`:

* Long Short-Term Memory layer.
* A type of RNN (Recurrent Neural Network) used to process **sequential data** like text or time series.
* Captures long-term dependencies in sequences.

#### `Dense`:

* Fully connected layer.
* Each input node connects to each output node.
* Used at the end of the network (e.g., for classification or regression output).

#### `Embedding`:

* Used in NLP.
* Converts **integer-encoded words** into dense vectors of fixed size (word embeddings).
* First layer often used in NLP models.



### 5. `from tensorflow.keras.preprocessing.sequence import pad_sequences`

* `pad_sequences` is used to **ensure all sequences have the same length** by:

  * Padding shorter sequences with zeros (or another value).
  * Truncating longer sequences.



### 6. `from tensorflow.keras.preprocessing.text import Tokenizer`

* `Tokenizer` is used to:

  * Vectorize a text corpus.
  * Convert text into sequences of integers (word indices).
  * Create a word index dictionary.


### 7. `import random`

* Python’s built-in **random module**.
* Used for generating random numbers, selecting random items, etc.


### Summary Table:

| Component       | Purpose                             |
| --------------- | ----------------------------------- |
| `np`            | Numerical computing with arrays     |
| `tf`            | Deep learning with TensorFlow       |
| `Sequential`    | Model architecture builder          |
| `LSTM`          | Sequence learning (e.g., text/time) |
| `Dense`         | Fully connected layer               |
| `Embedding`     | Word embedding (vectorizing text)   |
| `pad_sequences` | Equal-length sequence padding       |
| `Tokenizer`     | Convert text to numerical tokens    |
| `random`        | General-purpose randomness          |



In [1]:
import numpy as np
import tensorflow as tf
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import LSTM, Dense, Embedding
from tensorflow.keras.preprocessing.sequence import pad_sequences
from tensorflow.keras.preprocessing.text import Tokenizer
import random

In [2]:
padded = pad_sequences([[1, 2, 3], [4, 5]], maxlen=4)

padded

array([[0, 1, 2, 3],
       [0, 0, 4, 5]])

## Temperature 

### Temperature is a tuning parameter used to control the randomness of predictions in text generation. It changes the distribution of predicted next words before sampling.
- min value of temp is close to 0 (e.g.0.2)
- max value of temp can be any positive value, but most of the time it is set between 1 and 2. 

hello: 0.6  
world: 0.2  
this: 0.1  
deep: 0.1

If temperature = 1.0 (default):
Keeps probabilities as they are

Sampling is balanced and creative

#### If temperature = 0.1 (low):
- Sharpens the distribution
- Makes the model very confident — selects high-probability words


#### If temperature = 1.5 (high):
- Flattens the distribution
- Makes the model more random — even low-probability words can be picked

## Mathematically:

preds = np.log(preds + 1e-8) / temperature

- Low temperature ⇒ logits (scores before softmax) become sharper ⇒ top prediction dominates.

- High temperature ⇒ logits get flatter ⇒ more exploration.



In [3]:
# Imports
import numpy as np
import tensorflow as tf
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import LSTM, Dense, Embedding
from tensorflow.keras.preprocessing.text import Tokenizer
from tensorflow.keras.preprocessing.sequence import pad_sequences

# Corpus (Poetic)
corpus = [
    "This world a canvas vast and wide",
    "Where dreams and hopes in hearts reside",
    "With every dawn a chance to grow",
    "In sunlit fields or moonlit glow",
    "Through trials faced and joys embraced",
    "We find our path our steps retraced",
    "In unity we rise and stand",
    "Together we shape this wondrous land"
]

# Tokenization
tokenizer = Tokenizer()
tokenizer.fit_on_texts(corpus)
total_words = len(tokenizer.word_index) + 1

# Create input sequences
input_sequences = []
for line in corpus:
    token_list = tokenizer.texts_to_sequences([line.lower()])[0]
    for i in range(1, len(token_list)):
        n_gram_sequence = token_list[:i+1]
        input_sequences.append(n_gram_sequence)

# Padding
max_sequence_len = max([len(x) for x in input_sequences])
input_sequences = pad_sequences(input_sequences, maxlen=max_sequence_len, padding='pre')
input_sequences = np.array(input_sequences)

# Predictors and labels
X = input_sequences[:, :-1]
y = tf.keras.utils.to_categorical(input_sequences[:, -1], num_classes=total_words)

# Model
model = Sequential()
model.add(Embedding(total_words, 10, input_length=max_sequence_len - 1))
model.add(LSTM(100))
model.add(Dense(total_words, activation='softmax'))

model.compile(loss='categorical_crossentropy', optimizer='adam', metrics=['accuracy'])
model.fit(X, y, epochs=500, verbose=0)  # Train silently




<keras.src.callbacks.history.History at 0x241fd545c50>

### Temperature-Based Text Generation Function

In [4]:
def sample_with_temperature(preds, temperature=1.0):
    preds = np.asarray(preds).astype("float64")
    preds = np.log(preds + 1e-8) / temperature
    exp_preds = np.exp(preds)
    preds = exp_preds / np.sum(exp_preds)
    probas = np.random.multinomial(1, preds, 1)
    return np.argmax(probas)

def generate_text(seed_text, next_words, model, max_sequence_len, temperature=1.0):
    for _ in range(next_words):
        token_list = tokenizer.texts_to_sequences([seed_text.lower()])[0]
        token_list = pad_sequences([token_list], maxlen=max_sequence_len - 1, padding='pre')
        predictions = model.predict(token_list, verbose=0)[0]
        predicted_index = sample_with_temperature(predictions, temperature)
        
        output_word = ""
        for word, index in tokenizer.word_index.items():
            if index == predicted_index:
                output_word = word
                break
        seed_text += " " + output_word
    return seed_text


In [7]:
seed_text = "in"
print("Temp 0.2:", generate_text(seed_text, next_words=8, model=model, max_sequence_len=max_sequence_len, temperature=0.2))
print("Temp 0.7:", generate_text(seed_text, next_words=10, model=model, max_sequence_len=max_sequence_len, temperature=0.7))
print("Temp 1.2:", generate_text(seed_text, next_words=11, model=model, max_sequence_len=max_sequence_len, temperature=1.2))


Temp 0.2: in sunlit fields or moonlit glow wide reside grow
Temp 0.7: in unity we rise and stand embraced land retraced reside reside
Temp 1.2: in unity we or and stand embraced land retraced reside reside hearts


### Rhyming Enhancement
We can post-process the generated text to rhyme the last word of each line. One way is to use the pronouncing library (based on CMU Pronouncing Dictionary).

In [None]:
### pip install pronouncing

In [16]:

# Imports
import numpy as np
import tensorflow as tf
import pronouncing
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import LSTM, Dense, Embedding
from tensorflow.keras.preprocessing.text import Tokenizer
from tensorflow.keras.preprocessing.sequence import pad_sequences

# Poetic Corpus
corpus = [
    "This world a canvas vast and wide",
    "Where dreams and hopes in hearts reside",
    "With every dawn a chance to grow",
    "In sunlit fields or moonlit glow",
    "Through trials faced and joys embraced",
    "We find our path our steps retraced",
    "In unity we rise and stand",
    "Together we shape this wondrous land"
]

# Tokenizer
tokenizer = Tokenizer()
tokenizer.fit_on_texts(corpus)
total_words = len(tokenizer.word_index) + 1

# Input Sequences
input_sequences = []
for line in corpus:
    token_list = tokenizer.texts_to_sequences([line.lower()])[0]
    for i in range(1, len(token_list)):
        n_gram_sequence = token_list[:i+1]
        input_sequences.append(n_gram_sequence)

# Padding
max_sequence_len = max([len(x) for x in input_sequences])
input_sequences = pad_sequences(input_sequences, maxlen=max_sequence_len, padding='pre')
input_sequences = np.array(input_sequences)

# X and y
X = input_sequences[:, :-1]
y = tf.keras.utils.to_categorical(input_sequences[:, -1], num_classes=total_words)

# Model
model = Sequential()
model.add(Embedding(total_words, 10, input_length=max_sequence_len - 1))
model.add(LSTM(100))
model.add(Dense(total_words, activation='softmax'))
model.compile(loss='categorical_crossentropy', optimizer='adam', metrics=['accuracy'])

# Train
model.fit(X, y, epochs=300, verbose=0)


<keras.src.callbacks.history.History at 0x24185b81690>

## Text Generation with Temperature & Rhyming

In [17]:
# Temperature Sampling
def sample_with_temperature(preds, temperature=1.0):
    preds = np.asarray(preds).astype("float64")
    preds = np.log(preds + 1e-8) / temperature
    exp_preds = np.exp(preds)
    preds = exp_preds / np.sum(exp_preds)
    return np.random.choice(len(preds), p=preds)

# Format as Verse
def format_as_verse(text, words_per_line=5):
    words = text.strip().split()
    return "\n".join([" ".join(words[i:i+words_per_line]) for i in range(0, len(words), words_per_line)])

# Generate Text
def generate_text(seed_text, next_words, model, max_sequence_len, temperature=1.0, rhyme_last_word=None):
    for _ in range(next_words):
        token_list = tokenizer.texts_to_sequences([seed_text.lower()])[0]
        token_list = pad_sequences([token_list], maxlen=max_sequence_len - 1, padding='pre')
        predictions = model.predict(token_list, verbose=0)[0]

        # Sample word with temperature
        next_index = sample_with_temperature(predictions, temperature)
        output_word = ""
        for word, index in tokenizer.word_index.items():
            if index == next_index:
                output_word = word
                break

        seed_text += " " + output_word

    # Format
    lines = format_as_verse(seed_text).split("\n")

    # Rhyme adjustment (optional: only on last word of every 2nd line)
    if rhyme_last_word:
        last_words = [line.split()[-1] for line in lines]
        rhymes = pronouncing.rhymes(rhyme_last_word)
        for i in range(1, len(lines), 2):
            if i < len(lines):
                new_last_word = next((r for r in rhymes if r in tokenizer.word_index), None)
                if new_last_word:
                    words = lines[i].split()
                    words[-1] = new_last_word
                    lines[i] = " ".join(words)
    return "\n".join(lines)


## Generate a Rhymed Poem

In [18]:
seed_text = "with every"
output = generate_text(seed_text, next_words=20, model=model, max_sequence_len=max_sequence_len, temperature=0.8, rhyme_last_word="glow")
print("Generated Poem:\n")
print(output)


Generated Poem:

with every dawn a chance
to grow grow wide grow
wide wide wide wide hearts
hearts hearts wide reside grow
reside reside


### Rhyming Word Injection Too Early
- Cause: The rhyme_last_word is being introduced too early or overriding natural generation.

- Fix:Apply rhyming correction after generation, only to the last word of each line.

### Add Filtering for Repeats
Add a small logic to block the same word from being repeated too many times in a row:

In [19]:
def filter_repetitions(text):
    words = text.split()
    filtered = []
    for i, word in enumerate(words):
        if i >= 2 and word == words[i-1] == words[i-2]:
            continue  # Skip word if repeated 3 times
        filtered.append(word)
    return " ".join(filtered)


In [20]:
def generate_poem(seed_text, next_words, model, max_sequence_len, temperature=1.0, words_per_line=5, rhyme_every=2):
    text = seed_text

    for _ in range(next_words):
        token_list = tokenizer.texts_to_sequences([text.lower()])[0]
        token_list = pad_sequences([token_list], maxlen=max_sequence_len - 1, padding='pre')
        predictions = model.predict(token_list, verbose=0)[0]

        next_index = sample_with_temperature(predictions, temperature)
        output_word = ""
        for word, index in tokenizer.word_index.items():
            if index == next_index:
                output_word = word
                break

        text += " " + output_word

    # Filter repetitive patterns
    text = filter_repetitions(text)

    # Format as lines
    lines = format_as_verse(text, words_per_line=words_per_line).split("\n")

    # Rhyming adjustment for every nth line
    for i in range(rhyme_every - 1, len(lines), rhyme_every):
        last_word = lines[i].split()[-1]
        rhymes = pronouncing.rhymes(last_word)
        if rhymes:
            candidate = next((r for r in rhymes if r in tokenizer.word_index), None)
            if candidate:
                words = lines[i].split()
                words[-1] = candidate
                lines[i] = " ".join(words)

    return "\n".join(lines)


In [21]:
poem = generate_poem("through trials", next_words=30, model=model, max_sequence_len=max_sequence_len, temperature=1.2)
print("Generated Poem:\n")
print(poem)


Generated Poem:

through trials faced and joys
embraced reside reside steps embraced
hearts wide land wide wide
retraced hearts land land reside
wide retraced wide hearts wide
wide reside wide
