**Character-Level Text Generation with Vanilla RNN** 

**Goal**: Train an RNN to generate text character by character.

**Tasks**
Implement a simple RNN from scratch using NumPy (or PyTorch/TensorFlow for automatic differentiation).\
Train on a small text corpus (e.g., Shakespeare, Wikipedia snippets, or song lyrics).\
Experiment with:\
	- Different hidden layer sizes\
	- Sequence lengths (how many characters to unroll)\
	- Temperature sampling (controlling randomness in generation).\
Compare outputs between trained and untrained models.
  
**Expected Outcome**:  
The model should generate somewhat coherent (but imperfect) text.
Understand vanishing gradients and why simple RNNs struggle with long sequences.  


# Exploration

![alt text](https://upload.wikimedia.org/wikipedia/commons/b/b5/Recurrent_neural_network_unfold.svg)

In [1]:
import tensorflow as tf
from tensorflow.keras import Sequential


model_VanillaRNN = Sequential()
model_VanillaRNN.add(tf.keras.layers.Input(shape=(None, 1))) # Assuming every character is a single feature and also time t
model_VanillaRNN.add(tf.keras.layers.Dense(64, activation='tanh'))

# will have to bulid a custom model

In [2]:
data = tf.ones((1,5,))
dense = tf.keras.layers.Dense(5, activation='tanh')
dense.build((None, 5))
dense.set_weights([tf.ones((5, 5)), tf.zeros((5,))])
dense(data)

# seems like a good way apply layers

<tf.Tensor: shape=(1, 5), dtype=float32, numpy=
array([[0.99990916, 0.99990916, 0.99990916, 0.99990916, 0.99990916]],
      dtype=float32)>

In [3]:
import tensorflow as tf
from tensorflow.keras.layers import Input, Dense

class VanillaRNN:
    def __init__(self):
        pass
    def fit(self):
        self.input = tf.ones((10,10,))

        self.layer1 = tf.keras.layers.Dense(10, activation='tanh')
        self.layer1.build((None, 10))
        print(self.layer1(self.input))
# made it a class

In [4]:
v = VanillaRNN()
v.fit()

tf.Tensor(
[[ 0.48476276 -0.356358   -0.36308432 -0.9673964   0.9025267  -0.74758697
  -0.47762728  0.3305781  -0.9375403   0.97452015]
 [ 0.48476276 -0.356358   -0.36308432 -0.9673964   0.9025267  -0.74758697
  -0.47762728  0.3305781  -0.9375403   0.97452015]
 [ 0.48476276 -0.356358   -0.36308432 -0.9673964   0.9025267  -0.74758697
  -0.47762728  0.3305781  -0.9375403   0.97452015]
 [ 0.48476276 -0.356358   -0.36308432 -0.9673964   0.9025267  -0.74758697
  -0.47762728  0.3305781  -0.9375403   0.97452015]
 [ 0.48476276 -0.356358   -0.36308432 -0.9673964   0.9025267  -0.74758697
  -0.47762728  0.3305781  -0.9375403   0.97452015]
 [ 0.48476276 -0.356358   -0.36308432 -0.9673964   0.9025267  -0.74758697
  -0.47762728  0.3305781  -0.9375403   0.97452015]
 [ 0.48476276 -0.356358   -0.36308432 -0.9673964   0.9025267  -0.74758697
  -0.47762728  0.3305781  -0.9375403   0.97452015]
 [ 0.48476276 -0.356358   -0.36308432 -0.9673964   0.9025267  -0.74758697
  -0.47762728  0.3305781  -0.9375403   0

In [5]:
import tensorflow as tf
from tensorflow.keras.layers import Input, Dense

class VanillaRNN:
    def __init__(self):
        self.hidden_state1 = None

    def fit(self, text=[0,1,1,0,1,0,1,1,0,1]):
        self.input = tf.ones((10,10,))

        self.layer1 = Dense(10, activation='linear')
        self.layer1.build((None, 10))

        for t in text:
            self.hidden_state1 = tf.tanh(self.layer1() + self.layer1(self.hidden_state1))

        return self.layer1(self.hidden_state1)
# Need to research the structure of hidden states

In [6]:
import numpy as np
import tensorflow as tf
from tensorflow.keras.layers import Dense

class VanillaRNN:
    def __init__(self):
        self.A = 10
        self.B = 10
        self.time = 20

        self.hidden_state1 = np.random.random((1, self.A))  # 1xA
        self.layer1 = Dense(self.B, activation='linear')  # B

    def fit(self, text=None):
        # Test data
        text = np.random.random((self.time, 1, self.A))  # Tx1xA

        for Ti in range(self.time):
            s1 = self.layer1(text[Ti])  # 1xA x AxB -> 1xB
            s2 = self.layer1(self.hidden_state1)  #  1xA x AxB -> 1xB
            
            self.hidden_state1 = tf.tanh(s1 + s2)  # 1xB

        return self.layer1(self.hidden_state1)  # 1xB
    
    # Need to update weights and fix update for hidden state


In [7]:
model = VanillaRNN()
output = model.fit()
print(output)

tf.Tensor(
[[ 0.95217055  0.18014412  0.09028164  0.18123281 -0.01465768 -0.5539251
  -0.6086999  -1.163328   -0.21046673  0.97643435]], shape=(1, 10), dtype=float32)


In [8]:
import numpy as np
import tensorflow as tf
from tensorflow.keras.layers import Dense

class VanillaRNN:
    def __init__(self):
        self.A = 10
        self.B = 10
        self.time = 20

        self.hidden_state1 = np.random.random((1, self.A))  # 1xA
        self.layer1 = Dense(self.B, activation='linear')  # all x B

    def fit(self, text=None):
        # Test data
        text = np.random.random((self.time, 1, self.A))  # Tx1xA

        for Ti in range(self.time):
            s1 = self.layer1(text[Ti])  # 1xA x AxB -> 1xB
            s2 = self.layer1(self.hidden_state1)  #  1xA x AxB -> 1xB
            
            self.hidden_state1 = tf.tanh(s1 + s2)  # 1xB

        return self.layer1(self.hidden_state1)  # 1xB
    
    # Need to update weights and fix update for hidden state
    # Moreover this whole model is a single layer and doesn't seem extendable
    # So it would seem that we would need a different approach to build a RNN

**Review**

The above code looks something like a hidden state update. This model does not fullfil **REQ-003: CONFIGUREABLE HIDDEN LAYER SIZES** among others.

**Possible models for mutiple layers:**
- List of layer-objects
- List of layer-matrices /  Tensor of layer-matrices
- Using some tensorflow specific architecture for multiple layers or networks


**Possible solutions:**
- Take a look at functional keras API 
-  Subclass the model class
    - Use custom call and init 




In [9]:
import numpy as np
import tensorflow as tf
class HiddenState(tf.keras.layers.Layer):
    def __init__(self, units, activation=None):
        super(HiddenState, self).__init__()
        self.units = units
        
        self.hidden_state = self.add_weight(
            shape=(1, units),
            initializer="random_normal" # or zeros etc
        )


    def call(self, inputs):
        return tf.matmul(inputs, self.hidden_state)

In [10]:
A = 10
B = 5

batch_size = 25
input_layer = tf.random.normal((batch_size, 1)) # Ax1

hidden_state1 = HiddenState(units=A)  # 1xA

layer1 = Dense(units=B, activation='linear')  # B x all
layer1.build((None, A))

print(input_layer.shape)
x = hidden_state1(input_layer)
print(x.shape) 
x = layer1(x) # 25xA *xB -> 25xB
print(x.shape) 
# output : 25xB

(25, 1)
(25, 10)
(25, 5)


In [11]:
x

<tf.Tensor: shape=(25, 5), dtype=float32, numpy=
array([[ 0.009206  ,  0.02962599,  0.00361145, -0.00293425,  0.00607665],
       [ 0.01516538,  0.04880397,  0.00594927, -0.0048337 ,  0.01001029],
       [-0.00819033, -0.02635743, -0.00321301,  0.00261052, -0.00540623],
       [ 0.01466852,  0.04720503,  0.00575436, -0.00467534,  0.00968232],
       [-0.0245201 , -0.07890856, -0.00961907,  0.00781536, -0.0161851 ],
       [-0.00754231, -0.02427203, -0.0029588 ,  0.00240398, -0.00497849],
       [-0.00832139, -0.02677922, -0.00326442,  0.0026523 , -0.00549274],
       [ 0.00589585,  0.01897356,  0.0023129 , -0.0018792 ,  0.00389171],
       [-0.02860088, -0.09204098, -0.01121993,  0.00911603, -0.01887872],
       [-0.01550188, -0.04988687, -0.00608128,  0.00494095, -0.0102324 ],
       [-0.02481974, -0.07987285, -0.00973662,  0.00791086, -0.01638289],
       [-0.00486389, -0.01565257, -0.00190807,  0.00155028, -0.00321053],
       [ 0.01525478,  0.04909168,  0.00598434, -0.0048622 ,  0.

In [12]:
hidden_state1.hidden_state

<Variable path=hidden_state/variable, shape=(1, 10), dtype=float32, value=[[ 0.04654356 -0.02881666 -0.02108963 -0.00619787 -0.05357994  0.02723821
   0.00670306 -0.04887664 -0.00453065 -0.01233678]]>

Now we need to define the model and its call and then compile it

In [13]:
import numpy as np
import tensorflow as tf


class HiddenState(tf.keras.layers.Layer):
    def __init__(self, units, activation=None):
        super(HiddenState, self).__init__()
        self.units = units
        
        self.hidden_state = self.add_weight(
            shape=(1, units),
            initializer="random_normal" # or zeros etc
        )


    def call(self, inputs):
        return tf.matmul(inputs, self.hidden_state)
    

class VanillaRNN(tf.keras.Model):
    def __init__(self, n_hidden_layers : int, units_per_layer : list):
        super(VanillaRNN, self).__init__()
        self.layers = []
        for _, units in enumerate(units_per_layer):
            self.layers.append(Dense(units, activation='tanh'))

        self.hidden_state = HiddenState(units=units_per_layer[0])

    def call(self, inputs):
        pass

**Review**
- Apparently every layer must have its own hidden state. That mean we have two options:
    - Make a rnn-layer after incl hidden state based on Layer or Dense
    - Make a list of hidden states for the layers and incl it in the model


In [None]:
import numpy as np
import tensorflow as tf


class HiddenState(tf.keras.layers.Layer):
    def __init__(self, units, activation=None):
        super(HiddenState, self).__init__()
        self.units = units
        
        self.hidden_state = self.add_weight(
            shape=(1, units),
            initializer="random_normal" # or zeros etc
        )


    def call(self, inputs):
        return tf.matmul(inputs, self.hidden_state)
    

class VanillaRNN(tf.keras.Model):
    def __init__(self, n_hidden_layers : int, units_per_layer : list):
        super(VanillaRNN, self).__init__()

        self.layers = []
        self.hidden_states = []

        for _, units in enumerate(units_per_layer):
            self.layers.append(Dense(units, activation='tanh'))
            self.hidden_states.append(HiddenState(units=units))


    def call(self, inputs):
        """
        Procedure in psuedo code:
        at time step t:
            update hidden states as you forward propogate
            
            out of model:
                compile with adam, and sparse crossentropy loss
                optimize with fit


        Update hidden state:
            h =  tanh(Wx * x + Wh * h + b)
        """
        

Apparently there needs to be two weights in each layer:
- W_h for hidden state
- W_x for inputs

In [None]:
import numpy as np
import tensorflow as tf


class LayerRNN(tf.keras.layers.Layer):
    def __init__(self, units, activation=None):
        super(LayerRNN, self).__init__()
        self.units = units

        self.W_x = None
        self.W_h = None

        self.hidden_state = None
        
        self.b = None

    def build(self, input_shape):

        batch_size = input_shape[0]
        input_shape = input_shape[-1]


        self.W_x = self.add_weight(
            shape=(input_shape, self.units),
            initializer="random_normal", # or zeros etc
            trainable=True,
        )

        self.W_h = self.add_weight(
            shape=(self.units, self.units),
            initializer="random_normal", # or zeros etc
            trainable=True,
        )

        self.hidden_state = tf.zeros((batch_size, self.units))

        self.b = self.add_weight(
            shape=(self.units,),
            initializer='zeros',
            trainable=True
        )


    def call(self, inputs, reset_state=False):
        if reset_state:
            self.hidden_state = tf.zeros_like(self.hidden_state)

        x = tf.matmul(inputs, self.W_x)  
        h = tf.matmul(self.hidden_state, self.W_h)
        self.hidden_state = tf.tanh(x + h + self.b)

        return self.hidden_state
    



class VanillaRNN(tf.keras.Model):
    def __init__(self, n_hidden_layers : int, units_per_layer : list):
        super(VanillaRNN, self).__init__()

        self.layer_list = []

        for _, units in enumerate(units_per_layer):
            self.layer_list.append(LayerRNN(units, activation='tanh'))



    def call(self, inputs):
        """
        Procedure in psuedo code:
        at time step t:
            update hidden states as you forward propogate
            
            out of model:
                compile with adam, and sparse crossentropy loss
                optimize with fit


        Update hidden state:
            h =  tanh(Wx * x + Wh * h + b)
        """
        batch_size = inputs.shape[0]
        time_steps = inputs.shape[1]

        for T in range(time_steps):
            x  = inputs[:, T, :]
            for layer in self.layers:
                x = layer(x)    

In [16]:
VanillaRNNModel = VanillaRNN(n_hidden_layers=3, units_per_layer=[10, 20, 30])
VanillaRNNModel.build((None, 5, 10))  # Assuming input shape is (batch_size, time_steps, features)
print(VanillaRNNModel.summary())




None


In [17]:
# Generate synthetic data
def generate_data(num_samples, sequence_length, num_features):
    X = np.random.rand(num_samples, sequence_length, num_features)  # Random sequences
    y = np.sum(X, axis=(1, 2)) % 2  # Labels: 0 if sum is even, 1 if odd
    return X, y

# Parameters
num_samples = 1000
sequence_length = 10
num_features = 5
n_hidden_layers = 2
units_per_layer = [64, 32]

# Generate data
X, y = generate_data(num_samples, sequence_length, num_features)

# Create and compile the model
model = VanillaRNN(n_hidden_layers=n_hidden_layers, units_per_layer=units_per_layer)
model.compile(optimizer='adam', loss='sparse_categorical_crossentropy', metrics=['accuracy'])

# Train the model
model.fit(X, y, epochs=10, batch_size=32)

# Evaluate the model
loss, accuracy = model.evaluate(X, y)
print(f"Loss: {loss}, Accuracy: {accuracy}")

Epoch 1/10


TypeError: Exception encountered when calling VanillaRNN.call().

[1mExpected int32, but got None of type 'NoneType'.[0m

Arguments received by VanillaRNN.call():
  • inputs=tf.Tensor(shape=(None, 10, 5), dtype=float32)

In [18]:
import numpy as np
import tensorflow as tf

class LayerRNN(tf.keras.layers.Layer):
    def __init__(self, units, activation=None):
        super(LayerRNN, self).__init__()
        self.units = units
        self.activation = activation

    def build(self, input_shape):
        input_dim = input_shape[-1]

        self.W_x = self.add_weight(
            shape=(input_dim, self.units),
            initializer="random_normal",
            trainable=True,
        )

        self.W_h = self.add_weight(
            shape=(self.units, self.units),
            initializer="random_normal",
            trainable=True,
        )

        self.b = self.add_weight(
            shape=(self.units,),
            initializer='zeros',
            trainable=True
        )

    def call(self, inputs, hidden_state):
        x = tf.matmul(inputs, self.W_x)  
        h = tf.matmul(hidden_state, self.W_h)
        hidden_state = tf.tanh(x + h + self.b)

        return hidden_state

class VanillaRNN(tf.keras.Model):
    def __init__(self, n_hidden_layers: int, units_per_layer: list):
        super(VanillaRNN, self).__init__()
        self.layers_list = [LayerRNN(units) for units in units_per_layer]

    def call(self, inputs):
        batch_size = tf.shape(inputs)[0]
        time_steps = inputs.shape[1]

        # Initialize hidden state for the first layer
        hidden_states = [tf.zeros((batch_size, layer.units)) for layer in self.layers_list]

        for t in range(time_steps):
            x = inputs[:, t, :]  # Get the input for the current time step
            for i, layer in enumerate(self.layers_list):
                hidden_states[i] = layer(x, hidden_states[i])  # Update hidden state

                # Pass the output to the next layer
                x = hidden_states[i]

        return hidden_states[-1]  # Return the output of the last layer



In [19]:
VanillaRNNModel = VanillaRNN(n_hidden_layers=3, units_per_layer=[10, 20, 30])
VanillaRNNModel.build((None, 5, 10))  # Assuming input shape is (batch_size, time_steps, features)
print(VanillaRNNModel.summary())



None


In [20]:
# Generate synthetic data
def generate_data(num_samples, sequence_length, num_features):
    X = np.random.rand(num_samples, sequence_length, num_features)  # Random sequences
    y = np.sum(X, axis=(1, 2)) % 2  # Labels: 0 if sum is even, 1 if odd
    return X, y

# Parameters
num_samples = 1000
sequence_length = 10
num_features = 5
n_hidden_layers = 2
units_per_layer = [64, 32]

# Generate data
X, y = generate_data(num_samples, sequence_length, num_features)

# Create and compile the model
model = VanillaRNN(n_hidden_layers=n_hidden_layers, units_per_layer=units_per_layer)
model.compile(optimizer='adam', loss='sparse_categorical_crossentropy', metrics=['accuracy'])

print(model.summary())

# Train the model
model.fit(X, y, epochs=10, batch_size=32)

# Evaluate the model
loss, accuracy = model.evaluate(X, y)
print(f"Loss: {loss}, Accuracy: {accuracy}")

None
Epoch 1/10
[1m32/32[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m4s[0m 5ms/step - accuracy: 0.0000e+00 - loss: 5.7938
Epoch 2/10
[1m32/32[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 4ms/step - accuracy: 0.0000e+00 - loss: 0.7056
Epoch 3/10
[1m32/32[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 5ms/step - accuracy: 0.0000e+00 - loss: 0.6942
Epoch 4/10
[1m32/32[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 5ms/step - accuracy: 0.0000e+00 - loss: 0.6929
Epoch 5/10
[1m32/32[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 5ms/step - accuracy: 0.0000e+00 - loss: 0.6944
Epoch 6/10
[1m32/32[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 5ms/step - accuracy: 0.0000e+00 - loss: 0.6925
Epoch 7/10
[1m32/32[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 5ms/step - accuracy: 0.0000e+00 - loss: 0.6923
Epoch 8/10
[1m32/32[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 5ms/step - accuracy: 0.0000e+00 - loss: 0.6914
Epoch 9/10
[1m32/3

In [22]:
model.summary()

**Review**
- First SUCCESSFUL run
- What could y be/ what loss/ what metrics?


## ai_generated Review of Attempts and Progress

### What You've Tried

1. **Initial Exploration**
    - Started with basic Keras/TensorFlow layers (Dense) to simulate RNN-like behavior.
    - Experimented with custom classes and manual hidden state updates.

2. **Custom RNN Layer Development**
    - Created `HiddenState` and `LayerRNN` classes to encapsulate hidden state logic.
    - Explored different ways to manage hidden states, including per-layer hidden states.
    - Implemented custom weight matrices (`W_x`, `W_h`) and bias for each layer.

3. **Model Architecture**
    - Built a `VanillaRNN` model using lists of custom RNN layers.
    - Iterated on the model’s `call` method to process sequences step-by-step and propagate hidden states through layers.

4. **Data Generation and Training**
    - Generated synthetic sequence data for testing.
    - Compiled and trained the custom RNN model using Keras’ training loop.

### What Worked

- Successfully implemented a multi-layer vanilla RNN from scratch using TensorFlow’s low-level API.
- Managed hidden states for each layer and updated them at each time step.
- The model could be compiled, trained, and evaluated using standard Keras methods.
- Achieved a working training loop with synthetic data and observed loss/accuracy metrics.

### What Didn’t Work / Challenges

- Early attempts using only Dense layers did not capture the recurrent nature of RNNs.
- Managing hidden states outside of the Keras layer/model structure was cumbersome and error-prone.
- Initial custom classes lacked proper weight management and did not generalize to multiple layers.
- Some confusion around input/output shapes and how to propagate hidden states between layers and time steps.
- Output layer and loss function needed to be carefully chosen to match the task (e.g., classification vs. regression).

### Key Learnings

- Each RNN layer must maintain its own hidden state and update it at every time step.
- Custom RNNs require careful management of weights and state, but TensorFlow’s subclassing API makes this possible.
- Building from scratch deepens understanding of RNN internals, including vanishing gradients and sequence processing.
- Keras’ model API allows integration of custom layers with standard training workflows.

**Next Steps:**  
- Experiment with different sequence lengths, hidden sizes, and temperature sampling for text generation.
- Add output layers suitable for character-level prediction (e.g., softmax over vocabulary).
- Try training on real text data and compare outputs from trained vs. untrained models.

**Note** : The classes were moved to rnn.py and utils were implemented for vectorization of text

In [23]:
import numpy as np
import tensorflow as tf

from utils import Vectorizer
from rnn import VanillaRNN


# Sample text for training
text = "Pets are beloved companions that bring joy"
# Initialize the vectorizer
vectorizer = Vectorizer(text)

# Tokenize and vectorize the training text
tokens = vectorizer.tokenize()
vector_one_hot = vectorizer.vectorize(tokens)
vocab_size = vectorizer.vocab_size

# Initialize and compile the VanillaRNN model
vanilla_rnn = VanillaRNN(n_hidden_layers=2, units_per_layer=[128, 64], vocab_size=vocab_size)
vanilla_rnn.compile(optimizer='adam', loss='sparse_categorical_crossentropy', metrics=['accuracy'])

# Prepare input and target for next-character prediction
X = vector_one_hot[:, :-1, :]  # Input: all but last character
y = np.argmax(vector_one_hot[:, 1:, :], axis=-1)  # Target: all but first character

# Fit the model
vanilla_rnn.fit(X, y, epochs=100)

Epoch 1/100
[1m1/1[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m15s[0m 15s/step - accuracy: 0.0244 - loss: 3.0450
Epoch 2/100
[1m1/1[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 84ms/step - accuracy: 0.1463 - loss: 3.0129
Epoch 3/100
[1m1/1[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 80ms/step - accuracy: 0.3171 - loss: 2.9809
Epoch 4/100
[1m1/1[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 76ms/step - accuracy: 0.4390 - loss: 2.9484
Epoch 5/100
[1m1/1[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 155ms/step - accuracy: 0.5610 - loss: 2.9149
Epoch 6/100
[1m1/1[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 77ms/step - accuracy: 0.6341 - loss: 2.8797
Epoch 7/100
[1m1/1[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 78ms/step - accuracy: 0.6829 - loss: 2.8423
Epoch 8/100
[1m1/1[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 79ms/step - accuracy: 0.6829 - loss: 2.8020
Epoch 9/100
[1m1/1[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[

<keras.src.callbacks.history.History at 0x1aaf22c3850>

In [24]:
def generate_text(model, vectorizer, seed_text, num_chars, temperature=1.0):
    # Tokenize and vectorize the seed text.
    tokens = vectorizer.tokenize() if seed_text == vectorizer.text else [vectorizer.char2idx[c] for c in seed_text]
    # Generate one-hot vectors; shape: (1, seq_len, vocab_size)
    input_seq = vectorizer.vectorize(tokens)
    generated = seed_text

    

    for i in range(num_chars):
        # Predict next character probabilities from the current sequence.
        logits = model(input_seq)  # shape: (1, time_steps, vocab_size)
        last_logits = logits[0, -1, :] / temperature
        probs = tf.nn.softmax(last_logits).numpy()

             
        # next_token = np.random.choice(range(vectorizer.vocab_size), p=probs)
        # Not using distribution as that leads to non sensical results
        
        next_token = np.argmax(probs)
        
        # Append predicted character.
        generated += vectorizer.idx2char[next_token]
        # Append one-hot vector for the new token.
        next_one_hot = tf.one_hot([next_token], depth=vectorizer.vocab_size)
        next_one_hot = tf.expand_dims(next_one_hot, axis=1)  # shape: (1, 1, vocab_size)
        input_seq = tf.concat([input_seq, next_one_hot], axis=1)

        print(vectorizer.idx2char[next_token], end="")
    return generated



seed = "are"

print("Streaming text: ", seed, end="")
generated_text = generate_text(vanilla_rnn, vectorizer, seed, num_chars=100, temperature=2)
print("\nGenerated Text:")
print(generated_text)

Streaming text:  are beloved companions that bring joyt bring joyt atibg nh ts that bring joyt bring joyt bring joyt bri
Generated Text:
are beloved companions that bring joyt bring joyt atibg nh ts that bring joyt bring joyt bring joyt bri
