### **State University of Campinas - UNICAMP** </br>
**Course**: MC886A </br>
**Professor**: Marcelo da Silva Reis </br>
**TA (PED)**: Marcos Vinicius Souza Freire

---

### **Hands-On: Deep Learnng with PyTorch**
##### Notebook: 02 RNN
---

### **Table of Contents**

1. [**Objectives**](#objectives) </br>
2. [**Prerequisites**](#prerequisites) </br>
3. [**Recurrent Neural Network (RNN)**](#recurrent-neural-network-rnn) </br>
4. [**Implementation of a RNN**](#implementation-of-a-rnn) </br>
  4.1. [**Implementing for a lerger corpus**](#implementing-for-a-larger-corpus) </br>
6. [**REFERENCES**](#references)

---

#### **Objectives**
- Understand how Convolutional Neural Networks (CNN) are originated and created.
- Advance from the single concept of Deep Neural Networks to CNNs.
- Understand basic concepts of Transfer Learning and work with Fine-tune.

---



#### **Prerequisites**
- Install PyTorch and some extra packages.
- Have Python and a Jupyter Notebook ready (great for interactive demos).

Installing Pytorch (for all setups in `00-setup.ipynb` from the hands-on 00):

- `pip install torch torchvision`

- `pip install nbformat`

- `pip install torchmetrics`

- To plot pretty graphs, you can use Plotly
`pip install plotly`

---

### **Recurrent Neural Network (RNN)**

**Definition:**
A Recurrent Neural Network is a type of neural network designed to process sequential data by maintaining a hidden state that captures information from previous time steps. Unlike feedforward networks, RNNs have connections that form directed cycles, allowing information to persist across time steps. This makes them particularly suited for tasks like natural language processing, time series prediction, and speech recognition.

**Formula:**
For a basic RNN:

Hidden state update at time step $t$:
$h_t = \text{activation}(\mathbf{W_{xh}}x_t + \mathbf{W_{hh}}h_{t-1} + b_h)$

Output at time step $t$:
$y_t = \text{activation}_{\text{out}}(\mathbf{W_{hy}}h_t + b_y)$

Where:
- $x_t$ is the input at time step $t$
- $h_t$ is the hidden state at time step $t$
- $h_{t-1}$ is the hidden state from the previous time step
- $y_t$ is the output at time step $t$
- $\mathbf{W_{xh}}$ is the weight matrix for input-to-hidden connections
- $\mathbf{W_{hh}}$ is the weight matrix for hidden-to-hidden recurrent connections
- $\mathbf{W_{hy}}$ is the weight matrix for hidden-to-output connections
- $b_h$ and $b_y$ are bias terms
- $\text{activation}$ is typically tanh or ReLU
- $\text{activation}_{\text{out}}$ depends on the task (e.g., sigmoid for binary classification, softmax for multi-class)

4. Recurrent Neural Network (RNN)
RNNs handle sequential data. We'll predict the next character in a text sequence.

In [None]:
# --------------------------------------------------------------------------
# Recurrent Neural Network
# --------------------------------------------------------------------------

import plotly.graph_objects as go
import numpy as np

# Reuse the shared configuration
node_size = 40
font_size = 12
layer_colors = {'input': '#636EFA', 'bias': '#00CC96',
               'hidden': '#FFA15A', 'output': '#EF553B',
               'activation': '#AB63FA', 'recurrent': '#19D3F3'}

def create_network(nodes, edges, recurrent_edges, title):
    fig = go.Figure()
    legend_groups = set()

    # Create regular edges
    for i, ((src, dest), weight) in enumerate(edges.items()):
        x0, y0 = nodes[src]['x'], nodes[src]['y']
        x1, y1 = nodes[dest]['x'], nodes[dest]['y']
        fig.add_trace(go.Scatter(
            x=[x0, x1, None], y=[y0, y1, None],
            line=dict(width=1, color='gray'),
            mode='lines',
            hoverinfo='text',
            text=f'Weight: {weight}',
            showlegend=i == 0,
            legendgroup='weights',
            name='Weights'
        ))

    # Create recurrent edges with curved lines
    for i, ((src, dest), weight) in enumerate(recurrent_edges.items()):
        x0, y0 = nodes[src]['x'], nodes[src]['y']
        x1, y1 = nodes[dest]['x'], nodes[dest]['y']

        # Calculate control points for curved line
        # For horizontal recurrent connections
        if abs(y1 - y0) < 0.1:
            # Create a curved path above if going right to left
            if x1 < x0:
                xcp = (x0 + x1) / 2
                ycp = max(y0, y1) + 0.5
            # Create a curved path below if going left to right
            else:
                xcp = (x0 + x1) / 2
                ycp = min(y0, y1) - 0.5
        # For vertical recurrent connections
        else:
            # Create a curved path to the right
            xcp = max(x0, x1) + 0.5
            ycp = (y0 + y1) / 2

        # Generate points for a quadratic Bezier curve
        t = np.linspace(0, 1, 20)
        x_bezier = (1-t)**2 * x0 + 2*(1-t)*t * xcp + t**2 * x1
        y_bezier = (1-t)**2 * y0 + 2*(1-t)*t * ycp + t**2 * y1

        fig.add_trace(go.Scatter(
            x=x_bezier, y=y_bezier,
            line=dict(width=2, color=layer_colors['recurrent'], dash='dot'),
            mode='lines',
            hoverinfo='text',
            text=f'Recurrent Weight: {weight}',
            showlegend=i == 0 and 'recurrent' not in legend_groups,
            legendgroup='recurrent',
            name='Recurrent Weights'
        ))
        if 'recurrent' not in legend_groups and i == 0:
            legend_groups.add('recurrent')

    # Create nodes with legend groups
    for node in nodes:
        # Determine node type from color
        node_type = [k for k, v in layer_colors.items() if v == node['color']][0]

        fig.add_trace(go.Scatter(
            x=[node['x']], y=[node['y']],
            mode='markers+text',
            marker=dict(size=node_size, color=node['color']),
            text=node.get('label', ''),
            textposition="top center",
            hoverinfo='text',
            hovertext=node.get('formula', ''),
            showlegend=node_type not in legend_groups,
            legendgroup=node_type,
            name=f'{node_type.capitalize()} Node'
        ))
        if node_type not in legend_groups:
            legend_groups.add(node_type)

    # Add activation functions as separate traces
    activation_added = False
    for node in nodes:
        if 'activation' in node:
            fig.add_trace(go.Scatter(
                x=[node['x'] + 0.25],  # Offset from node
                y=[node['y'] + 0.1],   # Vertical adjustment
                mode='text',
                text=node['activation'],
                textfont=dict(color=layer_colors['activation'], size=font_size),
                showlegend=not activation_added,
                legendgroup='activation',
                name='Activation Function'
            ))
            activation_added = True

    # Add time step labels
    for t in range(3):
        fig.add_trace(go.Scatter(
            x=[t], y=[-2.5],
            mode='text',
            text=f"t={t+1}",
            textfont=dict(size=14, color='black'),
            showlegend=False
        ))

    fig.update_layout(
        title=title,
        template='plotly_white',
        xaxis=dict(showgrid=False, zeroline=False, showticklabels=False),
        yaxis=dict(showgrid=False, zeroline=False, showticklabels=False),
        margin=dict(l=20, r=150, t=50, b=50),
        legend=dict(
            x=1.05,
            y=0.5,
            xanchor='left',
            yanchor='middle',
            itemsizing='constant'
        ),
        width=800,
        height=500
    )
    return fig

# --------------------------------------------------------------------------
# 5. Recurrent Neural Network (RNN)
# --------------------------------------------------------------------------
rnn_nodes = [
    # Time step 1
    {'x': 0, 'y': -1, 'label': 'x₁', 'color': layer_colors['input'],
     'formula': 'Input at time t=1'},
    {'x': 0, 'y': 0, 'label': 'h₁', 'color': layer_colors['hidden'],
     'activation': 'tanh(Σ)', 'formula': 'h₁ = tanh(W_xh·x₁ + W_hh·h₀ + b_h)'},
    {'x': 0, 'y': 1, 'label': 'y₁', 'color': layer_colors['output'],
     'activation': 'σ(Σ)', 'formula': 'y₁ = σ(W_hy·h₁ + b_y)'},

    # Time step 2
    {'x': 1, 'y': -1, 'label': 'x₂', 'color': layer_colors['input'],
     'formula': 'Input at time t=2'},
    {'x': 1, 'y': 0, 'label': 'h₂', 'color': layer_colors['hidden'],
     'activation': 'tanh(Σ)', 'formula': 'h₂ = tanh(W_xh·x₂ + W_hh·h₁ + b_h)'},
    {'x': 1, 'y': 1, 'label': 'y₂', 'color': layer_colors['output'],
     'activation': 'σ(Σ)', 'formula': 'y₂ = σ(W_hy·h₂ + b_y)'},

    # Time step 3
    {'x': 2, 'y': -1, 'label': 'x₃', 'color': layer_colors['input'],
     'formula': 'Input at time t=3'},
    {'x': 2, 'y': 0, 'label': 'h₃', 'color': layer_colors['hidden'],
     'activation': 'tanh(Σ)', 'formula': 'h₃ = tanh(W_xh·x₃ + W_hh·h₂ + b_h)'},
    {'x': 2, 'y': 1, 'label': 'y₃', 'color': layer_colors['output'],
     'activation': 'σ(Σ)', 'formula': 'y₃ = σ(W_hy·h₃ + b_y)'},
]

# Regular feed-forward connections
rnn_edges = {
    (0, 1): 'W_xh',  # x₁ to h₁
    (1, 2): 'W_hy',  # h₁ to y₁
    (3, 4): 'W_xh',  # x₂ to h₂
    (4, 5): 'W_hy',  # h₂ to y₂
    (6, 7): 'W_xh',  # x₃ to h₃
    (7, 8): 'W_hy',  # h₃ to y₃
}

# Recurrent connections (with special curved rendering)
rnn_recurrent_edges = {
    (1, 4): 'W_hh',  # h₁ to h₂
    (4, 7): 'W_hh',  # h₂ to h₃
}

fig5 = create_network(rnn_nodes, rnn_edges, rnn_recurrent_edges, "Recurrent Neural Network (RNN)")
fig5.show()

### **Implementation of a RNN**

Recurrent Neural Network (RNN)

RNNs handle sequential data. We'll predict the next character in a text sequence.

In [None]:
import torch
import torch.nn as nn
import torch.optim as optim
from torchmetrics import Accuracy
from tqdm import tqdm
import plotly.graph_objects as go

# Prepare data
text = "hello world"
chars = sorted(list(set(text)))
char_to_idx = {ch: i for i, ch in enumerate(chars)}
idx_to_char = {i: ch for i, ch in enumerate(chars)}

seq_length = 3
sequences = []
targets = []
for i in range(len(text) - seq_length):
    seq = text[i:i+seq_length]
    target = text[i+seq_length]
    sequences.append([char_to_idx[ch] for ch in seq])
    targets.append(char_to_idx[target])

X = torch.tensor(sequences, dtype=torch.long)
y = torch.tensor(targets, dtype=torch.long)

# Define the RNN model
class RNN(nn.Module):
    def __init__(self, input_size, hidden_size, output_size):
        super(RNN, self).__init__()
        self.hidden_size = hidden_size
        self.embedding = nn.Embedding(input_size, hidden_size)
        self.rnn = nn.RNN(hidden_size, hidden_size, batch_first=True)
        self.fc = nn.Linear(hidden_size, output_size)

    def forward(self, x, hidden):
        x = self.embedding(x)
        out, hidden = self.rnn(x, hidden)
        out = self.fc(out[:, -1, :])
        return out, hidden

    def init_hidden(self, batch_size):
        return torch.zeros(1, batch_size, self.hidden_size) # Assuming 1 layer

# Training setup
input_size = len(chars)
hidden_size = 128
output_size = len(chars)
model = RNN(input_size, hidden_size, output_size)
criterion = nn.CrossEntropyLoss()
optimizer = optim.Adam(model.parameters(), lr=0.01)
accuracy = Accuracy(task="multiclass", num_classes=output_size)

# List to store losses
train_losses = []

# Training loop
for epoch in tqdm(range(100), desc='Training'):
    model.train()
    hidden = model.init_hidden(X.size(0))
    optimizer.zero_grad()
    output, hidden = model(X, hidden)
    loss = criterion(output, y)
    loss.backward()
    optimizer.step()
    train_losses.append(loss.item())
    if epoch % 20 == 0:
        accuracy.update(output, y)
        print(f'Epoch {epoch}, Loss: {loss.item():.4f}, Accuracy: {accuracy.compute():.2f}')
        accuracy.reset()

# Plot loss
fig = go.Figure()
fig.add_trace(go.Scatter(
    x=list(range(len(train_losses))),
    y=train_losses,
    mode='lines',
    name='Train Loss',
    line=dict(color='royalblue', width=2),
    hovertemplate='Epoch: %{x}<br>Loss: %{y:.4f}<extra></extra>'
))

fig.update_layout(
    title='RNN Training Progress',
    xaxis_title='Epoch',
    yaxis_title='Loss',
    template='plotly_white',
    hovermode='x unified',
    margin=dict(l=40, r=20, t=60, b=20),
    showlegend=True,
    xaxis=dict(tickmode='linear', dtick=20),
    yaxis=dict(range=[0, max(train_losses)*1.1])
)

fig.show()

# Prediction function
def predict(model, char_seq):
    model.eval()
    with torch.no_grad():
        seq = [char_to_idx[ch] for ch in char_seq]
        seq = torch.tensor([seq], dtype=torch.long)
        hidden = model.init_hidden(1)
        output, _ = model(seq, hidden)
        _, predicted_idx = torch.max(output, 1)
        return idx_to_char[predicted_idx.item()]

print('Predicting some example: \n')
print(predict(model, 'hell'))  # Should predict 'l'

**Implementing for a larger corpus**

In this part, we will utilize texts from Alice's Adventures in Wonderland by Lewis Carroll [5].

In [None]:
import plotly.graph_objects as go
import numpy as np

# Configuration with additional layer types
node_size = 40
font_size = 12
layer_colors = {
    'input': '#636EFA',
    'embedding': '#AB63FA',
    'hidden': '#FFA15A',
    'output': '#EF553B',
    'activation': '#00CC96',
    'recurrent': '#19D3F3'
}

def create_rnn_network(nodes, edges, recurrent_edges, title):
    fig = go.Figure()
    legend_groups = set()

    # Create regular edges
    for i, ((src, dest), weight) in enumerate(edges.items()):
        x0, y0 = nodes[src]['x'], nodes[src]['y']
        x1, y1 = nodes[dest]['x'], nodes[dest]['y']
        fig.add_trace(go.Scatter(
            x=[x0, x1, None], y=[y0, y1, None],
            line=dict(width=1.5, color='gray'),
            mode='lines',
            hoverinfo='text',
            text=f'Weight Matrix: {weight}',
            showlegend=i == 0,
            legendgroup='weights',
            name='Parameters'
        ))

    # Create recurrent edges with downward curved lines
    for i, ((src, dest), weight) in enumerate(recurrent_edges.items()):
        x0, y0 = nodes[src]['x'], nodes[src]['y']
        x1, y1 = nodes[dest]['x'], nodes[dest]['y']

        # Calculate curved path with downward concavity
        xcp = (x0 + x1)/2  # Keep x control point in the middle
        ycp = (y0 + y1)/2 - 0.5  # Lower the y control point for downward curve

        t = np.linspace(0, 1, 20)
        x_bezier = (1-t)**2 * x0 + 2*(1-t)*t * xcp + t**2 * x1
        y_bezier = (1-t)**2 * y0 + 2*(1-t)*t * ycp + t**2 * y1

        fig.add_trace(go.Scatter(
            x=x_bezier, y=y_bezier,
            line=dict(width=2, color=layer_colors['recurrent'], dash='dot'),
            mode='lines',
            hoverinfo='text',
            text=f'Recurrent Weight: {weight}',
            showlegend=i == 0,
            legendgroup='recurrent',
            name='Recurrent Weights'
        ))

    # Create nodes with dimension information
    for node in nodes:
        node_type = node['type']
        fig.add_trace(go.Scatter(
            x=[node['x']], y=[node['y']],
            mode='markers+text',
            marker=dict(size=node_size, color=layer_colors[node_type]),
            text=node.get('label', ''),
            textposition="top center",
            hoverinfo='text',
            hovertext=node['description'],
            showlegend=node_type not in legend_groups,
            legendgroup=node_type,
            name=f'{node_type.capitalize()} Layer'
        ))
        if node_type not in legend_groups:
            legend_groups.add(node_type)

    # Add activation functions
    activation_added = False
    for node in nodes:
        if 'activation' in node:
            fig.add_trace(go.Scatter(
                x=[node['x'] + 0.3],
                y=[node['y'] + 0.1],
                mode='text',
                text=node['activation'],
                textfont=dict(color=layer_colors['activation'], size=font_size),
                showlegend=not activation_added,
                legendgroup='activation',
                name='Activation'
            ))
            activation_added = True

    # Add time step labels
    for t in range(3):
        fig.add_trace(go.Scatter(
            x=[t], y=[-3.2],
            mode='text',
            text=f"Time Step {t+1}",
            textfont=dict(size=14, color='black'),
            showlegend=False
        ))

    fig.update_layout(
        title=dict(text=title, x=0.05, font=dict(size=20)),
        template='plotly_white',
        xaxis=dict(showgrid=False, zeroline=False, showticklabels=False, range=[-0.5, 2.5]),
        yaxis=dict(showgrid=False, zeroline=False, showticklabels=False, range=[-3.5, 2]),
        margin=dict(l=20, r=150, t=80, b=60),
        legend=dict(
            x=1.05,
            y=0.5,
            xanchor='left',
            yanchor='middle',
            itemsizing='constant',
            font=dict(size=12)
        ),
        width=1000,
        height=600
    )
    return fig

# Define network nodes with dimension information
rnn_nodes = [
    # Time Step 1
    {'x': 0, 'y': -2, 'type': 'input', 'label': 'x₁',
     'description': 'Input Character\n(Index: 0-38)'},
    {'x': 0, 'y': -1, 'type': 'embedding', 'label': 'e₁ (64D)',
     'description': 'Embedding Layer\nConverts index to dense vector'},
    {'x': 0, 'y': 0, 'type': 'hidden', 'label': 'h₁ (128D)',
     'activation': 'tanh',
     'description': 'Hidden State\nh₁ = tanh(W_ih·e₁ + W_hh·h₀ + b_h)'},
    {'x': 0, 'y': 1, 'type': 'output', 'label': 'ŷ₁ (39D)',
     'activation': 'softmax',
     'description': 'Output Prediction\nŷ₁ = softmax(W_ho·h₁ + b_o)'},

    # Time Step 2
    {'x': 1, 'y': -2, 'type': 'input', 'label': 'x₂',
     'description': 'Input Character\n(Index: 0-38)'},
    {'x': 1, 'y': -1, 'type': 'embedding', 'label': 'e₂ (64D)',
     'description': 'Embedding Layer\nConverts index to dense vector'},
    {'x': 1, 'y': 0, 'type': 'hidden', 'label': 'h₂ (128D)',
     'activation': 'tanh',
     'description': 'Hidden State\nh₂ = tanh(W_ih·e₂ + W_hh·h₁ + b_h)'},
    {'x': 1, 'y': 1, 'type': 'output', 'label': 'ŷ₂ (39D)',
     'activation': 'softmax',
     'description': 'Output Prediction\nŷ₂ = softmax(W_ho·h₂ + b_o)'},

    # Time Step 3
    {'x': 2, 'y': -2, 'type': 'input', 'label': 'x₃',
     'description': 'Input Character\n(Index: 0-38)'},
    {'x': 2, 'y': -1, 'type': 'embedding', 'label': 'e₃ (64D)',
     'description': 'Embedding Layer\nConverts index to dense vector'},
    {'x': 2, 'y': 0, 'type': 'hidden', 'label': 'h₃ (128D)',
     'activation': 'tanh',
     'description': 'Hidden State\nh₃ = tanh(W_ih·e₃ + W_hh·h₂ + b_h)'},
    {'x': 2, 'y': 1, 'type': 'output', 'label': 'ŷ₃ (39D)',
     'activation': 'softmax',
     'description': 'Output Prediction\nŷ₃ = softmax(W_ho·h₃ + b_o)'},
]

# Define connections
rnn_edges = {
    # Time Step 1
    (0, 1): 'Embedding Matrix',
    (1, 2): 'W_ih',
    (2, 3): 'W_ho',
    # Time Step 2
    (4, 5): 'Embedding Matrix',
    (5, 6): 'W_ih',
    (6, 7): 'W_ho',
    # Time Step 3
    (8, 9): 'Embedding Matrix',
    (9, 10): 'W_ih',
    (10, 11): 'W_ho',
}

rnn_recurrent_edges = {
    (2, 6): 'W_hh',
    (6, 10): 'W_hh'
}

fig = create_rnn_network(rnn_nodes, rnn_edges, rnn_recurrent_edges,
                        "RNN Architecture with Embedding Layer")
fig.show()

In [None]:
import plotly.graph_objects as go
import numpy as np

# Define layer colors and descriptions
layer_colors = {
    'input': '#636EFA',      # Blue
    'embedding': '#00CC96',  # Green
    'rnn': '#FFA15A',        # Orange
    'fc': '#AB63FA',         # Purple
    'output': '#19D3F3',     # Cyan
    'activation': '#FF6692',  # Pink
    'recurrence': 'red'      # Red
}

layer_descriptions = {
    'input': "Input Layer\nSequence of character indices",
    'embedding': "Embedding Layer\nMaps characters to dense vectors",
    'rnn': "RNN Layer\nProcesses sequence with hidden state recurrence",
    'fc': "Fully Connected Layer\nMaps hidden states to vocabulary logits",
    'output': "Output Layer\nSequence of predicted character logits",
    'recurrence': "Hidden State Recurrence\nConnects RNN cells over time steps"
}

def create_rnn_visualization():
    fig = go.Figure()

    # Define layer positions and properties
    x_positions = [0, 2, 4, 6, 8]
    layer_heights = [4, 3, 3, 3, 4]  # Larger height for input/output to represent sequence
    layer_depths = [2, 2, 2, 2, 2]
    layer_types = ['input', 'embedding', 'rnn', 'fc', 'output']
    labels = [
        'Input\nSequence length: 40',
        'Embedding\nDim: 64',
        'RNN\nHidden size: 128',
        'Fully Connected\nOutput: vocab_size',
        'Output\nSequence length: 40'
    ]
    activations = [None, None, 'tanh', None, None]  # RNN uses tanh by default

    legend_groups = set()

    # Create layers
    for i in range(len(x_positions)):
        create_3d_layer(fig, x_positions[i], layer_heights[i], layer_depths[i],
                        layer_types[i], labels[i], legend_groups,
                        layer_descriptions[layer_types[i]], activations[i])

    # Add internal structure for RNN layer to show recurrence
    add_rnn_internal(fig, x_positions[2], layer_heights[2], layer_depths[2], legend_groups)

    # Add layer connections
    add_layer_connections(fig, x_positions)

    # Configure layout
    fig.update_layout(
        title=dict(
            text="Recurrent Neural Network (RNN) Architecture for Text Generation",
            x=0.05,
            font=dict(size=24)
        ),
        template='plotly_white',
        margin=dict(l=20, r=300, t=100, b=20),
        legend=dict(
            title="Layer Types",
            x=1.05,
            y=0.5,
            xanchor='left',
            yanchor='middle',
            font=dict(size=12)
        ),
        width=1400,
        height=600,
        scene=dict(
            xaxis=dict(showgrid=False, zeroline=False, visible=False),
            yaxis=dict(showgrid=False, zeroline=False, visible=False),
            zaxis=dict(showgrid=False, zeroline=False, visible=False),
            aspectmode='manual',
            aspectratio=dict(x=2, y=1, z=0.5)
        ),
        annotations=[
            dict(
                x=1.05,
                y=0.9,
                xref='paper',
                yref='paper',
                text="<b>RNN Components:</b><br>"
                     "- Embedding: Character to vector mapping<br>"
                     "- RNN: Sequential processing with hidden states<br>"
                     "- Linear: Hidden to logits transformation<br>"
                     "- Output: Predicted character sequence",
                showarrow=False,
                align='left',
                font=dict(size=14)
            )
        ]
    )
    return fig

def create_3d_layer(fig, x_pos, height, depth, layer_type, label_text,
                    legend_groups, description, activation=None):
    # Create 3D box structure
    x = [x_pos, x_pos + 1.2, x_pos + 1.2, x_pos] * 2
    y = [-height / 2, -height / 2, height / 2, height / 2] * 2
    z = [0] * 4 + [depth] * 4

    # Define edges for the 3D box
    edges = [(0, 1), (1, 2), (2, 3), (3, 0), (4, 5), (5, 6), (6, 7), (7, 4), (0, 4), (1, 5), (2, 6), (3, 7)]
    for i, (start, end) in enumerate(edges):
        show_legend = (layer_type not in legend_groups) and (i == 0)
        fig.add_trace(go.Scatter3d(
            x=[x[start], x[end]],
            y=[y[start], y[end]],
            z=[z[start], z[end]],
            mode='lines',
            line=dict(color=layer_colors[layer_type], width=2),
            showlegend=show_legend,
            name=layer_descriptions[layer_type].split('\n')[0],
            legendgroup=layer_type,
            hoverinfo='text',
            hovertext=description
        ))
    if layer_type not in legend_groups:
        legend_groups.add(layer_type)

    # Add activation function annotation if applicable
    if activation:
        fig.add_trace(go.Scatter3d(
            x=[x_pos + 0.6],
            y=[height / 2 + 0.3],
            z=[depth / 2],
            mode='text',
            text=f'σ = {activation}',
            textfont=dict(color=layer_colors['activation'], size=14),
            showlegend='activation' not in legend_groups,
            name='Activation Function',
            legendgroup='activation',
            hoverinfo='text',
            hovertext=f"Activation function: {activation}"
        ))
        if 'activation' not in legend_groups:
            legend_groups.add('activation')

    # Add layer label
    fig.add_trace(go.Scatter3d(
        x=[x_pos + 0.6],
        y=[-height / 2 - 0.5],
        z=[0],
        mode='text',
        text=label_text,
        textfont=dict(color='black', size=14),
        hoverinfo='none',
        showlegend=False
    ))

def add_rnn_internal(fig, x_pos, height, depth, legend_groups):
    # Add small boxes to represent unrolled RNN cells
    x_small = [x_pos + 0.2, x_pos + 0.6, x_pos + 1.0]
    small_size = 0.3
    for x in x_small:
        small_x = [x - small_size / 2, x + small_size / 2, x + small_size / 2, x - small_size / 2] * 2
        small_y = [-small_size / 2, -small_size / 2, small_size / 2, small_size / 2] * 2
        small_z = [depth / 2 - small_size / 2] * 4 + [depth / 2 + small_size / 2] * 4
        edges = [(0, 1), (1, 2), (2, 3), (3, 0), (4, 5), (5, 6), (6, 7), (7, 4), (0, 4), (1, 5), (2, 6), (3, 7)]
        for start, end in edges:
            fig.add_trace(go.Scatter3d(
                x=[small_x[start], small_x[end]],
                y=[small_y[start], small_y[end]],
                z=[small_z[start], small_z[end]],
                mode='lines',
                line=dict(color='black', width=2),
                showlegend=False
            ))
    # Add connections between small boxes to indicate hidden state flow
    for i in range(2):
        fig.add_trace(go.Scatter3d(
            x=[x_small[i] + small_size / 2, x_small[i + 1] - small_size / 2],
            y=[0, 0],
            z=[depth / 2, depth / 2],
            mode='lines',
            line=dict(color=layer_colors['recurrence'], width=4),
            showlegend=i == 0 and 'recurrence' not in legend_groups,
            name=layer_descriptions['recurrence'].split('\n')[0],
            legendgroup='recurrence',
            hoverinfo='text',
            hovertext=layer_descriptions['recurrence']
        ))

    # Add legend entry for hidden state recurrence
    if 'recurrence' not in legend_groups:
        legend_groups.add('recurrence')

def add_layer_connections(fig, x_positions):
    operations = [
        ("Embedding", "Maps indices to vectors"),
        ("RNN", "Processes sequence with tanh"),
        ("Linear", "Maps hidden to logits"),
        ("Output", "Produces predictions")
    ]

    for i in range(len(x_positions) - 1):
        x_start = x_positions[i] + 1.2
        x_end = x_positions[i + 1]
        fig.add_trace(go.Scatter3d(
            x=np.linspace(x_start, x_end, 30),
            y=np.zeros(30),
            z=np.zeros(30),
            mode='lines',
            line=dict(color='gray', width=1),
            hoverinfo='text',
            hovertext=f"Operation: {operations[i][0]}<br>{operations[i][1]}",
            showlegend=False
        ))
        fig.add_trace(go.Scatter3d(
            x=[(x_start + x_end) / 2],
            y=[0.5],
            z=[0],
            mode='text',
            text=operations[i][0],
            textfont=dict(size=12, color='black'),
            showlegend=False
        ))

# Generate and display the visualization
fig = create_rnn_visualization()
fig.show()

In [None]:
# Import libraries
import torch
import torch.nn as nn
import torch.optim as optim
import numpy as np
import plotly.graph_objects as go
from torch.utils.data import Dataset, DataLoader
import requests
# from tqdm import tqdm
from tqdm.auto import tqdm

# Set device to CPU (no GPU available)
device = torch.device("cpu")
print(f"Using device: {device}")

# Download Alice's Adventures in Wonderland
url = "https://www.gutenberg.org/files/11/11-0.txt"
response = requests.get(url)
text = response.text

# Preprocess: lowercase, keep only letters and spaces
text = text.lower()
text = ''.join([c for c in text if c.isalpha() or c.isspace()])

# Create vocabulary
chars = sorted(list(set(text)))
char_to_idx = {c: i for i, c in enumerate(chars)}
idx_to_char = {i: c for i, c in enumerate(chars)}
vocab_size = len(chars)
print(f"Vocabulary size: {vocab_size}")

# Define sequence length and create sequences
sequence_length = 40
sequences = []
targets = []
for i in range(0, len(text) - sequence_length):
    seq = text[i:i + sequence_length]
    target = text[i + 1:i + sequence_length + 1]
    sequences.append([char_to_idx[c] for c in seq])
    targets.append([char_to_idx[c] for c in target])
print(f"Number of sequences: {len(sequences)}")

# Custom Dataset class
class TextDataset(Dataset):
    def __init__(self, sequences, targets):
        self.sequences = sequences
        self.targets = targets

    def __len__(self):
        return len(self.sequences)

    def __getitem__(self, idx):
        return torch.tensor(self.sequences[idx], dtype=torch.long), torch.tensor(self.targets[idx], dtype=torch.long)

# Create dataset and dataloader
dataset = TextDataset(sequences, targets)
batch_size = 256
dataloader = DataLoader(dataset, batch_size=batch_size, shuffle=True)

# Define the RNN model
class RNNModel(nn.Module):
    def __init__(self, vocab_size, embedding_dim, hidden_size):
        super(RNNModel, self).__init__()
        self.embedding = nn.Embedding(vocab_size, embedding_dim)
        self.rnn = nn.RNN(embedding_dim, hidden_size, batch_first=True)
        self.fc = nn.Linear(hidden_size, vocab_size)

    def forward(self, x):
        x = self.embedding(x)  # (batch_size, sequence_length, embedding_dim)
        output, _ = self.rnn(x)  # (batch_size, sequence_length, hidden_size)
        output = self.fc(output)  # (batch_size, sequence_length, vocab_size)
        return output

# Hyperparameters
embedding_dim = 64
hidden_size = 128
learning_rate = 0.001
num_epochs = 50

# Initialize model
model = RNNModel(vocab_size, embedding_dim, hidden_size).to(device)
print(model)

# Loss and optimizer
criterion = nn.CrossEntropyLoss()
optimizer = optim.Adam(model.parameters(), lr=learning_rate)

# Training loop
losses = []
for epoch in range(num_epochs):
    model.train()
    total_loss = 0
    # Use tqdm as a context manager
    with tqdm(dataloader, desc=f'Epoch {epoch+1}/{num_epochs}', leave=True) as progress_bar:
        for seq, target in progress_bar:
            seq, target = seq.to(device), target.to(device)
            optimizer.zero_grad()
            output = model(seq)
            output = output.view(-1, vocab_size)
            target = target.view(-1)
            loss = criterion(output, target)
            loss.backward()
            optimizer.step()

            total_loss += loss.item()
            # Update progress bar
            progress_bar.set_postfix({
                'batch_loss': f'{loss.item():.4f}',
                'epoch_loss': f'{total_loss/(progress_bar.n + 1):.4f}'
            })

    avg_loss = total_loss / len(dataloader)
    losses.append(avg_loss)

In [None]:
# Function to generate text
def generate_text(model, start_seq, length=200):
    model.eval()
    seq = [char_to_idx[c] for c in start_seq[-sequence_length:]]  # Ensure length matches sequence_length
    generated = start_seq
    with torch.no_grad():
        for _ in range(length):
            seq_tensor = torch.tensor(seq, dtype=torch.long).unsqueeze(0).to(device)
            output = model(seq_tensor)
            last_output = output[0, -1, :]
            probs = torch.softmax(last_output, dim=0).cpu().numpy()
            next_char_idx = np.random.choice(len(probs), p=probs)
            next_char = idx_to_char[next_char_idx]
            generated += next_char
            seq = seq[1:] + [next_char_idx]
    return generated

# Generate text
start_seq = "alice was beginning to get very tired"
generated_text = generate_text(model, start_seq)
print("Generated Text:\n", generated_text)

# Plot training loss with Plotly
fig = go.Figure()
fig.add_trace(go.Scatter(x=list(range(1, num_epochs+1)), y=losses, mode='lines+markers', name='Loss'))
fig.update_layout(title='Training Loss', xaxis_title='Epoch', yaxis_title='Loss')
fig.show()

### **REFERENCES**

**This hands-on was based or inspired on the following reference materials:**

- Deep Learning with PyTorch by Manning Publications [1]
- PyTorch Official Documentation [2]
- PyTorch Tutorials [3]
- Learn PyTorch for Deep Learning: Zero to Mastery [4]


[1] Stevens, E., Antiga, L., & Viehmann, T. (2020). Deep Learning with PyTorch: Build, train, and tune neural networks using Python tools. Manning.

[2] PyTorch (2025). PyTorch documentation. The Linux Foundation. https://docs.pytorch.org/docs/stable/index.html

[3] PyTorch (2024). Welcome to PyTorch Tutorials. The Linux Foundation. https://docs.pytorch.org/tutorials/

[4] Learn Pytorch (2023). Learn PyTorch for Deep Learning: Zero to Mastery. By Daniel Bourke. https://www.learnpytorch.io/

[5] Project Gutenberg (2025). Alice's Adventures in Wonderland, by Lewis Carroll. https://www.gutenberg.org/files/11/11-0.txt