<a href="https://colab.research.google.com/github/saerarawas/AAI_633O_B11_202520/blob/main/Week3/Building_a_Simplified_Transformer_Encoder.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# **Week 3 Hands-on Lab: Building a Simplified Transformer Encoder**

This hands-on lab allows you to understand the Transformer architecture by implementing a basic Transformer encoder. You will learn how input embeddings, positional encodings, and feedforward layers work together in an encoder block. We will be using the Torch framework to build a simple transformer encoder.

# **Part 1: Input Embedding and Positional Encoding**

**1.	Generate Input Data**
Define a sample sentence and tokenize it into a numerical format.


In [1]:
import torch
import torch.nn as nn
import numpy as np

# Example sentence and token IDs (simplified for illustration)
token_ids = torch.tensor([[1, 2, 3, 4, 5]])  # Tokenized sentence
vocab_size = 10  # Vocabulary size
embedding_dim = 8  # Embedding size

**2. Create an Embedding Layer**
Implement the embedding layer to convert token IDs into dense vectors.

In [2]:
embedding_layer = nn.Embedding(vocab_size, embedding_dim)
embedded_tokens = embedding_layer(token_ids)
print("Embedded Tokens:\n", embedded_tokens)

Embedded Tokens:
 tensor([[[-0.0476,  0.5313, -1.8818, -0.2800, -0.0036,  0.4372, -0.0532,
           0.1149],
         [ 0.1361, -0.6000, -1.2875,  0.7560,  1.3221,  0.6297, -1.9284,
          -0.7103],
         [ 0.3835,  2.5033,  0.0379,  1.3129,  1.0807,  0.6865, -0.0671,
           1.2258],
         [ 0.6793,  0.8648,  0.0030,  1.3668, -2.3055,  1.6827, -0.7441,
           1.3466],
         [ 0.7979, -1.0823, -1.8698, -1.6705, -0.5799,  0.1817,  0.8907,
          -1.6750]]], grad_fn=<EmbeddingBackward0>)


**3.	Add Positional Encoding**
Incorporate positional encoding to provide positional information to the model.


In [3]:
def positional_encoding(seq_len, embedding_dim):
    position = np.arange(seq_len)[:, np.newaxis]
    div_term = np.exp(np.arange(0, embedding_dim, 2) * -(np.log(10000.0) / embedding_dim))
    pe = np.zeros((seq_len, embedding_dim))
    pe[:, 0::2] = np.sin(position * div_term)
    pe[:, 1::2] = np.cos(position * div_term)
    return torch.tensor(pe, dtype=torch.float)

seq_len = token_ids.size(1)
pos_encoding = positional_encoding(seq_len, embedding_dim)
print("Positional Encoding:\n", pos_encoding)


Positional Encoding:
 tensor([[ 0.0000e+00,  1.0000e+00,  0.0000e+00,  1.0000e+00,  0.0000e+00,
          1.0000e+00,  0.0000e+00,  1.0000e+00],
        [ 8.4147e-01,  5.4030e-01,  9.9833e-02,  9.9500e-01,  9.9998e-03,
          9.9995e-01,  1.0000e-03,  1.0000e+00],
        [ 9.0930e-01, -4.1615e-01,  1.9867e-01,  9.8007e-01,  1.9999e-02,
          9.9980e-01,  2.0000e-03,  1.0000e+00],
        [ 1.4112e-01, -9.8999e-01,  2.9552e-01,  9.5534e-01,  2.9996e-02,
          9.9955e-01,  3.0000e-03,  1.0000e+00],
        [-7.5680e-01, -6.5364e-01,  3.8942e-01,  9.2106e-01,  3.9989e-02,
          9.9920e-01,  4.0000e-03,  9.9999e-01]])


Add the positional encoding to the embedded tokens:

In [4]:
embedded_with_pos = embedded_tokens + pos_encoding.unsqueeze(0)
print("Embedded Tokens with Positional Encoding:\n", embedded_with_pos)


Embedded Tokens with Positional Encoding:
 tensor([[[-0.0476,  1.5313, -1.8818,  0.7200, -0.0036,  1.4372, -0.0532,
           1.1149],
         [ 0.9775, -0.0597, -1.1877,  1.7510,  1.3321,  1.6296, -1.9274,
           0.2897],
         [ 1.2928,  2.0872,  0.2366,  2.2930,  1.1007,  1.6863, -0.0651,
           2.2258],
         [ 0.8204, -0.1252,  0.2985,  2.3221, -2.2755,  2.6823, -0.7411,
           2.3466],
         [ 0.0411, -1.7360, -1.4803, -0.7494, -0.5399,  1.1809,  0.8947,
          -0.6750]]], grad_fn=<AddBackward0>)


# **Part 2: Add a Feedforward Layer**

1.	**Define a Feedforward Neural Network**
Implement a simple feedforward layer as part of the encoder.


In [5]:
feedforward = nn.Sequential(
    nn.Linear(embedding_dim, 16),
    nn.ReLU(),
    nn.Linear(16, embedding_dim)
)
ff_output = feedforward(embedded_with_pos)
print("Feedforward Output:\n", ff_output)


Feedforward Output:
 tensor([[[-0.3757,  0.5520, -0.4770,  0.2611, -0.1558, -0.2438, -0.3839,
           0.6963],
         [-0.0732,  0.3629, -0.2447,  0.1214,  0.1462,  0.0155,  0.0546,
           0.2149],
         [-0.3086,  0.2660, -0.6446,  0.5495,  0.2360, -0.0111, -0.1559,
           1.0200],
         [-0.0326,  0.1907, -0.8252,  0.5016,  0.3373, -0.0055,  0.0163,
           1.2718],
         [ 0.1800,  0.3794,  0.1056, -0.1509,  0.1604,  0.2099,  0.2076,
          -0.0501]]], grad_fn=<ViewBackward0>)


# **Part 3: Combine the Components into an Encoder Block**

1.	**Define the Encoder Block**
Combine the embedding, positional encoding, and feedforward components into an encoder block.


In [6]:
class TransformerEncoderBlock(nn.Module):
    def __init__(self, vocab_size, embedding_dim):
        super(TransformerEncoderBlock, self).__init__()
        self.embedding = nn.Embedding(vocab_size, embedding_dim)
        self.feedforward = nn.Sequential(
            nn.Linear(embedding_dim, 16),
            nn.ReLU(),
            nn.Linear(16, embedding_dim)
        )
        self.layer_norm = nn.LayerNorm(embedding_dim)

    def forward(self, x):
        embed = self.embedding(x)
        pos_enc = positional_encoding(x.size(1), embed.size(2))
        embed_with_pos = embed + pos_enc.unsqueeze(0)
        ff_output = self.feedforward(embed_with_pos)
        return self.layer_norm(embed_with_pos + ff_output)

encoder = TransformerEncoderBlock(vocab_size, embedding_dim)
output = encoder(token_ids)
print("Encoder Output:\n", output)


Encoder Output:
 tensor([[[ 0.6291,  0.9954,  0.0744, -1.1658, -0.4841,  1.5911, -0.0660,
          -1.5743],
         [-1.1803, -0.2941, -0.8616,  0.8310, -0.9540,  2.0155,  0.1298,
           0.3137],
         [ 1.4635,  0.2129,  0.4341, -1.1405, -1.3900,  0.2033,  1.1868,
          -0.9702],
         [-0.3388, -0.4817, -0.2582,  2.0535, -0.1916, -0.8219,  1.1719,
          -1.1332],
         [-0.4899, -0.1472,  0.9754,  1.1978,  0.5335, -1.2628,  0.8510,
          -1.6579]]], grad_fn=<NativeLayerNormBackward0>)


**Part 4: Experiment with Different Inputs**

* Test with Different Sentences
Replace token_ids with new examples to observe how the encoder processes different inputs.
* Modify Hyperparameters
Experiment with different embedding sizes, feedforward dimensions, or positional encoding scales to see their effect on the output.


# **Summary**

By completing this lab, you have:

* Understood the role of embedding, positional encoding, and feedforward layers in the Transformer encoder.
* Gained hands-on experience implementing a core component of the Transformer architecture.
* Developed a deeper appreciation for the architecture’s design and functionality.

This lab builds foundational knowledge of the Transformer, preparing you for more advanced concepts like self-attention.
