<a href="https://colab.research.google.com/github/saerarawas/AAI_633O_B11_202520/blob/main/Week3/Adjusted_Building_a_Simplified_Transformer_Encoder.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# **Week 3 Hands-on Lab: Building a Simplified Transformer Encoder**

This hands-on lab allows you to understand the Transformer architecture by implementing a basic Transformer encoder. You will learn how input embeddings, positional encodings, and feedforward layers work together in an encoder block. We will be using the Torch framework to build a simple transformer encoder.

# **Part 1: Input Embedding and Positional Encoding**

**1.	Generate Input Data**
Define a sample sentence and tokenize it into a numerical format.


In [6]:
import torch
import torch.nn as nn
import numpy as np

# Example sentence and token IDs (simplified for illustration)
token_ids = torch.tensor([[1, 2, 3, 4, 5]])  # Tokenized sentence
vocab_size = 10  # Vocabulary size
embedding_dim = 8  # Embedding size

In [7]:
import torch
import torch.nn as nn

# Define tokenized sentences (IDs should be within vocab_size)
token_ids_1 = torch.tensor([[1, 2, 3, 4, 5]])  # Example 1
token_ids_2 = torch.tensor([[6, 7, 8, 9, 2]])  # Example 2
token_ids_3 = torch.tensor([[3, 5, 7, 1, 9]])  # Example 3

# Parameters
vocab_size = 10  # Vocabulary size
embedding_dim = 8  # Embedding size

# Define embedding layer
embedding = nn.Embedding(vocab_size, embedding_dim)

# Get embeddings
embedding_1 = embedding(token_ids_1)
embedding_2 = embedding(token_ids_2)
embedding_3 = embedding(token_ids_3)

# Print results
print("Embeddings for Sentence 1:\n", embedding_1)
print("\nEmbeddings for Sentence 2:\n", embedding_2)
print("\nEmbeddings for Sentence 3:\n", embedding_3)

Embeddings for Sentence 1:
 tensor([[[ 1.2439, -0.3636,  0.1571,  0.1304,  1.7929,  0.5730,  0.2452,
          -0.7472],
         [ 1.1265, -0.1054,  0.1853, -1.2709,  0.1538,  0.2244,  0.6790,
           0.5241],
         [ 0.7050, -0.0054, -0.7490,  1.6957,  0.4317,  1.0350, -2.0611,
          -0.4975],
         [ 0.7463, -0.1773,  0.3654, -0.6961,  1.5702,  0.2138, -1.9310,
           0.0756],
         [ 0.9414, -0.8223, -0.6779,  0.1547, -0.3171,  1.6001, -1.3324,
           0.1818]]], grad_fn=<EmbeddingBackward0>)

Embeddings for Sentence 2:
 tensor([[[-0.6317, -0.5269, -1.1740,  2.2211,  1.4646,  0.5983,  0.1780,
           0.9335],
         [-0.0181,  0.5755,  0.8578,  0.2542,  1.3986, -0.3414,  0.0848,
          -1.9094],
         [-1.0302, -0.0383,  0.2648,  0.0040,  0.5926, -0.2198, -0.3749,
           1.0893],
         [ 0.0978, -0.8249, -0.1186, -1.2166, -0.0740,  2.1932, -0.0941,
          -1.0248],
         [ 1.1265, -0.1054,  0.1853, -1.2709,  0.1538,  0.2244,  0.6790,
 

**2. Create an Embedding Layer**
Implement the embedding layer to convert token IDs into dense vectors.

In [8]:
embedding_layer = nn.Embedding(vocab_size, embedding_dim)
embedded_tokens_1 = embedding_layer(token_ids_1)
print("Embedded Tokens:\n", embedded_tokens_1)
embedded_tokens_2 = embedding_layer(token_ids_2)
print("Embedded Tokens:\n", embedded_tokens_2)
embedded_tokens_3 = embedding_layer(token_ids_3)
print("Embedded Tokens:\n", embedded_tokens_3)

Embedded Tokens:
 tensor([[[ 0.5992, -0.3391,  0.6635,  0.4097,  0.8510, -0.2971,  0.8541,
          -0.6802],
         [ 1.4388,  0.9362,  0.0233,  0.0526,  0.1267, -1.7471,  0.2546,
          -1.4998],
         [ 0.5110,  0.5424,  1.2411, -0.5663,  0.5679, -0.9699,  0.1383,
          -2.1414],
         [ 0.2437,  0.2788,  0.5861,  0.0316, -2.1589, -0.3765, -0.7783,
           0.6599],
         [ 1.1690, -0.9870, -0.1704,  0.8806,  0.8097, -1.3048,  1.1746,
           0.6297]]], grad_fn=<EmbeddingBackward0>)
Embedded Tokens:
 tensor([[[-2.4727e-01, -1.1428e+00, -6.6926e-01,  2.2455e-01, -1.5599e+00,
          -3.9196e-01, -1.3258e-01,  1.1621e+00],
         [ 4.9729e-01,  1.3950e+00, -2.5457e-01,  1.8279e-01,  2.1942e-03,
          -1.1572e+00,  3.5733e-01, -5.4347e-01],
         [ 4.4993e-01, -7.2765e-01,  1.3750e+00, -7.0125e-01,  5.2682e-01,
          -3.8529e-01,  2.8117e+00, -1.7598e+00],
         [-6.4449e-01, -1.5905e+00, -1.3379e+00,  2.5243e+00,  6.0160e-01,
           5.2089

**3.	Add Positional Encoding**
Incorporate positional encoding to provide positional information to the model.


In [9]:
def positional_encoding(seq_len, embedding_dim):
    position = np.arange(seq_len)[:, np.newaxis]
    div_term = np.exp(np.arange(0, embedding_dim, 2) * -(np.log(10000.0) / embedding_dim))
    pe = np.zeros((seq_len, embedding_dim))
    pe[:, 0::2] = np.sin(position * div_term)
    pe[:, 1::2] = np.cos(position * div_term)
    return torch.tensor(pe, dtype=torch.float)

seq_len = token_ids_1.size(1)
pos_encoding = positional_encoding(seq_len, embedding_dim)
print("Positional Encoding:\n", pos_encoding)


Positional Encoding:
 tensor([[ 0.0000e+00,  1.0000e+00,  0.0000e+00,  1.0000e+00,  0.0000e+00,
          1.0000e+00,  0.0000e+00,  1.0000e+00],
        [ 8.4147e-01,  5.4030e-01,  9.9833e-02,  9.9500e-01,  9.9998e-03,
          9.9995e-01,  1.0000e-03,  1.0000e+00],
        [ 9.0930e-01, -4.1615e-01,  1.9867e-01,  9.8007e-01,  1.9999e-02,
          9.9980e-01,  2.0000e-03,  1.0000e+00],
        [ 1.4112e-01, -9.8999e-01,  2.9552e-01,  9.5534e-01,  2.9996e-02,
          9.9955e-01,  3.0000e-03,  1.0000e+00],
        [-7.5680e-01, -6.5364e-01,  3.8942e-01,  9.2106e-01,  3.9989e-02,
          9.9920e-01,  4.0000e-03,  9.9999e-01]])


Add the positional encoding to the embedded tokens:

In [15]:
embedded_with_pos_1 = embedded_tokens_1 + pos_encoding.unsqueeze(0)
print("Embedded Tokens with Positional Encoding:\n", embedded_with_pos_1)
embedded_with_pos_2 = embedded_tokens_2 + pos_encoding.unsqueeze(0)
print("Embedded Tokens with Positional Encoding:\n", embedded_with_pos_2)
embedded_with_pos_3 = embedded_tokens_3 + pos_encoding.unsqueeze(0)
print("Embedded Tokens with Positional Encoding:\n", embedded_with_pos_3)

Embedded Tokens with Positional Encoding:
 tensor([[[ 0.5992,  0.6609,  0.6635,  1.4097,  0.8510,  0.7029,  0.8541,
           0.3198],
         [ 2.2803,  1.4765,  0.1231,  1.0476,  0.1367, -0.7472,  0.2556,
          -0.4998],
         [ 1.4203,  0.1262,  1.4398,  0.4138,  0.5879,  0.0299,  0.1403,
          -1.1415],
         [ 0.3848, -0.7112,  0.8816,  0.9869, -2.1289,  0.6230, -0.7753,
           1.6599],
         [ 0.4122, -1.6406,  0.2190,  1.8017,  0.8496, -0.3056,  1.1786,
           1.6296]]], grad_fn=<AddBackward0>)
Embedded Tokens with Positional Encoding:
 tensor([[[-2.4727e-01, -1.4277e-01, -6.6926e-01,  1.2245e+00, -1.5599e+00,
           6.0804e-01, -1.3258e-01,  2.1621e+00],
         [ 1.3388e+00,  1.9353e+00, -1.5473e-01,  1.1778e+00,  1.2194e-02,
          -1.5722e-01,  3.5833e-01,  4.5653e-01],
         [ 1.3592e+00, -1.1438e+00,  1.5737e+00,  2.7882e-01,  5.4682e-01,
           6.1451e-01,  2.8137e+00, -7.5980e-01],
         [-5.0337e-01, -2.5805e+00, -1.0424e+00,

# **Part 2: Add a Feedforward Layer**

1.	**Define a Feedforward Neural Network**
Implement a simple feedforward layer as part of the encoder.


In [16]:
feedforward = nn.Sequential(
    nn.Linear(embedding_dim, 16),
    nn.ReLU(),
    nn.Linear(16, embedding_dim)
)
ff_output = feedforward(embedded_with_pos_1)
print("Feedforward Output:\n", ff_output)
ff_output = feedforward(embedded_with_pos_2)
print("Feedforward Output:\n", ff_output)
ff_output = feedforward(embedded_with_pos_3)
print("Feedforward Output:\n", ff_output)

Feedforward Output:
 tensor([[[-0.1571,  0.0935,  0.0161, -0.2819,  0.1451,  0.1614, -0.2966,
           0.0821],
         [-0.2075,  0.1575, -0.1235, -0.2343,  0.1288,  0.3077, -0.0289,
           0.0817],
         [ 0.0547, -0.1875,  0.0627, -0.0278,  0.1793,  0.1280, -0.0423,
           0.2004],
         [-0.1711,  0.0078, -0.1642, -0.0315,  0.2445,  0.0642, -0.1423,
           0.1153],
         [-0.1161,  0.1205,  0.0069, -0.1451,  0.1816,  0.2675, -0.0855,
           0.0290]]], grad_fn=<ViewBackward0>)
Feedforward Output:
 tensor([[[-0.3326,  0.0317, -0.2559, -0.2773,  0.1369, -0.0585, -0.1441,
           0.2352],
         [-0.3508,  0.2016, -0.1888, -0.4201,  0.1795,  0.1333, -0.2951,
           0.0887],
         [ 0.4556, -0.2816,  0.2130, -0.0072,  0.0354,  0.1057,  0.2030,
           0.2080],
         [-0.1844,  0.5111,  0.0770,  0.1655, -0.0082,  0.3401,  0.3829,
           0.0276],
         [-0.1472, -0.0150, -0.0347, -0.0994,  0.1801,  0.1836, -0.1619,
           0.1063]]],

# **Part 3: Combine the Components into an Encoder Block**

1.	**Define the Encoder Block**
Combine the embedding, positional encoding, and feedforward components into an encoder block.


In [18]:
class TransformerEncoderBlock(nn.Module):
    def __init__(self, vocab_size, embedding_dim):
        super(TransformerEncoderBlock, self).__init__()
        self.embedding = nn.Embedding(vocab_size, embedding_dim)
        self.feedforward = nn.Sequential(
            nn.Linear(embedding_dim, 16),
            nn.ReLU(),
            nn.Linear(16, embedding_dim)
        )
        self.layer_norm = nn.LayerNorm(embedding_dim)

    def forward(self, x):
        embed = self.embedding(x)
        pos_enc = positional_encoding(x.size(1), embed.size(2))
        embed_with_pos = embed + pos_enc.unsqueeze(0)
        ff_output = self.feedforward(embed_with_pos)
        return self.layer_norm(embed_with_pos + ff_output)

encoder = TransformerEncoderBlock(vocab_size, embedding_dim)
output = encoder(token_ids_1)
print("Encoder Output:\n", output)
output = encoder(token_ids_2)
print("Encoder Output:\n", output)
output = encoder(token_ids_3)
print("Encoder Output:\n", output)


Encoder Output:
 tensor([[[-0.9931, -0.0568,  0.0662,  1.6722, -1.1605,  0.3061, -1.0908,
           1.2567],
         [-0.7680, -0.9188,  0.2827, -1.3189,  0.0515,  1.9645, -0.2098,
           0.9168],
         [ 0.4904, -1.3195, -1.1697, -1.1967,  0.5845,  1.2958,  0.2502,
           1.0650],
         [ 0.2816, -2.2282, -0.3123,  1.1132, -0.2728,  1.2251, -0.0148,
           0.2082],
         [-1.4596, -0.1393,  0.3228,  1.4708, -0.5630, -0.4751, -0.7369,
           1.5802]]], grad_fn=<NativeLayerNormBackward0>)
Encoder Output:
 tensor([[[-1.3450,  0.2081,  0.8977,  1.2969, -1.7840,  0.5439, -0.1959,
           0.3783],
         [ 0.6199,  1.1988,  0.4147,  0.7988, -1.7309,  0.1333, -1.5314,
           0.0967],
         [ 1.2797, -0.3752,  1.5477, -0.0791,  0.0944, -0.2915, -1.9122,
          -0.2637],
         [-0.0841, -1.5040, -0.3662,  1.5027,  0.0521, -0.2158, -0.9373,
           1.5527],
         [-1.6643, -1.2346,  0.4919, -0.4657,  0.7313,  1.4379, -0.0939,
           0.7974]

**Part 4: Experiment with Different Inputs**

* Test with Different Sentences
Replace token_ids with new examples to observe how the encoder processes different inputs.
* Modify Hyperparameters
Experiment with different embedding sizes, feedforward dimensions, or positional encoding scales to see their effect on the output.


# **Summary**

By completing this lab, you have:

* Understood the role of embedding, positional encoding, and feedforward layers in the Transformer encoder.
* Gained hands-on experience implementing a core component of the Transformer architecture.
* Developed a deeper appreciation for the architecture’s design and functionality.

This lab builds foundational knowledge of the Transformer, preparing you for more advanced concepts like self-attention.
