<a href="https://colab.research.google.com/github/geithelmasri/AAI614_Geith1/blob/main/Building_a_Simplified_Transformer_Encoder.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# **Week 3 Hands-on Lab: Building a Simplified Transformer Encoder**

This hands-on lab allows you to understand the Transformer architecture by implementing a basic Transformer encoder. You will learn how input embeddings, positional encodings, and feedforward layers work together in an encoder block. We will be using the Torch framework to build a simple transformer encoder.

# **Part 1: Input Embedding and Positional Encoding**

**1.	Generate Input Data**
Define a sample sentence and tokenize it into a numerical format.


In [1]:
import torch
import torch.nn as nn
import numpy as np

# Example sentence and token IDs (simplified for illustration)
token_ids = torch.tensor([[1, 2, 3, 4, 5]])  # Tokenized sentence
vocab_size = 10  # Vocabulary size
embedding_dim = 8  # Embedding size

**2. Create an Embedding Layer**
Implement the embedding layer to convert token IDs into dense vectors.

In [2]:
embedding_layer = nn.Embedding(vocab_size, embedding_dim)
embedded_tokens = embedding_layer(token_ids)
print("Embedded Tokens:\n", embedded_tokens)

Embedded Tokens:
 tensor([[[-0.3395,  0.1483, -0.0963,  0.1969,  0.3873, -0.8638, -0.8736,
           0.0853],
         [-1.0874, -0.7826, -0.9653,  0.4712,  0.5091,  0.3613, -0.0342,
           1.4888],
         [ 0.2657,  0.3678,  1.8991, -2.1339, -0.2700, -0.4976, -2.3552,
           1.1595],
         [-0.7438, -1.0204,  0.0696, -0.8761,  1.7035,  0.3335,  0.5493,
           0.6885],
         [-0.0164,  1.9604, -0.5623,  0.2379, -0.3100, -0.1869, -0.2129,
           0.7123]]], grad_fn=<EmbeddingBackward0>)


**3.	Add Positional Encoding**
Incorporate positional encoding to provide positional information to the model.


In [3]:
def positional_encoding(seq_len, embedding_dim):
    position = np.arange(seq_len)[:, np.newaxis]
    div_term = np.exp(np.arange(0, embedding_dim, 2) * -(np.log(10000.0) / embedding_dim))
    pe = np.zeros((seq_len, embedding_dim))
    pe[:, 0::2] = np.sin(position * div_term)
    pe[:, 1::2] = np.cos(position * div_term)
    return torch.tensor(pe, dtype=torch.float)

seq_len = token_ids.size(1)
pos_encoding = positional_encoding(seq_len, embedding_dim)
print("Positional Encoding:\n", pos_encoding)


Positional Encoding:
 tensor([[ 0.0000e+00,  1.0000e+00,  0.0000e+00,  1.0000e+00,  0.0000e+00,
          1.0000e+00,  0.0000e+00,  1.0000e+00],
        [ 8.4147e-01,  5.4030e-01,  9.9833e-02,  9.9500e-01,  9.9998e-03,
          9.9995e-01,  1.0000e-03,  1.0000e+00],
        [ 9.0930e-01, -4.1615e-01,  1.9867e-01,  9.8007e-01,  1.9999e-02,
          9.9980e-01,  2.0000e-03,  1.0000e+00],
        [ 1.4112e-01, -9.8999e-01,  2.9552e-01,  9.5534e-01,  2.9996e-02,
          9.9955e-01,  3.0000e-03,  1.0000e+00],
        [-7.5680e-01, -6.5364e-01,  3.8942e-01,  9.2106e-01,  3.9989e-02,
          9.9920e-01,  4.0000e-03,  9.9999e-01]])


Add the positional encoding to the embedded tokens:

In [4]:
embedded_with_pos = embedded_tokens + pos_encoding.unsqueeze(0)
print("Embedded Tokens with Positional Encoding:\n", embedded_with_pos)


Embedded Tokens with Positional Encoding:
 tensor([[[-0.3395,  1.1483, -0.0963,  1.1969,  0.3873,  0.1362, -0.8736,
           1.0853],
         [-0.2459, -0.2423, -0.8655,  1.4662,  0.5191,  1.3612, -0.0332,
           2.4888],
         [ 1.1750, -0.0484,  2.0977, -1.1538, -0.2500,  0.5022, -2.3532,
           2.1595],
         [-0.6027, -2.0104,  0.3651,  0.0793,  1.7335,  1.3331,  0.5523,
           1.6884],
         [-0.7732,  1.3067, -0.1729,  1.1589, -0.2700,  0.8123, -0.2089,
           1.7123]]], grad_fn=<AddBackward0>)


# **Part 2: Add a Feedforward Layer**

1.	**Define a Feedforward Neural Network**
Implement a simple feedforward layer as part of the encoder.


In [5]:
feedforward = nn.Sequential(
    nn.Linear(embedding_dim, 16),
    nn.ReLU(),
    nn.Linear(16, embedding_dim)
)
ff_output = feedforward(embedded_with_pos)
print("Feedforward Output:\n", ff_output)


Feedforward Output:
 tensor([[[ 0.4244, -0.2674, -0.0852,  0.0999, -0.0216,  0.0861, -0.2141,
          -0.3497],
         [ 0.6305, -0.1091, -0.0598,  0.0283,  0.2752, -0.0334, -0.0078,
          -0.9175],
         [ 0.5153, -0.2356, -0.2940,  0.1847,  0.0290, -0.0071, -0.4606,
          -0.5978],
         [ 0.8648, -0.0198, -0.1940, -0.1126,  0.4668, -0.0191, -0.0405,
          -1.2217],
         [ 0.4955, -0.2597, -0.1891,  0.1449,  0.0370,  0.0496, -0.1711,
          -0.4550]]], grad_fn=<ViewBackward0>)


# **Part 3: Combine the Components into an Encoder Block**

1.	**Define the Encoder Block**
Combine the embedding, positional encoding, and feedforward components into an encoder block.


In [6]:
class TransformerEncoderBlock(nn.Module):
    def __init__(self, vocab_size, embedding_dim):
        super(TransformerEncoderBlock, self).__init__()
        self.embedding = nn.Embedding(vocab_size, embedding_dim)
        self.feedforward = nn.Sequential(
            nn.Linear(embedding_dim, 16),
            nn.ReLU(),
            nn.Linear(16, embedding_dim)
        )
        self.layer_norm = nn.LayerNorm(embedding_dim)

    def forward(self, x):
        embed = self.embedding(x)
        pos_enc = positional_encoding(x.size(1), embed.size(2))
        embed_with_pos = embed + pos_enc.unsqueeze(0)
        ff_output = self.feedforward(embed_with_pos)
        return self.layer_norm(embed_with_pos + ff_output)

encoder = TransformerEncoderBlock(vocab_size, embedding_dim)
output = encoder(token_ids)
print("Encoder Output:\n", output)


Encoder Output:
 tensor([[[ 0.4006,  0.8551,  0.2504, -1.5370, -1.0545,  0.8056,  1.3405,
          -1.0608],
         [ 0.4120,  0.0219, -0.7859, -0.5379, -1.9107,  1.0950,  0.2965,
           1.4089],
         [-1.2847, -0.8070,  1.9712,  0.8418,  0.6551, -0.4991, -0.5816,
          -0.2958],
         [-1.5421, -0.4892, -0.6163,  0.7889, -1.1101,  1.3615,  0.7692,
           0.8382],
         [-1.2884,  1.2501, -0.3774,  1.5386, -0.3200, -1.3350, -0.0814,
           0.6135]]], grad_fn=<NativeLayerNormBackward0>)


**Part 4: Experiment with Different Inputs**

* Test with Different Sentences
Replace token_ids with new examples to observe how the encoder processes different inputs.
* Modify Hyperparameters
Experiment with different embedding sizes, feedforward dimensions, or positional encoding scales to see their effect on the output.


In [7]:

# Example 1: A longer sentence
token_ids_1 = torch.tensor([[1, 2, 3, 4, 5, 6, 7, 8]])
output_1 = encoder(token_ids_1)
print("Encoder Output with longer sentence:\n", output_1)

# Example 2: A shorter sentence
token_ids_2 = torch.tensor([[1, 2, 3]])
output_2 = encoder(token_ids_2)
print("Encoder Output with shorter sentence:\n", output_2)

# Example 3: A sentence with repeated tokens
token_ids_3 = torch.tensor([[1, 1, 2, 2, 3]])
output_3 = encoder(token_ids_3)
print("Encoder Output with repeated tokens:\n", output_3)


Encoder Output with longer sentence:
 tensor([[[ 0.4006,  0.8551,  0.2504, -1.5370, -1.0545,  0.8056,  1.3405,
          -1.0608],
         [ 0.4120,  0.0219, -0.7859, -0.5379, -1.9107,  1.0950,  0.2965,
           1.4089],
         [-1.2847, -0.8070,  1.9712,  0.8418,  0.6551, -0.4991, -0.5816,
          -0.2958],
         [-1.5421, -0.4892, -0.6163,  0.7889, -1.1101,  1.3615,  0.7692,
           0.8382],
         [-1.2884,  1.2501, -0.3774,  1.5386, -0.3200, -1.3350, -0.0814,
           0.6135],
         [-1.8068,  0.4872, -0.1251, -0.4395, -0.2045,  2.0094, -0.2817,
           0.3611],
         [-0.9836,  0.1381, -0.5481,  0.8788,  0.5302,  1.8115, -0.3180,
          -1.5090],
         [ 0.5170,  1.8270, -0.2839, -0.5932, -0.8515,  0.9909, -1.4978,
          -0.1086]]], grad_fn=<NativeLayerNormBackward0>)
Encoder Output with shorter sentence:
 tensor([[[ 0.4006,  0.8551,  0.2504, -1.5370, -1.0545,  0.8056,  1.3405,
          -1.0608],
         [ 0.4120,  0.0219, -0.7859, -0.5379, -1

In [8]:

import torch
import torch.nn as nn
import numpy as np



# Modify Hyperparameters
embedding_dim_list = [8, 16, 32]  # different embedding sizes
ff_dim_list = [16, 32, 64]  # different feedforward dimensions
pos_scale_list = [10000.0, 5000.0, 20000.0]  # different positional encoding scales

for embedding_dim in embedding_dim_list:
  for ff_dim in ff_dim_list:
    for pos_scale in pos_scale_list:
      print(f"\nExperiment with embedding_dim={embedding_dim}, ff_dim={ff_dim}, pos_scale={pos_scale}")

      # positional encoding with new scale
      def positional_encoding_modified(seq_len, embedding_dim, scale):
          position = np.arange(seq_len)[:, np.newaxis]
          div_term = np.exp(np.arange(0, embedding_dim, 2) * -(np.log(scale) / embedding_dim))
          pe = np.zeros((seq_len, embedding_dim))
          pe[:, 0::2] = np.sin(position * div_term)
          pe[:, 1::2] = np.cos(position * div_term)
          return torch.tensor(pe, dtype=torch.float)

      #  new encoder with modified hyperparameters
      encoder_modified = TransformerEncoderBlock(vocab_size, embedding_dim)
      encoder_modified.feedforward = nn.Sequential(
          nn.Linear(embedding_dim, ff_dim),
          nn.ReLU(),
          nn.Linear(ff_dim, embedding_dim)
      )

      # Test
      output_modified = encoder_modified(token_ids)
      print("Modified Encoder Output:\n", output_modified)



Experiment with embedding_dim=8, ff_dim=16, pos_scale=10000.0
Modified Encoder Output:
 tensor([[[-1.1230, -1.3283, -0.0488,  0.1102, -0.9538,  1.5336,  1.0786,
           0.7314],
         [ 0.3642, -1.1404, -0.0900,  0.3431, -1.3979,  0.6861, -0.6580,
           1.8929],
         [-0.2039, -1.3532, -1.6873,  1.4019,  0.2283,  0.7110,  0.0335,
           0.8696],
         [ 1.3296, -0.7619,  0.7278,  0.7544,  0.1753,  0.5033, -1.8868,
          -0.8417],
         [-0.0370, -0.4359, -1.1212,  0.8535, -0.5538,  1.2891, -1.3857,
           1.3909]]], grad_fn=<NativeLayerNormBackward0>)

Experiment with embedding_dim=8, ff_dim=16, pos_scale=5000.0
Modified Encoder Output:
 tensor([[[-0.2874,  0.3574, -0.0981, -1.7109,  0.7651,  1.9446, -0.3999,
          -0.5708],
         [ 1.2407, -0.6639,  0.6269, -0.8499, -1.5044,  0.7030, -0.7878,
           1.2354],
         [ 1.5458, -0.8047,  0.1134,  0.0747, -1.0863,  0.1619, -1.3695,
           1.3648],
         [-1.1717,  0.6742, -0.1544,  1.4

# **Summary**

By completing this lab, you have:

* Understood the role of embedding, positional encoding, and feedforward layers in the Transformer encoder.
* Gained hands-on experience implementing a core component of the Transformer architecture.
* Developed a deeper appreciation for the architecture’s design and functionality.

This lab builds foundational knowledge of the Transformer, preparing you for more advanced concepts like self-attention.
