<a href="https://colab.research.google.com/github/lamyse1/GenAI/blob/main/Week3/Ex2.Building_a_Simplified_Transformer_Encoder.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# **Week 3 Hands-on Lab: Building a Simplified Transformer Encoder**

This hands-on lab allows you to understand the Transformer architecture by implementing a basic Transformer encoder. You will learn how input embeddings, positional encodings, and feedforward layers work together in an encoder block. We will be using the Torch framework to build a simple transformer encoder.

# **Part 1: Input Embedding and Positional Encoding**

**1.	Generate Input Data**
Define a sample sentence and tokenize it into a numerical format.


In [1]:
import torch
import torch.nn as nn
import numpy as np

# Example sentence and token IDs (simplified for illustration)
token_ids = torch.tensor([[1, 2, 3, 4, 5]])  # Tokenized sentence
vocab_size = 10  # Vocabulary size
embedding_dim = 8  # Embedding size

**2. Create an Embedding Layer**
Implement the embedding layer to convert token IDs into dense vectors.

In [2]:
embedding_layer = nn.Embedding(vocab_size, embedding_dim)
embedded_tokens = embedding_layer(token_ids)
print("Embedded Tokens:\n", embedded_tokens)

Embedded Tokens:
 tensor([[[ 0.2448, -0.2579,  0.2403,  1.4379, -0.1118,  0.1126,  0.3993,
          -0.4770],
         [-0.6451,  0.4526,  2.6516,  0.1126, -1.6482,  1.1013,  0.5487,
          -0.5053],
         [ 0.1692, -0.3968, -0.1419,  0.4164, -1.2760, -1.0351, -0.6201,
          -1.0014],
         [ 0.6725,  1.1687,  0.4561, -1.9743, -0.7707, -0.1076, -1.9188,
           0.2917],
         [ 0.2650, -0.5033,  0.2574,  0.4430,  1.8879, -0.7403,  2.0639,
           0.1423]]], grad_fn=<EmbeddingBackward0>)


**3.	Add Positional Encoding**
Incorporate positional encoding to provide positional information to the model.


In [3]:
def positional_encoding(seq_len, embedding_dim):
    position = np.arange(seq_len)[:, np.newaxis]
    div_term = np.exp(np.arange(0, embedding_dim, 2) * -(np.log(10000.0) / embedding_dim))
    pe = np.zeros((seq_len, embedding_dim))
    pe[:, 0::2] = np.sin(position * div_term)
    pe[:, 1::2] = np.cos(position * div_term)
    return torch.tensor(pe, dtype=torch.float)

seq_len = token_ids.size(1)
pos_encoding = positional_encoding(seq_len, embedding_dim)
print("Positional Encoding:\n", pos_encoding)


Positional Encoding:
 tensor([[ 0.0000e+00,  1.0000e+00,  0.0000e+00,  1.0000e+00,  0.0000e+00,
          1.0000e+00,  0.0000e+00,  1.0000e+00],
        [ 8.4147e-01,  5.4030e-01,  9.9833e-02,  9.9500e-01,  9.9998e-03,
          9.9995e-01,  1.0000e-03,  1.0000e+00],
        [ 9.0930e-01, -4.1615e-01,  1.9867e-01,  9.8007e-01,  1.9999e-02,
          9.9980e-01,  2.0000e-03,  1.0000e+00],
        [ 1.4112e-01, -9.8999e-01,  2.9552e-01,  9.5534e-01,  2.9996e-02,
          9.9955e-01,  3.0000e-03,  1.0000e+00],
        [-7.5680e-01, -6.5364e-01,  3.8942e-01,  9.2106e-01,  3.9989e-02,
          9.9920e-01,  4.0000e-03,  9.9999e-01]])


Add the positional encoding to the embedded tokens:

In [4]:
embedded_with_pos = embedded_tokens + pos_encoding.unsqueeze(0)
print("Embedded Tokens with Positional Encoding:\n", embedded_with_pos)


Embedded Tokens with Positional Encoding:
 tensor([[[ 2.4482e-01,  7.4211e-01,  2.4028e-01,  2.4379e+00, -1.1180e-01,
           1.1126e+00,  3.9929e-01,  5.2305e-01],
         [ 1.9633e-01,  9.9293e-01,  2.7514e+00,  1.1076e+00, -1.6382e+00,
           2.1013e+00,  5.4969e-01,  4.9468e-01],
         [ 1.0785e+00, -8.1292e-01,  5.6792e-02,  1.3964e+00, -1.2560e+00,
          -3.5344e-02, -6.1813e-01, -1.4076e-03],
         [ 8.1359e-01,  1.7874e-01,  7.5164e-01, -1.0190e+00, -7.4069e-01,
           8.9198e-01, -1.9158e+00,  1.2917e+00],
         [-4.9184e-01, -1.1569e+00,  6.4680e-01,  1.3641e+00,  1.9279e+00,
           2.5887e-01,  2.0679e+00,  1.1423e+00]]], grad_fn=<AddBackward0>)


# **Part 2: Add a Feedforward Layer**

1.	**Define a Feedforward Neural Network**
Implement a simple feedforward layer as part of the encoder.


In [5]:
feedforward = nn.Sequential(
    nn.Linear(embedding_dim, 16),
    nn.ReLU(),
    nn.Linear(16, embedding_dim)
)
ff_output = feedforward(embedded_with_pos)
print("Feedforward Output:\n", ff_output)


Feedforward Output:
 tensor([[[ 0.1721,  0.3681,  0.3267, -0.1444,  0.0419, -0.5988,  0.3962,
           0.2048],
         [ 0.0111,  0.3325,  0.3242, -0.4747,  0.0032, -0.6876,  0.3106,
           0.4848],
         [ 0.0057,  0.1898,  0.3249, -0.2349,  0.0977, -0.5837,  0.2542,
           0.2067],
         [-0.0278, -0.0283,  0.0932,  0.0022, -0.0068, -0.3013,  0.1213,
           0.1411],
         [ 0.4647,  0.4109,  0.0456, -0.2412,  0.1968, -0.9455, -0.0137,
           0.0439]]], grad_fn=<ViewBackward0>)


# **Part 3: Combine the Components into an Encoder Block**

1.	**Define the Encoder Block**
Combine the embedding, positional encoding, and feedforward components into an encoder block.


In [6]:
class TransformerEncoderBlock(nn.Module):
    def __init__(self, vocab_size, embedding_dim):
        super(TransformerEncoderBlock, self).__init__()
        self.embedding = nn.Embedding(vocab_size, embedding_dim)
        self.feedforward = nn.Sequential(
            nn.Linear(embedding_dim, 16),
            nn.ReLU(),
            nn.Linear(16, embedding_dim)
        )
        self.layer_norm = nn.LayerNorm(embedding_dim)

    def forward(self, x):
        embed = self.embedding(x)
        pos_enc = positional_encoding(x.size(1), embed.size(2))
        embed_with_pos = embed + pos_enc.unsqueeze(0)
        ff_output = self.feedforward(embed_with_pos)
        return self.layer_norm(embed_with_pos + ff_output)

encoder = TransformerEncoderBlock(vocab_size, embedding_dim)
output = encoder(token_ids)
print("Encoder Output:\n", output)


Encoder Output:
 tensor([[[-1.0448, -0.5450,  0.9632,  1.7050,  0.0192,  0.0734, -1.6066,
           0.4356],
         [ 0.9758, -1.8320,  0.3634,  0.9183, -0.4838, -1.2596,  0.5426,
           0.7754],
         [ 2.4935, -0.1755, -0.5465, -0.1991, -1.1428, -0.0416, -0.3174,
          -0.0705],
         [-1.1091, -1.5216, -0.2775,  0.4408,  0.4907,  1.9340, -0.2957,
           0.3384],
         [ 0.5874, -1.0779, -0.9069,  1.1809, -1.3886, -0.0961,  0.1830,
           1.5183]]], grad_fn=<NativeLayerNormBackward0>)


**Part 4: Experiment with Different Inputs**

* Test with Different Sentences
Replace token_ids with new examples to observe how the encoder processes different inputs.
* Modify Hyperparameters
Experiment with different embedding sizes, feedforward dimensions, or positional encoding scales to see their effect on the output.


# **Summary**

By completing this lab, you have:

* Understood the role of embedding, positional encoding, and feedforward layers in the Transformer encoder.
* Gained hands-on experience implementing a core component of the Transformer architecture.
* Developed a deeper appreciation for the architectureâ€™s design and functionality.

This lab builds foundational knowledge of the Transformer, preparing you for more advanced concepts like self-attention.
