# Course: Introduction To GenAI

*Notebook: Building_a_Simplified_Transformer_Encoder.ipynb*

<a href="https://colab.research.google.com/github/gassaf2/IntroductionToGenAI/blob/main/week3/Building_a_Simplified_Transformer_Encoder.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# **Week 3 Hands-on Lab: Building a Simplified Transformer Encoder**

This hands-on lab allows you to understand the Transformer architecture by implementing a basic Transformer encoder. You will learn how input embeddings, positional encodings, and feedforward layers work together in an encoder block. We will be using the Torch framework to build a simple transformer encoder.

# **Part 1: Input Embedding and Positional Encoding**

**1.	Generate Input Data**
Define a sample sentence and tokenize it into a numerical format.


In [38]:
import torch
import torch.nn as nn
import numpy as np

# Example sentence and token IDs (simplified for illustration)
token_ids = torch.tensor([[1, 2, 3, 4, 5]])  # Tokenized sentence
vocab_size = 10  # Vocabulary size
embedding_dim = 8  # Embedding size


## Georges Assaf. modifying the tensor and the variable vocab_size and embedding_dim
token_ids = torch.tensor([[1, 2, 3, 4, 5,6]])  # Tokenized sentence
vocab_size = 20  # Vocabulary size
embedding_dim = 16  # Embedding size


**2. Create an Embedding Layer**
Implement the embedding layer to convert token IDs into dense vectors.

In [39]:
embedding_layer = nn.Embedding(vocab_size, embedding_dim)
embedded_tokens = embedding_layer(token_ids)
print("Embedded Tokens:\n", embedded_tokens)

Embedded Tokens:
 tensor([[[ 1.2946, -1.6176, -1.4281, -0.3603,  0.3961, -0.5066, -0.8373,
           0.0259, -0.1213, -0.8311,  0.0902, -0.4314, -0.9263,  0.1306,
           0.3347, -1.4473],
         [ 0.2507,  0.0974,  0.0320,  0.1344, -0.0562,  1.7124,  0.2595,
          -0.6400,  1.0983,  1.5185,  1.6119, -0.4933,  2.1423,  0.6859,
           1.7850, -2.0000],
         [ 1.0554, -2.2970,  0.2895,  0.5738, -1.5454,  0.4678,  0.6180,
           0.3973, -0.9604,  1.2303,  0.3447, -0.9984,  1.9419, -0.2734,
           1.9376, -2.0838],
         [-0.3119,  1.6137, -0.4106, -0.3783,  0.7321, -0.1668, -1.2336,
           0.9647, -0.3289,  0.0798,  0.5524,  0.0797,  1.1040,  1.2905,
          -1.2435, -0.0028],
         [-0.1095,  1.1084,  0.3514, -0.4367,  1.5886,  0.4718,  0.1962,
           2.0646,  0.9635, -0.4830, -1.0385, -0.6251,  0.1583,  0.2104,
          -0.2026, -0.5687],
         [-0.3105, -0.4894,  0.7193, -1.4652, -0.0137, -0.4502,  0.6476,
           0.8255,  0.2186, -0.260

**3.	Add Positional Encoding**
Incorporate positional encoding to provide positional information to the model.


In [40]:
def positional_encoding(seq_len, embedding_dim):
    position = np.arange(seq_len)[:, np.newaxis]
    div_term = np.exp(np.arange(0, embedding_dim, 2) * -(np.log(10000.0) / embedding_dim))
    pe = np.zeros((seq_len, embedding_dim))
    pe[:, 0::2] = np.sin(position * div_term)
    pe[:, 1::2] = np.cos(position * div_term)
    return torch.tensor(pe, dtype=torch.float)

seq_len = token_ids.size(1)
pos_encoding = positional_encoding(seq_len, embedding_dim)
print("Positional Encoding:\n", pos_encoding)


Positional Encoding:
 tensor([[ 0.0000e+00,  1.0000e+00,  0.0000e+00,  1.0000e+00,  0.0000e+00,
          1.0000e+00,  0.0000e+00,  1.0000e+00,  0.0000e+00,  1.0000e+00,
          0.0000e+00,  1.0000e+00,  0.0000e+00,  1.0000e+00,  0.0000e+00,
          1.0000e+00],
        [ 8.4147e-01,  5.4030e-01,  3.1098e-01,  9.5042e-01,  9.9833e-02,
          9.9500e-01,  3.1618e-02,  9.9950e-01,  9.9998e-03,  9.9995e-01,
          3.1623e-03,  9.9999e-01,  1.0000e-03,  1.0000e+00,  3.1623e-04,
          1.0000e+00],
        [ 9.0930e-01, -4.1615e-01,  5.9113e-01,  8.0658e-01,  1.9867e-01,
          9.8007e-01,  6.3203e-02,  9.9800e-01,  1.9999e-02,  9.9980e-01,
          6.3245e-03,  9.9998e-01,  2.0000e-03,  1.0000e+00,  6.3246e-04,
          1.0000e+00],
        [ 1.4112e-01, -9.8999e-01,  8.1265e-01,  5.8275e-01,  2.9552e-01,
          9.5534e-01,  9.4726e-02,  9.9550e-01,  2.9996e-02,  9.9955e-01,
          9.4867e-03,  9.9995e-01,  3.0000e-03,  1.0000e+00,  9.4868e-04,
          1.0000e+00]

Add the positional encoding to the embedded tokens:

In [41]:
embedded_with_pos = embedded_tokens + pos_encoding.unsqueeze(0)
print("Embedded Tokens with Positional Encoding:\n", embedded_with_pos)


Embedded Tokens with Positional Encoding:
 tensor([[[ 1.2946e+00, -6.1765e-01, -1.4281e+00,  6.3972e-01,  3.9609e-01,
           4.9344e-01, -8.3733e-01,  1.0259e+00, -1.2126e-01,  1.6891e-01,
           9.0194e-02,  5.6857e-01, -9.2628e-01,  1.1306e+00,  3.3474e-01,
          -4.4732e-01],
         [ 1.0921e+00,  6.3766e-01,  3.4296e-01,  1.0848e+00,  4.3619e-02,
           2.7074e+00,  2.9112e-01,  3.5952e-01,  1.1083e+00,  2.5184e+00,
           1.6151e+00,  5.0674e-01,  2.1433e+00,  1.6859e+00,  1.7853e+00,
          -1.0000e+00],
         [ 1.9647e+00, -2.7131e+00,  8.8061e-01,  1.3804e+00, -1.3467e+00,
           1.4479e+00,  6.8123e-01,  1.3953e+00, -9.4042e-01,  2.2301e+00,
           3.5104e-01,  1.5669e-03,  1.9439e+00,  7.2657e-01,  1.9382e+00,
          -1.0838e+00],
         [-1.7074e-01,  6.2370e-01,  4.0207e-01,  2.0448e-01,  1.0277e+00,
           7.8855e-01, -1.1389e+00,  1.9602e+00, -2.9895e-01,  1.0793e+00,
           5.6191e-01,  1.0797e+00,  1.1070e+00,  2.2905e+00

# **Part 2: Add a Feedforward Layer**

1.	**Define a Feedforward Neural Network**
Implement a simple feedforward layer as part of the encoder.


In [42]:
feedforward = nn.Sequential(
    nn.Linear(embedding_dim, 16),
    nn.ReLU(),
    nn.Linear(16, embedding_dim)
)
ff_output = feedforward(embedded_with_pos)
print("Feedforward Output:\n", ff_output)


Feedforward Output:
 tensor([[[-0.1572, -0.2207, -0.4988, -0.2427, -0.4057, -0.2053,  0.4138,
           0.1484, -0.2072,  0.4187, -0.1409,  0.3280, -0.0890,  0.1992,
          -0.0903, -0.5097],
         [-0.1450,  0.2609, -0.6227, -0.0029, -0.5559,  0.2653, -0.0430,
           0.1175, -0.2884,  0.4779, -0.5457,  0.1647,  0.2225,  0.6340,
          -0.0211, -0.7522],
         [-0.2102,  0.0908, -0.6323, -0.3001, -0.7394, -0.1145,  0.1823,
           0.0314, -0.5904,  0.5787, -0.5822,  0.3015,  0.2708,  0.4998,
          -0.0301, -0.9231],
         [ 0.0528, -0.0779, -0.2815, -0.0507, -0.1358,  0.1043,  0.2363,
          -0.0069,  0.0647,  0.2521, -0.1855,  0.3481, -0.1037,  0.0311,
          -0.0835, -0.3319],
         [ 0.0192, -0.0683, -0.1524,  0.0207, -0.2571, -0.0028,  0.1125,
           0.0142,  0.0444,  0.3107, -0.3062,  0.2579, -0.0363, -0.0447,
          -0.0898, -0.4226],
         [ 0.2295,  0.0241, -0.0819,  0.0567, -0.1269, -0.0971, -0.0043,
           0.1104,  0.0459,  0.

# **Part 3: Combine the Components into an Encoder Block**

1.	**Define the Encoder Block**
Combine the embedding, positional encoding, and feedforward components into an encoder block.


In [43]:
class TransformerEncoderBlock(nn.Module):
    def __init__(self, vocab_size, embedding_dim):
        super(TransformerEncoderBlock, self).__init__()
        self.embedding = nn.Embedding(vocab_size, embedding_dim)
        self.feedforward = nn.Sequential(
            nn.Linear(embedding_dim, 16),
            nn.ReLU(),
            nn.Linear(16, embedding_dim)
        )
        self.layer_norm = nn.LayerNorm(embedding_dim)

    def forward(self, x):
        embed = self.embedding(x)
        pos_enc = positional_encoding(x.size(1), embed.size(2))
        embed_with_pos = embed + pos_enc.unsqueeze(0)
        ff_output = self.feedforward(embed_with_pos)
        return self.layer_norm(embed_with_pos + ff_output)

encoder = TransformerEncoderBlock(vocab_size, embedding_dim)
output = encoder(token_ids)
print("Encoder Output:\n", output)


Encoder Output:
 tensor([[[ 1.3294e-01,  1.4697e+00, -6.7457e-01, -2.7197e-01, -1.8065e-03,
           1.5336e+00, -2.2453e+00,  7.3532e-01, -4.6048e-01, -5.3451e-01,
          -5.0691e-02,  1.3836e+00,  1.7041e-01, -1.5720e+00, -2.3713e-01,
           6.2297e-01],
         [ 7.7386e-01,  1.2560e+00, -6.5581e-02,  6.4131e-01, -4.5938e-01,
           1.5621e+00, -4.6505e-01,  5.0597e-01, -2.7963e+00, -7.2114e-01,
          -5.6513e-01,  6.3266e-01,  2.1282e-02,  7.1487e-01, -2.1978e-01,
          -8.1577e-01],
         [ 7.2185e-01, -1.0716e+00, -2.4328e-01, -2.7066e-01,  8.2456e-01,
           1.0211e+00, -1.4951e+00,  1.4192e-01, -9.2208e-01,  9.3943e-01,
          -1.8538e+00,  1.4610e+00, -8.3058e-01, -2.6823e-02,  1.4873e+00,
           1.1674e-01],
         [-1.8008e+00, -1.3058e+00,  5.2797e-01,  1.0503e+00,  2.6161e-01,
           3.7693e-01,  2.7536e-01,  1.1176e+00,  1.4152e-02,  1.8708e+00,
           4.7820e-01, -1.9664e+00, -7.2031e-01, -4.9409e-02,  3.6081e-03,
          -

**Part 4: Experiment with Different Inputs**

* Test with Different Sentences
Replace token_ids with new examples to observe how the encoder processes different inputs.
* Modify Hyperparameters
Experiment with different embedding sizes, feedforward dimensions, or positional encoding scales to see their effect on the output.


# Experiment

# **Summary**

By completing this lab, you have:

* Understood the role of embedding, positional encoding, and feedforward layers in the Transformer encoder.
* Gained hands-on experience implementing a core component of the Transformer architecture.
* Developed a deeper appreciation for the architecture’s design and functionality.

This lab builds foundational knowledge of the Transformer, preparing you for more advanced concepts like self-attention.
