**Name: Mahesh Marathe**
  

  
  **PRN-22211533**
  
  **Roll_No.: 391035**

**Batch-A2**

Title:-Create a Transformer using Pytorch Library

**Objectives:**

1.To understand the architecture and functioning of the Transformer model in deep learning.

2.To implement a custom Transformer architecture using PyTorch.

3.To learn the practical usage of attention mechanisms, especially self-attention.

4.To explore the role of Transformer models in modern NLP tasks like machine translation, summarization, and question answering.

**Theory:**

The Transformer model, introduced in the paper “Attention is All You Need” (Vaswani et al., 2017), is a deep learning architecture based purely on attention mechanisms, removing recurrence entirely. It has become the foundation for models like BERT, GPT, and T5.

Key Components of Transformer:
Input Embedding + Positional Encoding:
Since Transformers don't have recurrence, positional encodings are added to input embeddings to provide token order information.

Multi-Head Self Attention:
Allows the model to focus on different positions of the sequence for each token, enhancing context understanding.

Feed Forward Neural Network:
Each position's attention output is passed through a fully connected network.

Add & Norm:
Layer normalization and residual connections ensure stability and better gradient flow.

Encoder & Decoder Blocks:

Encoder: Processes the input sequence.

Decoder: Generates the output sequence using encoder outputs and self-attention.

Applications:
Machine Translation

Text Summarization

Chatbots

Speech Recognition

Text Generation


In [None]:
import torch
import torch.nn as nn
import torch.optim as optim
from torch.nn import Transformer
import math


In [None]:
torch.manual_seed(42)


<torch._C.Generator at 0x7ecbcc5da630>

In [None]:
src_vocab_size = 1000
tgt_vocab_size = 1000
embedding_size = 512
num_heads = 8
num_encoder_layers = 3
num_decoder_layers = 3
dropout = 0.1
src_seq_length = 10
tgt_seq_length = 10
batch_size = 32


In [None]:
class PositionalEncoding(nn.Module):
    def __init__(self, emb_size, maxlen=5000):
        super(PositionalEncoding, self).__init__()
        pe = torch.zeros(maxlen, emb_size)
        position = torch.arange(0, maxlen, dtype=torch.float).unsqueeze(1)
        div_term = torch.exp(torch.arange(0, emb_size, 2).float() * (-math.log(10000.0) / emb_size))
        pe[:, 0::2] = torch.sin(position * div_term)
        pe[:, 1::2] = torch.cos(position * div_term)
        pe = pe.unsqueeze(0)
        self.register_buffer('pe', pe)

    def forward(self, x):
        x = x + self.pe[:, :x.size(1)]
        return x

In [None]:
class TransformerModel(nn.Module):
    def __init__(self, src_vocab, tgt_vocab, emb_size, nhead, num_encoder_layers, num_decoder_layers, dropout):
        super(TransformerModel, self).__init__()
        self.src_embedding = nn.Embedding(src_vocab, emb_size)
        self.tgt_embedding = nn.Embedding(tgt_vocab, emb_size)
        self.pos_encoder = PositionalEncoding(emb_size)
        self.pos_decoder = PositionalEncoding(emb_size)
        self.transformer = Transformer(d_model=emb_size, nhead=nhead, num_encoder_layers=num_encoder_layers,
                                       num_decoder_layers=num_decoder_layers, dropout=dropout)
        self.fc_out = nn.Linear(emb_size, tgt_vocab)

    def forward(self, src, tgt):
        src_emb = self.pos_encoder(self.src_embedding(src))
        tgt_emb = self.pos_decoder(self.tgt_embedding(tgt))
        output = self.transformer(src_emb.permute(1, 0, 2), tgt_emb.permute(1, 0, 2))
        output = self.fc_out(output.permute(1, 0, 2))
        return output

In [None]:
model = TransformerModel(src_vocab=src_vocab_size, tgt_vocab=tgt_vocab_size,
                         emb_size=embedding_size, nhead=num_heads,
                         num_encoder_layers=num_encoder_layers,
                         num_decoder_layers=num_decoder_layers,
                         dropout=dropout)




In [None]:
src = torch.randint(0, src_vocab_size, (batch_size, src_seq_length))
tgt = torch.randint(0, tgt_vocab_size, (batch_size, tgt_seq_length))

In [None]:
output = model(src, tgt)
print("Transformer output shape:", output.shape)

Transformer output shape: torch.Size([32, 10, 1000])


**Conclusion:**

This assignment demonstrates the construction of a basic Transformer model using PyTorch. By creating each component from scratch, we gain a deeper understanding of how attention mechanisms power modern deep learning models. Transformers continue to revolutionize NLP and have applications in audio, vision, and multi-modal AI systems.