# CS455 Project
by Ehtisham Khalid 2021147<br>



## Urdu2Eng Transformer Using Sinusoidal Positional Embeddings

In [None]:
# Install with pip (recommended to run in a Jupyter cell or script, not directly in Python shell)
!pip install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cu118  # or change cu118 to your CUDA version or use 'cpu'
!pip install spacy
!pip install nltk
!python -m spacy download en_core_web_sm


Looking in indexes: https://download.pytorch.org/whl/cu118
INFO: pip is looking at multiple versions of torch to determine which version is compatible with other requirements. This could take a while.
Collecting torch
  Downloading https://download.pytorch.org/whl/cu118/torch-2.7.0%2Bcu118-cp311-cp311-manylinux_2_28_x86_64.whl.metadata (28 kB)
Collecting sympy>=1.13.3 (from torch)
  Downloading https://download.pytorch.org/whl/sympy-1.13.3-py3-none-any.whl.metadata (12 kB)
Collecting nvidia-cuda-nvrtc-cu11==11.8.89 (from torch)
  Downloading https://download.pytorch.org/whl/cu118/nvidia_cuda_nvrtc_cu11-11.8.89-py3-none-manylinux1_x86_64.whl (23.2 MB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m23.2/23.2 MB[0m [31m107.1 MB/s[0m eta [36m0:00:00[0m
[?25hCollecting nvidia-cuda-runtime-cu11==11.8.89 (from torch)
  Downloading https://download.pytorch.org/whl/cu118/nvidia_cuda_runtime_cu11-11.8.89-py3-none-manylinux1_x86_64.whl (875 kB)
[2K     [90m━━━━━━━━━━━━━━━━

In [6]:
import copy
from typing import Optional, Any, Union, Callable

import torch
import math
import time
import torch.nn as nn
from torch import Tensor
import torch.nn.functional as F
from torch.nn import Module
from torch.nn import MultiheadAttention
from torch.nn import ModuleList
from torch.nn.init import xavier_uniform_
from torch.nn import Dropout
from torch.nn import Linear
from torch.nn import LayerNorm

import spacy

from collections import Counter
import io
#from torchtext.vocab import vocab
from torch.nn.utils.rnn import pad_sequence
from torch.utils.data import DataLoader

from nltk.translate.bleu_score import sentence_bleu
#from torchtext.data.metrics import bleu_score

import sys

In [7]:
def _get_activation_fn(activation: str) -> Callable[[Tensor], Tensor]:
    if activation == "relu":
        return F.relu
    elif activation == "gelu":
        return F.gelu

    raise RuntimeError("activation should be relu/gelu, not {}".format(activation))

### Encoder

In [8]:
class TransformerEncoderLayer(Module):

    __constants__ = ['batch_first', 'norm_first']

    def __init__(self, d_model: int, nhead: int, dim_feedforward: int = 2048, dropout: float = 0.1,
                 activation: Union[str, Callable[[Tensor], Tensor]] = F.relu,
                 layer_norm_eps: float = 1e-5, batch_first: bool = False, norm_first: bool = False,
                 device=None, dtype=None) -> None:
        factory_kwargs = {'device': device, 'dtype': dtype}
        super(TransformerEncoderLayer, self).__init__()
        self.self_attn = MultiheadAttention(d_model, nhead, dropout=dropout, batch_first=batch_first,
                                            **factory_kwargs)
        # Implementation of Feedforward model
        self.linear1 = Linear(d_model, dim_feedforward, **factory_kwargs)#input features,output features
        self.dropout = Dropout(dropout)
        self.linear2 = Linear(dim_feedforward, d_model, **factory_kwargs)

        self.norm_first = norm_first
        self.norm1 = LayerNorm(d_model, eps=layer_norm_eps, **factory_kwargs)
        self.norm2 = LayerNorm(d_model, eps=layer_norm_eps, **factory_kwargs)
        self.dropout1 = Dropout(dropout)
        self.dropout2 = Dropout(dropout)

        # Legacy string support for activation function.
        if isinstance(activation, str):
            activation = _get_activation_fn(activation)

        if activation is F.relu:
            self.activation_relu_or_gelu = 1
        elif activation is F.gelu:
            self.activation_relu_or_gelu = 2
        else:
            self.activation_relu_or_gelu = 0
        self.activation = activation

    def __setstate__(self, state):
        if 'activation' not in state:
            state['activation'] = F.relu
        super(TransformerEncoderLayer, self).__setstate__(state)

    def forward(self, src: Tensor, src_mask: Optional[Tensor] = None,
                src_key_padding_mask: Optional[Tensor] = None) -> Tensor:

        if (src.dim() == 3 and not self.norm_first and not self.training and
            self.self_attn.batch_first and
            self.self_attn._qkv_same_embed_dim and self.activation_relu_or_gelu and
            self.norm1.eps == self.norm2.eps and
            ((src_mask is None and src_key_padding_mask is None)
             if src.is_nested
             else (src_mask is None or src_key_padding_mask is None))):
            tensor_args = (
                src,
                self.self_attn.in_proj_weight,
                self.self_attn.in_proj_bias,
                self.self_attn.out_proj.weight,
                self.self_attn.out_proj.bias,
                self.norm1.weight,
                self.norm1.bias,
                self.norm2.weight,
                self.norm2.bias,
                self.linear1.weight,
                self.linear1.bias,
                self.linear2.weight,
                self.linear2.bias,
            )##biases and weights
            if (not torch.overrides.has_torch_function(tensor_args) and
                    # We have to use a list comprehension here because TorchScript
                    # doesn't support generator expressions.
                    all([(x.is_cuda or 'cpu' in str(x.device)) for x in tensor_args]) and
                    (not torch.is_grad_enabled() or all([not x.requires_grad for x in tensor_args]))):
                return torch._transformer_encoder_layer_fwd(
                    src,
                    self.self_attn.embed_dim,
                    self.self_attn.num_heads,
                    self.self_attn.in_proj_weight,
                    self.self_attn.in_proj_bias,
                    self.self_attn.out_proj.weight,
                    self.self_attn.out_proj.bias,
                    self.activation_relu_or_gelu == 2,
                    False,  # norm_first, currently not supported
                    self.norm1.eps,
                    self.norm1.weight,
                    self.norm1.bias,
                    self.norm2.weight,
                    self.norm2.bias,
                    self.linear1.weight,
                    self.linear1.bias,
                    self.linear2.weight,
                    self.linear2.bias,
                    src_mask if src_mask is not None else src_key_padding_mask,
                )
        x = src
        if self.norm_first:
            x = x + self._sa_block(self.norm1(x), src_mask, src_key_padding_mask)
            x = x + self._ff_block(self.norm2(x))
        else:
            x = self.norm1(x + self._sa_block(x, src_mask, src_key_padding_mask))
            x = self.norm2(x + self._ff_block(x))

        return x

    # self-attention block
    def _sa_block(self, x: Tensor,
                  attn_mask: Optional[Tensor], key_padding_mask: Optional[Tensor]) -> Tensor:
        x = self.self_attn(x, x, x,
                           attn_mask=attn_mask,
                           key_padding_mask=key_padding_mask,
                           need_weights=False)[0]
        return self.dropout1(x)

    # feed forward block
    def _ff_block(self, x: Tensor) -> Tensor:
        x = self.linear2(self.dropout(self.activation(self.linear1(x))))
        return self.dropout2(x)

In [9]:
def _get_clones(module, N):
    return ModuleList([copy.deepcopy(module) for i in range(N)])

In [10]:
# TransformerEncoder is a stack of N encoder layers
class TransformerEncoder(Module):

    __constants__ = ['norm']

# encoder_layer: an instance of the TransformerEncoderLayer() class (required).
# num_layers: the number of sub-encoder-layers in the encoder (required).
# norm: the layer normalization component (optional).
    def __init__(self, encoder_layer, num_layers, norm=None, enable_nested_tensor=True):
        super(TransformerEncoder, self).__init__()
        self.layers = _get_clones(encoder_layer, num_layers)
        self.num_layers = num_layers
        self.norm = norm
        self.enable_nested_tensor = enable_nested_tensor

# Pass the input through the encoder layers in turn.
# src: the sequence to the encoder (required).
# mask: the mask for the src sequence (optional).
# src mask=is to do -inf
# tgt mask=0
# memory mask= -inf to some mask
# src_key_padd_mask the ByteTensor mask for src keys per batch (optional). Since your src usually has different lengths sequences it's common to remove the padding vectors you appended at the end. For this you specify the length of each sequence per example in your batch.
# src_key_padding_mask: the mask for the src keys per batch (optional).
# in this we just have to run the forward passs of encoder layer
    def forward(self, src: Tensor, mask: Optional[Tensor] = None, src_key_padding_mask: Optional[Tensor] = None) -> Tensor:
        output = src
        convert_to_nested = False
        first_layer = self.layers[0]
        if isinstance(first_layer, torch.nn.TransformerEncoderLayer):
            if (not first_layer.norm_first and not first_layer.training and
                    first_layer.self_attn.batch_first and
                    first_layer.self_attn._qkv_same_embed_dim and first_layer.activation_relu_or_gelu and
                    first_layer.norm1.eps == first_layer.norm2.eps and
                    src.dim() == 3 and self.enable_nested_tensor) :
                if src_key_padding_mask is not None and not output.is_nested and mask is None:
                    tensor_args = (
                        src,
                        first_layer.self_attn.in_proj_weight,
                        first_layer.self_attn.in_proj_bias,
                        first_layer.self_attn.out_proj.weight,
                        first_layer.self_attn.out_proj.bias,
                        first_layer.norm1.weight,
                        first_layer.norm1.bias,
                        first_layer.norm2.weight,
                        first_layer.norm2.bias,
                        first_layer.linear1.weight,
                        first_layer.linear1.bias,
                        first_layer.linear2.weight,
                        first_layer.linear2.bias,
                    )
                    if not torch.overrides.has_torch_function(tensor_args):
                        if output.is_cuda or 'cpu' in str(output.device):
                            convert_to_nested = True
                            output = torch._nested_tensor_from_mask(output, src_key_padding_mask.logical_not())

        for mod in self.layers:
            if convert_to_nested:
                output = mod(output, src_mask=mask)
            else:
                output = mod(output, src_mask=mask, src_key_padding_mask=src_key_padding_mask)

        if convert_to_nested:
            output = output.to_padded_tensor(0.)

        if self.norm is not None:
            output = self.norm(output)

        return output


### Decoder

In [11]:
class TransformerDecoderLayer(Module):
    __constants__ = ['batch_first', 'norm_first']

    def __init__(self, d_model: int, nhead: int, dim_feedforward: int = 2048, dropout: float = 0.1,
                 activation: Union[str, Callable[[Tensor], Tensor]] = F.relu,
                 layer_norm_eps: float = 1e-5, batch_first: bool = False, norm_first: bool = False,
                 device=None, dtype=None) -> None:
        factory_kwargs = {'device': device, 'dtype': dtype}
        super(TransformerDecoderLayer, self).__init__()
        self.self_attn = MultiheadAttention(d_model, nhead, dropout=dropout, batch_first=batch_first,
                                            **factory_kwargs)
        self.multihead_attn = MultiheadAttention(d_model, nhead, dropout=dropout, batch_first=batch_first,
                                                 **factory_kwargs)
        # Implementation of Feedforward model
        self.linear1 = Linear(d_model, dim_feedforward, **factory_kwargs)
        self.dropout = Dropout(dropout)
        self.linear2 = Linear(dim_feedforward, d_model, **factory_kwargs)

        self.norm_first = norm_first
        self.norm1 = LayerNorm(d_model, eps=layer_norm_eps, **factory_kwargs)
        self.norm2 = LayerNorm(d_model, eps=layer_norm_eps, **factory_kwargs)
        self.norm3 = LayerNorm(d_model, eps=layer_norm_eps, **factory_kwargs)
        self.dropout1 = Dropout(dropout)
        self.dropout2 = Dropout(dropout)
        self.dropout3 = Dropout(dropout)

        # Legacy string support for activation function.
        if isinstance(activation, str):
            self.activation = _get_activation_fn(activation)
        else:
            self.activation = activation

    def __setstate__(self, state):
        if 'activation' not in state:
            state['activation'] = F.relu
        super(TransformerDecoderLayer, self).__setstate__(state)

    def forward(self, tgt: Tensor, memory: Tensor, tgt_mask: Optional[Tensor] = None, memory_mask: Optional[Tensor] = None,
                tgt_key_padding_mask: Optional[Tensor] = None, memory_key_padding_mask: Optional[Tensor] = None) -> Tensor:

        x = tgt
        if self.norm_first:
            x = x + self._sa_block(self.norm1(x), tgt_mask, tgt_key_padding_mask)
            x = x + self._mha_block(self.norm2(x), memory, memory_mask, memory_key_padding_mask)
            x = x + self._ff_block(self.norm3(x))
        else:
            x = self.norm1(x + self._sa_block(x, tgt_mask, tgt_key_padding_mask))
            x = self.norm2(x + self._mha_block(x, memory, memory_mask, memory_key_padding_mask))
            x = self.norm3(x + self._ff_block(x))

        return x

    # self-attention block
    def _sa_block(self, x: Tensor,
                  attn_mask: Optional[Tensor], key_padding_mask: Optional[Tensor]) -> Tensor:
        x = self.self_attn(x, x, x,
                           attn_mask=attn_mask,
                           key_padding_mask=key_padding_mask,
                           need_weights=False)[0]
        return self.dropout1(x)

    # multihead attention block
    def _mha_block(self, x: Tensor, mem: Tensor,
                   attn_mask: Optional[Tensor], key_padding_mask: Optional[Tensor]) -> Tensor:
        x = self.multihead_attn(x, mem, mem,
                                attn_mask=attn_mask,
                                key_padding_mask=key_padding_mask,
                                need_weights=False)[0]
        return self.dropout2(x)

    # feed forward block
    def _ff_block(self, x: Tensor) -> Tensor:
        x = self.linear2(self.dropout(self.activation(self.linear1(x))))
        return self.dropout3(x)


In [12]:
# TransformerDecoder is a stack of N Decoder layers
class TransformerDecoder(Module):
    __constants__ = ['norm']

    def __init__(self, decoder_layer, num_layers, norm=None):
        super(TransformerDecoder, self).__init__()
        self.layers = _get_clones(decoder_layer, num_layers)
        self.num_layers = num_layers
        self.norm = norm

    def forward(self, tgt: Tensor, memory: Tensor, tgt_mask: Optional[Tensor] = None,
                memory_mask: Optional[Tensor] = None, tgt_key_padding_mask: Optional[Tensor] = None,
                memory_key_padding_mask: Optional[Tensor] = None) -> Tensor:

        output = tgt

        for mod in self.layers:
            output = mod(output, memory, tgt_mask=tgt_mask,
                         memory_mask=memory_mask,
                         tgt_key_padding_mask=tgt_key_padding_mask,
                         memory_key_padding_mask=memory_key_padding_mask)

        if self.norm is not None:
            output = self.norm(output)

        return output


### Embeddings and Positional Embeddings

In [13]:
# Values for Positional Encoding PE(pos,i)=sin(pos/10000**2i/d)
# i is the index of the word
# and pos is the position
# where d=size of embeddings
# pos 0 means the first positional embedding
# pos 1 means the 2nd and so on
# and i is the index in position embedding we are filling
class PositionalEncoding(nn.Module):
    def __init__(self, emb_size: int, dropout, maxlen: int = 5000):
        super(PositionalEncoding, self).__init__()
        den = torch.exp(- torch.arange(0, emb_size, 2) * math.log(10000) / emb_size)
        pos = torch.arange(0, maxlen).reshape(maxlen, 1)
        pos_embedding = torch.zeros((maxlen, emb_size))
        pos_embedding[:, 0::2] = torch.sin(pos * den)
        pos_embedding[:, 1::2] = torch.cos(pos * den)
        pos_embedding = pos_embedding.unsqueeze(-2)

        self.dropout = nn.Dropout(dropout)
        self.register_buffer('pos_embedding', pos_embedding)

    def forward(self, token_embedding: Tensor):
        return self.dropout(token_embedding +
                            self.pos_embedding[:token_embedding.size(0),:])

class TokenEmbedding(nn.Module):
    def __init__(self, vocab_size: int, emb_size):
        super(TokenEmbedding, self).__init__()
        self.embedding = nn.Embedding(vocab_size, emb_size)
        self.emb_size = emb_size
    def forward(self, tokens: Tensor):
        return self.embedding(tokens.long()) * math.sqrt(self.emb_size)

### Seq2Seq Transformer Module

In [14]:
class Seq2SeqTransformer(nn.Module):
    def __init__(self, num_encoder_layers: int, num_decoder_layers: int,
                 emb_size: int, src_vocab_size: int, tgt_vocab_size: int,
                 dim_feedforward:int = 512, dropout:float = 0.1):
        super(Seq2SeqTransformer, self).__init__()
        encoder_layer = TransformerEncoderLayer(d_model=emb_size, nhead=NHEAD,
                                                dim_feedforward=dim_feedforward)
        self.transformer_encoder = TransformerEncoder(encoder_layer, num_layers=num_encoder_layers)
        decoder_layer = TransformerDecoderLayer(d_model=emb_size, nhead=NHEAD,
                                                dim_feedforward=dim_feedforward)
        self.transformer_decoder = TransformerDecoder(decoder_layer, num_layers=num_decoder_layers)

        self.generator = nn.Linear(emb_size, tgt_vocab_size)
        self.src_tok_emb = TokenEmbedding(src_vocab_size, emb_size)
        self.tgt_tok_emb = TokenEmbedding(tgt_vocab_size, emb_size)
        self.positional_encoding = PositionalEncoding(emb_size, dropout=dropout)

    def forward(self, src: Tensor, trg: Tensor, src_mask: Tensor,
                tgt_mask: Tensor, src_padding_mask: Tensor,
                tgt_padding_mask: Tensor, memory_key_padding_mask: Tensor):
        src_emb = self.positional_encoding(self.src_tok_emb(src))
        #print(src_emb)
        tgt_emb = self.positional_encoding(self.tgt_tok_emb(trg))
        #print(tgt-_emb)
        memory = self.transformer_encoder(src_emb, src_mask, src_padding_mask)
        #print(memory)
        outs = self.transformer_decoder(tgt_emb, memory, tgt_mask, None,
                                        tgt_padding_mask, memory_key_padding_mask)
        #print(outs)
        return self.generator(outs)

    def encode(self, src: Tensor, src_mask: Tensor):
        return self.transformer_encoder(self.positional_encoding(
                            self.src_tok_emb(src)), src_mask)

    def decode(self, tgt: Tensor, memory: Tensor, tgt_mask: Tensor):
        return self.transformer_decoder(self.positional_encoding(
                          self.tgt_tok_emb(tgt)), memory,
                          tgt_mask)

### Mask Generation

In [15]:
def generate_square_subsequent_mask(sz):
    mask = (torch.triu(torch.ones((sz, sz), device=DEVICE)) == 1).transpose(0, 1)
    mask = mask.float().masked_fill(mask == 0, float('-inf')).masked_fill(mask == 1, float(0.0))
    return mask

def create_mask(src: torch.Tensor, tgt: torch.Tensor, pad_idx: int, device: torch.device):
    """
    Creates source/target masks and padding masks that match the current shapes of src and tgt.

    Args:
        src: Tensor of shape [src_len, batch_size]
        tgt: Tensor of shape [tgt_len, batch_size]
        pad_idx: Integer pad token index
        device: torch.device

    Returns:
        src_mask:      [src_len, src_len]
        tgt_mask:      [tgt_len, tgt_len]
        src_padding:   [batch_size, src_len]
        tgt_padding:   [batch_size, tgt_len]
    """
    src_len, batch_size = src.shape
    tgt_len, _ = tgt.shape

    # Source has no causal structure; just a zero mask
    src_mask = torch.zeros((src_len, src_len), device=device).type(torch.bool)

    # Target uses causal mask
    tgt_mask = generate_square_subsequent_mask(tgt_len).to(device)

    # Padding masks [batch_size, seq_len]
    src_padding_mask = (src == pad_idx).transpose(0, 1)  # [batch_size, src_len]
    tgt_padding_mask = (tgt == pad_idx).transpose(0, 1)  # [batch_size, tgt_len]

    return src_mask, tgt_mask, src_padding_mask, tgt_padding_mask


### Dataset Preprocessing

In [16]:
from google.colab import files
files.upload()  # Upload kaggle.json here


Saving kaggle.json to kaggle.json


{'kaggle.json': b'{"username":"ehtishamkhalid57","key":"358f376c62048dd0048f32aedf3b0648"}'}

In [17]:
import os
import json

# Make directory and move the file
!mkdir -p ~/.kaggle

!mv kaggle.json ~/.kaggle/

# Set permissions
!chmod 600 ~/.kaggle/kaggle.json

# Set environment variables
os.environ['KAGGLE_CONFIG_DIR'] = "/root/.kaggle"


In [18]:
from google.colab import userdata
import os

#os.environ["KAGGLE_KEY"] = userdata.get('KAGGLE_KEY')
#os.environ["KAGGLE_USERNAME"] = userdata.get('KAGGLE_USERNAME')

In [19]:
!kaggle datasets download -d zainuddin123/parallel-corpus-for-english-urdu-language


! unzip "parallel-corpus-for-english-urdu-language.zip"

Dataset URL: https://www.kaggle.com/datasets/zainuddin123/parallel-corpus-for-english-urdu-language
License(s): unknown
Downloading parallel-corpus-for-english-urdu-language.zip to /content
  0% 0.00/419k [00:00<?, ?B/s]
100% 419k/419k [00:00<00:00, 1.06GB/s]
Archive:  parallel-corpus-for-english-urdu-language.zip
  inflating: Dataset/english-corpus.txt  
  inflating: Dataset/urdu-corpus.txt  


In [20]:
import spacy
import torch
from torch.utils.data import Dataset
from collections import Counter
import pandas as pd
from sklearn.model_selection import train_test_split

# Load spaCy Urdu tokenizer
nlp_urdu = spacy.blank("ur")

# ----------------------------
# Step 1: Load Data
# ----------------------------
def load_parallel_corpus(eng_path, urd_path):
    with open(eng_path, 'r', encoding='utf-8') as f:
        eng_lines = f.read().splitlines()
    with open(urd_path, 'r', encoding='utf-8') as f:
        urd_lines = f.read().splitlines()
    assert len(eng_lines) == len(urd_lines), "Mismatch in lines!"
    return pd.DataFrame({"English": eng_lines, "Urdu": urd_lines})


# ----------------------------
# Step 2: Urdu Tokenizer
# ----------------------------
def urdu_tokenize(text):
    doc = nlp_urdu(text)
    return [token.text for token in doc]


# ----------------------------
# Step 3: Build Separate Vocabularies for Urdu and English
# ----------------------------
def build_vocab(urdu_texts, eng_texts, min_freq=2):
    urdu_counter = Counter()
    eng_counter = Counter()

    # Tokenizing and counting frequencies for Urdu
    for ur in urdu_texts:
        urdu_counter.update(urdu_tokenize(ur))

    # Tokenizing and counting frequencies for English
    for en in eng_texts:
        eng_counter.update(en.strip().split())

    # Build Urdu vocabulary
    urdu_vocab = {"<PAD>": 0, "<UNK>": 1, "<SOS>": 2, "<EOS>": 3}
    urdu_idx = 4
    for tok, freq in urdu_counter.items():
        if freq >= min_freq:
            urdu_vocab[tok] = urdu_idx
            urdu_idx += 1

    # Build English vocabulary
    eng_vocab = {"<PAD>": 0, "<UNK>": 1, "<SOS>": 2, "<EOS>": 3}
    eng_idx = 4
    for tok, freq in eng_counter.items():
        if freq >= min_freq:
            eng_vocab[tok] = eng_idx
            eng_idx += 1

    return urdu_vocab, eng_vocab


# ----------------------------
# Step 4: Convert to IDs
# ----------------------------
def tokens_to_ids(tokens, vocab, max_len):
    ids = [vocab["<SOS>"]]
    ids += [vocab.get(t, vocab["<UNK>"]) for t in tokens]
    ids.append(vocab["<EOS>"])
    if len(ids) < max_len:
        ids += [vocab["<PAD>"]] * (max_len - len(ids))
    else:
        ids = ids[:max_len]
    return ids


# ----------------------------
# Step 5: Dataset
# ----------------------------
class EngUrdDataset(Dataset):
    def __init__(self, dataframe, urdu_vocab, eng_vocab, max_len=32):
        self.df = dataframe
        self.urdu_vocab = urdu_vocab
        self.eng_vocab = eng_vocab
        self.max_len = max_len

    def __len__(self):
        return len(self.df)

    def __getitem__(self, idx):
        eng = self.df.iloc[idx]['English']
        urd = self.df.iloc[idx]['Urdu']

        eng_ids = tokens_to_ids(eng.strip().split(), self.eng_vocab, self.max_len)
        urd_ids = tokens_to_ids(urdu_tokenize(urd), self.urdu_vocab, self.max_len)

        return {
            "input_ids": torch.tensor(urd_ids, dtype=torch.long),
            "labels": torch.tensor(eng_ids, dtype=torch.long)
        }


# ----------------------------
# Step 6: Preprocessing Pipeline with Split
# ----------------------------
def preprocess_pipeline(eng_path, urd_path, max_len=32, val_ratio=0.1, test_ratio=0.1):
    df = load_parallel_corpus(eng_path, urd_path)

    # Split into train/val/test
    train_df, temp_df = train_test_split(df, test_size=val_ratio + test_ratio, random_state=42)
    val_df, test_df = train_test_split(temp_df, test_size=test_ratio / (val_ratio + test_ratio), random_state=42)

    # Build separate vocabularies for Urdu and English
    urdu_vocab, eng_vocab = build_vocab(train_df['Urdu'], train_df['English'], min_freq=2)

    # Print max vocab size for both English and Urdu
    print("Max Vocabulary Size for Urdu:", len(urdu_vocab))
    print("Max Vocabulary Size for English:", len(eng_vocab))

    # Create datasets
    train_dataset = EngUrdDataset(train_df, urdu_vocab, eng_vocab, max_len=max_len)
    val_dataset = EngUrdDataset(val_df, urdu_vocab, eng_vocab, max_len=max_len)
    test_dataset = EngUrdDataset(test_df, urdu_vocab, eng_vocab, max_len=max_len)

    return train_dataset, val_dataset, test_dataset, urdu_vocab, eng_vocab

In [21]:
eng_path = "/content/Dataset/english-corpus.txt"
urd_path = "/content/Dataset/urdu-corpus.txt"
train_dataset, val_dataset, test_dataset, urdu_vocab, eng_vocab = preprocess_pipeline(eng_path, urd_path)

Max Vocabulary Size for Urdu: 2968
Max Vocabulary Size for English: 2815


In [22]:
from torch.nn.utils.rnn import pad_sequence

PAD_IDX = 0   # assuming 0 is your pad token in both input_ids and labels

def collate_fn(batch):
    # batch is a list of dicts, each with 'input_ids' and 'labels' tensors
    src_seqs = [example['input_ids'] for example in batch]
    tgt_seqs = [example['labels']    for example in batch]

    # pad_sequence defaults to (max_len, batch_size)
    src_batch = pad_sequence(src_seqs, padding_value=PAD_IDX)
    tgt_batch = pad_sequence(tgt_seqs, padding_value=PAD_IDX)
    return src_batch, tgt_batch


In [23]:
train_dataloader = DataLoader(
    train_dataset,
    batch_size=128,
    shuffle=True,
    collate_fn=collate_fn,
)

val_dataloader = DataLoader(
    val_dataset,
    batch_size=128,
    shuffle=False,
    collate_fn=collate_fn,
)

test_dataloader = DataLoader(
    test_dataset,
    batch_size=128,
    shuffle=False,
    collate_fn=collate_fn,
)


In [24]:
BATCH_SIZE = 128

PAD_IDX = urdu_vocab['<PAD>'] #padding in sentence
BOS_IDX = urdu_vocab['<SOS>'] #beggining of sentence
EOS_IDX = urdu_vocab['<EOS>'] #representing end of sentence

### Model Instantiation and Training

In [25]:
def train_epoch(model, train_iter, optimizer):
  model.train()
  losses = 0
  for idx, (src, tgt) in enumerate(train_iter):

      src = src.to(device)   # now [T=32, B=128]
      tgt = tgt.to(device)    # now [T=32, B=128]

      tgt_input = tgt[:-1, :]

      src_mask, tgt_mask, src_padding_mask, tgt_padding_mask = create_mask(src, tgt_input, PAD_IDX, DEVICE)

      # print(f"src shape: {src.shape}")
      # print(f"tgt shape: {tgt.shape}")
      # print(f"src_mask shape: {src_mask.shape}")
      # print(f"tgt_mask shape: {tgt_mask.shape}")
      # print(f"src_padding_mask: {src_padding_mask.shape}")

      #logits = model(src, tgt_input, src_mask, tgt_mask,src_padding_mask, tgt_padding_mask, src_padding_mask)
      logits = model(
        src,
        tgt_input,
        src_mask=src_mask,                           # [T_src, T_src]
        tgt_mask=tgt_mask,                           # [T_tgt, T_tgt]
        src_padding_mask=src_padding_mask,       # [B, T_src]
        tgt_padding_mask=tgt_padding_mask,       # [B, T_tgt]
        memory_key_padding_mask=src_padding_mask     # same as encoder padding
      )


      optimizer.zero_grad()

      tgt_out = tgt[1:,:]
      loss = loss_fn(logits.reshape(-1, logits.shape[-1]), tgt_out.reshape(-1))
      loss.backward()

      optimizer.step()
      losses += loss.item()
  return losses / len(train_iter)

In [26]:
def evaluate(model, val_iter):
    model.eval()
    losses = 0
    for idx, (src, tgt) in (enumerate(val_iter)):
        src = src.to(device)
        tgt = tgt.to(device)

        tgt_input = tgt[:-1, :]

        src_mask, tgt_mask, src_padding_mask, tgt_padding_mask = create_mask(src, tgt_input, PAD_IDX, DEVICE)

        logits = model(src, tgt_input, src_mask, tgt_mask,
                                src_padding_mask, tgt_padding_mask, src_padding_mask)
        tgt_out = tgt[1:,:]
        loss = loss_fn(logits.reshape(-1, logits.shape[-1]), tgt_out.reshape(-1))
        losses += loss.item()
    return losses / len(val_iter)

In [27]:
SRC_VOCAB_SIZE = len(urdu_vocab)
TGT_VOCAB_SIZE = len(eng_vocab)

EMB_SIZE = 512

NHEAD = 8

FFN_HID_DIM = 512

BATCH_SIZE = 128

NUM_ENCODER_LAYERS = 3

NUM_DECODER_LAYERS = 3

NUM_EPOCHS = 16

DEVICE = torch.device('cuda:0' if torch.cuda.is_available() else 'cpu')
device = torch.device('cuda:0' if torch.cuda.is_available() else 'cpu')

transformer = Seq2SeqTransformer(NUM_ENCODER_LAYERS, NUM_DECODER_LAYERS,
                                 EMB_SIZE, SRC_VOCAB_SIZE, TGT_VOCAB_SIZE,
                                 FFN_HID_DIM)

for p in transformer.parameters():
    if p.dim() > 1:
        nn.init.xavier_uniform_(p)

transformer_spe = transformer.to(device)

loss_fn = torch.nn.CrossEntropyLoss(ignore_index=PAD_IDX)

optimizer = torch.optim.Adam(
    transformer.parameters(), lr=0.0001, betas=(0.9, 0.98), eps=1e-9
)

In [28]:
NUM_EPOCHS = 15
spe_tloss = []
spe_vloss = []

for epoch in range(1, NUM_EPOCHS+1):
    start_time = time.time()
    train_loss = train_epoch(transformer_spe, train_dataloader, optimizer)
    end_time = time.time()
    val_loss = evaluate(transformer_spe, val_dataloader)
    spe_tloss.append(train_loss)
    spe_vloss.append(val_loss)
    print((f"Epoch: {epoch}, Train loss: {train_loss:.3f}, Val loss: {val_loss:.3f}, "
          f"Epoch time = {(end_time - start_time):.3f}s"))



Epoch: 1, Train loss: 4.701, Val loss: 3.661, Epoch time = 22.425s
Epoch: 2, Train loss: 3.478, Val loss: 2.978, Epoch time = 21.716s
Epoch: 3, Train loss: 2.920, Val loss: 2.562, Epoch time = 22.126s
Epoch: 4, Train loss: 2.524, Val loss: 2.258, Epoch time = 22.887s
Epoch: 5, Train loss: 2.212, Val loss: 2.015, Epoch time = 22.930s
Epoch: 6, Train loss: 1.950, Val loss: 1.830, Epoch time = 22.472s
Epoch: 7, Train loss: 1.731, Val loss: 1.677, Epoch time = 22.452s
Epoch: 8, Train loss: 1.546, Val loss: 1.565, Epoch time = 22.646s
Epoch: 9, Train loss: 1.381, Val loss: 1.480, Epoch time = 22.688s
Epoch: 10, Train loss: 1.247, Val loss: 1.387, Epoch time = 22.565s
Epoch: 11, Train loss: 1.125, Val loss: 1.333, Epoch time = 22.568s
Epoch: 12, Train loss: 1.014, Val loss: 1.297, Epoch time = 22.693s
Epoch: 13, Train loss: 0.927, Val loss: 1.262, Epoch time = 22.652s
Epoch: 14, Train loss: 0.843, Val loss: 1.217, Epoch time = 23.076s
Epoch: 15, Train loss: 0.767, Val loss: 1.219, Epoch time

### Results


In [29]:
def greedy_decode(model, src, src_mask, max_len, start_symbol):
    src = src.to(device)
    src_mask = src_mask.to(device)

    memory = model.encode(src, src_mask)
    ys = torch.ones(1, 1).fill_(start_symbol).type(torch.long).to(device)
    for i in range(max_len-1):
        memory = memory.to(device)
        memory_mask = torch.zeros(ys.shape[0], memory.shape[0]).to(device).type(torch.bool)
        tgt_mask = (generate_square_subsequent_mask(ys.size(0))
                                    .type(torch.bool)).to(device)
        out = model.decode(ys, memory, tgt_mask)
        out = out.transpose(0, 1)
        prob = model.generator(out[:, -1])

        _, next_word = torch.max(prob, dim = 1)
        next_word = next_word.item()

        ys = torch.cat([ys, torch.ones(1, 1).type_as(src.data).fill_(next_word)], dim=0)
        if next_word == EOS_IDX:
            break
    return ys

In [30]:
def translate(model, src, src_vocab, tgt_vocab, src_tokenizer):
    model.eval()
    tokens = [BOS_IDX] + [src_vocab.get(tok, src_vocab["<UNK>"]) for tok in src_tokenizer(src)]+ [EOS_IDX]
    num_tokens = len(tokens)
    src = (torch.LongTensor(tokens).reshape(num_tokens, 1) )
    #print(src)
    src_mask = (torch.zeros(num_tokens, num_tokens)).type(torch.bool)
    tgt_tokens = greedy_decode(model,  src, src_mask, max_len=num_tokens + 5, start_symbol=BOS_IDX).flatten()
    #print(tgt_tokens)
    rev_en_vocab = {idx: token for token, idx in tgt_vocab.items()}
    token_ids = tgt_tokens.tolist()  # convert tensor to list of ints


    tokens = [rev_en_vocab.get(i, "<UNK>") for i in token_ids]
    rev_tgt_vocab = {idx: token for token, idx in tgt_vocab.items()}
    return ' '.join(
        rev_tgt_vocab.get(i, "<UNK>") for i in tgt_tokens[1:].tolist()
        if rev_tgt_vocab.get(i, "<UNK>") not in ("<EOS>", "<PAD>")
    )


In [31]:
translate(transformer_spe, "میں سونے کیلئے بستر پر لیٹ چکا ہوں", urdu_vocab, eng_vocab, urdu_tokenize)

'i have to go to bed'

In [None]:
translate(transformer_spe, "ایک کتا ہمارا پیچھا کر رہا ہے", urdu_vocab, eng_vocab, urdu_tokenize)

'a dog is following us'

In [32]:
from nltk.translate.bleu_score import sentence_bleu, SmoothingFunction

def tensor_to_sentence(tensor, vocab):
    inv_vocab = {idx: word for word, idx in vocab.items()}
    return [inv_vocab.get(tok.item(), "<unk>") for tok in tensor if tok.item() not in [PAD_IDX, BOS_IDX, EOS_IDX]]

def compute_bleu_score(model, dataloader, src_vocab, tgt_vocab):
    model.eval()
    bleu_scores = []
    smooth_fn = SmoothingFunction().method4

    with torch.no_grad():
        for src, tgt in dataloader:
            src = src.to(device)
            tgt = tgt.to(device)

            tgt_input = tgt[:-1, :]
            src_mask, tgt_mask, src_padding_mask, tgt_padding_mask = create_mask(src, tgt_input, PAD_IDX, device)

            logits = model(src, tgt_input, src_mask, tgt_mask,
                           src_padding_mask, tgt_padding_mask, src_padding_mask)

            predicted_tokens = torch.argmax(logits, dim=-1)  # [tgt_len, batch_size]

            for i in range(predicted_tokens.shape[1]):
                pred_sentence = tensor_to_sentence(predicted_tokens[:, i], tgt_vocab)
                ref_sentence = tensor_to_sentence(tgt[1:, i], tgt_vocab)
                bleu = sentence_bleu([ref_sentence], pred_sentence, smoothing_function=smooth_fn)
                bleu_scores.append(bleu)

    avg_bleu = sum(bleu_scores) / len(bleu_scores)
    print(f"\n✅ Average BLEU Score on Validation Set: {avg_bleu:.4f}")
    return avg_bleu


In [33]:
compute_bleu_score(transformer_spe, val_dataloader, urdu_vocab, eng_vocab)



✅ Average BLEU Score on Validation Set: 0.4004


0.40036160279914484

# **ENHANCEMENT 1 Evaluation Metrics**

In [3]:
!pip install evaluate
!pip install sacrebleu

Collecting sacrebleu
  Downloading sacrebleu-2.5.1-py3-none-any.whl.metadata (51 kB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m51.8/51.8 kB[0m [31m2.0 MB/s[0m eta [36m0:00:00[0m
[?25hCollecting portalocker (from sacrebleu)
  Downloading portalocker-3.1.1-py3-none-any.whl.metadata (8.6 kB)
Collecting colorama (from sacrebleu)
  Downloading colorama-0.4.6-py2.py3-none-any.whl.metadata (17 kB)
Downloading sacrebleu-2.5.1-py3-none-any.whl (104 kB)
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m104.1/104.1 kB[0m [31m5.1 MB/s[0m eta [36m0:00:00[0m
[?25hDownloading colorama-0.4.6-py2.py3-none-any.whl (25 kB)
Downloading portalocker-3.1.1-py3-none-any.whl (19 kB)
Installing collected packages: portalocker, colorama, sacrebleu
Successfully installed colorama-0.4.6 portalocker-3.1.1 sacrebleu-2.5.1


In [34]:
import torch
import evaluate

# Load METEOR and CHRF++ evaluators
meteor_metric = evaluate.load("meteor")
chrf_metric = evaluate.load("chrf")

# Function to convert token tensors to readable sentences
def tensor_to_sentence(tensor, vocab):
    inv_vocab = {idx: word for word, idx in vocab.items()}
    return [inv_vocab.get(tok.item(), "<unk>") for tok in tensor if tok.item() not in [PAD_IDX, BOS_IDX, EOS_IDX]]

# Function to compute METEOR and CHRF++
def compute_meteor_chrf(model, dataloader, tgt_vocab):
    model.eval()
    all_preds = []
    all_refs = []

    with torch.no_grad():
        for src, tgt in dataloader:
            src = src.to(device)
            tgt = tgt.to(device)

            tgt_input = tgt[:-1, :]
            src_mask, tgt_mask, src_padding_mask, tgt_padding_mask = create_mask(src, tgt_input, PAD_IDX, device)

            logits = model(src, tgt_input, src_mask, tgt_mask,
                           src_padding_mask, tgt_padding_mask, src_padding_mask)

            predicted_tokens = torch.argmax(logits, dim=-1)

            for i in range(predicted_tokens.shape[1]):
                pred_sentence = tensor_to_sentence(predicted_tokens[:, i], tgt_vocab)
                ref_sentence = tensor_to_sentence(tgt[1:, i], tgt_vocab)

                all_preds.append(" ".join(pred_sentence))
                all_refs.append(" ".join(ref_sentence))

    # Compute metrics
    meteor_score = meteor_metric.compute(predictions=all_preds, references=all_refs)["meteor"]
    chrf_score = chrf_metric.compute(predictions=all_preds, references=all_refs)["score"]

    print(f"✅ METEOR Score: {meteor_score:.4f}")
    print(f"✅ CHRF++ Score: {chrf_score:.4f}")
    return meteor_score, chrf_score


[nltk_data] Downloading package wordnet to /root/nltk_data...
[nltk_data]   Package wordnet is already up-to-date!
[nltk_data] Downloading package punkt_tab to /root/nltk_data...
[nltk_data]   Package punkt_tab is already up-to-date!
[nltk_data] Downloading package omw-1.4 to /root/nltk_data...
[nltk_data]   Package omw-1.4 is already up-to-date!


In [35]:
compute_meteor_chrf(transformer_spe, val_dataloader, eng_vocab)


✅ METEOR Score: 0.6664
✅ CHRF++ Score: 59.8330


(np.float64(0.6663797850183125), 59.83299409369018)

# Extension 2 Transfer Learning (e.g., MarianMT)

In [36]:
!pip install transformers sentencepiece




In [37]:
from transformers import MarianMTModel, MarianTokenizer

# Load English→Urdu model (you can switch to ur→en as needed)
model_name = "Helsinki-NLP/opus-mt-en-ur"
marian_tokenizer = MarianTokenizer.from_pretrained(model_name)
marian_model = MarianMTModel.from_pretrained(model_name).to("cuda" if torch.cuda.is_available() else "cpu")


tokenizer_config.json:   0%|          | 0.00/44.0 [00:00<?, ?B/s]

source.spm:   0%|          | 0.00/816k [00:00<?, ?B/s]

target.spm:   0%|          | 0.00/848k [00:00<?, ?B/s]

vocab.json:   0%|          | 0.00/1.91M [00:00<?, ?B/s]

config.json:   0%|          | 0.00/1.39k [00:00<?, ?B/s]



pytorch_model.bin:   0%|          | 0.00/306M [00:00<?, ?B/s]

model.safetensors:   0%|          | 0.00/306M [00:00<?, ?B/s]

generation_config.json:   0%|          | 0.00/293 [00:00<?, ?B/s]

In [38]:
def translate_with_marian(sentences, model, tokenizer, device="cuda" if torch.cuda.is_available() else "cpu"):
    """
    Translate a list of English sentences into Urdu using MarianMT.
    """
    translations = []
    batch_size = 8  # You can adjust for performance

    for i in range(0, len(sentences), batch_size):
        batch = sentences[i:i+batch_size]
        encoded = tokenizer(batch, return_tensors="pt", padding=True, truncation=True).to(device)
        translated = model.generate(**encoded)
        decoded = tokenizer.batch_decode(translated, skip_special_tokens=True)
        translations.extend(decoded)

    return translations


In [39]:
# Collect source sentences for Marian model (English → Urdu)
marian_inputs = []
ref_urdu_sentences = []

for src, tgt in val_dataloader:
    for i in range(src.shape[1]):  # batch dimension
        src_sentence = tensor_to_sentence(src[:, i], eng_vocab)
        tgt_sentence = tensor_to_sentence(tgt[1:, i], urdu_vocab)
        marian_inputs.append(" ".join(src_sentence))
        ref_urdu_sentences.append(" ".join(tgt_sentence))


In [40]:
marian_outputs = translate_with_marian(marian_inputs, marian_model, marian_tokenizer)


In [41]:
import evaluate
meteor = evaluate.load("meteor")
chrf = evaluate.load("chrf")
from nltk.translate.bleu_score import corpus_bleu, SmoothingFunction

# BLEU computation
bleu_score = corpus_bleu([[ref.split()] for ref in ref_urdu_sentences],
                         [hyp.split() for hyp in marian_outputs],
                         smoothing_function=SmoothingFunction().method4)

# METEOR + CHRF
meteor_score = meteor.compute(predictions=marian_outputs, references=ref_urdu_sentences)["meteor"]
chrf_score = chrf.compute(predictions=marian_outputs, references=ref_urdu_sentences)["score"]

print(f"\n🔁 MarianMT Evaluation:")
print(f"✅ BLEU Score: {bleu_score:.4f}")
print(f"✅ METEOR Score: {meteor_score:.4f}")
print(f"✅ CHRF++ Score: {chrf_score:.4f}")


[nltk_data] Downloading package wordnet to /root/nltk_data...
[nltk_data]   Package wordnet is already up-to-date!
[nltk_data] Downloading package punkt_tab to /root/nltk_data...
[nltk_data]   Package punkt_tab is already up-to-date!
[nltk_data] Downloading package omw-1.4 to /root/nltk_data...
[nltk_data]   Package omw-1.4 is already up-to-date!



🔁 MarianMT Evaluation:
✅ BLEU Score: 0.0001
✅ METEOR Score: 0.0091
✅ CHRF++ Score: 8.3722


## 📊 Model Evaluation Summary: Custom Transformer vs MarianMT (Transfer Learning)

This section presents a quantitative comparison between our **custom-trained Transformer model** and the **pretrained MarianMT model** (Helsinki-NLP/opus-mt-en-ur) for English → Urdu translation. All evaluations are conducted on the same validation dataset using three metrics: **BLEU**, **METEOR**, and **CHRF++**.

---

### ✅ Custom Transformer Model Results
| Metric         | Score      |
|----------------|------------|
| **BLEU**       | 0.4004     |
| **METEOR**     | 0.6664     |
| **CHRF++**     | 59.83      |

- 📌 **Interpretation**:
  - A BLEU score of **0.4004** indicates solid n-gram overlap with reference translations, especially for a low-resource pair like English–Urdu.
  - The METEOR score of **0.6664** reflects excellent alignment on both content and synonym/ordering accuracy.
  - CHRF++ being **59.83** reinforces that the model captures both character- and word-level structures effectively.

---

### 🔁 MarianMT (Transfer Learning) Results
| Metric         | Score      |
|----------------|------------|
| **BLEU**       | 0.0001     |
| **METEOR**     | 0.0091     |
| **CHRF++**     | 8.37       |

- ⚠️ **Interpretation**:
  - The pretrained MarianMT model **underperforms significantly** on our dataset.
  - BLEU and METEOR scores near zero indicate that MarianMT struggles to adapt to our specific domain/data, likely due to vocabulary mismatch, script/tokenization differences, or insufficient finetuning.
  - CHRF++ at **8.37** confirms this breakdown at the subword level as well.

---

## 🧠 Key Takeaways

- 🔬 **Custom Model Superiority**: Our Transformer model **vastly outperforms** MarianMT on all metrics, showing that even with modest resources, **domain-specific training** yields high-quality translation performance.
- 🧪 **MarianMT Requires Fine-Tuning**: Pretrained models like MarianMT cannot be used directly for production without **finetuning on representative data** (Urdu script, domain-specific vocabulary, etc.).
- 📈 **BLEU Is Not Enough**: Relying solely on BLEU would not reveal the full picture. The inclusion of **METEOR and CHRF++** was essential to highlight nuanced performance differences.

---

## 🔧 Future Improvements

1. **Finetune MarianMT** on our dataset for better cross-model benchmarking.
2. **Expand training data** with more Urdu–English pairs from news, health, or literature domains.
3. **Try subword tokenization (e.g., SentencePiece)** in our model to improve generalization across rare or unseen words.
4. **Implement beam search** during decoding to increase fluency in generated sentences.
5. **Visualize attention maps** to understand what the model focuses on during translation — useful for debugging and presentations.

---

## 🏁 Conclusion

This experiment confirms the value of training custom models in **low-resource language settings** like English–Urdu. While pretrained models offer convenience, **they require careful adaptation to match or exceed task-specific baselines**. Our custom Transformer has proven capable of producing high-quality translations that can be further improved with targeted enhancements.


In [42]:
for i in range(5):
    print(f"ENGLISH: {marian_inputs[i]}")
    print(f"Custom Model Urdu: {ref_urdu_sentences[i]}")
    print(f"MarianMT Urdu: {marian_outputs[i]}")
    print("-" * 50)


ENGLISH: play model helps <UNK> <UNK>
Custom Model Urdu: جو خدمت کوریائی درخت
MarianMT Urdu: ماڈل حکمت عملی نے مدد فرمائی (جیسے لوگ دیکھ سکتے ہیں)۔
--------------------------------------------------
ENGLISH: have model dream beyond
Custom Model Urdu: اگر تبدیلی میرا ہلکا
MarianMT Urdu: مثال کے طور پر اُنہوں نے ایک نمونہ قائم کِیا ہے جس پر عمل کرنے سے ہم بہت کچھ سیکھ سکتے ہیں ۔
--------------------------------------------------
ENGLISH: behind have model down whom behind rest
Custom Model Urdu: ہوں اگر پکڑی نقصان
MarianMT Urdu: پیچھے پیچھے پیچھے کے لئے ایک نمونہ ہے جو پیچھے رہ گیا ہے
--------------------------------------------------
ENGLISH: behind may lets whos door knife out impressed ever
Custom Model Urdu: مذاق تو موٹا زیادہ پیانو اسی سال
MarianMT Urdu: دروازے پر دستک دینے والے کے پیچھے ہو سکتا ہے کبھی اس سے متاثر ہو سکتا ہے
--------------------------------------------------
ENGLISH: play see understand everyone happened
Custom Model Urdu: جو قبول لچکدار کرو
MarianMT Urdu: کھیلتے ہ