# Eksplorasi Arsitektur Transformer: Mesin Translasi English-French

**Kelompok:**
* Havidz Ridho Pratama - 122140160
* Royfran Roger Valentino - 122140239

Proyek ini adalah implementasi arsitektur Transformer "from scratch" menggunakan PyTorch untuk tugas penerjemahan mesin. Fokus dari eksplorasi ini adalah pada kejelasan proses pembangunan model, mulai dari persiapan data hingga inferensi.

## 1. Persiapan Data (Text Preprocessing)

Tahap ini mencakup semua langkah yang diperlukan untuk mengubah data teks mentah (kalimat) menjadi format Tensor yang siap diproses oleh model.

**Proses yang Dilakukan:**
1.  **Memuat Data**: Dua file `.csv` (`small_vocab_en.csv` dan `small_vocab_fr.csv`) dibaca sebagai file teks biasa baris per baris, lalu digabungkan ke dalam satu DataFrame Pandas.
2.  **Tokenisasi**: Kami menggunakan `spaCy` untuk tokenisasi. Model `en_core_web_sm` digunakan untuk bahasa Inggris dan `fr_core_news_sm` untuk bahasa Prancis.
3.  **Membangun Vocabulary**: Sebuah `class Vocabulary` kustom dibuat untuk memetakan setiap kata unik ke sebuah indeks integer. Kami menyertakan token khusus (`<PAD>`, `<SOS>`, `<EOS>`, `<UNK>`) dan menetapkan `freq_threshold=2` untuk menyaring kata yang terlalu jarang muncul.
4.  **Dataset Kustom**: Sebuah `class TranslationDataset` (mewarisi `Dataset` PyTorch) dibuat. Fungsi `__getitem__` di dalamnya bertugas mengambil pasangan kalimat, melakukan numerikalisasi (token-ke-indeks), dan menambahkan token `<SOS>` serta `<EOS>`.
5.  **Collate Function (Padding)**: Sebuah `class MyCollate` kustom diimplementasikan. Fungsi ini sangat penting untuk mengambil sekelompok data (`batch`) dan menambahkan *padding* (`<PAD>`) agar semua sekuens dalam *batch* tersebut memiliki panjang yang seragam.
6.  **DataLoader**: Data dibagi menjadi `train_loader` dan `val_loader` dengan `BATCH_SIZE=100`.

In [10]:
!python -m spacy download en_core_web_sm
!python -m spacy download fr_core_news_sm

Collecting en-core-web-sm==3.8.0
  Downloading https://github.com/explosion/spacy-models/releases/download/en_core_web_sm-3.8.0/en_core_web_sm-3.8.0-py3-none-any.whl (12.8 MB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m12.8/12.8 MB[0m [31m114.4 MB/s[0m eta [36m0:00:00[0m
[?25h[38;5;2m✔ Download and installation successful[0m
You can now load the package via spacy.load('en_core_web_sm')
[38;5;3m⚠ Restart to reload dependencies[0m
If you are in a Jupyter or Colab notebook, you may need to restart Python in
order to load all the package's dependencies. You can do this by selecting the
'Restart kernel' or 'Restart runtime' option.
Collecting fr-core-news-sm==3.8.0
  Downloading https://github.com/explosion/spacy-models/releases/download/fr_core_news_sm-3.8.0/fr_core_news_sm-3.8.0-py3-none-any.whl (16.3 MB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m16.3/16.3 MB[0m [31m95.0 MB/s[0m eta [36m0:00:00[0m
[?25hInstalling collected packages

In [11]:
# ==============================================================================
# PERSIAPAN DATA (Text Preprocessing)
# ==============================================================================
import torch
import torch.nn as nn
import torch.optim as optim
from torch.utils.data import Dataset, DataLoader
import pandas as pd
import spacy
from collections import Counter
import io

# Download & Load Dataset
import pandas as pd

try:
    with open('small_vocab_en.csv', 'r', encoding='utf-8') as f:
        en_sentences = [line.strip() for line in f]

    with open('small_vocab_fr.csv', 'r', encoding='utf-8') as f:
        fr_sentences = [line.strip() for line in f]

    if len(en_sentences) != len(fr_sentences):
        print(f"Error: Jumlah baris tidak sama! Inggris: {len(en_sentences)}, Prancis: {len(fr_sentences)}")
    else:
        df = pd.DataFrame({
            'english': en_sentences,
            'french': fr_sentences
        })

        print("Berhasil memuat dan menggabungkan 2 file. Contoh data:")
        print(df.head())

except FileNotFoundError:
    print("Pastikan 'small_vocab_en.csv' dan 'small_vocab_fr.csv' sudah di-upload.")
    df = pd.DataFrame({"english": ["hello world"], "french": ["bonjour le monde"]})

print("Berhasil menggabungkan 2 file. Contoh data:")
print(df.head())

print("Contoh data:")
print(df.head())

# Setup Tokenizer (Spacy)

spacy_eng = spacy.load('en_core_web_sm')
spacy_fra = spacy.load('fr_core_news_sm')

def tokenize_eng(text):
    return [tok.text.lower() for tok in spacy_eng.tokenizer(text)]

def tokenize_fra(text):
    return [tok.text.lower() for tok in spacy_fra.tokenizer(text)]

# Setup Vocabulary (Kamus)
class Vocabulary:
    def __init__(self, tokenizer, freq_threshold=2):
        self.itos = {0: "<PAD>", 1: "<SOS>", 2: "<EOS>", 3: "<UNK>"}
        self.stoi = {"<PAD>": 0, "<SOS>": 1, "<EOS>": 2, "<UNK>": 3}
        self.tokenizer = tokenizer
        self.freq_threshold = freq_threshold

    def __len__(self):
        return len(self.itos)

    def build_vocabulary(self, sentence_list):
        word_counts = Counter()
        idx = 4

        for sentence in sentence_list:
            for word in self.tokenizer(sentence):
                word_counts[word] += 1

        for word, count in word_counts.items():
            if count >= self.freq_threshold:
                self.stoi[word] = idx
                self.itos[idx] = word
                idx += 1

    def numericalize(self, text):
        tokenized_text = self.tokenizer(text)
        return [self.stoi.get(token, self.stoi["<UNK>"]) for token in tokenized_text]

# Membuat kamus
freq_threshold = 2
eng_vocab = Vocabulary(tokenize_eng, freq_threshold)
fra_vocab = Vocabulary(tokenize_fra, freq_threshold)

df_sample = df.sample(20000, random_state=42).reset_index(drop=True)

eng_vocab.build_vocabulary(df_sample['english'].tolist())
fra_vocab.build_vocabulary(df_sample['french'].tolist())

print(f"Ukuran Kamus Inggris: {len(eng_vocab)}")
print(f"Ukuran Kamus Prancis: {len(fra_vocab)}")

# Custom Dataset
class TranslationDataset(Dataset):
    def __init__(self, df, eng_vocab, fra_vocab):
        self.df = df
        self.eng_vocab = eng_vocab
        self.fra_vocab = fra_vocab

    def __len__(self):
        return len(self.df)

    def __getitem__(self, index):
        eng_sentence = self.df.iloc[index]['english']
        fra_sentence = self.df.iloc[index]['french']

        eng_tokenized = [self.eng_vocab.stoi["<SOS>"]] + self.eng_vocab.numericalize(eng_sentence) + [self.eng_vocab.stoi["<EOS>"]]
        fra_tokenized = [self.fra_vocab.stoi["<SOS>"]] + self.fra_vocab.numericalize(fra_sentence) + [self.fra_vocab.stoi["<EOS>"]]

        return torch.tensor(eng_tokenized), torch.tensor(fra_tokenized)

# Collate Function (Padding)
class MyCollate:
    def __init__(self, pad_idx):
        self.pad_idx = pad_idx

    def __call__(self, batch):
        source_seqs = [item[0] for item in batch]
        target_seqs = [item[1] for item in batch]

        # Pad sequences
        source_padded = nn.utils.rnn.pad_sequence(source_seqs, batch_first=True, padding_value=self.pad_idx)
        target_padded = nn.utils.rnn.pad_sequence(target_seqs, batch_first=True, padding_value=self.pad_idx)

        return source_padded, target_padded

# Setup DataLoader
train_df = df_sample.iloc[:19000]
val_df = df_sample.iloc[19000:]

train_dataset = TranslationDataset(train_df, eng_vocab, fra_vocab)
val_dataset = TranslationDataset(val_df, eng_vocab, fra_vocab)

PAD_IDX = eng_vocab.stoi["<PAD>"]
BATCH_SIZE = 100

collate_fn = MyCollate(pad_idx=PAD_IDX)

train_loader = DataLoader(train_dataset, batch_size=BATCH_SIZE, shuffle=True, collate_fn=collate_fn)
val_loader = DataLoader(val_dataset, batch_size=BATCH_SIZE, shuffle=False, collate_fn=collate_fn)

print("DataLoader siap.")

Berhasil memuat dan menggabungkan 2 file. Contoh data:
                                             english  \
0  new jersey is sometimes quiet during autumn , ...   
1  the united states is usually chilly during jul...   
2  california is usually quiet during march , and...   
3  the united states is sometimes mild during jun...   
4  your least liked fruit is the grape , but my l...   

                                              french  
0  new jersey est parfois calme pendant l' automn...  
1  les états-unis est généralement froid en juill...  
2  california est généralement calme en mars , et...  
3  les états-unis est parfois légère en juin , et...  
4  votre moins aimé fruit est le raisin , mais mo...  
Berhasil menggabungkan 2 file. Contoh data:
                                             english  \
0  new jersey is sometimes quiet during autumn , ...   
1  the united states is usually chilly during jul...   
2  california is usually quiet during march , and...   
3  the uni

## 2. Definisi Arsitektur Transformer

Arsitektur Transformer diimplementasikan "from scratch" dengan mendefinisikan setiap kelas komponen pembentuknya.

**Kelas-kelas yang Didefinisikan:**
1.  **`PositionalEncoding`**: Menambahkan informasi posisi ke *embedding* menggunakan formula `sin` dan `cos`, karena Transformer tidak memiliki pemahaman urutan secara abstrak.
2.  **`MultiHeadAttention`**: Implementasi dari mekanisme *Scaled Dot-Product Attention*. Lapisan ini memproyeksikan Query, Key, dan Value, membaginya menjadi beberapa *head*, menghitung skor *attention*, dan menerapkan *mask*.
3.  **`PositionwiseFeedForward`**: Jaringan MLP sederhana (Linear -> ReLU -> Linear) yang diterapkan setelah blok *attention*.
4.  **`EncoderLayer`**: Satu blok Encoder yang terdiri dari `MultiHeadAttention` (self-attention) dan `PositionwiseFeedForward`, lengkap dengan *residual connection* dan *layer normalization*.
5.  **`DecoderLayer`**: Satu blok Decoder yang terdiri dari dua `MultiHeadAttention` (self-attention dengan *look-ahead mask* dan *encoder-decoder attention*) serta `PositionwiseFeedForward`.
6.  **`Seq2SeqTransformer`**: Kelas utama yang menggabungkan semua komponen. Kelas ini juga bertanggung jawab untuk:
    * Meng-inisialisasi `nn.Embedding` untuk sumber dan target.
    * Membuat tumpukan `EncoderLayer` dan `DecoderLayer`.
    * Mendefinisikan fungsi `make_src_mask` (untuk padding) dan `make_trg_mask` (untuk padding + *look-ahead*).
    * Mendefinisikan alur `forward` pass dari `src` dan `trg` hingga menghasilkan output *logits* (nilai keluaran mentah yang belum dinormalisasi).

In [12]:
# ==============================================================================
# DEFINISI ARSITEKTUR TRANSFORMER
# ==============================================================================
import math

# Class Positional Encoding
class PositionalEncoding(nn.Module):
    def __init__(self, d_model, max_len, device):
        super(PositionalEncoding, self).__init__()
        self.encoding = torch.zeros(max_len, d_model).to(device)
        position = torch.arange(0, max_len).float().unsqueeze(1)
        div_term = torch.exp(torch.arange(0, d_model, 2).float() * -(math.log(10000.0) / d_model))

        self.encoding[:, 0::2] = torch.sin(position * div_term)
        self.encoding[:, 1::2] = torch.cos(position * div_term)
        self.encoding = self.encoding.unsqueeze(0)

    def forward(self, x):
        return x + self.encoding[:, :x.size(1)].detach()

# Class Multi-Head Attention
class MultiHeadAttention(nn.Module):
    def __init__(self, d_model, num_heads):
        super(MultiHeadAttention, self).__init__()
        self.num_heads = num_heads
        self.d_model = d_model
        self.head_dim = d_model // num_heads

        self.q_linear = nn.Linear(d_model, d_model)
        self.k_linear = nn.Linear(d_model, d_model)
        self.v_linear = nn.Linear(d_model, d_model)
        self.out_linear = nn.Linear(d_model, d_model)

    def forward(self, query, key, value, mask=None):
        batch_size = query.size(0)

        # Linear projections
        Q = self.q_linear(query)
        K = self.k_linear(key)
        V = self.v_linear(value)

        # Split into heads
        Q = Q.view(batch_size, -1, self.num_heads, self.head_dim).transpose(1, 2)
        K = K.view(batch_size, -1, self.num_heads, self.head_dim).transpose(1, 2)
        V = V.view(batch_size, -1, self.num_heads, self.head_dim).transpose(1, 2)

        # Scaled Dot-Product Attention
        scores = torch.matmul(Q, K.transpose(-2, -1)) / math.sqrt(self.head_dim)

        if mask is not None:
            scores = scores.masked_fill(mask == 0, -1e9)

        attention = torch.softmax(scores, dim=-1)
        context = torch.matmul(attention, V)

        # Concatenate heads
        context = context.transpose(1, 2).contiguous().view(batch_size, -1, self.d_model)
        return self.out_linear(context)

# Class Position-wise Feed-Forward
class PositionwiseFeedForward(nn.Module):
    def __init__(self, d_model, ff_dim, dropout=0.1):
        super(PositionwiseFeedForward, self).__init__()
        self.linear1 = nn.Linear(d_model, ff_dim)
        self.linear2 = nn.Linear(ff_dim, d_model)
        self.dropout = nn.Dropout(dropout)

    def forward(self, x):
        return self.linear2(self.dropout(torch.relu(self.linear1(x))))

# Class Encoder Layer
class EncoderLayer(nn.Module):
    def __init__(self, d_model, num_heads, ff_dim, dropout=0.1):
        super(EncoderLayer, self).__init__()
        self.attention = MultiHeadAttention(d_model, num_heads)
        self.norm1 = nn.LayerNorm(d_model)
        self.feed_forward = PositionwiseFeedForward(d_model, ff_dim, dropout)
        self.norm2 = nn.LayerNorm(d_model)
        self.dropout = nn.Dropout(dropout)

    def forward(self, x, mask):
        attn_output = self.attention(x, x, x, mask)
        x = x + self.dropout(attn_output)
        x = self.norm1(x)

        ff_output = self.feed_forward(x)
        x = x + self.dropout(ff_output)
        x = self.norm2(x)
        return x

# Class Decoder Layer
class DecoderLayer(nn.Module):
    def __init__(self, d_model, num_heads, ff_dim, dropout=0.1):
        super(DecoderLayer, self).__init__()
        self.self_attention = MultiHeadAttention(d_model, num_heads)
        self.norm1 = nn.LayerNorm(d_model)
        self.encoder_attention = MultiHeadAttention(d_model, num_heads)
        self.norm2 = nn.LayerNorm(d_model)
        self.feed_forward = PositionwiseFeedForward(d_model, ff_dim, dropout)
        self.norm3 = nn.LayerNorm(d_model)
        self.dropout = nn.Dropout(dropout)

    def forward(self, x, enc_output, src_mask, trg_mask):
        self_attn_output = self.self_attention(x, x, x, trg_mask)
        x = x + self.dropout(self_attn_output)
        x = self.norm1(x)

        enc_attn_output = self.encoder_attention(x, enc_output, enc_output, src_mask)
        x = x + self.dropout(enc_attn_output)
        x = self.norm2(x)

        ff_output = self.feed_forward(x)
        x = x + self.dropout(ff_output)
        x = self.norm3(x)
        return x

# Class Transformer Lengkap
class Seq2SeqTransformer(nn.Module):
    def __init__(self, src_vocab_size, trg_vocab_size, d_model, num_heads, num_layers,
                 ff_dim, max_len, dropout, device):
        super(Seq2SeqTransformer, self).__init__()
        self.device = device
        self.src_embedding = nn.Embedding(src_vocab_size, d_model)
        self.trg_embedding = nn.Embedding(trg_vocab_size, d_model)
        self.pos_encoding = PositionalEncoding(d_model, max_len, device)

        self.encoder_layers = nn.ModuleList([EncoderLayer(d_model, num_heads, ff_dim, dropout) for _ in range(num_layers)])
        self.decoder_layers = nn.ModuleList([DecoderLayer(d_model, num_heads, ff_dim, dropout) for _ in range(num_layers)])

        self.fc_out = nn.Linear(d_model, trg_vocab_size)
        self.dropout = nn.Dropout(dropout)

    def make_src_mask(self, src):
        src_mask = (src != PAD_IDX).unsqueeze(1).unsqueeze(2)
        return src_mask.to(self.device)

    def make_trg_mask(self, trg):
        trg_pad_mask = (trg != PAD_IDX).unsqueeze(1).unsqueeze(3)
        trg_len = trg.shape[1]
        trg_sub_mask = torch.tril(torch.ones((trg_len, trg_len), device=self.device)).bool()
        trg_mask = trg_pad_mask & trg_sub_mask
        return trg_mask

    def forward(self, src, trg):
        src_mask = self.make_src_mask(src)
        trg_mask = self.make_trg_mask(trg)

        src_emb = self.dropout(self.pos_encoding(self.src_embedding(src)))
        trg_emb = self.dropout(self.pos_encoding(self.trg_embedding(trg)))

        enc_output = src_emb
        for layer in self.encoder_layers:
            enc_output = layer(enc_output, src_mask)

        dec_output = trg_emb
        for layer in self.decoder_layers:
            dec_output = layer(dec_output, enc_output, src_mask, trg_mask)

        return self.fc_out(dec_output)

print("Class-class arsitektur Transformer berhasil didefinisikan.")

Class-class arsitektur Transformer berhasil didefinisikan.


## 3. Proses Pelatihan

Tahap ini berfokus pada pelatihan model selama 1 epoch, dengan menampilkan metrik performa secara detail sesuai permintaan tugas.

**Proses yang Dilakukan:**
1.  **Inisialisasi**: Model `Seq2SeqTransformer` diinisialisasi dengan hyperparameter yang dipilih (seperti `D_MODEL=256`, `NUM_LAYERS=3` untuk mempercepat training). Bobot diinisialisasi menggunakan `xavier_uniform_` untuk stabilitas.
2.  **Loss & Optimizer**: Kami menggunakan `CrossEntropyLoss` dengan `ignore_index=PAD_IDX` agar *padding* tidak berkontribusi pada *loss*. Optimizer yang digunakan adalah `Adam`.
3.  **Fungsi `get_accuracy`**: Dibuat fungsi helper untuk menghitung akurasi per token, dengan mengabaikan token `<PAD>`.
4.  **Fungsi `train_epoch` dan `evaluate`**:
    * Kedua fungsi ini mengiterasi `DataLoader`.
    * Melakukan *forward pass* dengan *slicing* token target yang tepat (`trg[:, :-1]` sebagai input dan `trg[:, 1:]` sebagai target) untuk *teacher forcing*.
    * Menghitung *loss* dan *accuracy* untuk setiap *batch*.
    * **Pelaporan per Batch**: Sesuai permintaan tugas, **TrainLoss**, **ValLoss**, dan **ValAcc** (atau TrainAcc) dicetak ke konsol **pada setiap akhir batch**.
5.  **Eksekusi Training**: Model dilatih hanya untuk **1 epoch**.

In [13]:
# ==============================================================================
# PROSES PELATIHAN
# ==============================================================================

# Inisialisasi Model dan Hyperparameter
DEVICE = torch.device('cuda' if torch.cuda.is_available() else 'cpu')

SRC_VOCAB_SIZE = len(eng_vocab)
TRG_VOCAB_SIZE = len(fra_vocab)
D_MODEL = 256
NUM_HEADS = 8
NUM_LAYERS = 3
FF_DIM = 512
MAX_LEN = 100
DROPOUT = 0.1
LEARNING_RATE = 0.0001
NUM_EPOCHS = 1

model = Seq2SeqTransformer(SRC_VOCAB_SIZE, TRG_VOCAB_SIZE, D_MODEL, NUM_HEADS,
                           NUM_LAYERS, FF_DIM, MAX_LEN, DROPOUT, DEVICE).to(DEVICE)

# Inisialisasi bobot
def initialize_weights(m):
    if hasattr(m, 'weight') and m.weight.dim() > 1:
        nn.init.xavier_uniform_(m.weight.data)
model.apply(initialize_weights)

optimizer = optim.Adam(model.parameters(), lr=LEARNING_RATE)

criterion = nn.CrossEntropyLoss(ignore_index=PAD_IDX)

# Fungsi Helper (Akurasi & Loop)
def get_accuracy(output, target, pad_idx):
    output = output.argmax(dim=-1)

    non_pad_mask = (target != pad_idx)

    correct = (output == target)[non_pad_mask].float()

    if len(correct) == 0:
        return torch.tensor(0.0)

    acc = correct.sum() / len(correct)
    return acc

def train_epoch(model, loader, optimizer, criterion, pad_idx, device):
    model.train()
    epoch_loss = 0
    epoch_acc = 0

    for i, batch in enumerate(loader):
        src, trg = batch
        src, trg = src.to(device), trg.to(device)

        optimizer.zero_grad()

        output = model(src, trg[:, :-1])

        output_flat = output.reshape(-1, output.shape[2])
        target_flat = trg[:, 1:].reshape(-1)

        loss = criterion(output_flat, target_flat)
        loss.backward()
        optimizer.step()

        # Hitung metrik
        batch_loss = loss.item()
        batch_acc = get_accuracy(output, trg[:, 1:], pad_idx).item()

        epoch_loss += batch_loss
        epoch_acc += batch_acc

        # Laporan per Batch
        print(f"  Batch {i+1}/{len(loader)} | TrainLoss: {batch_loss:.4f} | TrainAcc: {batch_acc:.4f}")

    return epoch_loss / len(loader), epoch_acc / len(loader)

def evaluate(model, loader, criterion, pad_idx, device):
    model.eval()
    epoch_loss = 0
    epoch_acc = 0

    with torch.no_grad():
        for i, batch in enumerate(loader):
            src, trg = batch
            src, trg = src.to(device), trg.to(device)

            output = model(src, trg[:, :-1])
            output_flat = output.reshape(-1, output.shape[2])
            target_flat = trg[:, 1:].reshape(-1)

            loss = criterion(output_flat, target_flat)

            # Hitung metrik
            batch_loss = loss.item()
            batch_acc = get_accuracy(output, trg[:, 1:], pad_idx).item()

            epoch_loss += batch_loss
            epoch_acc += batch_acc

            # Laporan per Batch
            print(f"  Batch {i+1}/{len(loader)} | ValLoss: {batch_loss:.4f} | ValAcc: {batch_acc:.4f}")

    return epoch_loss / len(loader), epoch_acc / len(loader)

# Training
print(f"Memulai Training untuk {NUM_EPOCHS} Epoch...")

for epoch in range(NUM_EPOCHS):
    print(f"\n--- Epoch {epoch+1}/{NUM_EPOCHS} ---")

    print("Menjalankan Training...")
    train_loss, train_acc = train_epoch(model, train_loader, optimizer, criterion, PAD_IDX, DEVICE)

    print("\nMenjalankan Validasi...")
    val_loss, val_acc = evaluate(model, val_loader, criterion, PAD_IDX, DEVICE)

    print("-" * 50)
    print(f"HASIL EPOCH {epoch+1}:")
    print(f"  Avg Train Loss: {train_loss:.4f} | Avg Train Acc: {train_acc:.4f}")
    print(f"  Avg Val Loss  : {val_loss:.4f} | Avg Val Acc  : {val_acc:.4f}")
    print("-" * 50)

print("Pelatihan selesai.")

Memulai Training untuk 1 Epoch...

--- Epoch 1/1 ---
Menjalankan Training...
  Batch 1/190 | TrainLoss: 5.9446 | TrainAcc: 0.0075
  Batch 2/190 | TrainLoss: 5.4090 | TrainAcc: 0.0838
  Batch 3/190 | TrainLoss: 5.1529 | TrainAcc: 0.0923
  Batch 4/190 | TrainLoss: 4.9883 | TrainAcc: 0.0908
  Batch 5/190 | TrainLoss: 4.8319 | TrainAcc: 0.0807
  Batch 6/190 | TrainLoss: 4.7717 | TrainAcc: 0.0787
  Batch 7/190 | TrainLoss: 4.6265 | TrainAcc: 0.0739
  Batch 8/190 | TrainLoss: 4.6473 | TrainAcc: 0.0787
  Batch 9/190 | TrainLoss: 4.5519 | TrainAcc: 0.0790
  Batch 10/190 | TrainLoss: 4.5403 | TrainAcc: 0.0946
  Batch 11/190 | TrainLoss: 4.5334 | TrainAcc: 0.0932
  Batch 12/190 | TrainLoss: 4.5081 | TrainAcc: 0.0977
  Batch 13/190 | TrainLoss: 4.4688 | TrainAcc: 0.1000
  Batch 14/190 | TrainLoss: 4.4991 | TrainAcc: 0.0824
  Batch 15/190 | TrainLoss: 4.3906 | TrainAcc: 0.1074
  Batch 16/190 | TrainLoss: 4.4095 | TrainAcc: 0.0883
  Batch 17/190 | TrainLoss: 4.3741 | TrainAcc: 0.1018
  Batch 18/190

## 4. Proses Inferensi (Translation)

Tahap ini mendokumentasikan proses penggunaan model yang telah dilatih untuk menerjemahkan kalimat baru dari bahasa Inggris ke bahasa Prancis.

**Proses yang Dilakukan:**
1.  **Fungsi `translate_sentence`**: Dibuat sebuah fungsi khusus untuk inferensi.
2.  **Mode Autoregressive**: Tidak seperti saat training, proses ini berjalan *autoregressive* (kata demi kata):
    * Kalimat sumber diproses oleh **Encoder** *hanya satu kali* untuk mendapatkan representasi konteks (`enc_output`).
    * **Decoder** dimulai dengan token `<SOS>`.
    * Decoder memprediksi kata berikutnya (`pred_token`).
    * `pred_token` tersebut kemudian digabungkan ke input Decoder untuk memprediksi kata selanjutnya.
    * Proses ini diulang hingga model memprediksi token `<EOS>` atau mencapai panjang maksimum.
3.  **Uji Coba**: Kami menunjukkan penerapan fungsi ini pada kalimat dari *validation set* dan pada kalimat kustom baru.
    * **Hasil**: Karena model hanya dilatih 1 epoch, hasil terjemahan yang dihasilkan masih acak. Fokusnya adalah untuk mendemonstrasikan bahwa *pipeline* inferensi berfungsi dengan benar.

In [14]:
# ==============================================================================
# PROSES INFERENSI (TRANSLATION)
# ==============================================================================

def translate_sentence(model, sentence, eng_vocab, fra_vocab, device, max_len=50):
    model.eval()

    # Membuat Token untuk kalimat sumber
    if isinstance(sentence, str):
        tokens = [tok.text.lower() for tok in spacy_eng.tokenizer(sentence)]
    else:
        tokens = [tok.lower() for tok in sentence]

    # Transform ke angka + <SOS> dan <EOS>
    tokens_numerical = [eng_vocab.stoi["<SOS>"]] + [eng_vocab.stoi.get(token, eng_vocab.stoi["<UNK>"]) for token in tokens] + [eng_vocab.stoi["<EOS>"]]
    src_tensor = torch.LongTensor(tokens_numerical).unsqueeze(0).to(device)

    src_mask = model.make_src_mask(src_tensor)

    with torch.no_grad():
        enc_output = model.src_embedding(src_tensor)
        enc_output = model.pos_encoding(enc_output)
        for layer in model.encoder_layers:
            enc_output = layer(enc_output, src_mask)

    # Proses decoding (autoregressive)
    trg_indices = [fra_vocab.stoi["<SOS>"]]

    for i in range(max_len):
        trg_tensor = torch.LongTensor(trg_indices).unsqueeze(0).to(device)
        trg_mask = model.make_trg_mask(trg_tensor)

        with torch.no_grad():
            dec_output = model.trg_embedding(trg_tensor)
            dec_output = model.pos_encoding(dec_output)
            for layer in model.decoder_layers:
                dec_output = layer(dec_output, enc_output, src_mask, trg_mask)

            output = model.fc_out(dec_output)

        pred_token = output.argmax(2)[:, -1].item()
        trg_indices.append(pred_token)

        if pred_token == fra_vocab.stoi["<EOS>"]:
            break

    # Transform angka kembali jadi kata-kata
    trg_tokens = [fra_vocab.itos[i] for i in trg_indices]

    return trg_tokens[1:]

print("\n--- UJI COBA INFERENSI ---")

example_idx = 5
src_text = val_df.iloc[example_idx]['english']
trg_text = val_df.iloc[example_idx]['french']

print(f"Kalimat Sumber (Inggris): {src_text}")
print(f"Terjemahan Seharusnya (Prancis): {trg_text}")

translation_result = translate_sentence(model, src_text, eng_vocab, fra_vocab, DEVICE)
print(f"Hasil Terjemahan Model: {' '.join(translation_result)}")

custom_sentence = "a cat is sitting on the roof"
print(f"\nKalimat Custom: {custom_sentence}")
translation_result = translate_sentence(model, custom_sentence, eng_vocab, fra_vocab, DEVICE)
print(f"Hasil Terjemahan Model: {' '.join(translation_result)}")


--- UJI COBA INFERENSI ---
Kalimat Sumber (Inggris): he likes grapefruit , grapes , and oranges .
Terjemahan Seharusnya (Prancis): il aime le pamplemousse , les raisins et les oranges .
Hasil Terjemahan Model: il aime les les , les les et les les . <EOS>

Kalimat Custom: a cat is sitting on the roof
Hasil Terjemahan Model: elle aime est le , . . <EOS>
