---
### KOD İÇERİĞİ AŞAĞIDAKİ GİBİDİR.
---
# 1️⃣ Tokenizer & Dataset

## TokenizerDataset (Dataset class)
- `__init__` → `sources`, `targets`, `tokenizer_name`, `max_length`, `decoder_start_token_id`  
- `__len__` → dataset uzunluğu  
- `__getitem__` → 
  - `encoder_input_ids`, `encoder_attention_mask`
  - `decoder_input_ids`, `decoder_attention_mask`
  - `decoder_target_ids`

## collate_fn
- Batch verilerini stack ederek tensor haline getirir:  
  - `'encoder_input_ids'`, `'encoder_attention_mask'`  
  - `'decoder_input_ids'`, `'decoder_attention_mask'`  
  - `'decoder_target_ids'`  

## DataLoader
- `batch_size`, `shuffle`, `collate_fn` ile tanımlanır  


# 2️⃣ Model Yapısı (Seq2Seq LLM)

## 2.1 Encoder
- **TokenEmbed** → Token embedding  
- **PositionelEncod** → Positional encoding  
- **MultiHeadAttention** → Self-attention  
- **FeedForward** → FFN (GELU veya SwiGLU ile)  
- **TransformerEncoderBlockLLM** → 
  - LayerNorm  
  - Self-Attention  
  - FeedForward  
  - DropPath  
- **TransformersEncoderLLM** → 
  - Embedding + Positional encoding  
  - Encoder blokları (ModuleList)  
  - LayerNorm  

## 2.2 Decoder
- **TransformerDecoderBlockLLM** → 
  - Masked Self-Attention  
  - Cross-Attention  
  - FeedForward  
  - LayerNorm + DropPath + Gamma parametreleri  
- **TransformerDecoderLLM** → 
  - Embedding + Positional encoding  
  - Decoder blokları  
  - LayerNorm  
  - `lm_head` → vocab boyutunda linear layer  

## 2.3 Seq2Seq Model
- **Seq2SeqLLM** → 
  - Encoder ve Decoder’ı birleştirir  
  - `forward` → `src_tokens` → encoder → `tgt_tokens` → decoder → logits  


# 3️⃣ Generation (Top-K / Top-P)
- **generate_seq2seq**
  - `src_texts` → tokenize edilir  
  - `dec_input_ids` → başlangıç tokenları (pad veya bos)  
  - Döngü ile max_len boyunca token üretimi:
    - Top-K filtreleme  
    - Top-P (nucleus) filtreleme  
    - `torch.multinomial` ile sampling  
  - Sonuç → decoded string listesi  


# 4️⃣ Loss & Metrics

## 4.1 Loss
- **masked_cross_entropy_loss**
  - CrossEntropyLoss with `ignore_index=pad_token_id`  

## 4.2 Accuracy
- **accuracy_fn** → pad mask ile token-level accuracy  
- **top_k_accuracy_fn** → top-k içinde doğru token sayısı  

## 4.3 Perplexity & BLEU
- **calculate_llm_metrics**
  - Perplexity → CrossEntropyLoss(reduction='none') ve exp(mean)  
  - Sentence-level BLEU → `sentence_bleu` + smoothing  
  - Corpus-level BLEU → `corpus_bleu` + smoothing  


# 5️⃣ Training Loop
- **LLM_Train**
  - Epoch loop  
  - Forward pass: `model(src_tokens, tgt_tokens)`  
  - Loss backward → optimizer.step + scheduler.step  
  - Metrics hesaplama: `accuracy`, `top5`, `ppl`, `BLEU (sent + corpus)`  
  - Her `sample_interval` epoch: Top-K + Top-P örnek generation  
  - Postfix ve epoch print ile log  



# 6️⃣ Optimizer & Scheduler
- **AdamW** → lr, weight_decay  
- **get_linear_schedule_with_warmup**
  - `num_warmup_steps`  
  - `num_training_steps`  



# 7️⃣ Device / CUDA
- `device = "cuda" if torch.cuda.is_available() else "cpu"`  
- Model ve batch tensorları GPU’ya taşınıyor: `.to(device)`  
- `torch.cuda.empty_cache()` → notebook’ta yeniden çalıştırmadan önce
