# Turkish Legal Question-Answering System with LoRA Fine-tuning

This notebook demonstrates fine-tuning Trendyol LLM 7B on Turkish legal Q&A dataset using QLoRA (4-bit quantization + LoRA).

## Project Overview
- **Base Model**: Trendyol/Trendyol-LLM-7b-chat-v0.1
- **Dataset**: turkish-law-chatbot (14.9K Q&A pairs)
- **Method**: QLoRA (4-bit quantization + LoRA adapters)
- **Hardware**: 8GB VRAM

In [1]:
import os

# CUDA memory allocation optimization
os.environ['PYTORCH_CUDA_ALLOC_CONF'] = 'expandable_segments:True'

## 1. Load Model with 4-bit Quantization

**Quantization:** Reduces model weights from 32-bit to 4-bit
- Memory: Trendyol LLM 7B → ~4GB (4-bit)
- Uses NF4 (Normal Float 4-bit) for better accuracy
- Compute dtype: bfloat16 for stable training
- **Optimized for Turkish language**

In [2]:
import torch
from transformers import LlamaTokenizer, AutoModelForCausalLM, BitsAndBytesConfig

# 4-bit quantization configuration
bnb_config = BitsAndBytesConfig(
    load_in_4bit=True,
    bnb_4bit_quant_type="nf4",
    bnb_4bit_compute_dtype=torch.bfloat16,
    bnb_4bit_use_double_quant=True,
)

# Model name - Turkish optimized model by Trendyol
model_id = "Trendyol/Trendyol-LLM-7b-chat-v0.1"

print("Loading model...")
tokenizer = LlamaTokenizer.from_pretrained(model_id)
model = AutoModelForCausalLM.from_pretrained(
    model_id,
    quantization_config=bnb_config,
    device_map="auto",
)

print(f"✓ Model loaded successfully")
print(f"Memory footprint: {model.get_memory_footprint() / 1024**3:.2f} GB")

Loading model...


Loading checkpoint shards:   0%|          | 0/6 [00:00<?, ?it/s]

✓ Model loaded successfully
Memory footprint: 3.69 GB


## 2. Test Base Model

Test the base model before fine-tuning to establish a baseline.

In [3]:
# Test prompt
messages = [
    {"role": "user", "content": "Trafik cezalarına itiraz süreci nasıl işler?"}
]

# Tokenize and generate
text = tokenizer.apply_chat_template(messages, 
                                     tokenize=False, 
                                     add_generation_prompt=True)

model_inputs = tokenizer([text], return_tensors="pt").to(model.device)

generated_ids = model.generate(
    **model_inputs,
    max_new_tokens=1024
)

generated_ids = [
    output_ids[len(input_ids):] for input_ids, output_ids in zip(model_inputs.input_ids, generated_ids)
]

response = tokenizer.batch_decode(generated_ids, skip_special_tokens=True)[0]

print(response)

Trafik cezalarına itiraz süreci, trafik cezası aldığınıza dair tebligatınızın gelmesiyle başlar.Express'te yer alan bilgilere göre, trafik cezasına itiraz etmek için öncelikle cezanın size tebliğ edildiği tarihten itibaren 15 gün içinde itiraz etmeniz gerekiyor. Bu süre içinde itiraz etmezseniz, ceza kesinleşiyor ve ödeme yapmanız gerekiyor. İtiraz etmek için trafik cezası aldığınız tarihten itibaren 15 gün içinde itiraz etmeniz gerekiyor. İtiraz etmek için trafik cezasının size tebliğ edildiği tarihten itibaren 15 gün içinde itiraz etmeniz gerekiyor. İtiraz etmek için trafik cezasının size tebliğ edildiği tarihten itibaren 15 gün içinde itiraz etmeniz gerekiyor. İtiraz etmek için trafik cezasının size tebliğ edildiği tarihten itibaren 15 gün içinde itiraz etmeniz gerekiyor. İtiraz etmek için trafik cezasının size tebliğ edildiği tarihten itibaren 15 gün içinde itiraz etmeniz gerekiyor. İtiraz etmek için trafik cezasının size tebliğ edildiği tarihten itibaren 15 gün içinde itiraz etmen

## 3. Load Dataset

Turkish legal Q&A dataset with ~14.9K question-answer pairs.

In [4]:
from datasets import load_dataset

# Load dataset
dataset = load_dataset("Renicames/turkish-law-chatbot")

print(f"Train examples: {len(dataset['train'])}")
print(f"Test examples: {len(dataset['test'])}")
print(f"\nColumns: {dataset['train'].column_names}")
print(f"\nFirst example:")
print(dataset['train'][0])

Train examples: 13354
Test examples: 1500

Columns: ['Soru', 'Cevap']

First example:
{'Soru': "Anayasa madde 1'e göre, türkiye'nin devlet şekli nedir", 'Cevap': "Anayasa madde 1'e göre, türkiye'nin devlet şekli cumhuriyettir. bu madde, türkiye'nin yönetim biçiminin halkın egemenliğine dayandığını ve bu yönetim biçiminin cumhuriyet olduğunu belirler. cumhuriyet, halkın kendi kendini yönetme biçimi olarak kabul edilir ve türkiye cumhuriyeti'nin temel yönetim ilkesi olarak anayasal güvence altına alınmıştır."}


## 4. Define Formatting Function

Converts Q&A pairs into chat format that the model expects.

In [5]:
def formatting_func(example):
    """
    Converts a single Q&A example to chat format.

    Args:
        example: Dict with 'Soru' (question) and 'Cevap' (answer) keys

    Returns:
        Formatted text string in chat template format
    """
    messages = [
        {"role": "user", "content": example['Soru']},
        {"role": "assistant", "content": example['Cevap']}
    ]

    # Apply chat template (adds special tokens)
    text = tokenizer.apply_chat_template(
        messages,
        tokenize=False,
        add_generation_prompt=False  # Include answer for training
    )

    return text

# Test formatting function
formatted_example = formatting_func(dataset['train'][0])
print("Formatted example:")
print(formatted_example[:300], "...")

Formatted example:
<s>[INST] Anayasa madde 1'e göre, türkiye'nin devlet şekli nedir [/INST] Anayasa madde 1'e göre, türkiye'nin devlet şekli cumhuriyettir. bu madde, türkiye'nin yönetim biçiminin halkın egemenliğine dayandığını ve bu yönetim biçiminin cumhuriyet olduğunu belirler. cumhuriyet, halkın kendi kendini yöne ...


## 5. Configure LoRA

**LoRA (Low-Rank Adaptation):**
- Trains only small adapter matrices instead of full model
- 0.24% trainable parameters
- Memory efficient and fast

**Parameters:**
- `r=16`: Rank of adapter matrices
- `alpha=32`: Controls how strongly LoRA adapters influence the frozen model (higher = stronger adapter effect)
- Target modules: Attention layers (q, k, v, o projections)

In [6]:
from peft import LoraConfig, prepare_model_for_kbit_training, get_peft_model

# LoRA configuration
lora_config = LoraConfig(
    r=16,
    lora_alpha=32,
    target_modules=["q_proj", "v_proj", "k_proj", "o_proj"],
    lora_dropout=0.05,
    bias="none",
    task_type="CAUSAL_LM"
)

# Prepare quantized model for training
model = prepare_model_for_kbit_training(model)

# Enable gradient checkpointing (saves memory)
model.gradient_checkpointing_enable()

# Add LoRA adapters to model
model = get_peft_model(model, lora_config)

print("✓ LoRA configured")
model.print_trainable_parameters()

✓ LoRA configured
trainable params: 16,777,216 || all params: 6,855,315,456 || trainable%: 0.2447


## 6. Configure Training Arguments

**Key Parameters:**
- `num_train_epochs=2`: Train for 2 complete passes through the dataset
- `batch_size=2`: Process 2 examples at once
- `gradient_accumulation_steps=4`: Effective batch size = 8
- `learning_rate=2e-4`: Standard for LoRA
- `bf16=True`: Use bfloat16 for memory efficiency
- `save_strategy="epoch"`: Save checkpoint after each epoch

In [None]:
from transformers import TrainingArguments

training_args = TrainingArguments(
    output_dir="./trendyol-turkish-law-lora",        # Output directory for checkpoints
    num_train_epochs=2,                              # 2 epochs
    per_device_train_batch_size=2,                   # Batch size per device
    gradient_accumulation_steps=4,                   # Effective batch = 8
    learning_rate=2e-4,                              # Learning rate
    logging_steps=100,                               # Log every 100 steps
    save_strategy="epoch",                           # Save by epochs
    bf16=True,                                       # Use bfloat16 for training
    optim="paged_adamw_8bit",                        # 8-bit optimizer for memory efficiency
)

print("✓ Training arguments configured")

✓ Training arguments configured


## 7. Create Trainer

SFTTrainer (Supervised Fine-Tuning Trainer) handles the training loop.

In [8]:
from trl import SFTTrainer

# Create trainer
trainer = SFTTrainer(
    model=model,
    args=training_args,
    train_dataset=dataset['train'],
    formatting_func=formatting_func
)

print("✓ Trainer created and ready")

✓ Trainer created and ready


## 8. Start Training

In [9]:
# Start training
trainer.train()

The tokenizer has new PAD/BOS/EOS tokens that differ from the model config and generation config. The model config and generation config were aligned accordingly, being updated with the tokenizer's values. Updated tokens: {'pad_token_id': 2}.


Step,Training Loss
100,1.8169
200,1.2779
300,1.1715
400,1.1218
500,1.0785
600,1.0301
700,0.9816
800,0.9354
900,0.8781
1000,0.9149


TrainOutput(global_step=3340, training_loss=0.8177105366826771, metrics={'train_runtime': 16875.1108, 'train_samples_per_second': 1.583, 'train_steps_per_second': 0.198, 'total_flos': 9.926673952815514e+16, 'train_loss': 0.8177105366826771, 'entropy': 0.6569879136647389, 'num_tokens': 1993798.0, 'mean_token_accuracy': 0.8258876568952184, 'epoch': 2.0})

## 9. Save Model

After training completes, save the LoRA adapters.

In [None]:
# Save LoRA adapters
model.save_pretrained("./trendyol-turkish-law-lora-final")
tokenizer.save_pretrained("./trendyol-turkish-law-lora-final")
print("✓ Model saved")

✓ Model saved


## 10. Evaluation

Test the fine-tuned model and compare with base model.

In [1]:
import torch
from transformers import AutoTokenizer, AutoModelForCausalLM, BitsAndBytesConfig
from peft import PeftModel

# 4-bit quantization configuration
bnb_config = BitsAndBytesConfig(
    load_in_4bit=True,
    bnb_4bit_quant_type="nf4",
    bnb_4bit_compute_dtype=torch.bfloat16,
    bnb_4bit_use_double_quant=True,
)

# Model paths
base_model_name = "Trendyol/Trendyol-LLM-7b-chat-v0.1"
adapter_path = "./trendyol-turkish-law-lora-final"

print("Loading base model...")
tokenizer = AutoTokenizer.from_pretrained(base_model_name)
base_model = AutoModelForCausalLM.from_pretrained(
    base_model_name,
    quantization_config=bnb_config,
    device_map="auto"
)

print("Loading LoRA adapters...")
model = PeftModel.from_pretrained(base_model, adapter_path)

print("✓ Fine-tuned model loaded successfully\n")

Loading base model...


Loading checkpoint shards:   0%|          | 0/6 [00:00<?, ?it/s]

Loading LoRA adapters...
✓ Fine-tuned model loaded successfully



### 10.1 Single Example Test

Test a single question to see model performance.

In [2]:
# Test question
test_question = "Trafik cezalarına itiraz süreci nasıl işler?"

messages = [
    {"role": "user", "content": test_question}
]

text = tokenizer.apply_chat_template(messages, tokenize=False, add_generation_prompt=True)
model_inputs = tokenizer([text], return_tensors="pt").to(model.device)

print(f"Question: {test_question}\n")
print("Generating response...")

generated_ids = model.generate(
    **model_inputs,
    max_new_tokens=1024,
    do_sample=True
)

generated_ids = [
    output_ids[len(input_ids):] for input_ids, output_ids in zip(model_inputs.input_ids, generated_ids)
]

response = tokenizer.batch_decode(generated_ids, skip_special_tokens=True)[0]

print("\nFine-tuned model response:")
print(response)

Question: Trafik cezalarına itiraz süreci nasıl işler?

Generating response...

Fine-tuned model response:
Trafik cezalarına itiraz süreci, cezanın tebliğinden itibaren 15 gün içinde sulh ceza hakimliğine başvurarak yapılabilir. 


### 10.2 Test Set Evaluation

Evaluate model performance on the full test set (1500 examples).

In [7]:
# Evaluate on test set
from tqdm import tqdm
from datasets import load_dataset

# Load test data
test_set = load_dataset("Renicames/turkish-law-chatbot", split="test")

print(f"Evaluating on {len(test_set)} test examples...")

model.eval()
total_loss = 0
num_samples = 0

for example in tqdm(test_set):
    messages = [
        {"role": "user", "content": example['Soru']},
        {"role": "assistant", "content": example['Cevap']}
    ]
    text = tokenizer.apply_chat_template(messages, tokenize=False)
    inputs = tokenizer(text, return_tensors="pt").to(model.device)
    
    with torch.no_grad():
        outputs = model(**inputs, labels=inputs.input_ids)
        total_loss += outputs.loss.item()
        num_samples += 1

avg_loss = total_loss / num_samples
perplexity = torch.exp(torch.tensor(avg_loss)).item()

print(f"\n{'='*50}")
print(f"Test Loss: {avg_loss:.4f}")
print(f"Perplexity: {perplexity:.4f}")
print(f"{'='*50}")

Evaluating on 1500 test examples...


100%|██████████| 1500/1500 [09:42<00:00,  2.58it/s]


Test Loss: 0.7859
Perplexity: 2.1945





## 11. Summary

### Training Results
- **Training Loss**: 0.818 (final)
- **Training Time**: ~4.7 hours (3340 steps)
- **Trainable Parameters**: 16.7M (0.24% of total)

### Evaluation Results
- **Test Loss**: 0.7859
- **Perplexity**: 2.19 (lower is better)
- **Model Size**: 33MB (LoRA adapters only)

### Key Achievements
- ✅ Successfully fine-tuned 7B model on 8GB VRAM
- ✅ Model learned Turkish legal terminology
- ✅ Improved response quality (no more repetition loops)
- ✅ Low perplexity indicates good language modeling

### Next Steps
- Deploy model as API or chatbot
- Fine-tune further on domain-specific data
- Experiment with different LoRA ranks
- Test on edge cases and complex legal queries