# English to Swedish Poetry Translation with Unsloth

This notebook fine-tunes a language model using Unsloth to translate English poetry to Swedish.

## Hardware Requirements
- GPU: RTX 3060 (12GB) or better
- RAM: 16GB+ recommended

## Dataset
- Training data: `english_to_swedish_poetry_translation.json`
- **1884 translation examples** from **111 poems**
- Format: Alpaca (instruction, input, output)
- **Modern Swedish Poets (1940-1990 style):**
  - Tomas TranstrÃ¶mer (Nobel Prize 2011)
  - Harry Martinson (Nobel Prize 1974)
  - Gunnar EkelÃ¶f
  - Werner AspenstrÃ¶m
  - Karin Boye
  - Lars Gustafsson
- Plus classic poets: Viktor Rydberg, Verner von Heidenstam, Esaias TegnÃ©r
- Multiple granularities: full poems, stanzas, and multi-line excerpts

## 1. Install Dependencies

In [1]:
%pip install "unsloth[colab-new] @ git+https://github.com/unslothai/unsloth.git"
%pip install torchvision

Collecting unsloth@ git+https://github.com/unslothai/unsloth.git (from unsloth[colab-new]@ git+https://github.com/unslothai/unsloth.git)
  Cloning https://github.com/unslothai/unsloth.git to /tmp/pip-install-99bfmy7b/unsloth_04a79c7939224f498b82064221bfc26c
  Running command git clone --filter=blob:none --quiet https://github.com/unslothai/unsloth.git /tmp/pip-install-99bfmy7b/unsloth_04a79c7939224f498b82064221bfc26c
  Resolved https://github.com/unslothai/unsloth.git to commit e51d3ea2e498fc893770d92ca6727bd113918480
  Installing build dependencies ... [?25ldone
[?25h  Getting requirements to build wheel ... [?25ldone
[?25h  Preparing metadata (pyproject.toml) ... [?25ldone
Note: you may need to restart the kernel to use updated packages.
Note: you may need to restart the kernel to use updated packages.


## 2. Import Libraries

In [2]:
from unsloth import FastLanguageModel
import torch
import json
from datasets import Dataset
from trl import SFTTrainer
from transformers import TrainingArguments
import os

ðŸ¦¥ Unsloth: Will patch your computer to enable 2x faster free finetuning.


  from .autonotebook import tqdm as notebook_tqdm


ðŸ¦¥ Unsloth Zoo will now patch everything to make training faster!


## 3. Configuration

In [None]:
# Model configuration
max_seq_length = 2048  # Unsloth supports RoPE Scaling internally
dtype = None  # None for auto detection. Float16 for Tesla T4, V100, Bfloat16 for Ampere+
load_in_4bit = True  # Use 4bit quantization to reduce memory usage

# Training configuration - IMPROVED for better quality
EPOCHS = 4  # Increased from 2 for larger dataset (1884 examples)
BATCH_SIZE = 2
GRADIENT_ACCUMULATION_STEPS = 4
LEARNING_RATE = 1e-4  # Reduced for more stable training with larger dataset
MAX_STEPS = -1  # Set to -1 for full training
WARMUP_STEPS = 50  # Increased warmup for larger dataset

# Train/validation split
VALIDATION_SPLIT = 0.05  # Use 5% for validation

# Paths
DATA_PATH = "../data/english_to_swedish_poetry_translation.json"
OUTPUT_DIR = "./outputs/translation_model"

print(f"Configuration:")
print(f"  Epochs: {EPOCHS}")
print(f"  Batch size: {BATCH_SIZE}")
print(f"  Gradient accumulation: {GRADIENT_ACCUMULATION_STEPS}")
print(f"  Effective batch size: {BATCH_SIZE * GRADIENT_ACCUMULATION_STEPS}")
print(f"  Learning rate: {LEARNING_RATE}")
print(f"  Warmup steps: {WARMUP_STEPS}")
print(f"  Validation split: {VALIDATION_SPLIT * 100}%")

## 4. Load Model

In [4]:
# Load model with Unsloth optimizations
model, tokenizer = FastLanguageModel.from_pretrained(
    model_name="unsloth/llama-3.2-3b-instruct",  # Choose from Unsloth's optimized models
    max_seq_length=max_seq_length,
    dtype=dtype,
    load_in_4bit=load_in_4bit,
)

print(f"Model loaded: {model.config.model_type}")
print(f"Vocabulary size: {len(tokenizer)}")

==((====))==  Unsloth 2026.1.4: Fast Llama patching. Transformers: 4.57.6.
   \\   /|    NVIDIA GeForce RTX 3060. Num GPUs = 1. Max memory: 11.633 GB. Platform: Linux.
O^O/ \_/ \    Torch: 2.10.0+cu128. CUDA: 8.6. CUDA Toolkit: 12.8. Triton: 3.6.0
\        /    Bfloat16 = TRUE. FA [Xformers = None. FA2 = False]
 "-____-"     Free license: http://github.com/unslothai/unsloth
Unsloth: Fast downloading is enabled - ignore downloading bars which are red colored!
Model loaded: llama
Vocabulary size: 128256


## 5. Configure LoRA for Fine-tuning

In [None]:
# Add LoRA adapters for efficient fine-tuning
# IMPROVED: Higher rank for better quality with larger dataset
model = FastLanguageModel.get_peft_model(
    model,
    r=32,  # LoRA rank - increased from 16 for better quality
    target_modules=["q_proj", "k_proj", "v_proj", "o_proj",
                    "gate_proj", "up_proj", "down_proj"],
    lora_alpha=32,  # Increased to match rank
    lora_dropout=0,  # Supports any, but = 0 is optimized
    bias="none",     # Supports any, but = "none" is optimized
    use_gradient_checkpointing="unsloth",  # True or "unsloth" for very long context
    random_state=3407,
    use_rslora=True,  # Enable Rank Stabilized LoRA for better training
    loftq_config=None, # LoftQ
)

print("LoRA adapters configured with rank=32 and RSLoRA enabled")

## 6. Load and Prepare Dataset

In [6]:
# Load the alpaca-formatted JSON data
with open(DATA_PATH, 'r', encoding='utf-8') as f:
    data = json.load(f)

print(f"Loaded {len(data)} training examples")
print(f"\nExample entry:")
print(f"Instruction: {data[0]['instruction']}")
print(f"Input: {data[0]['input'][:100]}...")
print(f"Output: {data[0]['output'][:100]}...")

Loaded 790 training examples

Example entry:
Instruction: Translate the following English poem to Swedish.
Input: Invocation

O Muse! my foam-born sister!
Thou only god, at whose altar
I trust and offer.
Thou, who ...
Output: Ã…kallan

O musa! Min skum-fÃ¶dda syster!
Du enda gud, till hvars altare
jag tror och offrar.
Du, som ...


In [7]:
# Define the alpaca prompt template
alpaca_prompt = """Below is an instruction that describes a task, paired with an input that provides further context. Write a response that appropriately completes the request.

### Instruction:
{}

### Input:
{}

### Response:
{}"""

EOS_TOKEN = tokenizer.eos_token  # Must add EOS_TOKEN

def formatting_prompts_func(examples):
    instructions = examples["instruction"]
    inputs       = examples["input"]
    outputs      = examples["output"]
    texts = []
    for instruction, input, output in zip(instructions, inputs, outputs):
        # Must add EOS_TOKEN, otherwise generation will go on forever
        text = alpaca_prompt.format(instruction, input, output) + EOS_TOKEN
        texts.append(text)
    return {"text": texts}

# Convert to HuggingFace Dataset
dataset = Dataset.from_list(data)
dataset = dataset.map(formatting_prompts_func, batched=True)

# Split into train and validation
dataset_split = dataset.train_test_split(test_size=VALIDATION_SPLIT, seed=3407)
train_dataset = dataset_split['train']
eval_dataset = dataset_split['test']

print(f"Total examples: {len(dataset)}")
print(f"Training examples: {len(train_dataset)}")
print(f"Validation examples: {len(eval_dataset)}")
print(f"\nFormatted example (first 500 chars):")
print(train_dataset[0]['text'][:500])

Map: 100%|â–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆ| 790/790 [00:00<00:00, 90057.90 examples/s]

Total examples: 790
Training examples: 750
Validation examples: 40

Formatted example (first 500 chars):
Below is an instruction that describes a task, paired with an input that provides further context. Write a response that appropriately completes the request.

### Instruction:
Translate the following English text to Swedish.

### Input:
The Trumpets
I am the trumpeter

### Response:
Trumpetaren
Jag Ã¤r trumpetaren<|eot_id|>





## 7. Configure Training

In [None]:
trainer = SFTTrainer(
    model=model,
    tokenizer=tokenizer,
    train_dataset=train_dataset,
    eval_dataset=eval_dataset,
    dataset_text_field="text",
    max_seq_length=max_seq_length,
    dataset_num_proc=2,
    packing=False,  # Can make training 5x faster for short sequences
    args=TrainingArguments(
        per_device_train_batch_size=BATCH_SIZE,
        per_device_eval_batch_size=BATCH_SIZE,
        gradient_accumulation_steps=GRADIENT_ACCUMULATION_STEPS,
        warmup_steps=WARMUP_STEPS,
        max_steps=MAX_STEPS,
        num_train_epochs=EPOCHS,
        learning_rate=LEARNING_RATE,
        fp16=not torch.cuda.is_bf16_supported(),
        bf16=torch.cuda.is_bf16_supported(),
        logging_steps=10,  # Log every 10 steps
        eval_strategy="steps",  # Evaluate during training
        eval_steps=100,  # Evaluate every 100 steps (larger dataset)
        save_strategy="steps",  # Save during training
        save_steps=100,  # Save every 100 steps
        optim="adamw_8bit",
        weight_decay=0.01,
        lr_scheduler_type="cosine",  # Cosine schedule for smoother training
        seed=3407,
        output_dir=OUTPUT_DIR,
        save_total_limit=3,  # Keep 3 checkpoints
        load_best_model_at_end=True,  # Load best model after training
        metric_for_best_model="eval_loss",
    ),
)

print("Trainer configured")
print(f"Effective batch size: {BATCH_SIZE * GRADIENT_ACCUMULATION_STEPS}")
print(f"Total training steps per epoch: {len(train_dataset) // (BATCH_SIZE * GRADIENT_ACCUMULATION_STEPS)}")
print(f"Total training steps: {len(train_dataset) // (BATCH_SIZE * GRADIENT_ACCUMULATION_STEPS) * EPOCHS}")
print(f"Evaluation every 100 steps")

## 8. Train the Model

In [9]:
# Show GPU stats before training
gpu_stats = torch.cuda.get_device_properties(0)
start_gpu_memory = round(torch.cuda.max_memory_reserved() / 1024 / 1024 / 1024, 3)
max_memory = round(gpu_stats.total_memory / 1024 / 1024 / 1024, 3)
print(f"GPU = {gpu_stats.name}. Max memory = {max_memory} GB.")
print(f"{start_gpu_memory} GB of memory reserved.")

GPU = NVIDIA GeForce RTX 3060. Max memory = 11.633 GB.
3.07 GB of memory reserved.


In [10]:
# Start training
trainer_stats = trainer.train()

# Show final memory and time stats
used_memory = round(torch.cuda.max_memory_reserved() / 1024 / 1024 / 1024, 3)
used_memory_for_lora = round(used_memory - start_gpu_memory, 3)
used_percentage = round(used_memory / max_memory * 100, 3)
lora_percentage = round(used_memory_for_lora / max_memory * 100, 3)

print(f"Peak reserved memory = {used_memory} GB.")
print(f"Peak reserved memory for training = {used_memory_for_lora} GB.")
print(f"Peak reserved memory % of max memory = {used_percentage} %.")
print(f"Peak reserved memory for training % of max memory = {lora_percentage} %.")

The model is already on multiple devices. Skipping the move to device specified in `args`.
==((====))==  Unsloth - 2x faster free finetuning | Num GPUs used = 1
   \\   /|    Num examples = 750 | Num Epochs = 2 | Total steps = 188
O^O/ \_/ \    Batch size per device = 2 | Gradient accumulation steps = 4
\        /    Data Parallel GPUs = 1 | Total batch size (2 x 4 x 1) = 8
 "-____-"     Trainable parameters = 24,313,856 of 3,237,063,680 (0.75% trained)


Step,Training Loss,Validation Loss
50,0.9516,0.961267
100,0.6287,0.750344
150,0.5133,0.618352


Unsloth: Not an error, but LlamaForCausalLM does not accept `num_items_in_batch`.
Using gradient accumulation will be very slightly less accurate.
Read more on gradient accumulation issues here: https://unsloth.ai/blog/gradient


Peak reserved memory = 6.225 GB.
Peak reserved memory for training = 3.155 GB.
Peak reserved memory % of max memory = 53.512 %.
Peak reserved memory for training % of max memory = 27.121 %.


## 9. Test the Model

In [None]:
# Enable native 2x faster inference
FastLanguageModel.for_inference(model)

# Test translation with modern poetry style
test_instruction = "Translate the following English poem to Swedish."
test_input = """Two o'clock: moonlight. The train has stopped
out in the middle of the plain. Far away, points of light in a town,
flickering cold at the horizon.

As when someone has gone into a dream so deep
she'll never remember having been there
when she returns to her room."""

# Format the input
prompt = alpaca_prompt.format(
    test_instruction,
    test_input,
    "",  # output - leave blank for generation
)

inputs = tokenizer([prompt], return_tensors="pt").to("cuda")

print("Testing translation...\n")
print(f"Input English text:\n{test_input}\n")
print("=" * 50)

outputs = model.generate(
    **inputs,
    max_new_tokens=256,
    use_cache=True,
    temperature=0.7,
    top_p=0.9,
)

decoded_output = tokenizer.batch_decode(outputs, skip_special_tokens=True)[0]

# Extract just the response part
response = decoded_output.split("### Response:")[-1].strip()

print(f"\nSwedish translation:\n{response}")

## 10. More Test Examples

In [None]:
def translate_to_swedish(english_text):
    """Helper function to translate English poetry to Swedish"""
    prompt = alpaca_prompt.format(
        "Translate the following English text to Swedish.",
        english_text,
        "",
    )
    
    inputs = tokenizer([prompt], return_tensors="pt").to("cuda")
    outputs = model.generate(
        **inputs,
        max_new_tokens=256,
        use_cache=True,
        temperature=0.7,
        top_p=0.9,
    )
    
    decoded = tokenizer.batch_decode(outputs, skip_special_tokens=True)[0]
    response = decoded.split("### Response:")[-1].strip()
    return response

# Test with modern poetry examples
test_examples = [
    "I believe in the solitary human being, in her who walks alone.",
    "Spring lies desolate. The velvet-dark ditch crawls by my side without reflections.",
    "The snow falls slowly over the sleeping houses.",
    "There is a place beyond words where language cannot reach.",
    "Yes, of course it hurts when buds are breaking.",
]

print("Testing multiple translations:\n")
print("=" * 70)

for i, test in enumerate(test_examples, 1):
    print(f"\nTest {i}:")
    print(f"English: {test}")
    translation = translate_to_swedish(test)
    print(f"Swedish: {translation}")
    print("-" * 70)

## 11. Save the Model

In [13]:
# Save LoRA adapters only (much smaller)
model.save_pretrained("translation_model_lora")
tokenizer.save_pretrained("translation_model_lora")

print("LoRA adapters saved to: translation_model_lora/")

LoRA adapters saved to: translation_model_lora/


In [14]:
# Optional: Save merged model (base + LoRA) in 16bit
# This creates a standalone model that doesn't need LoRA adapters
model.save_pretrained_merged(
    "translation_model_merged_16bit",
    tokenizer,
    save_method="merged_16bit",
)

print("Merged 16-bit model saved to: translation_model_merged_16bit/")

Found HuggingFace hub cache directory: /home/johan/.cache/huggingface/hub
Checking cache directory for required files...
Cache check failed: model-00001-of-00002.safetensors not found in local cache.
Not all required files found in cache. Will proceed with downloading.
Checking cache directory for required files...
Cache check failed: tokenizer.model not found in local cache.
Not all required files found in cache. Will proceed with downloading.


Unsloth: Preparing safetensor model files: 100%|â–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆ| 2/2 [00:00<00:00, 3748.26it/s]


Note: tokenizer.model not found (this is OK for non-SentencePiece models)


Unsloth: Merging weights into 16bit: 100%|â–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆ| 2/2 [00:51<00:00, 25.97s/it]


Unsloth: Merge process complete. Saved to `/home/johan/git/Trainingdata/notebooks/translation_model_merged_16bit`
Merged 16-bit model saved to: translation_model_merged_16bit/


In [16]:
# Optional: Save as 4-bit quantized GGUF for llama.cpp
# Useful for running locally with CPU or smaller GPUs
model.save_pretrained_gguf(
    "translation_model",
    tokenizer,
    quantization_method="q4_k_m",
)

print("GGUF model saved to: translation_model/")

Unsloth: Merging model weights to 16-bit format...
Found HuggingFace hub cache directory: /home/johan/.cache/huggingface/hub
Checking cache directory for required files...
Cache check failed: model-00001-of-00002.safetensors not found in local cache.
Not all required files found in cache. Will proceed with downloading.
Checking cache directory for required files...
Cache check failed: tokenizer.model not found in local cache.
Not all required files found in cache. Will proceed with downloading.


Unsloth: Preparing safetensor model files: 100%|â–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆ| 2/2 [00:00<00:00, 1118.18it/s]


Note: tokenizer.model not found (this is OK for non-SentencePiece models)


Unsloth: Merging weights into 16bit: 100%|â–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆ| 2/2 [01:02<00:00, 31.39s/it]


Unsloth: Merge process complete. Saved to `/home/johan/git/Trainingdata/notebooks/translation_model`
Unsloth: Converting to GGUF format...
==((====))==  Unsloth: Conversion from HF to GGUF information
   \\   /|    [0] Installing llama.cpp might take 3 minutes.
O^O/ \_/ \    [1] Converting HF to GGUF bf16 might take 3 minutes.
\        /    [2] Converting GGUF bf16 to ['q4_k_m'] might take 10 minutes each.
 "-____-"     In total, you will have to wait at least 16 minutes.

Unsloth: Installing llama.cpp. This might take 3 minutes...
Unsloth: llama.cpp folder exists but binaries not found - will rebuild
Unsloth: Updating system package directories
Unsloth: All required system packages already installed!
Unsloth: Install llama.cpp and building - please wait 1 to 3 minutes
Unsloth: Install GGUF and other packages


RuntimeError: Unsloth: GGUF conversion failed: [FAIL] Command `pip install gguf protobuf sentencepiece mistral_common` failed with exit code 1
stdout: [1;31merror[0m: [1mexternally-managed-environment[0m

[31mÃ—[0m This environment is externally managed
[31mâ•°â”€>[0m To install Python packages system-wide, try apt install
[31m   [0m python3-xyz, where xyz is the package you are trying to
[31m   [0m install.
[31m   [0m 
[31m   [0m If you wish to install a non-Debian-packaged Python package,
[31m   [0m create a virtual environment using python3 -m venv path/to/venv.
[31m   [0m Then use path/to/venv/bin/python and path/to/venv/bin/pip. Make
[31m   [0m sure you have python3-full installed.
[31m   [0m 
[31m   [0m If you wish to install a non-Debian packaged Python application,
[31m   [0m it may be easiest to use pipx install xyz, which will manage a
[31m   [0m virtual environment for you. Make sure you have pipx installed.
[31m   [0m 
[31m   [0m See /usr/share/doc/python3.12/README.venv for more information.

[1;35mnote[0m: If you believe this is a mistake, please contact your Python installation or OS distribution provider. You can override this, at the risk of breaking your Python installation or OS, by passing --break-system-packages.
[1;36mhint[0m: See PEP 668 for the detailed specification.

