In [1]:
from qwen_finetuning import QwenFineTuningConfig, QwenFineTuning

In [2]:
config = QwenFineTuningConfig(
    model_name="Qwen/Qwen3-8B",
    train_file="data/train.jsonl",
    output_dir="./results_optimized",
    
    # Batch configuration - GOOD ✓
    batch_size=1,                    # Appropriate for 24GB VRAM
    gradient_accumulation_steps=16,  # Effective batch size = 16 ✓
    
    # Learning rate - GOOD ✓
    learning_rate=2e-4,              # Optimal for LoRA according to research
    
    # Training duration - CONSIDER INCREASE
    num_epochs=2,                    # Increased from 2 to 3 for 90K samples
    
    # Sequence length - GOOD ✓
    max_length=512,                  # Sufficient for Italian Q&A
    
    # LoRA hyperparameters - MOSTLY GOOD
    lora_r=16,                       # Good rank ✓
    lora_alpha=32,                   # Follows 2*r rule ✓  
    lora_dropout=0.1,                # IMPROVED: Increased from 0.05 to 0.1
    
    # CRITICAL ADDITIONS
    max_grad_norm=1.0,               # ADDED: Gradient clipping for stability
    
    # Performance optimizations - GOOD ✓
    dataset_num_proc=4,
    dataloader_num_workers=4,
    torch_empty_cache_steps=4,
)

In [3]:
config.print_config()


✓ Configuration set
Model: Qwen/Qwen3-8B
Learning rate: 0.0002
Max gradient norm: 1.0
Batch size: 1
Effective batch size: 16
Dataset processing cores: 4
Cache writer batch size: 500
DataLoader workers: 4
DataLoader optimizations: pin_memory=True, persistent_workers=True
GPU cache management: empty every 4 steps
Training stability: gradient clipping enabled at 1.0


In [4]:
# Create fine-tuning instance
finetuner = QwenFineTuning(config)


✓ Environment loaded, HF token available


In [5]:
# Load training data
train_data = finetuner.load_jsonl(config.train_file)


In [6]:
finetuner.run_complete_finetuning(train_data=train_data)


Train Dataset: 86929 examples
Categories: unknown(86929)
Answer distribution: A(24205), B(24441), C(24625), D(11146), E(2512)
Loading model and tokeniser...


config.json:   0%|          | 0.00/728 [00:00<?, ?B/s]

model.safetensors.index.json: 0.00B [00:00, ?B/s]

Fetching 5 files:   0%|          | 0/5 [00:00<?, ?it/s]

model-00001-of-00005.safetensors:   0%|          | 0.00/4.00G [00:00<?, ?B/s]

model-00003-of-00005.safetensors:   0%|          | 0.00/3.96G [00:00<?, ?B/s]

model-00004-of-00005.safetensors:   0%|          | 0.00/3.19G [00:00<?, ?B/s]

model-00002-of-00005.safetensors:   0%|          | 0.00/3.99G [00:00<?, ?B/s]

model-00005-of-00005.safetensors:   0%|          | 0.00/1.24G [00:00<?, ?B/s]

Loading checkpoint shards:   0%|          | 0/5 [00:00<?, ?it/s]

generation_config.json:   0%|          | 0.00/239 [00:00<?, ?B/s]

tokenizer_config.json: 0.00B [00:00, ?B/s]

vocab.json: 0.00B [00:00, ?B/s]

merges.txt: 0.00B [00:00, ?B/s]

tokenizer.json:   0%|          | 0.00/11.4M [00:00<?, ?B/s]

Trainable parameters:
trainable params: 43,646,976 || all params: 8,234,382,336 || trainable%: 0.5301
✓ Using target_modules='all-linear' for optimal performance

Example prompt format:
<|im_start|>user
Domanda: Il diario clinico ha lo scopo di:...

A) Permettere la ricostruzione del decorso clinico del residente documentando le scelt...

Optimizations enabled:
  - Dataset processing: 4 CPU cores
  - Memory-efficient caching: batch size 500
  - Optimized DataLoader: 4 workers, pin_memory, persistent_workers
  - GPU memory management: cache clearing every 4 steps
  - Gradient clipping: max_grad_norm=1.0 for training stability
Setting up trainer with optimized DataLoader configuration and gradient clipping...
✓ Processing and caching dataset with memory optimization...
Formatting dataset with 4 processes (memory: 18.5GB)...


Formatting with chat templates (num_proc=4):   0%|          | 0/86929 [00:00<?, ? examples/s]

✓ Formatting complete (memory: 18.5GB → 18.7GB)
✓ Processing complete (memory: 18.3GB → 18.7GB)
✓ Saving to cache with optimized batch size (500)...


Saving the dataset (0/1 shards):   0%|          | 0/86929 [00:00<?, ? examples/s]

✓ Dataset cached efficiently to: cache/processed_datasets/342ab7c6db43e6f8df6ed1e851ed55d9
✓ Total memory usage: 18.3GB → 18.7GB
✓ Training optimizations enabled:
  - DataLoader workers: 4 (parallel data loading)
  - Pin memory: True (faster GPU transfer)
  - Persistent workers: True (reduced startup overhead)
  - GPU cache clearing: every 4 steps
  - Gradient clipping: max_grad_norm=1.0 (training stability)




Adding EOS to train dataset:   0%|          | 0/86929 [00:00<?, ? examples/s]

Tokenizing train dataset:   0%|          | 0/86929 [00:00<?, ? examples/s]

Packing train dataset:   0%|          | 0/86929 [00:00<?, ? examples/s]

Detected kernel version 5.4.0, which is below the recommended minimum of 5.5.0; this can cause the process to hang. It is recommended to upgrade the kernel to the minimum version or higher.
No label_names provided for model class `PeftModelForCausalLM`. Since `PeftModel` hides base models input arguments, if label_names is not given, label_names can't be set automatically within `Trainer`. Note that empty label_names list will be used instead.


✓ Trainer configured with gradient clipping for improved stability
Starting training with gradient clipping enabled...


`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`.


Step,Training Loss
20,2.4077
40,1.6102
60,1.2773
80,1.144
100,1.1531
120,1.1308
140,1.1208
160,1.1036
180,1.1023
200,1.1066


Saving model...
✓ Training completed


In [7]:
print(f"\n1-epoch fine-tuning completed successfully")
print(f"Model saved to: {config.output_dir}")


1-epoch fine-tuning completed successfully
Model saved to: ./results_optimized
