## Finetune LLM

### In this notebook you can find whole LLM finetuning process using Unsloth library for Nvidia GPUs

In [1]:
from unsloth import FastLanguageModel, UnslothTrainingArguments, UnslothTrainer
import torch
from datasets import load_dataset
import torch
import wandb
import os

🦥 Unsloth: Will patch your computer to enable 2x faster free finetuning.
Unsloth: OpenAI failed to import - ignoring for now.
🦥 Unsloth Zoo will now patch everything to make training faster!


For results logging use wandb

In [2]:
wandb.login()

[34m[1mwandb[0m: Currently logged in as: [33mlysun-pn[0m ([33mlysun-pn-ukrainian-catholic-university[0m) to [32mhttps://api.wandb.ai[0m. Use [1m`wandb login --relogin`[0m to force relogin


True

Configuration:


In [3]:
model_name = "meta-llama/Llama-2-7b-chat-hf"
dataset_path = "../../data/taras_shevchenko_poetry.jsonl"
output_dir = "../../shevchenko_finetuned_lora"
lora_rank = 16
lora_alpha = lora_rank # Scaling factor, often set equal to rank
max_seq_length = 1024

Load LLM from pretrained:

In [4]:
model, tokenizer = FastLanguageModel.from_pretrained(
    model_name=model_name,
    max_seq_length=max_seq_length,
    dtype=None, # Auto-detects bf16 on 4090/A100, fp16 on others
    load_in_4bit=True, # Enable QLoRA
)

==((====))==  Unsloth 2025.4.3: Fast Llama patching. Transformers: 4.51.3.
   \\   /|    NVIDIA GeForce RTX 4060 Ti. Num GPUs = 1. Max memory: 15.697 GB. Platform: Linux.
O^O/ \_/ \    Torch: 2.7.0+cu126. CUDA: 8.9. CUDA Toolkit: 12.6. Triton: 3.3.0
\        /    Bfloat16 = TRUE. FA [Xformers = 0.0.30. FA2 = False]
 "-____-"     Free license: http://github.com/unslothai/unsloth
Unsloth: Fast downloading is enabled - ignore downloading bars which are red colored!


Apply unsloth LoRA adapters

In [5]:
model = FastLanguageModel.get_peft_model(
    model,
    r=lora_rank,
    lora_alpha=lora_alpha,
    target_modules=[
        "q_proj", "k_proj", "v_proj", "o_proj",
        "gate_proj", "up_proj", "down_proj",
    ],
    lora_dropout=0,
    bias="none",
    use_gradient_checkpointing=True, # Saves memory by recomputing activations
    random_state=3407,
    use_rslora=False,
    loftq_config=None,
)

Unsloth 2025.4.3 patched 32 layers with 32 QKV layers, 32 O layers and 32 MLP layers.


Load dataset:


In [6]:
# load dataset into test and train splits
dataset = load_dataset("json", data_files=dataset_path, split="train")
train_dataset, eval_dataset = dataset.train_test_split(test_size=0.1, seed=42, shuffle=True).values()

Add training arguments for better performance

In [7]:
training_args = UnslothTrainingArguments(
    output_dir=output_dir,
    per_device_train_batch_size=8,
    gradient_accumulation_steps=4, # Increase to simulate larger batch size if VRAM is limited
    warmup_steps=10,
    num_train_epochs=5,
    learning_rate=5e-4,
    fp16=not torch.cuda.is_bf16_supported(),
    bf16=torch.cuda.is_bf16_supported(),
    logging_steps=5, # Log metrics every 5 steps
    optim="adamw_8bit", # Memory-efficient optimizer
    weight_decay=0.01,
    lr_scheduler_type="linear",
    seed=3407,
    # --- Checkpointing ---
    save_strategy="steps", # Save checkpoints periodically
    save_steps=50, # Save a checkpoint every 50 steps (adjust as needed)
    save_total_limit=2, # Keep only the latest 2 checkpoints + the final one
    # --- Evaluation (Optional) ---
    eval_steps=50 if eval_dataset else None,
    # --- Logging ---
    report_to=["tensorboard"], # Log to TensorBoard (or add "wandb")
)

Init the trainer:

In [8]:

trainer = UnslothTrainer(
    model=model,
    tokenizer=tokenizer,
    train_dataset=train_dataset,
    eval_dataset=eval_dataset,
    dataset_text_field="text",
    max_seq_length=max_seq_length,
    args=training_args,
    packing=False, # Set True if you want to pack multiple short sequences into one - saves compute but complex
)


Unsloth: Tokenizing ["text"] (num_proc=8):   0%|          | 0/207 [00:00<?, ? examples/s]

Unsloth: Tokenizing ["text"] (num_proc=8):   0%|          | 0/23 [00:00<?, ? examples/s]

Train adapter using trainer:

In [9]:
trainer.train()

==((====))==  Unsloth - 2x faster free finetuning | Num GPUs used = 1
   \\   /|    Num examples = 207 | Num Epochs = 5 | Total steps = 30
O^O/ \_/ \    Batch size per device = 8 | Gradient accumulation steps = 4
\        /    Data Parallel GPUs = 1 | Total batch size (8 x 4 x 1) = 32
 "-____-"     Trainable parameters = 39,976,960/7,000,000,000 (0.57% trained)


Unsloth: Will smartly offload gradients to save VRAM!


Step,Training Loss
5,3.5059
10,3.1403
15,2.8917
20,2.7164
25,2.5928
30,2.4552


TrainOutput(global_step=30, training_loss=2.8837226231892905, metrics={'train_runtime': 1163.401, 'train_samples_per_second': 0.89, 'train_steps_per_second': 0.026, 'total_flos': 3.539777974566912e+16, 'train_loss': 2.8837226231892905})

Save the adapter:

In [10]:


final_adapter_path = os.path.join(output_dir, "final_adapter")
model.save_pretrained(final_adapter_path)
tokenizer.save_pretrained(final_adapter_path)
print(f"Final LoRA adapter saved to {final_adapter_path}")

# You can also save the full model if needed, but usually just the adapter is fine
# model.save_pretrained_merged("shevchenko_full_model", tokenizer, save_method = "merged_16bit")


Final LoRA adapter saved to ../../shevchenko_finetuned_lora/final_adapter


In [11]:
prompt = "User: Як умру, то поховайте \n Agent: "
inputs = tokenizer(prompt, return_tensors="pt").to(model.device)
# add adapter 


In [12]:
# inference model and decoding
outputs = model.generate(**inputs,
                        #  do_sample=True, 
                        #  temperature=0.9, 
                        #  top_p=0.95, 
                        #  top_k=50,
                        #  num_return_sequences=1,
                         max_new_tokens=100,
                        #  pad_token_id=tokenizer.eos_token_id
                        )
decoded_output = tokenizer.decode(outputs[0], skip_special_tokens=True)
print(decoded_output)

User: Як умру, то поховайте 
 Agent:  Як умру, то поховайте,
Ні, не поховайте,
Поховайте, не поховайте,
То поховайте в неволі.
І не ходіте похилитись,
Щоб не вийшло.
Похилитесь, не похилитесь,
Щоб не вийшло.
І не говорите, що
