## Completion finetuning using unsloth

This notebook makes use of unsloth to finetune a model for a completion task.
In this example we will finetune the llama 3.2 base model to generate ascii art. I would recommend using the unsloth library compared to just using the huggingface library as it requires less memory and is faster.

Adapted from unsloth notebooks, if something is broken check on:
https://unsloth.ai/

In [1]:
!pip install git+https://github.com/omarxadel/camel_tools.git
!camel_data -i all
!pip install accelerate peft bitsandbytes transformers trl unsloth unsloth_zoo
!pip install --no-deps vllm


import requests
import re

# vLLM requirements - vLLM breaks Colab due to reinstalling numpy
f = requests.get("https://raw.githubusercontent.com/vllm-project/vllm/refs/heads/main/requirements/common.txt").content
with open("vllm_requirements.txt", "wb") as file:
    file.write(re.sub(rb"(transformers|numpy|xformers)[^\n]{1,}\n", b"", f))

    
!pip install -r vllm_requirements.txt

Collecting git+https://github.com/omarxadel/camel_tools.git
  Cloning https://github.com/omarxadel/camel_tools.git to /tmp/pip-req-build-6yc22mkh
  Running command git clone --filter=blob:none --quiet https://github.com/omarxadel/camel_tools.git /tmp/pip-req-build-6yc22mkh
  Resolved https://github.com/omarxadel/camel_tools.git to commit e8e831ed781f2f141a2513e33c8a9c7e5b94554d
  Installing build dependencies ... [?25ldone
[?25h  Getting requirements to build wheel ... [?25ldone
[?25h  Preparing metadata (pyproject.toml) ... [?25ldone
[?25hCollecting camel-kenlm@ git+https://github.com/omarxadel/camel-kenlm.git (from camel_tools==1.5.5)
  Cloning https://github.com/omarxadel/camel-kenlm.git to /tmp/pip-install-gtzrses1/camel-kenlm_c71b1737fbe94818b8bb38c8063ca880
  Running command git clone --filter=blob:none --quiet https://github.com/omarxadel/camel-kenlm.git /tmp/pip-install-gtzrses1/camel-kenlm_c71b1737fbe94818b8bb38c8063ca880
  Resolved https://github.com/omarxadel/camel-ken

### Load base model

In [2]:
from datasets import load_dataset
from unsloth import FastLanguageModel
import torch
import re

# Load the Arabic reasoning dataset
dataset = load_dataset("Omartificial-Intelligence-Space/Arabic_Reasoning_Dataset")

# Load tokenizer and model
model_name = "deepseek-ai/DeepSeek-R1-Distill-Llama-8B"
model, tokenizer = FastLanguageModel.from_pretrained(
    model_name=model_name,
    max_seq_length = 2048,
    dtype = None,
    load_in_4bit = False
)

🦥 Unsloth: Will patch your computer to enable 2x faster free finetuning.
Unsloth: Failed to patch Gemma3ForConditionalGeneration.
🦥 Unsloth Zoo will now patch everything to make training faster!
INFO 04-06 11:29:21 [__init__.py:239] Automatically detected platform cuda.


README.md:   0%|          | 0.00/2.74k [00:00<?, ?B/s]

train-00000-of-00001.parquet:   0%|          | 0.00/2.92M [00:00<?, ?B/s]

Generating train split:   0%|          | 0/9210 [00:00<?, ? examples/s]

==((====))==  Unsloth 2025.3.19: Fast Llama patching. Transformers: 4.51.0. vLLM: 0.8.3.
   \\   /|    NVIDIA L40S. Num GPUs = 1. Max memory: 44.521 GB. Platform: Linux.
O^O/ \_/ \    Torch: 2.6.0+cu124. CUDA: 8.9. CUDA Toolkit: 12.4. Triton: 3.2.0
\        /    Bfloat16 = TRUE. FA [Xformers = 0.0.29.post3. FA2 = False]
 "-____-"     Free license: http://github.com/unslothai/unsloth
Unsloth: Fast downloading is enabled - ignore downloading bars which are red colored!


model.safetensors.index.json:   0%|          | 0.00/23.9k [00:00<?, ?B/s]

Fetching 4 files:   0%|          | 0/4 [00:00<?, ?it/s]

model-00002-of-00004.safetensors:   0%|          | 0.00/5.00G [00:00<?, ?B/s]

model-00003-of-00004.safetensors:   0%|          | 0.00/4.92G [00:00<?, ?B/s]

model-00001-of-00004.safetensors:   0%|          | 0.00/4.98G [00:00<?, ?B/s]

model-00004-of-00004.safetensors:   0%|          | 0.00/1.17G [00:00<?, ?B/s]

Loading checkpoint shards:   0%|          | 0/4 [00:00<?, ?it/s]

generation_config.json:   0%|          | 0.00/236 [00:00<?, ?B/s]

tokenizer_config.json:   0%|          | 0.00/53.0k [00:00<?, ?B/s]

tokenizer.json:   0%|          | 0.00/17.2M [00:00<?, ?B/s]

special_tokens_map.json:   0%|          | 0.00/483 [00:00<?, ?B/s]

In [3]:
tokenizer.clean_up_tokenization_spaces = False

### Create the Morphological Tokenizer Class and tokenize the dataset

In [4]:
from camel_tools.morphology.database import MorphologyDB
from camel_tools.morphology.analyzer import Analyzer
from camel_tools.tokenizers.morphological import MorphologicalTokenizer
from camel_tools.disambig.mle import MLEDisambiguator

mle_msa = MLEDisambiguator.pretrained('calima-msa-r13')
morph_tokenizer = MorphologicalTokenizer(disambiguator=mle_msa, scheme='atbtok')

class CustomArabicTokenizer:
    def __init__(self, base_tokenizer, morph_tokenizer):
        self.base_tokenizer = base_tokenizer
        self.morph_tokenizer = morph_tokenizer

    def __call__(self, text, **kwargs):
        morph_tokens = self.camel_morph_tokenize(text)
        morph_text = ' '.join(morph_tokens)
        return self.base_tokenizer(morph_text, **kwargs)

    def camel_morph_tokenize(self, text):
        if isinstance(text, list):
            text = ' '.join(text)
        elif not isinstance(text, str):
            raise TypeError("Input text must be a string or a list of strings.")
        words = text.split() 
        tokenized_words = self.morph_tokenizer.tokenize(words)
        morph_text = ' '.join(tokenized_words)
        return morph_text

    def tokenize(self, text, **kwargs):
        morph_tokens = self.camel_morph_tokenize(text)
        return self.base_tokenizer.tokenize(morph_text, **kwargs)

    def decode(self, token_ids, **kwargs):
        return self.base_tokenizer.decode(token_ids, **kwargs)


custom_tokenizer = CustomArabicTokenizer(tokenizer, morph_tokenizer)

In [5]:
empty_prompt = """
{input}
"""

EOS_TOKEN = tokenizer.eos_token

def format_samples(instruction, answer):
    text = "### Instruction:\n" + instruction + "\n\n### Response:\n" + answer
    text = re.sub(r'\\d', '', text)
    text = re.sub(r'\\l', '', text)
    text = re.sub(r'\\c', '', text)
    text = re.sub(r'\\s', '', text)
    return text

def tokenize_function(examples):
    # Combine instruction and answer with a prompt template
    instructions = examples['instruction']
    answers = examples['answer']
    training_prompts = []
    for instruction, answer in zip(instructions, answers):
        formatted_example = format_samples(instruction, answer)
        training_prompt = empty_prompt.format(input=formatted_example) + EOS_TOKEN
        tokenized_training_prompt = custom_tokenizer.camel_morph_tokenize(training_prompt)
        training_prompts.append(tokenized_training_prompt)
    # Tokenize the combined text 
    return { "text" : training_prompts, }

In [6]:
# Tokenize the dataset
tokenized_datasets = dataset.map(tokenize_function, batched=True)

Map:   0%|          | 0/9210 [00:00<?, ? examples/s]

### Add lora to base model and patch with Unsloth

In [7]:
# More info about parameters: https://huggingface.co/docs/peft/v0.11.0/en/package_reference/lora#peft.LoraConfig
target_modules =  ["q_proj", "k_proj", "v_proj", "o_proj",
                   "gate_proj", "up_proj", "down_proj"]

# When adding special tokens
train_embeddings = False

if train_embeddings:
  target_modules = target_modules + ["lm_head"]

model = FastLanguageModel.get_peft_model(
    model,
    r = 16, # rank of lora matrices according to paper not much loss when set relatively low
    target_modules = target_modules,  # On which modules of the llm the lora weights are used
    lora_alpha = 16, # scales the weights of the adapters (more influence on base model), 16 was recommended on reddit
    lora_dropout = 0, # Default on 0.05 in tutorial but unsloth says 0 is better
    bias = "none",    # "none" is optimized
    use_gradient_checkpointing = "unsloth", #"unsloth" for very long context, decreases vram
    random_state = 3407,
    use_rslora = False,  # scales lora_alpha with 1/sqrt(r), huggingface says this works better
    loftq_config = None, # And LoftQ
)

Unsloth 2025.3.19 patched 32 layers with 32 QKV layers, 32 O layers and 32 MLP layers.


### Visualize the dataset

In [8]:
for i, sample in enumerate(tokenized_datasets['train']):
    print(f"\n------ Sample {i + 1} ----")
    print(sample)
    if i > 2:
      break


------ Sample 1 ----
{'instruction': 'إذا كان لديك حديقة تحتوي على 12 شجرة، وكل شجرة تنتج 5 تفاحات. إذا تم اقتطاف نصف المحصول، كم تفاحة بقيت على الأشجار؟', 'answer': 'المعطيات:\nأولاً: عدد الأشجار = 12 --- (1)\nثانياً: عدد التفاحات لكل شجرة = 5 --- (2)\nثالثاً: نسبة التفاح المقطوف = نصف --- (3)\nالخطوات:\n1. حساب العدد الكلي للتفاحات: 12 × 5 = 60 تفاحة.\n2. حساب عدد التفاحات المقطوفة: 60 ÷ 2 = 30 تفاحة.\n3. حساب عدد التفاحات المتبقية: 60 - 30 = 30 تفاحة.\nإذن، بقيت 30 تفاحة على الأشجار.', 'text': '### Instruction: إذا كان لدى_+ك حديقة تحتوي على 12 شجرة، و+_كل شجرة تنتج 5 تفاحات. إذا تم اقتطاف نصف المحصول، كم تفاح_+ه بقيت على الأشجار؟ ### Response: المعطيات: أولا: عدد الأشجار = 12 --- (1) ثانيا: عدد التفاحات ل+_كل شجرة = 5 --- (2) ثالثا: نسبة التفاح المقطوف = نصف --- (3) الخطوات: 1. حساب العدد الكلي للتفاحات: 12 × 5 = 60 تفاحة. 2. حساب عدد التفاحات المقطوفة: 60 ÷ 2 = 30 تفاحة. 3. حساب عدد التفاحات المتبقية: 60 - 30 = 30 تفاحة. إذن، بقيت 30 تفاح_+ه على الأشجار. <｜end▁of▁sentence｜>'}

--

In [9]:
from datasets import DatasetDict

train_testvalid = tokenized_datasets['train'].train_test_split(0.1)
test_valid = train_testvalid['test'].train_test_split(0.5)
dataset = DatasetDict({
    'train': train_testvalid['train'],
    'test': test_valid['test'],
    'valid': test_valid['train']})
dataset

DatasetDict({
    train: Dataset({
        features: ['instruction', 'answer', 'text'],
        num_rows: 8289
    })
    test: Dataset({
        features: ['instruction', 'answer', 'text'],
        num_rows: 461
    })
    valid: Dataset({
        features: ['instruction', 'answer', 'text'],
        num_rows: 460
    })
})

### Train the model

In [10]:
from trl import SFTTrainer
from transformers import TrainingArguments
from unsloth import is_bfloat16_supported

trainer = SFTTrainer(
    model = model,
    tokenizer = tokenizer,
    train_dataset = dataset['train'],
    dataset_text_field = "text",
    max_seq_length = 2048,
    dataset_num_proc = 2,
    args = TrainingArguments(
        per_device_train_batch_size = 2,
        gradient_accumulation_steps = 4, # process 4 batches before updating parameters (parameter update == step)
        num_train_epochs = 5, # between 1 - 3 to prevent overfitting
        learning_rate = 2e-4,
        fp16 = not is_bfloat16_supported(),
        bf16 = is_bfloat16_supported(),
        logging_steps = 1,
        optim = "adamw_8bit",
        weight_decay = 0.01,
        lr_scheduler_type = "linear",
        seed = 3407,
        output_dir = "outputs",
        report_to = "none"
    ),
)

Unsloth: Tokenizing ["text"] (num_proc=2):   0%|          | 0/8289 [00:00<?, ? examples/s]

In [11]:
trainer_stats = trainer.train()

==((====))==  Unsloth - 2x faster free finetuning | Num GPUs used = 1
   \\   /|    Num examples = 8,289 | Num Epochs = 5 | Total steps = 5,180
O^O/ \_/ \    Batch size per device = 2 | Gradient accumulation steps = 4
\        /    Data Parallel GPUs = 1 | Total batch size (2 x 4 x 1) = 8
 "-____-"     Trainable parameters = 41,943,040/8,072,204,288 (0.52% trained)


Unsloth: Will smartly offload gradients to save VRAM!


Step,Training Loss
1,2.3135
2,2.1068
3,2.3171
4,1.9089
5,1.6036
6,1.6644
7,1.2481
8,1.1615
9,1.409
10,1.2701


### inference

In [12]:
from transformers import TextStreamer

def generate_responses(model, text):
    FastLanguageModel.for_inference(model)
    inputs = tokenizer(text, return_tensors = "pt").to("cuda")
    text_streamer = TextStreamer(tokenizer)
    for token in model.generate(**inputs, streamer = text_streamer, max_new_tokens = 2048):
        print(token)
        pass

In [13]:
for i, sample in enumerate(dataset['test']):
    print(f"\n------ Sample {i + 1} ----")
    print(sample)
    generate_responses(model, sample['instruction'])
    if i > 2:
      break


------ Sample 1 ----
{'instruction': 'في مزرعة، يوجد 120 دجاجة و80 أرنبًا. إذا كان مجموع أرجل الحيوانات في المزرعة 560 رجلًا، فكم عدد الأبقار في المزرعة؟', 'answer': 'المعطيات:\n1. عدد الدجاج = 120 (لكل دجاجة رجلان)\n2. عدد الأرانب = 80 (لكل أرنب 4 أرجل)\n3. مجموع الأرجل الكلي = 560\n4. لكل بقرة 4 أرجل\n\nالخطوات:\n1. حساب عدد أرجل الدجاج: 120 × 2 = 240\n2. حساب عدد أرجل الأرانب: 80 × 4 = 320\n3. حساب مجموع أرجل الدجاج والأرانب: 240 + 320 = 560\n4. حساب الأرجل المتبقية للأبقار: 560 - 560 = 0\n\nإذن، لا توجد أبقار في المزرعة، وعددها 0.', 'text': '### Instruction: في مزرعة، يوجد 120 دجاج_+ه و80 أرنبا. إذا كان مجموع أرجل الحيوانات في المزرعة 560 رجلا، ف+_كم عدد الأبقار في المزرعة؟ ### Response: المعطيات: 1. عدد الدجاج = 120 (لكل دجاج_+ه رجلان) 2. عدد الأرانب = 80 (لكل أرنب 4 أرجل) 3. مجموع الأرجل الكلي = 560 4. ل+_كل بقر_+ه 4 أرجل الخطوات: 1. حساب عدد أرجل الدجاج: 120 × 2 = 240 2. حساب عدد أرجل الأرانب: 80 × 4 = 320 3. حساب مجموع أرجل الدجاج والأرانب: 240 + 320 = 560 4. حساب الأرجل المتب

## Saving

### Save lora adapter

This is both useful for inference and if you want to load the model again

In [None]:
model.push_to_hub(
    "omarxadel/Arabic-Morph-DeepSeek-R1-Distill-Llama-8B",
    tokenizer, 
    token = ''
)

README.md:   0%|          | 0.00/594 [00:00<?, ?B/s]

adapter_model.safetensors:   0%|          | 0.00/168M [00:00<?, ?B/s]

Saved model to https://huggingface.co/omarxadel/Arabic-Morph-DeepSeek-R1-Distill-Llama-8B
