# Welcome to Our Graduation Project: **"Automatic Story Generation - Hakawaty"**

This notebook provides a detailed walkthrough of the steps involved in fine-tuning Quen2.5-7B to achieve optimal performance in generating high-quality stories. After extensive experimentation with various parameters, we present our approach to create the best possible stories.

This notebook was originally created by Unsloth, but modifications and additional elements have been added to better align with our project purpose and objectives.


# Fine-tuning Qwen-2.5
### **version**: *Qwen-2.5--7B-4bit*


##1. Preparation


####**1.1 Install unsloth**

unsloth is a library that makes  fine-tuning LLMs  x2 faster, use 70% less memory, and with no degradation in accuracy!

In [None]:
%%capture
!pip install unsloth
!pip uninstall unsloth -y && pip install --upgrade --no-cache-dir --no-deps git+https://github.com/unslothai/unsloth.git



Using FastLanguageModel class to enable faster processing using the available functions,
used below for initializing model settings






In [None]:
from unsloth import FastLanguageModel
import torch


max_seq_length = 1024
dtype = None
load_in_4bit = True

pretrained_model, pretrained_tokenizer = FastLanguageModel.from_pretrained(
    model_name="unsloth/Qwen2.5-7B",
    max_seq_length=1024,
    dtype=None,
    load_in_4bit=True,
)


==((====))==  Unsloth 2024.12.12: Fast Qwen2 patching. Transformers: 4.47.1.
   \\   /|    GPU: Tesla T4. Max memory: 14.748 GB. Platform: Linux.
O^O/ \_/ \    Torch: 2.5.1+cu121. CUDA: 7.5. CUDA Toolkit: 12.1. Triton: 3.1.0
\        /    Bfloat16 = FALSE. FA [Xformers = 0.0.29.post1. FA2 = False]
 "-____-"     Free Apache license: http://github.com/unslothai/unsloth
Unsloth: Fast downloading is enabled - ignore downloading bars which are red colored!


####**1.2 Adding LoRA adapters**

 so we only need to update 1 to 10% of all parameters!

In [None]:
model = FastLanguageModel.get_peft_model(
    model,
    r = 8,
    target_modules = ["q_proj", "k_proj", "v_proj", "o_proj",
                      "gate_proj", "up_proj", "down_proj",],
    lora_alpha = 16,
    lora_dropout = 0,
    bias = "none",
    use_gradient_checkpointing = "unsloth",
    random_state = 3407,
    use_rslora = False,
    loftq_config = None,
)

Unsloth 2024.12.12 patched 28 layers with 28 QKV layers, 28 O layers and 28 MLP layers.



##2. Data Preparation

In [None]:
import pandas as pd
from datasets import Dataset


file_path = "/content/merged_file.xlsx"
data = pd.read_excel(file_path)


data = data[["Prompt", "Story"]]


FileNotFoundError: [Errno 2] No such file or directory: '/content/merged_file.xlsx'

#### 2.1 Prepare a dataset in a **converasation format** that is compatible with Quen2.5-7B

In [None]:

template = """Below is an instruction that describes a task. Write a response that appropriately completes the request.

### Instruction:
{}

### Response:
{}"""

def format_for_finetuning(row):
    instruction = row["Prompt"]
    response = row["Story"]

    formatted_text = template.format(instruction, response) + tokenizer.eos_token
    return {"text": formatted_text}

Converting data frame to a hugging face objects which ensures compatibility with tools provided by HuggingFace, such as tokenizers and trainers.

In [None]:

formatted_data = data.apply(format_for_finetuning, axis=1).tolist()


formatted_df = pd.DataFrame(formatted_data)


hf_dataset = Dataset.from_pandas(formatted_df)


print(hf_dataset[0])

#### 2.2 **Split the formatted data** into training, validation, and test sets.
**Split dataset into:**
*   Train: 80%
*   Validation: 10%
*   Test: 10%

In [None]:
from datasets import Dataset
import pandas as pd

def split_data(dataset, train_ratio=0.8, val_ratio=0.1, test_ratio=0.1):


  assert train_ratio + val_ratio + test_ratio == 1.0, "Ratios must sum to 1."

  dataset_size = len(dataset)
  train_size = int(dataset_size * train_ratio)
  val_size = int(dataset_size * val_ratio)
  test_size = dataset_size - train_size - val_size

  train_dataset = dataset.select(range(train_size))
  val_dataset = dataset.select(range(train_size, train_size + val_size))
  test_dataset = dataset.select(range(train_size + val_size, dataset_size))

  return {
      "train": train_dataset,
      "validation": val_dataset,
      "test": test_dataset
  }


split_datasets = split_data(hf_dataset)

train_dataset = split_datasets["train"]
validation_dataset = split_datasets["validation"]
test_dataset = split_datasets["test"]

print("Train dataset size:", len(train_dataset))
print("Validation dataset size:", len(validation_dataset))
print("Test dataset size:", len(test_dataset))

Train dataset size: 896
Validation dataset size: 112
Test dataset size: 112


## 3. Train the model

#### 3.1 Using Huggingface TRL's `SFTTrainer` (Supervised Fine-tuning Trainer)

In [None]:
from trl import SFTTrainer
from transformers import TrainingArguments, EarlyStoppingCallback
from unsloth import is_bfloat16_supported

trainer = SFTTrainer(
    model=model,
    tokenizer=tokenizer,
    train_dataset=train_dataset,
    eval_dataset=validation_dataset,
    dataset_text_field="text",
    max_seq_length=max_seq_length,
    dataset_num_proc=2,
    packing=False,
    args=TrainingArguments(
        per_device_train_batch_size=2,
        gradient_accumulation_steps=4,
        warmup_steps=5,
        num_train_epochs=5,
        learning_rate=2e-4,
        fp16=not is_bfloat16_supported(),
        bf16=is_bfloat16_supported(),
        logging_steps=1,
        evaluation_strategy="steps",
        save_steps=50,
        eval_steps=10,
        optim="adamw_8bit",
        weight_decay=0.05,
        max_grad_norm=3.0,
        load_best_model_at_end=True,
        lr_scheduler_type="linear",
        seed=3407,
        output_dir="outputs",
    ),
    callbacks=[EarlyStoppingCallback(
        early_stopping_patience=35,
        early_stopping_threshold=0.01,
    )],
)




Map (num_proc=2):   0%|          | 0/896 [00:00<?, ? examples/s]

Map (num_proc=2):   0%|          | 0/112 [00:00<?, ? examples/s]

We also use Unsloth's train_on_completions method to only train on the assistant outputs and ignore the loss on the user's inputs.



In [None]:
from unsloth.chat_templates import train_on_responses_only
trainer = train_on_responses_only(
    trainer,
    instruction_part = "<|start_header_id|>user<|end_header_id|>\n\n",
    response_part = "<|start_header_id|>assistant<|end_header_id|>\n\n",
)

Map:   0%|          | 0/896 [00:00<?, ? examples/s]

Map:   0%|          | 0/112 [00:00<?, ? examples/s]

We verify masking is actually done:

In [None]:
tokenizer.decode(trainer.train_dataset[5]["input_ids"])

'Below is an instruction that describes a task. Write a response that appropriately completes the request.\n\n### Instruction:\nأكتب قصة قصيرة تكون مفهومة للأطفال في عمر 3-5. نهاية القصة يجب أن تكون سعيدة. عدد الشخصيات في القصة يجب أن يكون 3. الدرس المستفاد من القصة هو اتق شر من أحسنت اليه. بلد الأحداث هي  البحرين. الأحداث تدور في فصل الربيع.  أكتب القصة مباشرة.\n\n### Response:\nفي يوم جميل من أيام الربيع بالبحرين، استيقظ الأصدقاء الثلاثة: حميد وفرح والبطة بطوطة، مع طلوع الشمس الدافئة. كان حميد وفرح يسكنان قريبين من بعضهما البعض، وكانا يعشقان اللعب في الحدائق الغنّاء بالأزهار والأشجار المليئة بالثمار، بينما كانت بطوطة ترافقهما دائمًا، مستمتعة بالماء والبرك الصغيرة التي خلفتها أمطار الربيع.\n\nذات يوم، وأثناء لعبهم معًا، وجدوا عصفور صغير مصاب تحت شجرة. أظهر الأصدقاء الثلاثة قلوبهم الطيبة وقرروا مساعدته. عالجوا جناحه الصغير بكل حرص وحنان، وأعطوه الماء والطعام حتى يستعيد قواه.\n\nلكن بعد أيام، عندما شفي العصفور وأصبح قادرًا على الطيران مجددًا، سرق قطعة خبز كان حميد يحضرها للفطور وطار بها

In [None]:
tokenizer.decode(trainer.eval_dataset[5]["input_ids"])

'Below is an instruction that describes a task. Write a response that appropriately completes the request.\n\n### Instruction:\nأكتب قصة قصيرة تكون مفهومة للأطفال في عمر 5-8. نهاية القصة يجب أن تكون حزينة. عدد الشخصيات في القصة يجب أن يكون 1. بلد الأحداث هي  فلسطين.  أكتب القصة مباشرة.\n\n### Response:\nفي قرية صغيرة وجميلة في فلسطين، عاشت قطة بيضاء صغيرة يُدعى موزا. كان موزا يحب اللعب بين أشجار الزيتون والنوم تحت أشعة الشمس الدافئة. لكن الشيء الذي كان يحبه أكثر من أي شيء آخر هو الجلوس على الشرفة في المساء لينظر إلى النجوم مع صديقه الوحيد في العالم، القمر.\n\nكل ليلة، كان موزا ينتظر بفارغ الصبر غروب الشمس ليبدأ حديثه مع القمر. كان يشاركه أحلامه الصغيرة ومغامرات يومه، وكيف أنه يحلم بزيارة القمر يومًا ما. القمر، بدوره، كان ينصت إليه دائمًا ويضيء له طريقه في الليل.\n\nولكن، في إحدى الليالي، لاحظ موزا شيئًا غير عادي. القمر لم يظهر. بدأ موزا يشعر بالقلق والحزن. لم يستطع النوم وظل ينتظر طوال الليل، لكن القمر لم يأتِ. مرت الليالي، وكل ليلة كان موزا يأمل بظهور صديقه، لكن دون جدوى. \n\nبمرور ال

In [None]:
#@title Show current memory stats
gpu_stats = torch.cuda.get_device_properties(0)
start_gpu_memory = round(torch.cuda.max_memory_reserved() / 1024 / 1024 / 1024, 3)
max_memory = round(gpu_stats.total_memory / 1024 / 1024 / 1024, 3)
print(f"GPU = {gpu_stats.name}. Max memory = {max_memory} GB.")
print(f"{start_gpu_memory} GB of memory reserved.")

GPU = NVIDIA A100-SXM4-40GB. Max memory = 39.564 GB.
11.543 GB of memory reserved.


Parameters adjusted for training:
* Epoch: 5
* Early stopping: 35
* Drop out: 0.2
* rank: 8
* weight decay:0.5
* max_grad: 3.0

In [None]:
trainer_stats = trainer.train()

==((====))==  Unsloth - 2x faster free finetuning | Num GPUs = 1
   \\   /|    Num examples = 896 | Num Epochs = 5
O^O/ \_/ \    Batch size per device = 2 | Gradient Accumulation steps = 4
\        /    Total batch size = 8 | Total steps = 560
 "-____-"     Number of trainable parameters = 20,185,088


Step,Training Loss,Validation Loss
10,1.5284,1.53808
20,1.3292,1.381554
30,1.308,1.357298
40,1.2772,1.321462
50,1.2632,1.306162
60,1.2956,1.297671
70,1.2643,1.286159
80,1.2716,1.276663
90,1.2121,1.26839
100,1.2482,1.263191


In [None]:
#@title Show final memory and time stats
used_memory = round(torch.cuda.max_memory_reserved() / 1024 / 1024 / 1024, 3)
used_memory_for_lora = round(used_memory - start_gpu_memory, 3)
used_percentage = round(used_memory         /max_memory*100, 3)
lora_percentage = round(used_memory_for_lora/max_memory*100, 3)
print(f"{trainer_stats.metrics['train_runtime']} seconds used for training.")
print(f"{round(trainer_stats.metrics['train_runtime']/60, 2)} minutes used for training.")
print(f"Peak reserved memory = {used_memory} GB.")
print(f"Peak reserved memory for training = {used_memory_for_lora} GB.")
print(f"Peak reserved memory % of max memory = {used_percentage} %.")
print(f"Peak reserved memory for training % of max memory = {lora_percentage} %.")

1748.9159 seconds used for training.
29.15 minutes used for training.
Peak reserved memory = 13.254 GB.
Peak reserved memory for training = 1.711 GB.
Peak reserved memory % of max memory = 33.5 %.
Peak reserved memory for training % of max memory = 4.325 %.



## 4. Saving and Loading Fine-tuned Models


In [None]:
from huggingface_hub import login
login(token="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")

In [None]:
# Save model and tokenizer locally in a separate directory
model.save_pretrained("./qwen_arabic_stories_5_epochs_local")
tokenizer.save_pretrained("./qwen_arabic_stories_5_epochs_local")


('./qwen_arabic_stories_5_epochs_local/tokenizer_config.json',
 './qwen_arabic_stories_5_epochs_local/special_tokens_map.json',
 './qwen_arabic_stories_5_epochs_local/vocab.json',
 './qwen_arabic_stories_5_epochs_local/merges.txt',
 './qwen_arabic_stories_5_epochs_local/added_tokens.json',
 './qwen_arabic_stories_5_epochs_local/tokenizer.json')

In [None]:
model.push_to_hub("GhadyIbra250/qwen_arabic_stories_with_5_epochs_show", token="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")
tokenizer.push_to_hub("GhadyIbra250/qwen_arabic_stories_with_5_epochs_show", token="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")

README.md:   0%|          | 0.00/577 [00:00<?, ?B/s]

  0%|          | 0/1 [00:00<?, ?it/s]

adapter_model.safetensors:   0%|          | 0.00/80.8M [00:00<?, ?B/s]

Saved model to https://huggingface.co/lamaishere/qwen_arabic_storiessss_5_epochs


  0%|          | 0/1 [00:00<?, ?it/s]

tokenizer.json:   0%|          | 0.00/11.4M [00:00<?, ?B/s]