# ü¶ô DPO Fine-Tuning with LlamaFactory ‚Äî Qwen3-VL-4B Vision-Language Model

---

## üìñ Overview

This notebook walks through a complete **Direct Preference Optimization (DPO)** fine-tuning pipeline for a **Vision-Language Model (VLM)** using the **LlamaFactory** framework.

### What is DPO?
**Direct Preference Optimization (DPO)** is a fine-tuning technique that teaches a language model to prefer certain responses over others, without requiring a separate reward model. Instead of reinforcement learning, it directly optimizes the model using pairs of:
- ‚úÖ **Chosen** response ‚Äî the preferred, higher-quality answer
- ‚ùå **Rejected** response ‚Äî the non-preferred, lower-quality answer

### What is LlamaFactory?
**LlamaFactory** is an open-source, unified fine-tuning framework that supports:
- Multiple training methods: SFT, DPO, ORPO, KTO, PPO, and more
- Efficient adapters: LoRA, QLoRA, full fine-tuning
- Many model families: LLaMA, Qwen, Mistral, Gemma, and more
- Multimodal models (images, audio, video)

### Model Used
We fine-tune **Qwen3-VL-4B-Instruct**, a 4-billion parameter vision-language model from Alibaba's Qwen3 family, capable of understanding both text and images.

### Dataset
We use the **`helehan/topic-overwrite`** dataset from HuggingFace, which contains image-question pairs with chosen and rejected answers ‚Äî perfect for DPO training.

---

## üó∫Ô∏è Pipeline Overview

```
1. Load Dataset          ‚Üí  HuggingFace dataset with images + chosen/rejected answers
2. Process Images        ‚Üí  Save as JPEG, map paths back to DataFrame
3. Format for DPO        ‚Üí  Convert to LlamaFactory's ShareGPT DPO format
4. Download Extra Data   ‚Üí  Pull pre-prepared files from Google Drive
5. Fix Image Paths       ‚Üí  Remap paths from Colab ‚Üí Kaggle
6. Install Dependencies  ‚Üí  torch, transformers, LlamaFactory
7. Register Datasets     ‚Üí  Add custom datasets to LlamaFactory registry
8. Write YAML Config     ‚Üí  Define all training hyperparameters
9. Run Training          ‚Üí  Launch DPO training via CLI
```

---

# ***EDA***

## üì¶ Import Libraries & Load Dataset

### What this cell does:
This is the foundation cell. We import all the Python libraries we need throughout the notebook and load the raw dataset from HuggingFace.

### Libraries explained:

| Library | Purpose |
|---|---|
| `os`, `glob` | File system operations ‚Äî navigating directories, finding files by pattern |
| `base64` | Encoding binary data (images) into text ‚Äî sometimes needed for API calls |
| `tqdm` | Progress bars ‚Äî shows how long loops will take |
| `datasets` | HuggingFace library for loading and processing datasets |
| `json` | Reading and writing JSON files (our data format) |
| `pandas` | DataFrame operations ‚Äî filtering, indexing, splitting data |
| `sklearn` | `train_test_split` for splitting data into train/validation sets |
| `PIL (Pillow)` | Image processing ‚Äî opening, converting, saving images |
| `matplotlib` | Plotting ‚Äî visualizing training curves and sample images |
| `gdown` | Downloading files directly from Google Drive by file ID |

### Dataset:
The `helehan/topic-overwrite` dataset is a vision DPO dataset. Each sample contains:
- An **image**
- A **question** about that image
- A **chosen** (good) answer
- A **rejected** (bad) answer

We load only the `train` split and immediately convert it to a pandas DataFrame for easier manipulation.

In [1]:
import os
from glob import glob
import base64
from tqdm.auto import tqdm
from datasets import load_dataset
import json
import pandas as pd
from sklearn.model_selection import train_test_split
from PIL import Image
import matplotlib.pyplot as plt
import gdown

ModuleNotFoundError: No module named 'gdown'

In [None]:
dataset = load_dataset(
    "helehan/topic-overwrite", split="train"
)
df = dataset.to_pandas()

In [None]:
df

## üñºÔ∏è Process & Save Images as JPEG

### Why do we need this cell?
The images in HuggingFace datasets are stored as in-memory PIL Image objects. LlamaFactory needs images saved as **actual files on disk** with known paths. This cell:
1. Creates a local folder `imagesdpo/` to store images
2. Converts every image to RGB (some images may be RGBA, grayscale, or palette mode)
3. Saves each image as a compressed JPEG
4. Builds a sorted list of image paths
5. Maps those paths back to the DataFrame

### Key decisions:
- **RGB conversion**: JPEG format does not support transparency (alpha channel). Images with mode `RGBA`, `P` (palette), or `L` (grayscale) must be converted to `RGB` first or saving will fail.
- **JPEG quality=85**: A good balance between file size and visual quality. Lower = smaller files but more compression artifacts.
- **optimize=True**: Applies Huffman coding optimization ‚Äî slightly slower saving but smaller file size.
- **Filename format**: `{original_name}_{index}.jpg` ‚Äî the index at the end ensures unique filenames even if original names collide.
- **Sorted by index**: After glob (which returns files in arbitrary OS order), we sort by the numeric index in the filename so image order matches DataFrame row order.

In [None]:
os.makedirs("imagesdpo", exist_ok=True)

In [None]:
for i, sample in tqdm(enumerate(dataset)):
  imag = sample['image']
  if imag.mode != 'RGB':
    imag = imag.convert('RGB')
  file_name = f"{os.path.splitext(os.path.basename(sample['image_path']))[0]}_{i}.jpg"
  path = f"imagesdpo/{file_name}"
  imag.save(path,'JPEG', quality=85, optimize=True)


In [None]:
images = glob("imagesdpo/*.jpg")
images = sorted(images,
               key=lambda x: int(os.path.splitext(os.path.basename(x))[0].split("_")[-1]))
df['image_edited'] = images

In [None]:
image = Image.open(images[0])
plt.imshow(image)
plt.axis("off")
plt.show()

In [None]:
df

## ‚úÇÔ∏è Train / Validation Split

### Why split the data?
In any machine learning workflow, we split data into:
- **Training set (80%)**: The model learns from this data
- **Validation set (20%)**: We evaluate the model on this data to detect overfitting and track performance during training

### Parameters explained:
- **`test_size=0.2`**: 20% of data goes to validation, 80% stays for training
- **`shuffle=True`**: Randomly shuffles the data before splitting ‚Äî important so that the model doesn't learn from a biased ordering
- **`random_state=42`**: Fixes the random seed so results are **reproducible** ‚Äî running the same code always gives the same split. (42 is a convention in ML, from *The Hitchhiker's Guide to the Galaxy*)

### Important note:
The split happens at the **DataFrame level**, meaning both `train_df` and `test_df` retain all columns including `image_edited`, `question`, `chosen`, and `rejected`.

In [None]:
train_df, test_df = train_test_split(df, test_size=0.2, shuffle=True, random_state=42)

In [None]:
train_df

In [None]:
test_df

## üîÑ Format Data into LlamaFactory DPO Format

### Why do we need a special format?
LlamaFactory expects data in a specific **ShareGPT-style JSON structure** for DPO training. Raw data from HuggingFace doesn't match this structure, so we convert it.

### The DPO record structure:
```json
{
  "conversations": [
    {"from": "human", "value": "<image>What is shown in this image?"}
  ],
  "chosen":   {"from": "gpt", "value": "A detailed correct answer..."},
  "rejected": {"from": "gpt", "value": "A vague or wrong answer..."},
  "images":   ["imagesdpo/photo_42.jpg"]
}
```

### Key format details:

| Field | Description |
|---|---|
| `conversations` | List of conversation turns. For DPO we only need the **user turn** (the question) |
| `"from": "human"` | Marks this turn as coming from the user |
| `"<image>"` | Special token that tells the model where the image appears in the text |
| `chosen` | The preferred response ‚Äî the answer DPO wants the model to **favor** |
| `rejected` | The non-preferred response ‚Äî the answer DPO wants the model to **avoid** |
| `images` | List of image file paths. Must match the number of `<image>` tokens in conversations |

### Why `<image>` before the question?
Placing `<image>` before the text is the standard convention for Qwen-VL models ‚Äî it tells the model to process the visual context before reading the question.

In [None]:
def format_fine_tuning(df):
  formatted_data = []
  for i in range(len(df)):
    task_dpo_record = {
    "conversations": [
        {
          "from": "human",
          "value": "<image>"+df.iloc[i]['question'],
        },
      ],
    "chosen": {
      "from": "gpt",
      "value": df.iloc[i]['chosen']
    },
    "rejected": {
      "from": "gpt",
      "value": df.iloc[i]['rejected']
    },
    "images": [
        df.iloc[i]['image_edited']
      ]
    }
    formatted_data.append(task_dpo_record)
  return formatted_data


In [None]:
train_dpo = format_fine_tuning(train_df)
test_dpo = format_fine_tuning(test_df)

## üíæ Save Formatted JSON Files

### Why save to JSON?
LlamaFactory reads training data from **local JSON files**. We need to serialize our Python list of dictionaries into JSON format and save them at known paths.

### JSON writing parameters explained:
- **`ensure_ascii=False`**: Allows non-ASCII characters (Arabic, Chinese, accented letters, emoji) to be saved as-is instead of being escaped as `\uXXXX` sequences. This is important if questions/answers contain non-English text.
- **`default=str`**: If any Python object can't be serialized to JSON (e.g., a numpy int64 or a Path object), convert it to a string instead of crashing. A safety net.

### File structure created:
```
imagesdpo/
‚îî‚îÄ‚îÄ datasets/
    ‚îî‚îÄ‚îÄ llamafactory-dpo-finetune-data/
        ‚îú‚îÄ‚îÄ train-v1.json   ‚Üê 80% of data for training
        ‚îî‚îÄ‚îÄ val-v1.json     ‚Üê 20% of data for evaluation
```

In [None]:
os.makedirs(
    os.path.join("imagesdpo", "datasets", "llamafactory-dpo-finetune-data"), exist_ok=True
)

In [None]:
with open(os.path.join("imagesdpo", "datasets", "llamafactory-dpo-finetune-data", "train-v1.json") , "w") as dest:
    json.dump(train_dpo, dest, ensure_ascii=False, default=str)

with open(os.path.join("imagesdpo", "datasets", "llamafactory-dpo-finetune-data", "val-v1.json") , "w") as dest:
    json.dump(test_dpo, dest, ensure_ascii=False, default=str)

## ‚òÅÔ∏è Download Pre-Prepared Data from Google Drive

### Why download from Google Drive?
This project was originally developed in **Google Colab** where the dataset (images + formatted JSON files) was prepared and saved directly to **Google Drive**. When it came time to train, Colab's free GPU (**Tesla T4**) was too weak to handle a 4B vision-language model with DPO training efficiently.

So the workflow was moved to **Kaggle**, which offers a free **NVIDIA P100** GPU that is significantly more powerful for training:

| Platform | GPU | VRAM | Why |
|---|---|---|---|
| Google Colab (free) | Tesla T4 | 15GB | Optimized for inference, weaker at FP32 training |
| Kaggle (free) | NVIDIA P100 | 16GB | Built for scientific training workloads, stronger FP32 compute |

Since the data was already prepared and sitting on Google Drive, we use `gdown` to pull it directly into the Kaggle environment instead of re-running the entire data preparation pipeline from scratch.

### What is gdown?
`gdown` is a Python library for downloading files from Google Drive using their **file ID** (the random string in a Drive sharing URL).

For example, in `https://drive.google.com/file/d/1SlSKfmTzigLxJHOfL2DJYKTES5Ry0394/view`, the file ID is `1SlSKfmTzigLxJHOfL2DJYKTES5Ry0394`.

### Files downloaded:

| File | Content |
|---|---|
| `pdf_images.zip` | ZIP of images extracted from PDF documents (prepared in Colab) |
| `train-v1.json` | Pre-formatted DPO training data in ShareGPT format (prepared in Colab) |
| `val-v1.json` | Pre-formatted DPO validation data in ShareGPT format (prepared in Colab) |


In [None]:
pdf_images_file_id = "1SlSKfmTzigLxJHOfL2DJYKTES5Ry0394"
gdown.download(id=pdf_images_file_id, output="./pdf_images.zip")


train_file_id = "1YPhsnMLN3D1-qTAyKW0-bMqfha8jBtnk"
gdown.download(id=train_file_id, output="./train-v1.json")


val_file_id = "1QoH17Z-VAPLkoxLc9qi09kX4DnlczI2w"
gdown.download(id=val_file_id, output="./val-v1.json")

## üìÇ Unzip Images & Collect Paths

### What this cell does:
1. **Unzips** the downloaded `pdf_images.zip` archive into the working directory
2. **Collects** all extracted `.jpg` image paths using `glob`

### Why unzip?
ZIP files must be extracted before their contents can be accessed by Python or LlamaFactory. The `-q` flag makes `unzip` quiet (no verbose output), keeping the notebook clean.

### The nested path problem:
Notice the long path after unzipping:
```
/kaggle/working/imagesdpo/content/drive/MyDrive/RAG Techinques/pdf_images/imagesdpo/
```
This happens because when the ZIP was created in Google Colab, it preserved the **full absolute Colab path** inside the archive. When unzipped on Kaggle, that Colab path becomes a subdirectory. The next cell fixes the paths in the JSON files to point here correctly.

### The glob pattern:
`glob("path/*.jpg")` uses a wildcard `*` to match all files ending in `.jpg` in the specified directory.

In [None]:
!unzip -q /kaggle/working/pdf_images.zip -d /kaggle/working/imagesdpo

In [None]:
images = glob("/kaggle/working/imagesdpo/content/drive/MyDrive/RAG Techinques/pdf_images/imagesdpo/*.jpg")

In [None]:
images[:2]

In [None]:
len(images)

## üîß Load JSON & Fix Image Paths

### Why do we need to fix paths?
The downloaded `train-v1.json` and `val-v1.json` files were created in **Google Colab** where images were stored at:
```
/content/drive/MyDrive/RAG Techinques/pdf_images/imagesdpo/image.jpg
```

On **Kaggle**, the same images are located at:
```
/kaggle/working/imagesdpo/content/drive/MyDrive/RAG Techinques/pdf_images/imagesdpo/image.jpg
```

If we don't fix the paths, LlamaFactory will look for images that **don't exist** at those locations and crash.

### How we fix it:
We use Python's `str.replace()` to swap the old Colab prefix with the new Kaggle prefix for every single image path in every record.

In [None]:
with open("/kaggle/working/train-v1.json", "r", encoding="utf-8") as file:
    train_data = json.load(file)
with open("/kaggle/working/val-v1.json", "r", encoding="utf-8") as file:
    test_data = json.load(file)

In [None]:
for i in range(len(train_data)):
    train_data[i]['images'][0] = train_data[i]['images'][0].replace("/content/drive/MyDrive/RAG Techinques/pdf_images/imagesdpo/",
                                                "/kaggle/working/imagesdpo/content/drive/MyDrive/RAG Techinques/pdf_images/imagesdpo/")
for i in range(len(test_data)):
    test_data[i]['images'][0] = test_data[i]['images'][0].replace("/content/drive/MyDrive/RAG Techinques/pdf_images/imagesdpo/",
                                                              "/kaggle/working/imagesdpo/content/drive/MyDrive/RAG Techinques/pdf_images/imagesdpo/")

In [None]:
with open("./train-v1-edited.json" , "w") as dest:
    json.dump(train_data, dest, ensure_ascii=False, default=str)

with open("./val-v1-edited.json" , "w") as dest:
    json.dump(test_data, dest, ensure_ascii=False, default=str)

# ***Finetuning***

## ‚öôÔ∏è Install All Required Dependencies

### Why pin specific versions?
Machine learning libraries change rapidly and often break compatibility between versions. Pinning **exact versions** ensures:
- **Reproducibility**: anyone running this notebook gets the same behavior
- **Compatibility**: LlamaFactory has been tested with these specific versions
- **Stability**: No surprise API changes from upgrading

### Packages explained:

| Package | Version | Role |
|---|---|---|
| `transformers` | 4.57.6 | Core HuggingFace library ‚Äî model loading, tokenization, training loops |
| `optimum` | 1.26.0 | HuggingFace optimization toolkit ‚Äî enables efficient inference and training |
| `datasets` | 4.4.0 | HuggingFace data library ‚Äî loading, streaming, processing datasets |
| `torch` | 2.8.0 | PyTorch ‚Äî the deep learning framework powering all computations |
| `torchvision` | 0.23 | Image transforms and vision utilities for PyTorch |
| `torchaudio` | 2.8.0 | Audio processing for PyTorch (included for compatibility) |

### Why torch 2.8.0 specifically?
Qwen3-VL requires modern attention mechanisms (FlashAttention2, etc.) that are only available in newer PyTorch versions. 2.8.0 also has improved memory efficiency for large vision-language models.

In [None]:
!pip install transformers==4.57.6
!pip install optimum==1.26.0
!pip install datasets==4.4.0

!pip install torch==2.8.0
!pip install torchvision==0.23
!pip install torchaudio==2.8.0

## ü¶ô Clone & Install LlamaFactory

### What is LlamaFactory?
LlamaFactory is an open-source framework that wraps HuggingFace Transformers with:
- A simple **YAML config** interface for training (no complex Python scripts needed)
- Built-in support for **LoRA, QLoRA, full fine-tuning**
- Pre-built **chat templates** for dozens of models
- Multi-modal support (vision, audio, video)
- A **CLI tool** (`llamafactory-cli`) for one-command training

### Why a specific git commit?
```bash
git checkout 762b480131908d37736ad9aa3f12e87f8f7e6313
```
This pins LlamaFactory to a **specific commit** that is known to work with our setup. The main branch of active open-source projects can change daily ‚Äî pinning a commit prevents breaking changes from affecting our training run.

### `--depth 1` in git clone:
Downloads only the **latest snapshot** of the repository without the full git history. This is much faster and uses less disk space ‚Äî ideal for Kaggle's limited storage.

### `pip install -e .`:
Installs LlamaFactory in **editable mode** ‚Äî meaning Python imports the code directly from the cloned folder. Changes to the source files are immediately reflected without reinstalling.

### `requirements/metrics.txt`:
Installs additional libraries needed for computing evaluation metrics like BLEU, ROUGE, and perplexity during the eval phase.

In [None]:
!git clone --depth 1 https://github.com/hiyouga/LlamaFactory.git
!cd LlamaFactory && git checkout 762b480131908d37736ad9aa3f12e87f8f7e6313

!cd LlamaFactory && pip install -e .
!cd LlamaFactory && pip install -r requirements/metrics.txt

## üìã Register Custom Datasets in LlamaFactory

### What is `dataset_info.json`?
LlamaFactory maintains a **central registry** of all known datasets in `data/dataset_info.json`. Before you can use any dataset in training, it must be registered here.

### What information goes in the registry?
Each dataset entry tells LlamaFactory:
- **`file_name`**: Where to find the local JSON file
- **`formatting`**: Data format ‚Äî `"sharegpt"` for multi-turn conversation format, or `"alpaca"` for single instruction-response pairs
- **`ranking: True`**: This is a **preference dataset** with chosen/rejected pairs (required for DPO)
- **`columns`**: Maps our JSON field names to LlamaFactory's expected field names

### Column mapping for our DPO dataset:

| Our JSON field | LlamaFactory expects | Meaning |
|---|---|---|
| `conversations` | `messages` | The conversation history (user turns) |
| `chosen` | `chosen` | The preferred response |
| `rejected` | `rejected` | The non-preferred response |
| `images` | `images` | Paths to image files |

### Why write to the LlamaFactory folder?
LlamaFactory reads `dataset_info.json` from its own `data/` directory at training time. We must write our custom entries directly into that file ‚Äî we can't point it to another location.


In [None]:
data_info = {
  "identity": {
    "file_name": "identity.json"
  },
  "alpaca_en_demo": {
    "file_name": "alpaca_en_demo.json"
  },
  "alpaca_zh_demo": {
    "file_name": "alpaca_zh_demo.json"
  },
  "glaive_toolcall_en_demo": {
    "file_name": "glaive_toolcall_en_demo.json",
    "formatting": "sharegpt",
    "columns": {
      "messages": "conversations",
      "tools": "tools"
    }
  },
  "glaive_toolcall_zh_demo": {
    "file_name": "glaive_toolcall_zh_demo.json",
    "formatting": "sharegpt",
    "columns": {
      "messages": "conversations",
      "tools": "tools"
    }
  },
  "mllm_demo": {
    "file_name": "mllm_demo.json",
    "formatting": "sharegpt",
    "columns": {
      "messages": "messages",
      "images": "images"
    },
    "tags": {
      "role_tag": "role",
      "content_tag": "content",
      "user_tag": "user",
      "assistant_tag": "assistant"
    }
  },
  "mllm_audio_demo": {
    "file_name": "mllm_audio_demo.json",
    "formatting": "sharegpt",
    "columns": {
      "messages": "messages",
      "audios": "audios"
    },
    "tags": {
      "role_tag": "role",
      "content_tag": "content",
      "user_tag": "user",
      "assistant_tag": "assistant"
    }
  },
  "mllm_video_demo": {
    "file_name": "mllm_video_demo.json",
    "formatting": "sharegpt",
    "columns": {
      "messages": "messages",
      "videos": "videos"
    },
    "tags": {
      "role_tag": "role",
      "content_tag": "content",
      "user_tag": "user",
      "assistant_tag": "assistant"
    }
  },
  "mllm_video_audio_demo": {
    "file_name": "mllm_video_audio_demo.json",
    "formatting": "sharegpt",
    "columns": {
      "messages": "messages",
      "videos": "videos",
      "audios": "audios"
    },
    "tags": {
      "role_tag": "role",
      "content_tag": "content",
      "user_tag": "user",
      "assistant_tag": "assistant"
    }
  },
  "alpaca_en": {
    "hf_hub_url": "llamafactory/alpaca_en",
    "ms_hub_url": "llamafactory/alpaca_en",
    "om_hub_url": "HaM/alpaca_en"
  },
  "alpaca_zh": {
    "hf_hub_url": "llamafactory/alpaca_zh",
    "ms_hub_url": "llamafactory/alpaca_zh"
  },
  "alpaca_gpt4_en": {
    "hf_hub_url": "llamafactory/alpaca_gpt4_en",
    "ms_hub_url": "llamafactory/alpaca_gpt4_en"
  },
  "alpaca_gpt4_zh": {
    "hf_hub_url": "llamafactory/alpaca_gpt4_zh",
    "ms_hub_url": "llamafactory/alpaca_gpt4_zh",
    "om_hub_url": "State_Cloud/alpaca-gpt4-data-zh"
  },
  "glaive_toolcall_en": {
    "hf_hub_url": "llamafactory/glaive_toolcall_en",
    "formatting": "sharegpt",
    "columns": {
      "messages": "conversations",
      "tools": "tools"
    }
  },
  "glaive_toolcall_zh": {
    "hf_hub_url": "llamafactory/glaive_toolcall_zh",
    "formatting": "sharegpt",
    "columns": {
      "messages": "conversations",
      "tools": "tools"
    }
  },
  "lima": {
    "hf_hub_url": "llamafactory/lima",
    "formatting": "sharegpt"
  },
  "guanaco": {
    "hf_hub_url": "JosephusCheung/GuanacoDataset",
    "ms_hub_url": "AI-ModelScope/GuanacoDataset"
  },
  "belle_2m": {
    "hf_hub_url": "BelleGroup/train_2M_CN",
    "ms_hub_url": "AI-ModelScope/train_2M_CN"
  },
  "belle_1m": {
    "hf_hub_url": "BelleGroup/train_1M_CN",
    "ms_hub_url": "AI-ModelScope/train_1M_CN"
  },
  "belle_0.5m": {
    "hf_hub_url": "BelleGroup/train_0.5M_CN",
    "ms_hub_url": "AI-ModelScope/train_0.5M_CN"
  },
  "belle_dialog": {
    "hf_hub_url": "BelleGroup/generated_chat_0.4M",
    "ms_hub_url": "AI-ModelScope/generated_chat_0.4M"
  },
  "belle_math": {
    "hf_hub_url": "BelleGroup/school_math_0.25M",
    "ms_hub_url": "AI-ModelScope/school_math_0.25M"
  },
  "open_platypus": {
    "hf_hub_url": "garage-bAInd/Open-Platypus",
    "ms_hub_url": "AI-ModelScope/Open-Platypus"
  },
  "codealpaca": {
    "hf_hub_url": "sahil2801/CodeAlpaca-20k",
    "ms_hub_url": "AI-ModelScope/CodeAlpaca-20k"
  },
  "alpaca_cot": {
    "hf_hub_url": "QingyiSi/Alpaca-CoT",
    "ms_hub_url": "AI-ModelScope/Alpaca-CoT"
  },
  "openorca": {
    "hf_hub_url": "Open-Orca/OpenOrca",
    "ms_hub_url": "AI-ModelScope/OpenOrca",
    "columns": {
      "prompt": "question",
      "response": "response",
      "system": "system_prompt"
    }
  },
  "slimorca": {
    "hf_hub_url": "Open-Orca/SlimOrca",
    "formatting": "sharegpt"
  },
  "mathinstruct": {
    "hf_hub_url": "TIGER-Lab/MathInstruct",
    "ms_hub_url": "AI-ModelScope/MathInstruct",
    "columns": {
      "prompt": "instruction",
      "response": "output"
    }
  },
  "firefly": {
    "hf_hub_url": "YeungNLP/firefly-train-1.1M",
    "columns": {
      "prompt": "input",
      "response": "target"
    }
  },
  "wikiqa": {
    "hf_hub_url": "wiki_qa",
    "columns": {
      "prompt": "question",
      "response": "answer"
    }
  },
  "webqa": {
    "hf_hub_url": "suolyer/webqa",
    "ms_hub_url": "AI-ModelScope/webqa",
    "columns": {
      "prompt": "input",
      "response": "output"
    }
  },
  "webnovel": {
    "hf_hub_url": "zxbsmk/webnovel_cn",
    "ms_hub_url": "AI-ModelScope/webnovel_cn"
  },
  "nectar_sft": {
    "hf_hub_url": "AstraMindAI/SFT-Nectar",
    "ms_hub_url": "AI-ModelScope/SFT-Nectar"
  },
  "deepctrl": {
    "ms_hub_url": "deepctrl/deepctrl-sft-data"
  },
  "adgen_train": {
    "hf_hub_url": "HasturOfficial/adgen",
    "ms_hub_url": "AI-ModelScope/adgen",
    "split": "train",
    "columns": {
      "prompt": "content",
      "response": "summary"
    }
  },
  "adgen_eval": {
    "hf_hub_url": "HasturOfficial/adgen",
    "ms_hub_url": "AI-ModelScope/adgen",
    "split": "validation",
    "columns": {
      "prompt": "content",
      "response": "summary"
    }
  },
  "sharegpt_hyper": {
    "hf_hub_url": "totally-not-an-llm/sharegpt-hyperfiltered-3k",
    "formatting": "sharegpt"
  },
  "sharegpt4": {
    "hf_hub_url": "shibing624/sharegpt_gpt4",
    "ms_hub_url": "AI-ModelScope/sharegpt_gpt4",
    "formatting": "sharegpt"
  },
  "ultrachat_200k": {
    "hf_hub_url": "HuggingFaceH4/ultrachat_200k",
    "ms_hub_url": "AI-ModelScope/ultrachat_200k",
    "split": "train_sft",
    "formatting": "sharegpt",
    "columns": {
      "messages": "messages"
    },
    "tags": {
      "role_tag": "role",
      "content_tag": "content",
      "user_tag": "user",
      "assistant_tag": "assistant"
    }
  },
  "infinity_instruct": {
    "hf_hub_url": "BAAI/Infinity-Instruct",
    "formatting": "sharegpt"
  },
  "agent_instruct": {
    "hf_hub_url": "THUDM/AgentInstruct",
    "ms_hub_url": "ZhipuAI/AgentInstruct",
    "formatting": "sharegpt"
  },
  "lmsys_chat": {
    "hf_hub_url": "lmsys/lmsys-chat-1m",
    "ms_hub_url": "AI-ModelScope/lmsys-chat-1m",
    "formatting": "sharegpt",
    "columns": {
      "messages": "conversation"
    },
    "tags": {
      "role_tag": "role",
      "content_tag": "content",
      "user_tag": "user",
      "assistant_tag": "assistant"
    }
  },
  "evol_instruct": {
    "hf_hub_url": "WizardLM/WizardLM_evol_instruct_V2_196k",
    "ms_hub_url": "AI-ModelScope/WizardLM_evol_instruct_V2_196k",
    "formatting": "sharegpt"
  },
  "glaive_toolcall_100k": {
    "hf_hub_url": "hiyouga/glaive-function-calling-v2-sharegpt",
    "formatting": "sharegpt",
    "columns": {
      "messages": "conversations",
      "tools": "tools"
    }
  },
  "cosmopedia": {
    "hf_hub_url": "HuggingFaceTB/cosmopedia",
    "columns": {
      "prompt": "prompt",
      "response": "text"
    }
  },
  "stem_zh": {
    "hf_hub_url": "hfl/stem_zh_instruction"
  },
  "ruozhiba_gpt4": {
    "hf_hub_url": "hfl/ruozhiba_gpt4_turbo"
  },
  "neo_sft": {
    "hf_hub_url": "m-a-p/neo_sft_phase2",
    "formatting": "sharegpt"
  },
  "magpie_pro_300k": {
    "hf_hub_url": "Magpie-Align/Magpie-Pro-300K-Filtered",
    "formatting": "sharegpt"
  },
  "magpie_ultra": {
    "hf_hub_url": "argilla/magpie-ultra-v0.1",
    "columns": {
      "prompt": "instruction",
      "response": "response"
    }
  },
  "web_instruct": {
    "hf_hub_url": "TIGER-Lab/WebInstructSub",
    "columns": {
      "prompt": "question",
      "response": "answer"
    }
  },
  "openo1_sft": {
    "hf_hub_url": "llamafactory/OpenO1-SFT",
    "ms_hub_url": "llamafactory/OpenO1-SFT",
    "columns": {
      "prompt": "prompt",
      "response": "response"
    }
  },
  "open_thoughts": {
    "hf_hub_url": "llamafactory/OpenThoughts-114k",
    "formatting": "sharegpt",
    "columns": {
      "messages": "messages"
    },
    "tags": {
      "role_tag": "role",
      "content_tag": "content",
      "user_tag": "user",
      "assistant_tag": "assistant",
      "system_tag": "system"
    }
  },
  "open_r1_math": {
    "hf_hub_url": "llamafactory/OpenR1-Math-94k",
    "formatting": "sharegpt",
    "columns": {
      "messages": "messages"
    },
    "tags": {
      "role_tag": "role",
      "content_tag": "content",
      "user_tag": "user",
      "assistant_tag": "assistant",
      "system_tag": "system"
    }
  },
  "chinese_r1_distill": {
    "hf_hub_url": "Congliu/Chinese-DeepSeek-R1-Distill-data-110k-SFT",
    "ms_hub_url": "liucong/Chinese-DeepSeek-R1-Distill-data-110k-SFT"
  },
  "llava_1k_en": {
    "hf_hub_url": "BUAADreamer/llava-en-zh-2k",
    "subset": "en",
    "formatting": "sharegpt",
    "columns": {
      "messages": "messages",
      "images": "images"
    },
    "tags": {
      "role_tag": "role",
      "content_tag": "content",
      "user_tag": "user",
      "assistant_tag": "assistant"
    }
  },
  "llava_1k_zh": {
    "hf_hub_url": "BUAADreamer/llava-en-zh-2k",
    "subset": "zh",
    "formatting": "sharegpt",
    "columns": {
      "messages": "messages",
      "images": "images"
    },
    "tags": {
      "role_tag": "role",
      "content_tag": "content",
      "user_tag": "user",
      "assistant_tag": "assistant"
    }
  },
  "llava_150k_en": {
    "hf_hub_url": "BUAADreamer/llava-en-zh-300k",
    "subset": "en",
    "formatting": "sharegpt",
    "columns": {
      "messages": "messages",
      "images": "images"
    },
    "tags": {
      "role_tag": "role",
      "content_tag": "content",
      "user_tag": "user",
      "assistant_tag": "assistant"
    }
  },
  "llava_150k_zh": {
    "hf_hub_url": "BUAADreamer/llava-en-zh-300k",
    "subset": "zh",
    "formatting": "sharegpt",
    "columns": {
      "messages": "messages",
      "images": "images"
    },
    "tags": {
      "role_tag": "role",
      "content_tag": "content",
      "user_tag": "user",
      "assistant_tag": "assistant"
    }
  },
  "pokemon_cap": {
    "hf_hub_url": "llamafactory/pokemon-gpt4o-captions",
    "formatting": "sharegpt",
    "columns": {
      "messages": "conversations",
      "images": "images"
    }
  },
  "mllm_pt_demo": {
    "hf_hub_url": "BUAADreamer/mllm_pt_demo",
    "formatting": "sharegpt",
    "columns": {
      "messages": "messages",
      "images": "images"
    },
    "tags": {
      "role_tag": "role",
      "content_tag": "content",
      "user_tag": "user",
      "assistant_tag": "assistant"
    }
  },
  "oasst_de": {
    "hf_hub_url": "mayflowergmbh/oasst_de"
  },
  "dolly_15k_de": {
    "hf_hub_url": "mayflowergmbh/dolly-15k_de"
  },
  "alpaca-gpt4_de": {
    "hf_hub_url": "mayflowergmbh/alpaca-gpt4_de"
  },
  "openschnabeltier_de": {
    "hf_hub_url": "mayflowergmbh/openschnabeltier_de"
  },
  "evol_instruct_de": {
    "hf_hub_url": "mayflowergmbh/evol-instruct_de"
  },
  "dolphin_de": {
    "hf_hub_url": "mayflowergmbh/dolphin_de"
  },
  "booksum_de": {
    "hf_hub_url": "mayflowergmbh/booksum_de"
  },
  "airoboros_de": {
    "hf_hub_url": "mayflowergmbh/airoboros-3.0_de"
  },
  "ultrachat_de": {
    "hf_hub_url": "mayflowergmbh/ultra-chat_de"
  },
  "dlr_web": {
    "hf_hub_url": "Attention1115/DLR-Web",
    "split": "full",
    "columns": {
      "prompt": "question",
      "response": "response"
    }
  },
  "dpo_en_demo": {
    "file_name": "dpo_en_demo.json",
    "ranking": True,
    "formatting": "sharegpt",
    "columns": {
      "messages": "conversations",
      "chosen": "chosen",
      "rejected": "rejected"
    }
  },
  "dpo_zh_demo": {
    "file_name": "dpo_zh_demo.json",
    "ranking": True,
    "formatting": "sharegpt",
    "columns": {
      "messages": "conversations",
      "chosen": "chosen",
      "rejected": "rejected"
    }
  },
  "dpo_mix_en": {
    "hf_hub_url": "llamafactory/DPO-En-Zh-20k",
    "subset": "en",
    "ranking": True,
    "formatting": "sharegpt",
    "columns": {
      "messages": "conversations",
      "chosen": "chosen",
      "rejected": "rejected"
    }
  },
  "dpo_mix_zh": {
    "hf_hub_url": "llamafactory/DPO-En-Zh-20k",
    "subset": "zh",
    "ranking": True,
    "formatting": "sharegpt",
    "columns": {
      "messages": "conversations",
      "chosen": "chosen",
      "rejected": "rejected"
    }
  },
  "ultrafeedback": {
    "hf_hub_url": "llamafactory/ultrafeedback_binarized",
    "ms_hub_url": "llamafactory/ultrafeedback_binarized",
    "ranking": True,
    "columns": {
      "prompt": "instruction",
      "chosen": "chosen",
      "rejected": "rejected"
    }
  },
  "coig_p": {
    "hf_hub_url": "m-a-p/COIG-P",
    "ranking": True,
    "formatting": "sharegpt",
    "columns": {
      "messages": "conversations",
      "chosen": "chosen",
      "rejected": "rejected"
    }
  },
  "rlhf_v": {
    "hf_hub_url": "llamafactory/RLHF-V",
    "ranking": True,
    "formatting": "sharegpt",
    "columns": {
      "messages": "conversations",
      "chosen": "chosen",
      "rejected": "rejected",
      "images": "images"
    }
  },
  "vlfeedback": {
    "hf_hub_url": "Zhihui/VLFeedback",
    "ranking": True,
    "formatting": "sharegpt",
    "columns": {
      "messages": "conversations",
      "chosen": "chosen",
      "rejected": "rejected",
      "images": "images"
    }
  },
  "rlaif_v": {
    "hf_hub_url": "openbmb/RLAIF-V-Dataset",
    "ranking": True,
    "columns": {
      "prompt": "question",
      "chosen": "chosen",
      "rejected": "rejected",
      "images": "image"
    }
  },
  "orca_pairs": {
    "hf_hub_url": "Intel/orca_dpo_pairs",
    "ranking": True,
    "columns": {
      "prompt": "question",
      "chosen": "chosen",
      "rejected": "rejected",
      "system": "system"
    }
  },
  "nectar_rm": {
    "hf_hub_url": "AstraMindAI/RLAIF-Nectar",
    "ms_hub_url": "AI-ModelScope/RLAIF-Nectar",
    "ranking": True
  },
  "orca_dpo_de": {
    "hf_hub_url": "mayflowergmbh/intel_orca_dpo_pairs_de",
    "ranking": True
  },
  "kto_en_demo": {
    "file_name": "kto_en_demo.json",
    "formatting": "sharegpt",
    "columns": {
      "messages": "messages",
      "kto_tag": "label"
    },
    "tags": {
      "role_tag": "role",
      "content_tag": "content",
      "user_tag": "user",
      "assistant_tag": "assistant"
    }
  },
  "kto_mix_en": {
    "hf_hub_url": "argilla/kto-mix-15k",
    "formatting": "sharegpt",
    "columns": {
      "messages": "completion",
      "kto_tag": "label"
    },
    "tags": {
      "role_tag": "role",
      "content_tag": "content",
      "user_tag": "user",
      "assistant_tag": "assistant"
    }
  },
  "ultrafeedback_kto": {
    "hf_hub_url": "argilla/ultrafeedback-binarized-preferences-cleaned-kto",
    "ms_hub_url": "AI-ModelScope/ultrafeedback-binarized-preferences-cleaned-kto",
    "columns": {
      "prompt": "prompt",
      "response": "completion",
      "kto_tag": "label"
    }
  },
  "wiki_demo": {
    "file_name": "wiki_demo.txt",
    "columns": {
      "prompt": "text"
    }
  },
  "c4_demo": {
    "file_name": "c4_demo.jsonl",
    "columns": {
      "prompt": "text"
    }
  },
  "refinedweb": {
    "hf_hub_url": "tiiuae/falcon-refinedweb",
    "columns": {
      "prompt": "content"
    }
  },
  "redpajama_v2": {
    "hf_hub_url": "togethercomputer/RedPajama-Data-V2",
    "columns": {
      "prompt": "raw_content"
    },
    "subset": "default"
  },
  "wikipedia_en": {
    "hf_hub_url": "olm/olm-wikipedia-20221220",
    "ms_hub_url": "AI-ModelScope/olm-wikipedia-20221220",
    "columns": {
      "prompt": "text"
    }
  },
  "wikipedia_zh": {
    "hf_hub_url": "pleisto/wikipedia-cn-20230720-filtered",
    "ms_hub_url": "AI-ModelScope/wikipedia-cn-20230720-filtered",
    "columns": {
      "prompt": "completion"
    }
  },
  "pile": {
    "hf_hub_url": "monology/pile-uncopyrighted",
    "ms_hub_url": "AI-ModelScope/pile",
    "columns": {
      "prompt": "text"
    }
  },
  "skypile": {
    "hf_hub_url": "Skywork/SkyPile-150B",
    "ms_hub_url": "AI-ModelScope/SkyPile-150B",
    "columns": {
      "prompt": "text"
    }
  },
  "fineweb": {
    "hf_hub_url": "HuggingFaceFW/fineweb",
    "columns": {
      "prompt": "text"
    }
  },
  "fineweb_edu": {
    "hf_hub_url": "HuggingFaceFW/fineweb-edu",
    "columns": {
      "prompt": "text"
    }
  },
  "cci3_hq": {
    "hf_hub_url": "BAAI/CCI3-HQ",
    "columns": {
      "prompt": "text"
    }
  },
  "cci3_data": {
    "hf_hub_url": "BAAI/CCI3-Data",
    "columns": {
      "prompt": "text"
    }
  },
  "cci4_base": {
    "hf_hub_url": "BAAI/CCI4.0-M2-Base-v1",
    "columns": {
      "prompt": "text"
    }
  },
  "cci4_cot": {
    "hf_hub_url": "BAAI/CCI4.0-M2-CoT-v1",
    "columns": {
      "prompt": "text"
    }
  },
  "cci4_extra": {
    "hf_hub_url": "BAAI/CCI4.0-M2-Extra-v1",
    "columns": {
      "prompt": "text"
    }
  },
  "the_stack": {
    "hf_hub_url": "bigcode/the-stack",
    "ms_hub_url": "AI-ModelScope/the-stack",
    "columns": {
      "prompt": "content"
    }
  },
  "starcoder_python": {
    "hf_hub_url": "bigcode/starcoderdata",
    "ms_hub_url": "AI-ModelScope/starcoderdata",
    "columns": {
      "prompt": "content"
    },
    "folder": "python"
  },
  "dpo_finetuning_train": {
    "file_name": "/kaggle/working/train-v1-edited.json",
    "formatting": "sharegpt",
    "ranking": True,
    "columns": {
      "messages": "conversations",
      "chosen": "chosen",
      "rejected": "rejected",
      "images": "images"
   }
  },
    "dpo_finetuning_test": {
    "file_name": "/kaggle/working/val-v1-edited.json",
    "formatting": "sharegpt",
    "ranking": True,
    "columns": {
      "messages": "conversations",
      "chosen": "chosen",
      "rejected": "rejected",
      "images": "images"
   }
  }
}

with open("/kaggle/working/LlamaFactory/data/dataset_info.json","w") as dest:
    json.dump(data_info, dest, ensure_ascii=False, default=str, indent=2)

## üìù Write the Training Configuration YAML

### What is a YAML config?
Instead of long Python scripts with dozens of arguments, LlamaFactory uses a **single YAML file** that defines everything about the training run. This makes experiments easy to track, share, and reproduce.

### üèóÔ∏è Model Section

| Parameter | Value | Explanation |
|---|---|---|
| `model_name_or_path` | `Qwen/Qwen3-VL-4B-Instruct` | HuggingFace model ID ‚Äî auto-downloaded |
| `image_max_pixels` | `262144` | Max image resolution = 512√ó512 pixels (512¬≤=262144). Larger = more detail but more VRAM |
| `video_max_pixels` | `16384` | Max video frame resolution ‚Äî set low since we don't use video |
| `trust_remote_code` | `true` | Allow executing custom model code from HuggingFace (required for Qwen3-VL) |

### üéØ Method Section

| Parameter | Value | Explanation |
|---|---|---|
| `stage` | `dpo` | Training stage: Direct Preference Optimization |
| `do_train` | `true` | Enable training (vs. inference only) |
| `finetuning_type` | `lora` | Use LoRA adapters instead of full fine-tuning |
| `lora_rank` | `32` | LoRA rank ‚Äî higher rank = more parameters = more expressive but uses more VRAM |
| `lora_target` | `all` | Apply LoRA to ALL linear layers in the model |
| `pref_beta` | `0.1` | DPO temperature ‚Äî controls how strongly the model is pushed toward chosen responses |
| `pref_loss` | `sigmoid` | DPO loss type: standard sigmoid DPO (as in the original DPO paper) |

### üìö Dataset Section

| Parameter | Value | Explanation |
|---|---|---|
| `dataset` | `dpo_finetuning_train` | Name matching our registry entry for training |
| `eval_dataset` | `dpo_finetuning_test` | Name matching our registry entry for evaluation |
| `template` | `qwen3_vl_nothink` | Chat template ‚Äî **no-thinking mode** means direct answers without `<think>` tokens |
| `cutoff_len` | `12000` | Maximum sequence length in tokens ‚Äî longer sequences are truncated |
| `preprocessing_num_workers` | `16` | CPU workers for data preprocessing |
| `dataloader_num_workers` | `4` | CPU workers for feeding data to GPU during training |

### üîÑ Training Hyperparameters

| Parameter | Value | Explanation |
|---|---|---|
| `per_device_train_batch_size` | `2` | 2 samples per GPU per step |
| `gradient_accumulation_steps` | `8` | Accumulate gradients over 8 mini-batches. Effective batch size = 2√ó8 = **16** |
| `learning_rate` | `5.0e-6` | Very small LR ‚Äî fine-tuning needs gentle updates to avoid catastrophic forgetting |
| `num_train_epochs` | `3.0` | Train for 3 full passes through the dataset |
| `lr_scheduler_type` | `cosine` | Cosine annealing ‚Äî LR gradually decays following a cosine curve |
| `warmup_ratio` | `0.1` | First 10% of steps: LR linearly warms up from 0 to avoid unstable early updates |
| `bf16` | `true` | BFloat16 precision ‚Äî faster than FP32, more stable than FP16 for training |

### üìä Output & Evaluation

| Parameter | Value | Explanation |
|---|---|---|
| `output_dir` | `saves/qwen3-vl-4b/lora/dpo` | Where LoRA adapter weights and training artifacts are saved |
| `logging_steps` | `25` | Log training loss every 25 steps |
| `save_strategy` | `epoch` | Save a checkpoint after each epoch |
| `eval_strategy` | `epoch` | Run evaluation after each epoch |
| `plot_loss` | `true` | Generate a loss curve plot after training |
| `report_to` | `none` | Don't send metrics to W&B, TensorBoard, etc. |

In [None]:
yaml_content = """\
### model
model_name_or_path: Qwen/Qwen3-VL-4B-Instruct
image_max_pixels: 262144
video_max_pixels: 16384
trust_remote_code: true

### method
stage: dpo
do_train: true
finetuning_type: lora
lora_rank: 32
lora_target: all
pref_beta: 0.1
pref_loss: sigmoid  # choices: [sigmoid (dpo), orpo, simpo]

### dataset
dataset: dpo_finetuning_train
eval_dataset: dpo_finetuning_test
template: qwen3_vl_nothink
cutoff_len: 4092
max_samples: 1000
preprocessing_num_workers: 16
dataloader_num_workers: 4

### output
output_dir: saves/qwen3-vl-4b/lora/dpo
logging_strategy: steps         
logging_steps: 25
save_strategy: epoch
plot_loss: true
overwrite_output_dir: true
report_to: none

### train
per_device_train_batch_size: 2
gradient_accumulation_steps: 8
learning_rate: 5.0e-6
num_train_epochs: 3.0
lr_scheduler_type: cosine
warmup_ratio: 0.1
bf16: true
ddp_timeout: 180000000
#resume_from_checkpoint: null

### eval
per_device_eval_batch_size: 16
eval_strategy: epoch
"""

with open("/kaggle/working/LlamaFactory/examples/train_lora/dpo_finetuning.yaml", "w") as f:
    f.write(yaml_content)

In [None]:
os.environ["WANDB_DISABLED"] = "true"

## üöÄ Launch DPO Fine-Tuning

### What happens when you run this cell?
This is the main training command. LlamaFactory reads the YAML config and orchestrates the entire training process:

1. **Downloads the model** ‚Äî Qwen3-VL-4B-Instruct from HuggingFace (~8GB)
2. **Loads the dataset** ‚Äî reads `train-v1-edited.json` and `val-v1-edited.json`
3. **Preprocesses data** ‚Äî tokenizes text, loads and resizes images, creates input tensors
4. **Initializes LoRA** ‚Äî adds trainable adapter layers to the frozen base model
5. **Trains for 3 epochs** ‚Äî running DPO loss on (question, chosen, rejected, image) quadruples
6. **Evaluates each epoch** ‚Äî computes validation loss to track overfitting
7. **Saves checkpoints** ‚Äî saves LoRA adapter weights after each epoch
8. **Plots loss curve** ‚Äî generates a PNG showing training and validation loss over time

### What is `DISABLE_VERSION_CHECK=1`?
Suppresses LlamaFactory's startup version compatibility warnings, which can be noisy when using pinned dependency versions.

### What is LoRA?
**Low-Rank Adaptation (LoRA)** freezes the original model weights and adds small trainable matrices (adapters) alongside them. This means:
- üîµ Only ~1‚Äì5% of total parameters are trained
- üíæ Much less VRAM required than full fine-tuning
- ‚ö° Faster training
- üõ°Ô∏è Less risk of catastrophic forgetting

### Expected console output:
```
[INFO] Loading model: Qwen/Qwen3-VL-4B-Instruct
[INFO] trainable params: 83,886,080 || all params: 4,184,166,400 || trainable%: 2.004
{'loss': 0.6823, 'learning_rate': 2.5e-06, 'epoch': 1.0}
{'eval_loss': 0.6541, 'epoch': 1.0}
...
[INFO] Training completed. Output saved to saves/qwen3-vl-4b/lora/dpo
```

### Output files saved:
```
saves/qwen3-vl-4b/lora/dpo/
‚îú‚îÄ‚îÄ adapter_config.json        ‚Üê LoRA architecture config
‚îú‚îÄ‚îÄ adapter_model.safetensors  ‚Üê Trained LoRA weights  ‚úÖ most important
‚îú‚îÄ‚îÄ training_args.bin          ‚Üê Saved training arguments
‚îú‚îÄ‚îÄ trainer_log.jsonl          ‚Üê Step-by-step training metrics
‚îî‚îÄ‚îÄ training_loss.png          ‚Üê Loss curve visualization
```

In [None]:
!cd LlamaFactory && export DISABLE_VERSION_CHECK=1 && llamafactory-cli train /kaggle/working/LlamaFactory/examples/train_lora/dpo_finetuning.yaml

## üöÄ ***Inference***

In [None]:
MODEL_ID = ""

In [None]:
from transformers import Qwen3VLForConditionalGeneration, AutoProcessor

# default: Load the model on the available device(s)
model = Qwen3VLForConditionalGeneration.from_pretrained(
    MODEL_ID, dtype="auto", device_map="auto"
)

processor = AutoProcessor.from_pretrained(MODEL_ID)

messages = [
    {
        "role": "user",
        "content": [
            {
                "type": "image",
                "image": "put-image-path",
            },
            {"type": "text", "text": "Describe this image."},
        ],
    }
]

# Preparation for inference
inputs = processor.apply_chat_template(
    messages,
    tokenize=True,
    add_generation_prompt=True,
    return_dict=True,
    return_tensors="pt"
)
inputs = inputs.to(model.device)

# Inference: Generation of the output
generated_ids = model.generate(**inputs, max_new_tokens=128)
generated_ids_trimmed = [
    out_ids[len(in_ids) :] for in_ids, out_ids in zip(inputs.input_ids, generated_ids)
]
output_text = processor.batch_decode(
    generated_ids_trimmed, skip_special_tokens=True, clean_up_tokenization_spaces=False
)
print(output_text)


## üéâ Summary

You have successfully completed a full **DPO fine-tuning pipeline** for a Vision-Language Model!

| Step | Task | Output |
|---|---|---|
| 1 | Load HuggingFace dataset | Raw image+QA DataFrame |
| 2 | Process & save images | JPEG files in `imagesdpo/` |
| 3 | Split data | 80% train / 20% val |
| 4 | Format for DPO | ShareGPT-style JSON records |
| 5 | Save JSON files | `train-v1.json`, `val-v1.json` |
| 6 | Download extra data | PDF images + pre-built JSONs |
| 7 | Unzip & collect paths | Extracted image paths |
| 8 | Fix paths | Colab ‚Üí Kaggle path remapping |
| 9 | Install dependencies | PyTorch, Transformers stack |
| 10 | Install LlamaFactory | Training framework |
| 11 | Register datasets | Custom entries in `dataset_info.json` |
| 12 | Write YAML config | Training hyperparameters |
| 13 | Run DPO training | Trained LoRA adapter weights |