# Introduction

This Kaggle Notebook demonstrates how we fine-tune **LLaVA 1.5 (Large Language and Vision Assistant)** with **7B parameters** on curated **agriculture-specific datasets**.  
The goal is to build a domain-adapted model that can answer farmer queries, recommend crops, detect diseases and understand agricultural context from both **text** and **images**.  

## Datasets Used  

The notebook uses multiple datasets to cover **production statistics, crop recommendation, crop disease diagnosis and conversational farmer queries**.  

The [Finetunning-With-Unsloth](https://github.com/navik11/Finetunning-With-Unsloth.git) repo contains some of the datasets, cloning into it so that we can use the datasets.

In [1]:
!git clone https://github.com/navik11/Finetunning-With-Unsloth.git
%cd Finetunning-With-Unsloth

Cloning into 'Finetunning-With-Unsloth'...
remote: Enumerating objects: 21, done.[K
remote: Counting objects: 100% (21/21), done.[K
remote: Compressing objects: 100% (18/18), done.[K
remote: Total 21 (delta 3), reused 16 (delta 1), pack-reused 0 (from 0)[K
Receiving objects: 100% (21/21), 6.33 MiB | 15.04 MiB/s, done.
Resolving deltas: 100% (3/3), done.
/kaggle/working/Finetunning-With-Unsloth


## Environment Setup  

Before starting fine-tuning, we need to pull the latest code and install dependencies.

[Unsloth](https://github.com/unslothai/unsloth) is a framework for efficient fine-tuning of large models (LLMs and VLMs).

In [2]:
!git pull
!pip install -q unsloth
!pip install -q triton==3.2.0

Already up to date.
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m47.6/47.6 kB[0m [31m1.5 MB/s[0m eta [36m0:00:00[0m
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m42.0/42.0 kB[0m [31m1.9 MB/s[0m eta [36m0:00:00[0m
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m307.9/307.9 kB[0m [31m7.9 MB/s[0m eta [36m0:00:00[0m
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m561.5/561.5 kB[0m [31m19.9 MB/s[0m eta [36m0:00:00[0m
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m511.9/511.9 kB[0m [31m23.1 MB/s[0m eta [36m0:00:00[0m
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m11.3/11.3 MB[0m [31m100.5 MB/s[0m eta [36m0:00:00[0m
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m182.7/182.7 kB[0m [31m7.0 MB/s[0m eta [36m0:00:00[0m
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m117.2/117.2 MB[0m [31m15.6 MB/s[0m eta [36m0:00:00[0m
[2K   

# Crop Disease Diagnosis Multimodal (CDDM) Dataset  

The [CDDM dataset](https://github.com/UnicomAI/UnicomBenchmark/tree/main/CDDMBench) is a **crop disease domain multimodal dataset**, a pioneering resource designed to advance agricultural research through **multimodal learning techniques**.  

It provides paired **image + text data** of plants and their corresponding disease labels, enabling training of models that can understand both **visual symptoms** and **textual descriptions**.  

We will use the **CDDM Dataset** to fine-tune our model so that it can:  
- Detect the type of plant from images  
- Identify crop diseases from leaf symptoms  
- Enhance agricultural support systems with **vision-language capabilities**  


## Processing the CDDM Dataset  

The **CDDM dataset** is organized in folders named as `Plant,Disease` (e.g., `Apple,Alternaria Blotch`, `Tomato,Healthy`). Each folder contains multiple leaf images.  

We convert it into a **chat-style multimodal format** for LLaVA fine-tuning:  
- **User:** “What is the content of this picture?” + image  
- **Assistant:**  
  - `"This image shows a healthy <plant> leaf."` (if Healthy)  
  - `"This image shows a <plant> leaf affected by <disease>."`  

### Example
```json
{
  "messages": [
    {"role": "user", "content": [
      {"type": "text", "text": "What is the content of this picture?"},
      {"type": "image", "image": "/.../Apple,Alternaria Blotch/plant_69422.jpg"}
    ]},
    {"role": "assistant", "content": [
      {"type": "text", "text": "This image shows a Apple leaf affected by Alternaria Blotch."}
    ]}
  ]
}


In [3]:
import os
import json

base_path = "/kaggle/input/cddm-dataset/dataset/dataset/images"
converted_dataset = []

iters = 0
for folder_name in os.listdir(base_path):
    folder_path = os.path.join(base_path, folder_name)
    if not os.path.isdir(folder_path):
        continue

    try:
        plant_name, disease_name = folder_name.split(",", 1)
    except ValueError:
        print(f"Skipping folder with unexpected name format: {folder_name}")
        continue

    for img_file in os.listdir(folder_path):
        if not img_file.lower().endswith((".jpg", ".jpeg", ".png")):
            continue
        
        img_path = os.path.join(folder_path, img_file)

        # Create the answer text
        if disease_name.strip().lower() == "healthy":
            answer_text = f"This image shows a healthy {plant_name.strip()} leaf."
        else:
            answer_text = f"This image shows a {plant_name.strip()} leaf affected by {disease_name.strip()}."

        entry = {
            "messages": [
                {
                    "role": "user",
                    "content": [
                        {"type": "text", "text": "What is the content of this picture?"},
                        {"type": "image", "image": img_path}
                    ]
                },
                {
                    "role": "assistant",
                    "content": [
                        {"type": "text", "text": answer_text}
                    ]
                }
            ]
        }

        iters += 1
        if iters % 5000 == 0:
            print(f"Processed {iters} images...")

        converted_dataset.append(entry)

print(f"Total entries created: {len(converted_dataset)}")
print(json.dumps(converted_dataset[0], indent=2))

Processed 5000 images...
Processed 10000 images...
Processed 15000 images...
Processed 20000 images...
Processed 25000 images...
Processed 30000 images...
Processed 35000 images...
Processed 40000 images...
Processed 45000 images...
Processed 50000 images...
Processed 55000 images...
Processed 60000 images...
Processed 65000 images...
Processed 70000 images...
Processed 75000 images...
Processed 80000 images...
Processed 85000 images...
Processed 90000 images...
Processed 95000 images...
Processed 100000 images...
Processed 105000 images...
Processed 110000 images...
Processed 115000 images...
Processed 120000 images...
Processed 125000 images...
Processed 130000 images...
Processed 135000 images...
Total entries created: 137000
{
  "messages": [
    {
      "role": "user",
      "content": [
        {
          "type": "text",
          "text": "What is the content of this picture?"
        },
        {
          "type": "image",
          "image": "/kaggle/input/cddm-dataset/dataset/

## Loading LLaVA 1.5 7B with Unsloth  

We use **Unsloth’s FastVisionModel** to load the pretrained **LLaVA 1.5 7B** model efficiently.  

* `load_in_4bit=True`: runs the model in 4-bit precision for faster and memory-efficient fine-tuning.
* `use_gradient_checkpointing="unsloth"`: enables gradient checkpointing, saving memory at the cost of a slight compute overhead.

This setup makes it possible to fine-tune LLaVA 1.5 7B on Kaggle/Colab GPUs without running out of memory.

In [4]:
from unsloth import FastVisionModel
import torch

model, tokenizer = FastVisionModel.from_pretrained(
    "unsloth/llava-1.5-7b-hf",
    load_in_4bit = True,
    use_gradient_checkpointing = "unsloth",
)

🦥 Unsloth: Will patch your computer to enable 2x faster free finetuning.


2025-08-18 13:10:31.328491: E external/local_xla/xla/stream_executor/cuda/cuda_fft.cc:477] Unable to register cuFFT factory: Attempting to register factory for plugin cuFFT when one has already been registered
E0000 00:00:1755522631.704816      19 cuda_dnn.cc:8310] Unable to register cuDNN factory: Attempting to register factory for plugin cuDNN when one has already been registered
E0000 00:00:1755522631.803080      19 cuda_blas.cc:1418] Unable to register cuBLAS factory: Attempting to register factory for plugin cuBLAS when one has already been registered


🦥 Unsloth Zoo will now patch everything to make training faster!
==((====))==  Unsloth 2025.8.6: Fast Llava patching. Transformers: 4.55.2.
   \\   /|    Tesla T4. Num GPUs = 2. Max memory: 14.741 GB. Platform: Linux.
O^O/ \_/ \    Torch: 2.8.0+cu128. CUDA: 7.5. CUDA Toolkit: 12.8. Triton: 3.2.0
\        /    Bfloat16 = FALSE. FA [Xformers = 0.0.32.post2. FA2 = False]
 "-____-"     Free license: http://github.com/unslothai/unsloth
Unsloth: Fast downloading is enabled - ignore downloading bars which are red colored!


model.safetensors:   0%|          | 0.00/4.04G [00:00<?, ?B/s]

generation_config.json:   0%|          | 0.00/136 [00:00<?, ?B/s]

processor_config.json:   0%|          | 0.00/173 [00:00<?, ?B/s]

chat_template.json:   0%|          | 0.00/701 [00:00<?, ?B/s]

preprocessor_config.json:   0%|          | 0.00/505 [00:00<?, ?B/s]

Using a slow image processor as `use_fast` is unset and a slow processor was saved with this model. `use_fast=True` will be the default behavior in v4.52, even if the model was saved with a slow processor. This will result in minor differences in outputs. You'll still be able to use a slow processor with `use_fast=False`.


tokenizer_config.json: 0.00B [00:00, ?B/s]

tokenizer.model:   0%|          | 0.00/500k [00:00<?, ?B/s]

tokenizer.json: 0.00B [00:00, ?B/s]

added_tokens.json:   0%|          | 0.00/41.0 [00:00<?, ?B/s]

special_tokens_map.json:   0%|          | 0.00/552 [00:00<?, ?B/s]

## Applying LoRA with Unsloth  

We prepare the model for fine-tuning using **PEFT (Parameter-Efficient Fine-Tuning)** with LoRA.  

`FastVisionModel.get_peft_model(...)` applies LoRA adapters to selected components:  
- `finetune_vision_layers=True` → fine-tune vision encoder  
- `finetune_language_layers=True` → fine-tune language model layers  
- `finetune_attention_modules=True` and `finetune_mlp_modules=True` → fine-tune key transformer modules  

LoRA hyperparameters:  
- `r=16` and `lora_alpha=16` → rank and scaling factor  
- `lora_dropout=0` → no dropout for stability  
- `bias="none"` → LoRA applied without bias terms  
- `random_state=3407` → ensures reproducibility  
- `use_rslora=False` and `loftq_config=None` → standard LoRA setup  

This makes fine-tuning **efficient and GPU-friendly**, while still adapting both vision and language parts of LLaVA.  

In [5]:
model = FastVisionModel.get_peft_model(
    model,
    finetune_vision_layers     = True,
    finetune_language_layers   = True,
    finetune_attention_modules = True,
    finetune_mlp_modules       = True,

    r = 16,
    lora_alpha = 16,
    lora_dropout = 0,
    bias = "none",
    random_state = 3407,
    use_rslora = False,
    loftq_config = None,
)

Unsloth: Making `base_model.model.model.vision_tower.vision_model` require gradients


## Fine-Tuning Setup with Unsloth  

We use **Unsloth + TRL (Hugging Face’s Transformer Reinforcement Learning library)** to fine-tune LLaVA 1.5 on the processed dataset.  

- `FastVisionModel.for_training(model)` → Puts the model in training mode with Unsloth optimizations.  
- `UnslothVisionDataCollator(model, tokenizer)` → Prepares multimodal batches (text + images) for training.  
- `SFTTrainer` → Runs **Supervised Fine-Tuning (SFT)** with LoRA applied.  

### Training Config Highlights  
- `per_device_train_batch_size=2` → Small batch size to fit in GPU memory  
- `gradient_accumulation_steps=4` → Effective batch size = 8  
- `warmup_steps=5` → Gradually increase LR at start  
- `max_steps=50` → Runs only 50 optimization steps
- `learning_rate=2e-4` → Standard LR for LoRA fine-tuning  
- `optim="adamw_8bit"` → Memory-efficient optimizer with bitsandbytes  
- `weight_decay=0.01` → Regularization to prevent overfitting  
- `max_length=1024` → Maximum sequence length for text tokens  
- `remove_unused_columns=False` → Keeps dataset in multimodal format  
- `dataset_kwargs={"skip_prepare_dataset": True}` → Avoids reformatting since dataset is already prepared  

The trainer will fine-tune both **vision and language layers** of LLaVA efficiently using LoRA and Unsloth’s optimizations.  


In [6]:
from unsloth.trainer import UnslothVisionDataCollator
from trl import SFTTrainer, SFTConfig

FastVisionModel.for_training(model)

trainer = SFTTrainer(
    model = model,
    tokenizer = tokenizer,
    data_collator = UnslothVisionDataCollator(model, tokenizer),
    train_dataset = converted_dataset,
    args = SFTConfig(
        per_device_train_batch_size = 2,
        gradient_accumulation_steps = 4,
        warmup_steps = 5,
        max_steps = 50,
        learning_rate = 2e-4,
        logging_steps = 1,
        optim = "adamw_8bit",
        weight_decay = 0.01,
        lr_scheduler_type = "linear",
        seed = 3407,
        output_dir = "outputs",
        report_to = "none",
        remove_unused_columns = False,
        dataset_text_field = "",
        dataset_kwargs = {"skip_prepare_dataset": True},
        max_length = 1024,
    ),
)

## Training the Model  

We start the supervised fine-tuning process with:  

In [7]:
trainer_stats = trainer.train()

==((====))==  Unsloth - 2x faster free finetuning | Num GPUs used = 1
   \\   /|    Num examples = 137,000 | Num Epochs = 1 | Total steps = 50
O^O/ \_/ \    Batch size per device = 4 | Gradient accumulation steps = 4
\        /    Data Parallel GPUs = 1 | Total batch size (4 x 4 x 1) = 16
 "-____-"     Trainable parameters = 47,054,848 of 7,110,481,920 (0.66% trained)
huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
	- Avoid using `tokenizers` before the fork if possible
	- Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)
huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
	- Avoid using `tokenizers` before the fork if possible
	- Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)
huggingface/tokenizers: The current process just got forked, af

Unsloth: Will smartly offload gradients to save VRAM!


Step,Training Loss
1,5.2221
2,5.3141
3,5.2923
4,5.0785
5,4.9638
6,4.5573
7,3.8943
8,3.147
9,2.4477
10,1.9374


## Saving the Fine-Tuned Model  

After training, we save both the model weights and tokenizer for future inference:  

Output directory: /kaggle/working/Finetunning-With-Unsloth/llava-1.5-7b-finetuned-for-cddm/


In [8]:
print("Saving model...")
model.save_pretrained("llava-1.5-7b-finetuned-for-cddm")
tokenizer.save_pretrained("llava-1.5-7b-finetuned-for-cddm")

Saving model...


['llava-1.5-7b-finetuned-for-cddm/processor_config.json']

# Other Datasets

## Crop Recommendation Dataset  

Source: [Kaggle - Crop Recommendation Dataset](https://www.kaggle.com/datasets/atharvaingle/crop-recommendation-dataset)  

This dataset is designed to recommend suitable crops based on **soil nutrients** and **climatic conditions**.  
It was built by augmenting datasets of **rainfall, climate, and fertilizer data** available for India.  

### Data Fields  
- **N** → Nitrogen content ratio in soil  
- **P** → Phosphorus content ratio in soil  
- **K** → Potassium content ratio in soil  
- **temperature** → Temperature in °C  
- **humidity** → Relative humidity (%)  
- **ph** → Soil pH value  
- **rainfall** → Rainfall in mm  

This dataset enables training models for **crop recommendation systems**, helping farmers decide the best crop to cultivate under given environmental and soil conditions.  


## Crop Recommendation Dataset Conversion

We load the **Crop Recommendation CSV** using `pandas`, then transform each row into a **prompt-response format** suitable for fine-tuning.

- **Prompt** includes soil nutrients (N, P, K), temperature, humidity, pH, and rainfall.  
- **Response** specifies the recommended crop.  


In [9]:
import pandas as pd

df = pd.read_csv('/kaggle/working/Finetunning-With-Unsloth/Crop_recommendation.csv')
df.head()

converted_cr_dataset = []
for _, row in df.iterrows():
    prompt = (
        f"N: {row['N']}, P: {row['P']}, K: {row['K']}, temperature: {row['temperature']:.2f}, "
        f"humidity: {row['humidity']:.2f}, pH: {row['ph']:.2f}, rainfall: {row['rainfall']:.2f}"
    )
    response = f"Recommended crop: {row['label']}"
    converted_cr_dataset.append({"prompt": f"{prompt}", "response": f"{response}"})

print(f"Total entries created: {len(converted_cr_dataset)}")
print(json.dumps(converted_cr_dataset[0], indent=2))

Total entries created: 2200
{
  "prompt": "N: 90, P: 42, K: 43, temperature: 20.88, humidity: 82.00, pH: 6.50, rainfall: 202.94",
  "response": "Recommended crop: rice"
}


## Farmer Call Query Dataset

Source: [Kaggle- Farmer Call Query Dataset](https://www.kaggle.com/datasets/daskoushik/farmers-call-query-data-qa)

This was generated using data from data.gov.in, an open data platform by Govt. of India. Data is of Kisan Call Centre where farmers called for some query over phone call, and experts replied with some answers.

Dataset has two columns:
questions: asked by farmers
answers: reply from experts

## Farmer Call Query Dataset Conversion

We process the **farmer_call_query_dataset.csv** file and prepare it for fine-tuning.

### Steps:
1. Load the dataset using **pandas**.  
3. Convert each row into a **prompt-response format**:
   - **Prompt** → Farmer's question  
   - **Response** → Assistant's answer  

In [10]:
df = pd.read_csv("/kaggle/working/Finetunning-With-Unsloth/farmer_call_query_dataset.csv")
mask = df.apply(lambda row: row.astype(str).str.contains("detail", case=False, na=False).any(), axis=1)
df = df[~mask]

converted_fq_dataset = []

for _, row in df.iterrows():
    converted_fq_dataset.append({
        "prompt": str(row["questions"]),
        "response": str(row["answers"])
    })

print(f"Total entries created: {len(converted_fq_dataset)}")
print(json.dumps(converted_fq_dataset[0], indent=2))

Total entries created: 159576
{
  "prompt": "asking about the control measure for aphid infestation in mustard crops",
  "response": "suggested him to spray rogor@2ml/lit.at evening time."
}


## Agricultural Yeild Dataset

A structured dataset of agricultural production across Indian states and districts.  

**Columns:**  
- `State_Name` – Name of the state  
- `District_Name` – Name of the district  
- `Crop_Year` – Year of production  
- `Season` – Agricultural season (e.g., Kharif, Rabi, etc.)  
- `Crop` – Type of crop  
- `Area` – Cultivation area (in hectares)  
- `Production` – Crop production (in tonnes)  

## Agricultural Yeild Dataset Conversion

We process the **AgrcultureDataset.csv** file and prepare it for fine-tuning.

In [11]:
import csv
input_csv = "/kaggle/working/Finetunning-With-Unsloth/AgrcultureDataset.csv"

def safe_float(x):
    try:
        return float(x.strip())
    except:
        return None
        
converted_ay_dataset = []
with open(input_csv, "r") as f:
    reader = csv.DictReader(f)
    for row in reader:
        state = row["State_Name"].strip()
        district = row["District_Name"].strip()
        year = row["Crop_Year"].strip()
        season = row["Season"].strip()
        crop = row["Crop"].strip()
        area = safe_float(row["Area"])
        production = safe_float(row["Production"])

        if area is None or production is None or area <= 0:
            continue

        yield_value = production / area if area > 0 else 0

        prompt = f"What was the yield of {crop} in {district} district of {state} during {season} season of {year}?"
        response = f"The yield of {crop} in {district} ({state}) during {season} {year} was approximately {yield_value:.2f} tons per hectare."

        converted_ay_dataset.append({
            "prompt": prompt,
            "response": response
        })
        
print(f"Total entries created: {len(converted_ay_dataset)}")
print(json.dumps(converted_ay_dataset[0], indent=2))

Total entries created: 242364
{
  "prompt": "What was the yield of Arecanut in NICOBARS district of Andaman and Nicobar Islands during Kharif season of 2000?",
  "response": "The yield of Arecanut in NICOBARS (Andaman and Nicobar Islands) during Kharif 2000 was approximately 1.59 tons per hectare."
}


In [12]:
# merging all the datasets
merged_dataset = converted_cr_dataset + converted_fq_dataset + converted_ay_dataset

# Model Loading with Unsloth for Other Datasets

We load **LLaVA 1.5 (7B)** using Unsloth's `FastModel`. The model is initialized with 4-bit quantization for efficiency, automatic dtype selection, and gradient checkpointing.  

**Key Points**  
- 4-bit quantization → saves GPU memory  
- Auto dtype → picks best precision automatically  
- Gradient checkpointing → reduces memory in training  

In [13]:
from unsloth import FastModel
import torch

model, tokenizer = FastModel.from_pretrained(
    "unsloth/llava-1.5-7b-hf",
    load_in_4bit = True,
    dtype = None, 
    use_gradient_checkpointing = "unsloth",
)

==((====))==  Unsloth 2025.8.6: Fast Llava patching. Transformers: 4.55.2.
   \\   /|    Tesla T4. Num GPUs = 2. Max memory: 14.741 GB. Platform: Linux.
O^O/ \_/ \    Torch: 2.8.0+cu128. CUDA: 7.5. CUDA Toolkit: 12.8. Triton: 3.2.0
\        /    Bfloat16 = FALSE. FA [Xformers = 0.0.32.post2. FA2 = False]
 "-____-"     Free license: http://github.com/unslothai/unsloth
Unsloth: Fast downloading is enabled - ignore downloading bars which are red colored!


## Dataset Formatting for Fine-Tuning  

We format the dataset into **chat-style prompts** compatible with LLaVA training. The function applies a chat template with *user → assistant* turns, removes special tokens, and prepares final training text.  

In [14]:
def formatting_prompts_func(examples):
    prompts = examples["prompt"]
    responses = examples["response"]
    texts = [
        tokenizer.apply_chat_template(
            [{"role": "user", "content": p}, {"role": "assistant", "content": r}],
            tokenize=False,
            add_generation_prompt=False
        ).removeprefix("<bos>")
        for p, r in zip(prompts, responses)
    ]
    return { "text": texts }

from datasets import Dataset

dataset = Dataset.from_list(merged_dataset)
dataset = dataset.map(formatting_prompts_func, batched=True)

Map:   0%|          | 0/404140 [00:00<?, ? examples/s]

## LoRA Configuration  

We enable **parameter-efficient fine-tuning (PEFT)** with LoRA applied on the language, attention, and MLP modules, while keeping the vision encoder frozen.  

**Key Settings**  
- Vision layers: **Frozen**  
- Language, Attention, MLP: **Fine-tuned**  
- LoRA rank = 16, α = 16, dropout = 0  
- Random seed fixed for reproducibility  

In [15]:
model = FastModel.get_peft_model(
    model,
    finetune_vision_layers     = False,
    finetune_language_layers   = True,
    finetune_attention_modules = True,
    finetune_mlp_modules       = True,

    r = 16,
    lora_alpha = 16,
    lora_dropout = 0,
    bias = "none",
    random_state = 3407,
)

Unsloth: Making `model.base_model.model.model.language_model` require gradients


## Fine-tuning with SFTTrainer  

We use **Hugging Face TRL’s `SFTTrainer`** to fine-tune the model on our dataset. The dataset is pre-formatted into chat-style text using the `"text"` field.  

**Training Configuration**  
- Batch size per device: **2**  
- Gradient accumulation: **4** → effective batch size = 8  
- Warmup steps: **5**  
- Max training steps: **64**  
- Learning rate: **2e-4**  
- Optimizer: **AdamW (8-bit)**  
- Weight decay: **0.01**  
- LR Scheduler: **Linear**  
- Logging every step  
- Output directory: `outputs_text`  
- Random seed: **3407** (reproducibility)  

In [16]:
from trl import SFTTrainer, SFTConfig

trainer_text = SFTTrainer(
    model = model,
    tokenizer = tokenizer,
    train_dataset = dataset,
    args = SFTConfig(
        dataset_text_field = "text",
        per_device_train_batch_size = 2,
        gradient_accumulation_steps = 4,
        warmup_steps = 5,
        max_steps = 50,
        learning_rate = 2e-4,
        logging_steps = 1,
        optim = "adamw_8bit",
        weight_decay = 0.01,
        lr_scheduler_type = "linear",
        seed = 3407,
        output_dir = "outputs_text",
        report_to = "none",
    ),
)

Unsloth: Tokenizing ["text"] (num_proc=2):   0%|          | 0/404140 [00:00<?, ? examples/s]

## Training the Model  

Now we start the supervised fine-tuning process:

In [17]:
trainer_text.train()

==((====))==  Unsloth - 2x faster free finetuning | Num GPUs used = 1
   \\   /|    Num examples = 404,140 | Num Epochs = 1 | Total steps = 50
O^O/ \_/ \    Batch size per device = 4 | Gradient accumulation steps = 4
\        /    Data Parallel GPUs = 1 | Total batch size (4 x 4 x 1) = 16
 "-____-"     Trainable parameters = 39,976,960 of 7,103,404,032 (0.56% trained)


Step,Training Loss
1,5.5396
2,5.5396
3,5.348
4,4.8362
5,3.9721
6,2.8074
7,2.3944
8,2.0031
9,1.7338
10,1.51


TrainOutput(global_step=50, training_loss=1.1610584645473865, metrics={'train_runtime': 125.7182, 'train_samples_per_second': 6.363, 'train_steps_per_second': 0.398, 'total_flos': 334630993920000.0, 'train_loss': 1.1610584645473865})

## Saving the Fine-Tuned Model  

After training, we save both the model weights and tokenizer for future inference:  

Output directory: /kaggle/working/Finetunning-With-Unsloth/llava-1.5-7b-finetuned-for-cr_fq_ay/


In [18]:
print("Saving model...")
model.save_pretrained("llava-1.5-7b-finetuned-for-cr_fq_ay")
tokenizer.save_pretrained("llava-1.5-7b-finetuned-for-cr_fq_ay")

Saving model...


['llava-1.5-7b-finetuned-for-cr_fq_ay/processor_config.json']