To run this, press "*Runtime*" and press "*Run all*" on a **free** Tesla T4 Google Colab instance!
<div class="align-center">
<a href="https://unsloth.ai/"><img src="https://github.com/unslothai/unsloth/raw/main/images/unsloth%20new%20logo.png" width="115"></a>
<a href="https://discord.gg/unsloth"><img src="https://github.com/unslothai/unsloth/raw/main/images/Discord button.png" width="145"></a>
<a href="https://docs.unsloth.ai/"><img src="https://github.com/unslothai/unsloth/blob/main/images/documentation%20green%20button.png?raw=true" width="125"></a></a> Join Discord if you need help + ‚≠ê <i>Star us on <a href="https://github.com/unslothai/unsloth">Github</a> </i> ‚≠ê
</div>

To install Unsloth your local device, follow [our guide](https://docs.unsloth.ai/get-started/install-and-update). This notebook is licensed [LGPL-3.0](https://github.com/unslothai/notebooks?tab=LGPL-3.0-1-ov-file#readme).

You will learn how to do [data prep](#Data), how to [train](#Train), how to [run the model](#Inference), & [how to save it](#Save)


### News


Introducing FP8 precision training for faster RL inference. [Read Blog](https://docs.unsloth.ai/new/fp8-reinforcement-learning).

Unsloth's [Docker image](https://hub.docker.com/r/unsloth/unsloth) is here! Start training with no setup & environment issues. [Read our Guide](https://docs.unsloth.ai/new/how-to-train-llms-with-unsloth-and-docker).

[gpt-oss RL](https://docs.unsloth.ai/new/gpt-oss-reinforcement-learning) is now supported with the fastest inference & lowest VRAM. Try our [new notebook](https://colab.research.google.com/github/unslothai/notebooks/blob/main/nb/gpt-oss-(20B)-GRPO.ipynb) which creates kernels!

Introducing [Vision](https://docs.unsloth.ai/new/vision-reinforcement-learning-vlm-rl) and [Standby](https://docs.unsloth.ai/basics/memory-efficient-rl) for RL! Train Qwen, Gemma etc. VLMs with GSPO - even faster with less VRAM.

Visit our docs for all our [model uploads](https://docs.unsloth.ai/get-started/all-our-models) and [notebooks](https://docs.unsloth.ai/get-started/unsloth-notebooks).


### Installation

In [1]:
import kagglehub
path = kagglehub.dataset_download("mamun1113/doctors-handwritten-prescription-bd-dataset")

Using Colab cache for faster access to the 'doctors-handwritten-prescription-bd-dataset' dataset.


In [2]:
%%capture
import os, re
if "COLAB_" not in "".join(os.environ.keys()):
    !pip install unsloth
else:
    # Do this only in Colab notebooks! Otherwise use pip install unsloth
    import torch; v = re.match(r"[0-9]{1,}\.[0-9]{1,}", str(torch.__version__)).group(0)
    xformers = "xformers==" + ("0.0.33.post1" if v=="2.9" else "0.0.32.post2" if v=="2.8" else "0.0.29.post3")
    !pip install --no-deps bitsandbytes accelerate {xformers} peft trl triton cut_cross_entropy unsloth_zoo
    !pip install sentencepiece protobuf "datasets==4.3.0" "huggingface_hub>=0.34.0" hf_transfer
    !pip install --no-deps unsloth
!pip install transformers==4.56.2
!pip install --no-deps trl==0.22.2

### Unsloth

In [3]:
from unsloth import FastVisionModel # FastLanguageModel for LLMs
import torch
from transformers import AutoModelForCausalLM ,AutoProcessor
# 4bit pre quantized models we support for 4x faster downloading + no OOMs.
fourbit_models = [
    "unsloth/Qwen3-VL-8B-Instruct-bnb-4bit", # Qwen 3 vision support
    "unsloth/Qwen3-VL-8B-Thinking-bnb-4bit",
    "unsloth/Qwen3-VL-32B-Instruct-bnb-4bit",
    "unsloth/Qwen3-VL-32B-Thinking-bnb-4bit",
] # More models at https://huggingface.co/unsloth

model_path = "unsloth/PaddleOCR-VL"
model, tokenizer = FastVisionModel.from_pretrained(
    model_path,
    max_seq_length = 1024, # Choose any for long context!
    load_in_4bit = False,     # 4bit uses much less memory
    load_in_8bit = False,    # A bit more accurate, uses 2x memory
    full_finetuning=True, # We support full finetuning now!
    auto_model=AutoModelForCausalLM,
    trust_remote_code = True,
    unsloth_force_compile = True,
)

ü¶• Unsloth: Will patch your computer to enable 2x faster free finetuning.
ü¶• Unsloth Zoo will now patch everything to make training faster!
Are you certain you want to do remote code execution?
==((====))==  Unsloth 2025.12.8: Fast Paddleocr_Vl patching. Transformers: 4.56.2.
   \\   /|    Tesla T4. Num GPUs = 1. Max memory: 14.741 GB. Platform: Linux.
O^O/ \_/ \    Torch: 2.9.0+cu126. CUDA: 7.5. CUDA Toolkit: 12.6. Triton: 3.5.0
\        /    Bfloat16 = FALSE. FA [Xformers = 0.0.33.post1. FA2 = False]
 "-____-"     Free license: http://github.com/unslothai/unsloth
Unsloth: Fast downloading is enabled - ignore downloading bars which are red colored!
Unsloth: Float16 full finetuning uses more memory since we upcast weights to float32.


We now load the processor

In [4]:
processor = AutoProcessor.from_pretrained(model_path, trust_remote_code=True)

We now add LoRA adapters for parameter efficient finetuning - this allows us to only efficiently train 1% of all parameters.

**[NEW]** We also support finetuning ONLY the vision part of the model, or ONLY the language part. Or you can select both! You can also select to finetune the attention or the MLP layers!

In [5]:
model = FastVisionModel.get_peft_model(
    model,
    r = 16,
    lora_alpha = 16,
    lora_dropout = 0,
    bias = "none",
    random_state = 3407,
    use_rslora = False,
    target_modules = [
      "q_proj", "k_proj", "v_proj", "o_proj",


    ]
)

Unsloth: Full finetuning is enabled, so .get_peft_model has no effect


In [6]:
import os
import pandas as pd
from PIL import Image
import io
from datasets import Dataset

# Define your paths
CSV_PATH = os.path.join(path, "Doctor‚Äôs Handwritten Prescription BD dataset/Training/training_labels.csv")
IMAGES_FOLDER = os.path.join(path, "Doctor‚Äôs Handwritten Prescription BD dataset/Training/training_words")

def load_custom_data(csv_path, images_folder):
    df = pd.read_csv(csv_path, header=None)
    formatted_data = []
    print(f"Loading data from {csv_path}...")

    for index, row in df.iterrows():
        img_name = str(row[0])
        ground_truth = str(row[1])
        img_full_path = os.path.join(images_folder, img_name)
        try:
            image = Image.open(img_full_path).convert("RGB")
            formatted_data.append({"image": image, "text": ground_truth})
        except Exception as e:
            print(f"‚ö†Ô∏è Error loading image {img_name}: {e}")
            continue
    return formatted_data

# 1. Load Data
custom_raw_data = load_custom_data(CSV_PATH, IMAGES_FOLDER)

# 2. Convert to list format
instruction = "OCR:"
converted_dataset = []

for sample in custom_raw_data:
    conversation = [
        {
            "role": "user",
            "content": [
                # Text First, then Image
                {"type": "text", "text": instruction},
                {"type": "image", "image": sample["image"]}
            ]
        },
        {
            "role": "assistant",
            "content": [
                {"type": "text", "text": sample["text"]}
            ]
        }
    ]
    converted_dataset.append({ "images": [sample["image"]], "messages": conversation })

# 3. Create Dataset
hf_dataset = Dataset.from_list(converted_dataset)

# 4. === THE CORRECTED TRANSFORM ===
def format_output(batch):
    formatted_batch = {"images": [], "messages": []}

    for i in range(len(batch["images"])):
        # -- Step A: Prepare Master Image --
        raw_imgs = batch["images"][i]
        valid_images = []
        for img in raw_imgs:
            if isinstance(img, dict) and "bytes" in img:
                valid_images.append(Image.open(io.BytesIO(img["bytes"])).convert("RGB"))
            else:
                valid_images.append(img)

        formatted_batch["images"].append(valid_images)
        master_image = valid_images[0]

        # -- Step B: Fix Messages --
        raw_msgs = batch["messages"][i]
        clean_msgs = []

        for msg in raw_msgs:
            new_content = []
            for item in msg["content"]:
                # FIX: Check if image is present AND NOT NONE
                # The library inserts "image": None for text items, so we must check values.
                if "image" in item and item["image"] is not None:
                    new_content.append({"type": "image", "image": master_image})
                else:
                    # It's a text item. Clean up the None keys.
                    clean_item = {k: v for k, v in item.items() if v is not None}
                    new_content.append(clean_item)

            clean_msgs.append({"role": msg["role"], "content": new_content})

        formatted_batch["messages"].append(clean_msgs)

    return formatted_batch

# Apply the transform
hf_dataset.set_transform(format_output)

# 5. Split
dataset_split = hf_dataset.train_test_split(test_size=0.1)
train_dataset = dataset_split["train"]
eval_dataset = dataset_split["test"]

# === VERIFY OUTPUT ===
print("\nSample Output Structure:")
sample = train_dataset[0]

# Standard print allows you to see the true structure
print(sample)

Loading data from /kaggle/input/doctors-handwritten-prescription-bd-dataset/Doctor‚Äôs Handwritten Prescription BD dataset/Training/training_labels.csv...
‚ö†Ô∏è Error loading image IMAGE: [Errno 2] No such file or directory: '/kaggle/input/doctors-handwritten-prescription-bd-dataset/Doctor‚Äôs Handwritten Prescription BD dataset/Training/training_words/IMAGE'

Sample Output Structure:
{'images': [<PIL.PngImagePlugin.PngImageFile image mode=RGB size=116x46 at 0x7A66B6C8BE00>], 'messages': [{'role': 'user', 'content': [{'text': 'OCR:', 'type': 'text'}, {'type': 'image', 'image': <PIL.PngImagePlugin.PngImageFile image mode=RGB size=116x46 at 0x7A66B6C8BE00>}]}, {'role': 'assistant', 'content': [{'text': 'Opton', 'type': 'text'}]}]}


<a name="Data"></a>
### Data Prep
We'll be using a sampled dataset of handwritten maths formulas. The goal is to convert these images into a computer readable form - ie in LaTeX form, so we can render it. This can be very useful for complex formulas.

You can access the dataset [here](https://huggingface.co/datasets/unsloth/LaTeX_OCR). The full dataset is [here](https://huggingface.co/datasets/linxy/LaTeX_OCR).

We can also render the LaTeX in the browser directly!

To format the dataset, all vision finetuning tasks should be formatted as follows:

```python
[
{ "role": "user",
  "content": [{"type": "text",  "text": Q}, {"type": "image", "image": image} ]
},
{ "role": "assistant",
  "content": [{"type": "text",  "text": A} ]
},
]
```

Let's convert the dataset into the "correct" format for finetuning:

In [7]:
import os
import pandas as pd
from PIL import Image
import io
from datasets import Dataset

# Define your paths
CSV_PATH = os.path.join(path, "Doctor‚Äôs Handwritten Prescription BD dataset/Training/training_labels.csv")      # Update this path
IMAGES_FOLDER = os.path.join(path, "Doctor‚Äôs Handwritten Prescription BD dataset/Training/training_words")



def load_custom_data(csv_path, images_folder):
    df = pd.read_csv(csv_path, header=None)
    formatted_data = []
    print(f"Loading data from {csv_path}...")

    for index, row in df.iterrows():
        img_name = str(row[0])
        ground_truth = str(row[1])
        img_full_path = os.path.join(images_folder, img_name)
        try:
            image = Image.open(img_full_path).convert("RGB")
            formatted_data.append({"image": image, "text": ground_truth})
        except Exception as e:
            print(f"‚ö†Ô∏è Error loading image {img_name}: {e}")
            continue
    return formatted_data

# 1. Load Data
custom_raw_data = load_custom_data(CSV_PATH, IMAGES_FOLDER)

# 2. Convert to list format
instruction = "Read the handwritten medical term."
converted_dataset = []

for sample in custom_raw_data:
    # We create the structure with specific keys here
    # Arrow will technically fill missing ones with nulls internally,
    # but our transform below will remove them.
    conversation = [
        {
            "role": "user",
            "content": [
                {"type": "text", "text": instruction},
                {"type": "image", "image": sample["image"]}
            ]
        },
        {
            "role": "assistant",
            "content": [
                {"type": "text", "text": sample["text"]}
            ]
        }
    ]
    converted_dataset.append({ "images": [sample["image"]], "messages": conversation })

# 3. Create Dataset
hf_dataset = Dataset.from_list(converted_dataset)

# 4. === THE CLEANING TRANSFORM ===
# This function does two things:
# a) Converts bytes back to PIL Images
# b) Removes any key that is None (cleaning the schema)
def format_output(batch):
    formatted_batch = {"images": [], "messages": []}

    for i in range(len(batch["images"])):
        # -- Handle Top-Level Images --
        raw_imgs = batch["images"][i]
        processed_imgs = []
        for img in raw_imgs:
            if isinstance(img, dict) and "bytes" in img:
                processed_imgs.append(Image.open(io.BytesIO(img["bytes"])).convert("RGB"))
            else:
                processed_imgs.append(img)
        formatted_batch["images"].append(processed_imgs)

        # -- Handle Messages (The Clean-up) --
        raw_msgs = batch["messages"][i]
        clean_msgs = []
        for msg in raw_msgs:
            clean_content = []
            for item in msg["content"]:
                # Create a new dictionary excluding any None values
                clean_item = {}
                for k, v in item.items():
                    if v is not None:
                        # Decode image if present
                        if k == "image" and isinstance(v, dict) and "bytes" in v:
                            clean_item[k] = Image.open(io.BytesIO(v["bytes"])).convert("RGB")
                        else:
                            clean_item[k] = v
                clean_content.append(clean_item)
            clean_msgs.append({"role": msg["role"], "content": clean_content})

        formatted_batch["messages"].append(clean_msgs)

    return formatted_batch

# Apply the cleaning transform
hf_dataset.set_transform(format_output)

# 5. Split
dataset_split = hf_dataset.train_test_split(test_size=0.1)
train_dataset = dataset_split["train"]
eval_dataset = dataset_split["test"]

print("Data Prep Complete!")

# === VERIFY OUTPUT ===
print("\nSample Output Structure:")
sample = train_dataset[0]

# This print ensures the structure is exactly what you asked for
import pprint
pprint.pprint(sample)

Loading data from /kaggle/input/doctors-handwritten-prescription-bd-dataset/Doctor‚Äôs Handwritten Prescription BD dataset/Training/training_labels.csv...
‚ö†Ô∏è Error loading image IMAGE: [Errno 2] No such file or directory: '/kaggle/input/doctors-handwritten-prescription-bd-dataset/Doctor‚Äôs Handwritten Prescription BD dataset/Training/training_words/IMAGE'
Data Prep Complete!

Sample Output Structure:
{'images': [<PIL.PngImagePlugin.PngImageFile image mode=RGB size=89x38 at 0x7A66B7C69B80>],
 'messages': [{'content': [{'text': 'Read the handwritten medical term.',
                            'type': 'text'},
                           {'image': <PIL.Image.Image image mode=RGB size=89x38 at 0x7A66B7C6A660>,
                            'type': 'image'}],
               'role': 'user'},
              {'content': [{'text': 'Fexofast', 'type': 'text'}],
               'role': 'assistant'}]}


We look at how the conversations are structured for the first example:

In [8]:
train_dataset[50]

{'images': [<PIL.PngImagePlugin.PngImageFile image mode=RGB size=150x38>],
 'messages': [{'role': 'user',
   'content': [{'text': 'Read the handwritten medical term.', 'type': 'text'},
    {'image': <PIL.Image.Image image mode=RGB size=150x38>, 'type': 'image'}]},
  {'role': 'assistant', 'content': [{'text': 'Metro', 'type': 'text'}]}]}

Let's first see before we do any finetuning what the model outputs for the first example!

In [9]:
from transformers import TextStreamer
def run_inference(idx):
    # 1. Setup Data
    # CHANGED: "image" -> "images"[0]
    image = train_dataset[idx]["images"][0]

    ground_truth = train_dataset[idx]["messages"][1]["content"][0]["text"]
    instruction = "OCR:"

    messages = [
        {"role": "user", "content": [
            {"type": "image"},
            {"type": "text", "text": instruction}
        ]}
    ]

    # 2. Prepare Inputs
    text_prompt = processor.tokenizer.apply_chat_template(
        messages, tokenize=False, add_generation_prompt=True
    )
    inputs = processor(
        image, text_prompt, add_special_tokens=False, return_tensors="pt"
    ).to("cuda")

    # 3. Generate
    print(f"\n--- Index {idx} ---")
    print(f"Ground Truth: {ground_truth}")
    print("Prediction: ", end="")

    streamer = TextStreamer(processor.tokenizer, skip_prompt=True)
    _ = model.generate(
        **inputs,
        streamer=streamer,
        max_new_tokens=128,
        use_cache=False,
        temperature=1.5,
        min_p=0.1
    )

# Try running it now
run_inference(0)


--- Index 0 ---
Ground Truth: Fexofast
Prediction: fexoFast</s>


In [10]:
!pip install jiwer



In [11]:
import random
import torch
from jiwer import cer
from tqdm import tqdm

# 1. Setup Selection
num_samples = 10
if len(train_dataset) < num_samples:
    num_samples = len(train_dataset)

indices = random.sample(range(len(train_dataset)), num_samples)

predictions = []
references = []

print(f"Starting inference on {num_samples} random samples...")

# 2. Loop through samples
for idx in tqdm(indices):
    image = train_dataset[idx]["images"][0]
    ground_truth = train_dataset[idx]["messages"][1]["content"][0]["text"]
    instruction = "OCR:"

    messages = [
        {"role": "user", "content": [
            {"type": "image"},
            {"type": "text", "text": instruction}
        ]}
    ]

    text_prompt = processor.tokenizer.apply_chat_template(
        messages, tokenize=False, add_generation_prompt=True
    )

    inputs = processor(
        image, text_prompt, add_special_tokens=False, return_tensors="pt"
    ).to("cuda")

    # Generate
    with torch.inference_mode():
        output_ids = model.generate(
            **inputs,
            max_new_tokens=128,
            use_cache=False,  # <--- CHANGED THIS TO FALSE
            temperature=1.5,
            min_p=0.1
        )

    # Decode Output
    input_length = inputs.input_ids.shape[1]
    generated_ids = output_ids[:, input_length:]
    pred_text = processor.decode(generated_ids[0], skip_special_tokens=True)

    predictions.append(pred_text)
    references.append(ground_truth)

# 3. Calculate CER
score = cer(references, predictions)
print(f"\nResults over {num_samples} samples:")
print(f"Average CER: {score:.4f} (Lower is better)")
print(f"Percentage Accuracy: {(1-score)*100:.2f}%")

Starting inference on 10 random samples...


 20%|‚ñà‚ñà        | 2/10 [00:01<00:04,  1.89it/s]

smart_resize: height=26 < factor=28, reset height=factor


100%|‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà| 10/10 [00:05<00:00,  1.72it/s]


Results over 10 samples:
Average CER: 0.3099 (Lower is better)
Percentage Accuracy: 69.01%





<a name="Train"></a>
### Train the model
Now let's train our model. We do 60 steps to speed things up, but you can set `num_train_epochs=1` for a full run, and turn off `max_steps=None`. We also support TRL's `DPOTrainer`!

We use our new `UnslothVisionDataCollator` which will help in our vision finetuning setup.

In [12]:
from trl import SFTTrainer, SFTConfig
from unsloth.trainer import UnslothVisionDataCollator

FastVisionModel.for_training(model) # Enable for training!

custom_collator = UnslothVisionDataCollator(
    model=model,
    processor=processor,
    ignore_index=-100,
    max_seq_length=1024,
    train_on_responses_only=True,
    instruction_part = "User: ",
    response_part = "\nAssistant:",
    pad_to_multiple_of = 8,
)

trainer = SFTTrainer(
    model = model,
    tokenizer = processor.tokenizer,
    data_collator = custom_collator,
    train_dataset = train_dataset,
    args = SFTConfig(
        per_device_train_batch_size = 4,
        gradient_accumulation_steps = 2, # Use GA to mimic batch size!
        warmup_steps = 5,
        max_steps = 300,
        # num_train_epochs = 1, # Set this instead of max_steps for full training runs
        learning_rate = 5e-6,
        logging_steps = 1,
        optim = "adamw_8bit",
        weight_decay = 0.001,
        lr_scheduler_type = "linear",
        seed = 3407,
        output_dir = "outputs",
        report_to = "none",

        # You MUST put the below items for vision finetuning:
        remove_unused_columns = False,
        dataset_text_field = "",
        dataset_kwargs = {"skip_prepare_dataset": True},
        max_length = 1024,
        fp16 = not torch.cuda.is_bf16_supported(),
        bf16 = torch.cuda.is_bf16_supported(),
    ),
)


In [13]:
# @title Show current memory stats
gpu_stats = torch.cuda.get_device_properties(0)
start_gpu_memory = round(torch.cuda.max_memory_reserved() / 1024 / 1024 / 1024, 3)
max_memory = round(gpu_stats.total_memory / 1024 / 1024 / 1024, 3)
print(f"GPU = {gpu_stats.name}. Max memory = {max_memory} GB.")
print(f"{start_gpu_memory} GB of memory reserved.")

GPU = Tesla T4. Max memory = 14.741 GB.
5.266 GB of memory reserved.


In [14]:
trainer_stats = trainer.train()


==((====))==  Unsloth - 2x faster free finetuning | Num GPUs used = 1
   \\   /|    Num examples = 2,808 | Num Epochs = 1 | Total steps = 300
O^O/ \_/ \    Batch size per device = 4 | Gradient accumulation steps = 2
\        /    Data Parallel GPUs = 1 | Total batch size (4 x 2 x 1) = 8
 "-____-"     Trainable parameters = 958,588,736 of 958,588,736 (100.00% trained)


smart_resize: height=19 < factor=28, reset height=factor
smart_resize: height=25 < factor=28, reset height=factor
Unsloth: Will smartly offload gradients to save VRAM!


Step,Training Loss
1,18.4284
2,15.5422
3,17.5844
4,18.4389
5,17.0597
6,15.2965
7,16.3724
8,16.5777
9,12.9384
10,14.058


smart_resize: height=23 < factor=28, reset height=factor
smart_resize: height=18 < factor=28, reset height=factor
smart_resize: height=18 < factor=28, reset height=factor
smart_resize: height=14 < factor=28, reset height=factor
smart_resize: height=24 < factor=28, reset height=factor
smart_resize: height=18 < factor=28, reset height=factor
smart_resize: height=17 < factor=28, reset height=factor
smart_resize: height=19 < factor=28, reset height=factor
smart_resize: height=24 < factor=28, reset height=factor
smart_resize: height=27 < factor=28, reset height=factor
smart_resize: height=26 < factor=28, reset height=factor
smart_resize: height=17 < factor=28, reset height=factor
smart_resize: height=26 < factor=28, reset height=factor
smart_resize: height=16 < factor=28, reset height=factor
smart_resize: height=21 < factor=28, reset height=factor
smart_resize: height=16 < factor=28, reset height=factor
smart_resize: height=18 < factor=28, reset height=factor
smart_resize: height=24 < facto

In [15]:
# @title Show final memory and time stats
used_memory = round(torch.cuda.max_memory_reserved() / 1024 / 1024 / 1024, 3)
used_memory_for_lora = round(used_memory - start_gpu_memory, 3)
used_percentage = round(used_memory / max_memory * 100, 3)
lora_percentage = round(used_memory_for_lora / max_memory * 100, 3)
print(f"{trainer_stats.metrics['train_runtime']} seconds used for training.")
print(
    f"{round(trainer_stats.metrics['train_runtime']/60, 2)} minutes used for training."
)
print(f"Peak reserved memory = {used_memory} GB.")
print(f"Peak reserved memory for training = {used_memory_for_lora} GB.")
print(f"Peak reserved memory % of max memory = {used_percentage} %.")
print(f"Peak reserved memory for training % of max memory = {lora_percentage} %.")

2012.9737 seconds used for training.
33.55 minutes used for training.
Peak reserved memory = 12.635 GB.
Peak reserved memory for training = 7.369 GB.
Peak reserved memory % of max memory = 85.713 %.
Peak reserved memory for training % of max memory = 49.99 %.


<a name="Inference"></a>
### Inference
Let's run the model! You can change the instruction and input - leave the output blank!

We use `min_p = 0.1` and `temperature = 1.5`. Read this [Tweet](https://x.com/menhguin/status/1826132708508213629) for more information on why.

In [16]:
import random
import torch
from jiwer import cer
from tqdm import tqdm

# 1. Setup Selection
num_samples = 10
if len(train_dataset) < num_samples:
    num_samples = len(train_dataset)

indices = random.sample(range(len(train_dataset)), num_samples)

predictions = []
references = []

print(f"Starting inference on {num_samples} random samples...")

# 2. Loop through samples
for idx in tqdm(indices):
    image = train_dataset[idx]["images"][0]
    ground_truth = train_dataset[idx]["messages"][1]["content"][0]["text"]
    instruction = "OCR:"

    messages = [
        {"role": "user", "content": [
            {"type": "image"},
            {"type": "text", "text": instruction}
        ]}
    ]

    text_prompt = processor.tokenizer.apply_chat_template(
        messages, tokenize=False, add_generation_prompt=True
    )

    inputs = processor(
        image, text_prompt, add_special_tokens=False, return_tensors="pt"
    ).to("cuda")

    # Generate
    with torch.inference_mode():
        output_ids = model.generate(
            **inputs,
            max_new_tokens=128,
            use_cache=False,  # <--- CHANGED THIS TO FALSE
            temperature=1.5,
            min_p=0.1
        )

    # Decode Output
    input_length = inputs.input_ids.shape[1]
    generated_ids = output_ids[:, input_length:]
    pred_text = processor.decode(generated_ids[0], skip_special_tokens=True)

    predictions.append(pred_text)
    references.append(ground_truth)

# 3. Calculate CER
score = cer(references, predictions)
print(f"\nResults over {num_samples} samples:")
print(f"Average CER: {score:.4f} (Lower is better)")
print(f"Percentage Accuracy: {(1-score)*100:.2f}%")

Starting inference on 10 random samples...


100%|‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà| 10/10 [00:04<00:00,  2.21it/s]


Results over 10 samples:
Average CER: 0.1017 (Lower is better)
Percentage Accuracy: 89.83%





<a name="Save"></a>
### Saving, loading finetuned models
To save the final model as LoRA adapters, either use Huggingface's `push_to_hub` for an online save or `save_pretrained` for a local save.

**[NOTE]** This ONLY saves the LoRA adapters, and not the full model. To save to 16bit or GGUF, scroll down!

In [20]:
model.save_pretrained("lora_model_v2")  # Local saving
tokenizer.save_pretrained("lora_model_v2")
# model.push_to_hub("your_name/lora_model", token = "...") # Online saving
# tokenizer.push_to_hub("your_name/lora_model", token = "...") # Online saving

('lora_model_v2/tokenizer_config.json',
 'lora_model_v2/special_tokens_map.json',
 'lora_model_v2/chat_template.jinja',
 'lora_model_v2/tokenizer.model',
 'lora_model_v2/added_tokens.json',
 'lora_model_v2/tokenizer.json')

Now if you want to load the LoRA adapters we just saved for inference, set `False` to `True`:

In [18]:
if False:
    from unsloth import FastVisionModel
    model, tokenizer = FastVisionModel.from_pretrained(
        model_name = "lora_model", # YOUR MODEL YOU USED FOR TRAINING
        load_in_4bit = False, # Set to False for 16bit LoRA
    )
    FastVisionModel.for_inference(model) # Enable for inference!


from transformers import TextStreamer
text_streamer = TextStreamer(tokenizer, skip_prompt = True)
_ = model.generate(**inputs, streamer = text_streamer, max_new_tokens=128,
                   use_cache=False, temperature=1.5, min_p=0.1)

Delfu</s>


### Saving to float16 for VLLM

We also support saving to `float16` directly. Select `merged_16bit` for float16. Use `push_to_hub_merged` to upload to your Hugging Face account! You can go to https://huggingface.co/settings/tokens for your personal tokens.

In [19]:
# Select ONLY 1 to save! (Both not needed!)

# Save locally to 16bit
if False: model.save_pretrained_merged("unsloth_finetune", tokenizer,)

# To export and save to your Hugging Face account
if False: model.push_to_hub_merged("YOUR_USERNAME/unsloth_finetune", tokenizer, token = "PUT_HERE")

And we're done! If you have any questions on Unsloth, we have a [Discord](https://discord.gg/unsloth) channel! If you find any bugs or want to keep updated with the latest LLM stuff, or need help, join projects etc, feel free to join our Discord!

Some other links:
1. Train your own reasoning model - Llama GRPO notebook [Free Colab](https://colab.research.google.com/github/unslothai/notebooks/blob/main/nb/Llama3.1_(8B)-GRPO.ipynb)
2. Saving finetunes to Ollama. [Free notebook](https://colab.research.google.com/github/unslothai/notebooks/blob/main/nb/Llama3_(8B)-Ollama.ipynb)
3. Llama 3.2 Vision finetuning - Radiography use case. [Free Colab](https://colab.research.google.com/github/unslothai/notebooks/blob/main/nb/Llama3.2_(11B)-Vision.ipynb)
6. See notebooks for DPO, ORPO, Continued pretraining, conversational finetuning and more on our [documentation](https://docs.unsloth.ai/get-started/unsloth-notebooks)!

<div class="align-center">
  <a href="https://unsloth.ai"><img src="https://github.com/unslothai/unsloth/raw/main/images/unsloth%20new%20logo.png" width="115"></a>
  <a href="https://discord.gg/unsloth"><img src="https://github.com/unslothai/unsloth/raw/main/images/Discord.png" width="145"></a>
  <a href="https://docs.unsloth.ai/"><img src="https://github.com/unslothai/unsloth/blob/main/images/documentation%20green%20button.png?raw=true" width="125"></a>

  Join Discord if you need help + ‚≠êÔ∏è <i>Star us on <a href="https://github.com/unslothai/unsloth">Github</a> </i> ‚≠êÔ∏è

  This notebook and all Unsloth notebooks are licensed [LGPL-3.0](https://github.com/unslothai/notebooks?tab=LGPL-3.0-1-ov-file#readme)
</div>
