# Parsight Meter Reading ‚Äî Fine-Tune Qwen2-VL-2B

**This notebook fine-tunes Qwen2-VL-2B-Instruct on your 75 gas meter images.**

### Key design decisions:
- **NO bitsandbytes** ‚Äî avoids the `triton.ops` error on Colab T4
- **float16 instead of 4-bit** ‚Äî Qwen2-VL-2B is only ~4GB, fits on T4 (16GB) in float16
- **LoRA** ‚Äî only trains 1% of parameters (~20M out of 2B)
- **Gradient checkpointing** ‚Äî saves ~40% VRAM by recomputing activations

### Cost after training:
| Method | Cost/image |
|--------|----------|
| GPT-4o (current) | $0.10 |
| This fine-tuned model | ~$0.001 |

---
## Cell 1: Check GPU

**WHY:** Confirms a GPU is available and checks VRAM to set batch size.
- T4 (16GB free tier): batch_size=1
- A100 (40GB Pro): batch_size=2

In [None]:
import torch

if not torch.cuda.is_available():
    raise RuntimeError(
        "No GPU! Go to Runtime ‚Üí Change runtime type ‚Üí GPU (T4 or A100)"
    )

gpu_name = torch.cuda.get_device_name(0)
gpu_mem = torch.cuda.get_device_properties(0).total_memory / 1e9

print(f"GPU: {gpu_name}")
print(f"VRAM: {gpu_mem:.1f} GB")

if gpu_mem >= 35:
    BATCH_SIZE = 2
    GRAD_ACCUM = 4
elif gpu_mem >= 20:
    BATCH_SIZE = 1
    GRAD_ACCUM = 8
else:
    BATCH_SIZE = 1
    GRAD_ACCUM = 8

print(f"Batch: {BATCH_SIZE}, Grad accum: {GRAD_ACCUM}, Effective batch: {BATCH_SIZE * GRAD_ACCUM}")
print("‚úÖ GPU ready!")

GPU: Tesla T4
VRAM: 15.6 GB
Batch: 1, Grad accum: 8, Effective batch: 8
‚úÖ GPU ready!


---
## Cell 2: Install Dependencies

**WHY each package:**
- `transformers` ‚Äî Has the Qwen2-VL model code
- `peft` ‚Äî LoRA fine-tuning (train 1% of weights)
- `accelerate` ‚Äî GPU memory management
- `qwen-vl-utils` ‚Äî Qwen's image processing
- `datasets` ‚Äî Data loading

**NOTE: We do NOT install bitsandbytes.** The 2B model fits in float16 on T4 (~4GB model + ~10GB training = 14GB < 16GB). This avoids the `triton.ops` error entirely.

In [None]:
# NO bitsandbytes - avoids triton.ops error on Colab T4
!pip install -q \
    transformers==4.46.3 \
    peft==0.13.2 \
    accelerate==1.1.1 \
    "qwen-vl-utils==0.0.8" \
    datasets==3.1.0

print("\n‚úÖ All packages installed (no bitsandbytes needed!)")

[2K     [90m‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ[0m [32m44.1/44.1 kB[0m [31m3.1 MB/s[0m eta [36m0:00:00[0m
[2K   [90m‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ[0m [32m10.0/10.0 MB[0m [31m87.0 MB/s[0m eta [36m0:00:00[0m
[2K   [90m‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ[0m [32m320.7/320.7 kB[0m [31m26.9 MB/s[0m eta [36m0:00:00[0m
[2K   [90m‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ[0m [32m333.2/333.2 kB[0m [31m27.4 MB/s[0m eta [36m0:00:00[0m
[2K   [90m‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ[0m [32m480.6/480.6 kB[0m [31m24.5 MB/s[0m eta [36m0:00:00[0

---
## Cell 3: Upload & Extract Dataset

**WHY:** Uploads your `parsight_complete_dataset_75.zip` to Colab's temporary disk and extracts the images + JSONL files.

In [None]:
import os
import zipfile
from google.colab import files

print("Upload your parsight_complete_dataset_75.zip:")
uploaded = files.upload()

zip_name = list(uploaded.keys())[0]
print(f"\nUploaded: {zip_name} ({len(uploaded[zip_name]) / 1e6:.1f} MB)")

DATASET_DIR = "/content/dataset"
with zipfile.ZipFile(zip_name, 'r') as z:
    z.extractall(DATASET_DIR)

# Find the actual data directory (may be nested)
for root, dirs, fnames in os.walk(DATASET_DIR):
    if 'train.jsonl' in fnames:
        DATASET_DIR = root
        break

print(f"Dataset dir: {DATASET_DIR}")
print(f"Images: {len(os.listdir(os.path.join(DATASET_DIR, 'images')))}")

with open(os.path.join(DATASET_DIR, 'train.jsonl')) as f:
    train_count = sum(1 for _ in f)
with open(os.path.join(DATASET_DIR, 'val.jsonl')) as f:
    val_count = sum(1 for _ in f)

print(f"Train: {train_count} | Val: {val_count}")
print("‚úÖ Dataset ready!")

Upload your parsight_complete_dataset_75.zip:


Saving complete_dataset.zip to complete_dataset.zip

Uploaded: complete_dataset.zip (2.2 MB)
Dataset dir: /content/dataset/complete_dataset
Images: 75
Train: 67 | Val: 8
‚úÖ Dataset ready!


---
## Cell 4: Load Base Model (float16, NO quantization)

**WHY float16 instead of 4-bit:**
- Qwen2-VL-2B in float16 = ~4GB VRAM
- T4 has 16GB VRAM ‚Üí plenty of room for training overhead
- No need for bitsandbytes/triton (which causes errors on Colab)
- float16 is actually BETTER quality than 4-bit (no quantization loss)

**WHY `device_map="auto"`:** Automatically places model on GPU.

**WHY min/max_pixels:** Controls image resolution sent to the model.
- Too small ‚Üí can't read meter digits
- Too large ‚Üí VRAM explodes
- 256-512px is the sweet spot for meter photos

In [None]:
from transformers import Qwen2VLForConditionalGeneration, AutoProcessor
import torch

MODEL_ID = "Qwen/Qwen2-VL-2B-Instruct"

print(f"Loading {MODEL_ID} in float16 (no quantization needed for 2B model)...")

# Load model in float16 ‚Äî NO bitsandbytes, NO 4-bit
# WHY torch_dtype=float16: Half precision, 2x smaller than float32
# WHY attn_implementation="eager": Avoids flash_attention issues on T4
model = Qwen2VLForConditionalGeneration.from_pretrained(
    MODEL_ID,
    torch_dtype=torch.float16,
    device_map="auto",
    attn_implementation="eager",  # T4 doesn't support flash_attention_2
)

# Load processor (handles image resizing + text tokenization)
processor = AutoProcessor.from_pretrained(
    MODEL_ID,
    min_pixels=256 * 28 * 28,   # ~200K pixels min
    max_pixels=512 * 28 * 28,   # ~400K pixels max
)

mem_used = torch.cuda.memory_allocated() / 1e9
mem_total = torch.cuda.get_device_properties(0).total_memory / 1e9
print(f"\nGPU memory: {mem_used:.1f} / {mem_total:.1f} GB used")
print(f"Free for training: {mem_total - mem_used:.1f} GB")
print("‚úÖ Base model loaded!")

Loading Qwen/Qwen2-VL-2B-Instruct in float16 (no quantization needed for 2B model)...


Loading checkpoint shards:   0%|          | 0/2 [00:00<?, ?it/s]


GPU memory: 8.8 / 15.6 GB used
Free for training: 6.8 GB
‚úÖ Base model loaded!


---
## Cell 5: Configure LoRA

**WHY LoRA:**
- 2B params total, but we only train ~20M (1%)
- Prevents overfitting on just 75 images
- Much faster than full fine-tuning

**WHY these settings:**
- `r=16` ‚Äî Adapter rank. Higher=more capacity but more overfitting
- `lora_alpha=32` ‚Äî Scaling factor = 2√ór (standard)
- `lora_dropout=0.1` ‚Äî 10% dropout, critical with only 75 images
- `target_modules` ‚Äî Attention layers where model decides "what to look at"

**WHY `gradient_checkpointing`:**
- Saves ~40% VRAM by recomputing activations during backward pass
- Makes training ~20% slower but prevents OOM on T4

In [None]:
from peft import LoraConfig, get_peft_model

# Enable gradient checkpointing BEFORE applying LoRA
# WHY: Saves ~40% VRAM by recomputing activations instead of storing them
model.gradient_checkpointing_enable(
    gradient_checkpointing_kwargs={"use_reentrant": False}
)

# Enable input gradients (required for gradient checkpointing)
model.enable_input_require_grads()

lora_config = LoraConfig(
    r=16,
    lora_alpha=32,
    lora_dropout=0.1,
    bias="none",
    task_type="CAUSAL_LM",
    target_modules=["q_proj", "k_proj", "v_proj", "o_proj"],
)

model = get_peft_model(model, lora_config)

trainable, total = model.get_nb_trainable_parameters()
print(f"Total params:     {total:>12,}")
print(f"Trainable params: {trainable:>12,}")
print(f"Trainable %:      {trainable/total*100:.2f}%")
print(f"\n‚úÖ LoRA ready! Training {trainable/1e6:.1f}M of {total/1e6:.0f}M params.")

Total params:     2,213,343,744
Trainable params:    4,358,144
Trainable %:      0.20%

‚úÖ LoRA ready! Training 4.4M of 2213M params.


---
## Cell 6: Prepare Training Data

**WHY this custom Dataset class:**
- Loads each JSONL entry + its image from disk
- Formats into Qwen2-VL's chat template (`<|im_start|>user...`)
- Tokenizes text + processes image together
- Masks the user prompt in labels (set to -100) so the model only learns to predict the assistant's JSON response

**WHY mask labels:**
The model should learn to OUTPUT the JSON reading, not to parrot the input question. We set user tokens to -100 (PyTorch's "ignore" index) so loss is only computed on the assistant's answer.

In [None]:
import json
from PIL import Image
from torch.utils.data import Dataset


class MeterReadingDataset(Dataset):
    def __init__(self, jsonl_path, dataset_dir, processor):
        self.dataset_dir = dataset_dir
        self.processor = processor
        self.examples = []
        with open(jsonl_path, 'r') as f:
            for line in f:
                self.examples.append(json.loads(line.strip()))
        print(f"Loaded {len(self.examples)} examples from {os.path.basename(jsonl_path)}")

    def __len__(self):
        return len(self.examples)

    def __getitem__(self, idx):
        example = self.examples[idx]

        # Load image
        image_path = os.path.join(self.dataset_dir, example['image'])
        image = Image.open(image_path).convert('RGB')

        user_text = example['conversations'][0]['content']
        assistant_text = example['conversations'][1]['content']

        # Format as Qwen2-VL messages
        messages = [
            {
                "role": "user",
                "content": [
                    {"type": "image", "image": image},
                    {"type": "text", "text": user_text},
                ],
            },
            {
                "role": "assistant",
                "content": [
                    {"type": "text", "text": assistant_text},
                ],
            },
        ]

        # Apply Qwen2-VL chat template
        text = self.processor.apply_chat_template(
            messages, tokenize=False, add_generation_prompt=False
        )

        # Tokenize text + process image
        inputs = self.processor(
            text=[text],
            images=[image],
            padding=True,
            return_tensors="pt",
        )

        # Remove batch dim (DataLoader adds it back)
        inputs = {k: v.squeeze(0) for k, v in inputs.items()}

        # Create labels: mask user prompt with -100, keep assistant response
        input_ids = inputs['input_ids']
        labels = input_ids.clone()

        # Find where assistant response starts
        assistant_marker = self.processor.tokenizer.encode(
            '<|im_start|>assistant\n', add_special_tokens=False
        )
        marker_len = len(assistant_marker)

        ids_list = input_ids.tolist()
        assistant_start = -1
        for i in range(len(ids_list) - marker_len + 1):
            if ids_list[i:i + marker_len] == assistant_marker:
                assistant_start = i + marker_len
                break

        if assistant_start > 0:
            labels[:assistant_start] = -100  # Mask user prompt

        inputs['labels'] = labels
        return inputs


# Create datasets
train_dataset = MeterReadingDataset(
    os.path.join(DATASET_DIR, 'train.jsonl'), DATASET_DIR, processor
)
val_dataset = MeterReadingDataset(
    os.path.join(DATASET_DIR, 'val.jsonl'), DATASET_DIR, processor
)

# Verify
sample = train_dataset[0]
print(f"\nSample check:")
print(f"  input_ids: {sample['input_ids'].shape}")
print(f"  pixel_values: {sample['pixel_values'].shape}")
print(f"  Masked tokens: {(sample['labels'] == -100).sum().item()}")
print(f"  Target tokens: {(sample['labels'] != -100).sum().item()}")
print("‚úÖ Datasets ready!")

Loaded 67 examples from train.jsonl
Loaded 8 examples from val.jsonl

Sample check:
  input_ids: torch.Size([434])
  pixel_values: torch.Size([1064, 1176])
  Masked tokens: 329
  Target tokens: 105
‚úÖ Datasets ready!


---
## Cell 7: Data Collator

**WHY custom collator:**
Different images produce different token sequence lengths. The collator pads shorter sequences so they can be batched together. Standard collators don't handle Qwen2-VL's `image_grid_thw` tensor.

In [None]:
from dataclasses import dataclass
from typing import Dict, List


@dataclass
class MeterCollator:
    processor: object

    def __call__(self, features: List[Dict]) -> Dict:
        max_len = max(f['input_ids'].shape[0] for f in features)
        pad_token_id = self.processor.tokenizer.pad_token_id or 0

        batch = {
            'input_ids': [], 'attention_mask': [], 'labels': [],
            'pixel_values': [], 'image_grid_thw': [],
        }

        for f in features:
            pad_len = max_len - f['input_ids'].shape[0]
            if pad_len > 0:
                batch['input_ids'].append(
                    torch.cat([f['input_ids'], torch.full((pad_len,), pad_token_id)]))
                batch['attention_mask'].append(
                    torch.cat([f['attention_mask'], torch.zeros(pad_len, dtype=torch.long)]))
                batch['labels'].append(
                    torch.cat([f['labels'], torch.full((pad_len,), -100)]))
            else:
                batch['input_ids'].append(f['input_ids'])
                batch['attention_mask'].append(f['attention_mask'])
                batch['labels'].append(f['labels'])

            batch['pixel_values'].append(f['pixel_values'])
            batch['image_grid_thw'].append(f['image_grid_thw'])

        batch['input_ids'] = torch.stack(batch['input_ids'])
        batch['attention_mask'] = torch.stack(batch['attention_mask'])
        batch['labels'] = torch.stack(batch['labels'])
        batch['pixel_values'] = torch.cat(batch['pixel_values'], dim=0)
        batch['image_grid_thw'] = torch.stack(batch['image_grid_thw'], dim=0) # Changed from cat to stack

        return batch


collator = MeterCollator(processor=processor)
print("‚úÖ Collator ready!")

‚úÖ Collator ready!


---
## Cell 8: Training Config

**WHY these hyperparameters:**
- `epochs=3` ‚Äî 67 images √ó 3 = ~201 update steps. 1=underfitting, 5+=overfitting
- `lr=2e-4` ‚Äî Standard for LoRA. From QLoRA paper.
- `warmup_ratio=0.1` ‚Äî First 10% uses ramping LR to prevent early instability
- `cosine scheduler` ‚Äî LR decays smoothly to 0
- `fp16=True` ‚Äî 2√ó faster training
- `gradient_checkpointing=True` ‚Äî Saves ~40% VRAM
- `optim="adamw_torch"` ‚Äî Standard optimizer (NOT 8-bit since no bitsandbytes)

In [None]:
from transformers import TrainingArguments, Trainer

OUTPUT_DIR = "/content/parsight_meter_model"

training_args = TrainingArguments(
    output_dir=OUTPUT_DIR,

    # Training duration
    num_train_epochs=3,

    # Batch size
    per_device_train_batch_size=BATCH_SIZE,
    per_device_eval_batch_size=1,
    gradient_accumulation_steps=GRAD_ACCUM,

    # Learning rate
    learning_rate=2e-4,
    warmup_ratio=0.1,
    lr_scheduler_type="cosine",
    weight_decay=0.01,

    # Memory optimization
    fp16=True,
    gradient_checkpointing=True,
    gradient_checkpointing_kwargs={"use_reentrant": False},
    optim="adamw_torch",       # Standard optimizer (no bitsandbytes needed)
    max_grad_norm=1.0,

    # Logging & saving
    logging_steps=5,
    eval_strategy="epoch",
    save_strategy="epoch",
    save_total_limit=2,
    load_best_model_at_end=True,
    metric_for_best_model="eval_loss",

    # Other
    remove_unused_columns=False,   # Keep pixel_values, image_grid_thw
    dataloader_pin_memory=True,
    report_to="none",
)

print(f"Epochs: {training_args.num_train_epochs}")
print(f"Effective batch: {BATCH_SIZE} √ó {GRAD_ACCUM} = {BATCH_SIZE * GRAD_ACCUM}")
print(f"LR: {training_args.learning_rate}")
print(f"Total steps: ~{len(train_dataset) * 3 // (BATCH_SIZE * GRAD_ACCUM)}")
print("‚úÖ Training config ready!")

Epochs: 3
Effective batch: 1 √ó 8 = 8
LR: 0.0002
Total steps: ~25
‚úÖ Training config ready!


---
## Cell 9: Train!

**WHY `Trainer`:** Handles gradient accumulation, mixed precision, checkpointing, evaluation, and saving ‚Äî would be ~200 lines manually.

**Expected time:** ~30-60 min on T4, ~15 min on A100.

**Watch the loss:** Should decrease. If it increases after epoch 2, model is overfitting.

In [None]:
trainer = Trainer(
    model=model,
    args=training_args,
    train_dataset=train_dataset,
    eval_dataset=val_dataset,
    data_collator=collator,
)

print("Starting training...")
print("Watch the loss ‚Äî should decrease over time.")
print("=" * 60)

train_result = trainer.train()

print("\n" + "=" * 60)
print("TRAINING COMPLETE!")
print(f"  Final train loss: {train_result.training_loss:.4f}")
print(f"  Training time: {train_result.metrics['train_runtime']:.0f} seconds")

eval_result = trainer.evaluate()
print(f"  Validation loss: {eval_result['eval_loss']:.4f}")
print("\n‚úÖ Training done!")

Starting training...
Watch the loss ‚Äî should decrease over time.


`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...


Epoch,Training Loss,Validation Loss
0,3.0704,1.772646
1,1.1655,0.846862
2,0.7227,0.655273



TRAINING COMPLETE!
  Final train loss: 1.5761
  Training time: 154 seconds


  Validation loss: 0.6553

‚úÖ Training done!


---
## Cell 10: Save Model

**WHY save adapters only:** Full model = ~4GB. LoRA adapters = ~80MB. We save just the adapters and load them on top of the base model later.

In [None]:
SAVE_DIR = "/content/parsight_meter_lora"

model.save_pretrained(SAVE_DIR)
processor.save_pretrained(SAVE_DIR)

print(f"Saved to: {SAVE_DIR}")
total_size = 0
for f in sorted(os.listdir(SAVE_DIR)):
    fpath = os.path.join(SAVE_DIR, f)
    if os.path.isfile(fpath):
        size = os.path.getsize(fpath)
        total_size += size
        print(f"  {f:45s} {size/1e6:.1f} MB")
print(f"\n  Total: {total_size/1e6:.1f} MB")
print("‚úÖ Model saved!")

Saved to: /content/parsight_meter_lora
  README.md                                     0.0 MB
  adapter_config.json                           0.0 MB
  adapter_model.safetensors                     17.5 MB
  added_tokens.json                             0.0 MB
  chat_template.json                            0.0 MB
  merges.txt                                    1.7 MB
  preprocessor_config.json                      0.0 MB
  special_tokens_map.json                       0.0 MB
  tokenizer.json                                11.4 MB
  tokenizer_config.json                         0.0 MB
  vocab.json                                    2.8 MB

  Total: 33.3 MB
‚úÖ Model saved!


---
## Cell 11: Test on Validation Image

**WHY test before downloading:** Verify the model actually learned to read meters before spending time downloading and deploying.

In [None]:
# Run this in a new Colab cell
import shutil
from google.colab import files

# Zip the LoRA folder
shutil.make_archive("/content/parsight_meter_lora", 'zip', "/content/parsight_meter_lora")

# Download
files.download("/content/parsight_meter_lora.zip")

<IPython.core.display.Javascript object>

<IPython.core.display.Javascript object>

In [None]:
import json
from PIL import Image

# Load first validation example
with open(os.path.join(DATASET_DIR, 'val.jsonl')) as f:
    test_example = json.loads(f.readline())

test_image_path = os.path.join(DATASET_DIR, test_example['image'])
test_image = Image.open(test_image_path).convert('RGB')
expected = test_example['conversations'][1]['content']

print(f"Testing: {test_example['image']}")
print(f"Expected: {expected[:100]}...")

# Create inference prompt
messages = [
    {
        "role": "user",
        "content": [
            {"type": "image", "image": test_image},
            {"type": "text", "text": test_example['conversations'][0]['content']},
        ],
    }
]

text = processor.apply_chat_template(
    messages, tokenize=False, add_generation_prompt=True
)

inputs = processor(
    text=[text], images=[test_image], return_tensors="pt", padding=True
).to(model.device)

with torch.no_grad():
    output_ids = model.generate(
        **inputs, max_new_tokens=256, do_sample=False,
    )

input_len = inputs['input_ids'].shape[1]
response = processor.tokenizer.decode(
    output_ids[0][input_len:], skip_special_tokens=True
)

print(f"\nModel output:\n{response}")

try:
    parsed = json.loads(response)
    exp = json.loads(expected)
    print(f"\n{'Field':<20} {'Expected':>15} {'Got':>15} {'Match':>6}")
    print("-" * 60)
    for key in ['meter_number', 'full_reading', 'billing_reading', 'meter_type']:
        e = exp.get(key, 'N/A')
        g = parsed.get(key, 'N/A')
        m = '‚úÖ' if str(e) == str(g) else '‚ùå'
        print(f"{key:<20} {str(e):>15} {str(g):>15} {m:>6}")
    print("\n‚úÖ Valid JSON output!")
except json.JSONDecodeError:
    print("\n‚ö†Ô∏è Output is not valid JSON. May need more training.")

Testing: images/meter_012.png
Expected: {"meter_number": "SMTT7731934LES5", "bp_number": "789243576", "full_reading": 347.215, "billing_read...





Model output:
```json
{
  "meter_reading": "000347.215",
  "bp_number": "BP-789243576",
  "full_reading": "000347.215 m¬≥",
  "billing_reading": "215",
  "meter_type": "ALPS G1.6",
  "manufacturer": "SMART",
  "black_digit": "000",
  "red_digit": "347",
  "customer_name": "KHAN-503 OKHLA"
}
```

‚ö†Ô∏è Output is not valid JSON. May need more training.


---
## Cell 12: Download Model

**WHY:** Colab's disk is temporary. Download the LoRA weights (~80MB) before session ends.

In [None]:
import shutil
from google.colab import files

shutil.make_archive("/content/parsight_meter_lora", 'zip', SAVE_DIR)

zip_file = "/content/parsight_meter_lora.zip"
print(f"Model ZIP: {os.path.getsize(zip_file)/1e6:.1f} MB")
print("Downloading...")

files.download(zip_file)
print("\n‚úÖ Download started!")

---
## Cell 13: Local Inference Script (save as meter_inference.py)

**Usage on your Mac:**
```bash
pip install transformers peft torch qwen-vl-utils Pillow
unzip parsight_meter_lora.zip -d ./parsight_meter_lora
python meter_inference.py meter_photo.jpg
```

In [None]:
INFERENCE_CODE = '''#!/usr/bin/env python3
"""
Parsight Meter Reading - Local Inference
Usage: python meter_inference.py <image_path>
"""
import sys, json, torch
from PIL import Image
from transformers import Qwen2VLForConditionalGeneration, AutoProcessor
from peft import PeftModel

BASE_MODEL = "Qwen/Qwen2-VL-2B-Instruct"
LORA_PATH = "./parsight_meter_lora"
PROMPT = (
    "Extract the meter reading from this gas meter image. "
    "Return a JSON object with meter_number, bp_number, full_reading, "
    "billing_reading, meter_type, manufacturer, black_digits, red_digits, "
    "and customer_name."
)

def load_model():
    print("Loading model...")
    if torch.cuda.is_available():
        device, dtype = "cuda", torch.float16
    elif hasattr(torch.backends, "mps") and torch.backends.mps.is_available():
        device, dtype = "mps", torch.float16
    else:
        device, dtype = "cpu", torch.float32
    print(f"Device: {device}")

    model = Qwen2VLForConditionalGeneration.from_pretrained(
        BASE_MODEL, torch_dtype=dtype,
        device_map="auto" if device != "cpu" else None,
    )
    model = PeftModel.from_pretrained(model, LORA_PATH)
    model.eval()
    processor = AutoProcessor.from_pretrained(
        LORA_PATH, min_pixels=256*28*28, max_pixels=512*28*28,
    )
    return model, processor

def extract_reading(model, processor, image_path):
    image = Image.open(image_path).convert("RGB")
    messages = [{"role": "user", "content": [
        {"type": "image", "image": image},
        {"type": "text", "text": PROMPT},
    ]}]
    text = processor.apply_chat_template(messages, tokenize=False, add_generation_prompt=True)
    inputs = processor(text=[text], images=[image], return_tensors="pt", padding=True).to(model.device)
    with torch.no_grad():
        out = model.generate(**inputs, max_new_tokens=256, do_sample=False)
    resp = processor.tokenizer.decode(out[0][inputs["input_ids"].shape[1]:], skip_special_tokens=True)
    try: return json.loads(resp)
    except: return {"raw": resp, "error": "JSON parse failed"}

if __name__ == "__main__":
    if len(sys.argv) < 2: print("Usage: python meter_inference.py <image>"); sys.exit(1)
    model, proc = load_model()
    print(json.dumps(extract_reading(model, proc, sys.argv[1]), indent=2))
'''

with open('/content/meter_inference.py', 'w') as f:
    f.write(INFERENCE_CODE)

files.download('/content/meter_inference.py')
print("‚úÖ Inference script ready!")

In [None]:
import json
import torch
from PIL import Image
from google.colab import files

# Upload a test image
print("Upload a meter image:")
uploaded = files.upload()
image_path = list(uploaded.keys())[0]
print(f"\nTesting: {image_path}")

# Load image
image = Image.open(image_path).convert('RGB')

# Create prompt
PROMPT = (
    "Extract the meter reading from this gas meter image. "
    "Return a JSON object with meter_number, bp_number, full_reading, "
    "billing_reading, meter_type, manufacturer, black_digits, red_digits, "
    "and customer_name."
)

messages = [
    {
        "role": "user",
        "content": [
            {"type": "image", "image": image},
            {"type": "text", "text": PROMPT},
        ],
    }
]

# Process
text = processor.apply_chat_template(messages, tokenize=False, add_generation_prompt=True)
inputs = processor(text=[text], images=[image], return_tensors="pt", padding=True).to(model.device)

# Generate
with torch.no_grad():
    output_ids = model.generate(**inputs, max_new_tokens=256, do_sample=False)

# Decode
input_len = inputs['input_ids'].shape[1]
response = processor.tokenizer.decode(output_ids[0][input_len:], skip_special_tokens=True)

print(f"\nModel output:\n{response}")

# Try parsing JSON
try:
    parsed = json.loads(response)
    print("\n‚úÖ Valid JSON!")
    for k, v in parsed.items():
        print(f"  {k}: {v}")
except:
    print("\n‚ö†Ô∏è Raw text output (not valid JSON)")

Upload a meter image:


Saving WhatsApp Image 2026-02-18 at 17.10.33.jpeg to WhatsApp Image 2026-02-18 at 17.10.33.jpeg

Testing: WhatsApp Image 2026-02-18 at 17.10.33.jpeg





Model output:
{
  "customer_name": "Indraprastha Gas Limited",
  "utility_reading": "396.6",
  "utility_reading_number": "17619818",
  "utility_reading_type": "G1.6",
  "utility_reading_blown": "396.6",
  "utility_reading_reading": "396.6",
  "utility_reading_blown_reading": "396.6",
  "utility_reading_blown_type": "G1.6",
  "utility_reading_blown_number": "17619818",
  "utility_reading_reading_blown": "396.6",
  "utility_reading_reading_blown_type": "G1.6",
  "utility_reading_reading_blown_number": "17619818"
}

‚úÖ Valid JSON!
  customer_name: Indraprastha Gas Limited
  utility_reading: 396.6
  utility_reading_number: 17619818
  utility_reading_type: G1.6
  utility_reading_blown: 396.6
  utility_reading_reading: 396.6
  utility_reading_blown_reading: 396.6
  utility_reading_blown_type: G1.6
  utility_reading_blown_number: 17619818
  utility_reading_reading_blown: 396.6
  utility_reading_reading_blown_type: G1.6
  utility_reading_reading_blown_number: 17619818


---
## Done! üéâ

### You now have:
1. `parsight_meter_lora.zip` ‚Äî Fine-tuned LoRA weights (~80MB)
2. `meter_inference.py` ‚Äî Local inference script

### Next steps:
1. Test locally: `python meter_inference.py test_image.jpg`
2. If accuracy ‚â• 90%, replace GPT-4o endpoint in Parsight API
3. If accuracy < 90%, collect more images and retrain

### Cost savings:
- GPT-4o: ~‚Çπ8 per reading
- This model: ~‚Çπ0.08 per reading (100x cheaper)

In [None]:
# Resume training for 3 more epochs with LOWER learning rate
# WHY lower LR: Model already learned the basics. Now we fine-tune gently.
# 1e-4 instead of 2e-4 = half the step size = less risk of overfitting.

from transformers import TrainingArguments, Trainer

training_args_v2 = TrainingArguments(
    output_dir="/content/parsight_meter_model_v2",

    num_train_epochs=3,

    per_device_train_batch_size=BATCH_SIZE,
    per_device_eval_batch_size=1,
    gradient_accumulation_steps=GRAD_ACCUM,

    learning_rate=1e-4,            # HALF of before (was 2e-4)
    warmup_ratio=0.05,             # Less warmup (model is already warmed up)
    lr_scheduler_type="cosine",
    weight_decay=0.01,

    fp16=True,
    gradient_checkpointing=True,
    gradient_checkpointing_kwargs={"use_reentrant": False},
    optim="adamw_torch",
    max_grad_norm=1.0,

    logging_steps=5,
    eval_strategy="epoch",
    save_strategy="epoch",
    save_total_limit=2,
    load_best_model_at_end=True,
    metric_for_best_model="eval_loss",

    remove_unused_columns=False,
    dataloader_pin_memory=True,
    report_to="none",
)

trainer_v2 = Trainer(
    model=model,
    args=training_args_v2,
    train_dataset=train_dataset,
    eval_dataset=val_dataset,
    data_collator=collator,
)

print("Resuming training (3 more epochs, LR=1e-4)...")
print("Watch eval_loss ‚Äî if it INCREASES, we're overfitting.")
print("=" * 60)

result = trainer_v2.train()

print("\n" + "=" * 60)
print(f"Train loss: {result.training_loss:.4f}")
eval_r = trainer_v2.evaluate()
print(f"Val loss:   {eval_r['eval_loss']:.4f}")

# Save updated model
model.save_pretrained("/content/parsight_meter_lora_v2")
processor.save_pretrained("/content/parsight_meter_lora_v2")
print("\n‚úÖ Saved to /content/parsight_meter_lora_v2")

Resuming training (3 more epochs, LR=1e-4)...
Watch eval_loss ‚Äî if it INCREASES, we're overfitting.


Epoch,Training Loss,Validation Loss
0,0.5726,0.431517
1,0.3592,0.352497
2,0.2605,0.326005



Train loss: 0.3824


Val loss:   0.3260

‚úÖ Saved to /content/parsight_meter_lora_v2


In [None]:
import shutil
from google.colab import files

# Zip v2 model
shutil.make_archive("/content/parsight_meter_lora_v2", 'zip', "/content/parsight_meter_lora_v2")
files.download("/content/parsight_meter_lora_v2.zip")

# Also download inference script
files.download("/content/meter_inference.py")

<IPython.core.display.Javascript object>

<IPython.core.display.Javascript object>

FileNotFoundError: Cannot find file: /content/meter_inference.py

In [None]:
import json
import torch
from PIL import Image
from google.colab import files

print("Upload a meter image:")
uploaded = files.upload()
image_path = list(uploaded.keys())[0]
print(f"\nTesting: {image_path}")

image = Image.open(image_path).convert('RGB')

PROMPT = (
    "Extract the meter reading from this gas meter image. "
    "Return a JSON object with meter_number, bp_number, full_reading, "
    "billing_reading, meter_type, manufacturer, black_digits, red_digits, "
    "and customer_name."
)

messages = [{"role": "user", "content": [
    {"type": "image", "image": image},
    {"type": "text", "text": PROMPT},
]}]

text = processor.apply_chat_template(messages, tokenize=False, add_generation_prompt=True)
inputs = processor(text=[text], images=[image], return_tensors="pt", padding=True).to(model.device)

with torch.no_grad():
    output_ids = model.generate(**inputs, max_new_tokens=256, do_sample=False)

input_len = inputs['input_ids'].shape[1]
response = processor.tokenizer.decode(output_ids[0][input_len:], skip_special_tokens=True)

print(f"\nModel output:\n{response}")

try:
    # Handle markdown code blocks
    if "```" in response:
        json_str = response.split("```")[1].replace("json", "").strip()
        parsed = json.loads(json_str)
    else:
        parsed = json.loads(response)
    print("\n‚úÖ Valid JSON!")
    for k, v in parsed.items():
        print(f"  {k}: {v}")
except:
    print("\n‚ö†Ô∏è Raw text output")

Upload a meter image:


Saving WhatsApp Image 2026-02-18 at 17.10.33.jpeg to WhatsApp Image 2026-02-18 at 17.10.33 (1).jpeg

Testing: WhatsApp Image 2026-02-18 at 17.10.33 (1).jpeg

Model output:
{
  "customer_name": "Indraprastha Gas Limited",
  "utility_reading": "396.6",
  "utility_reading_number": "17619818",
  "utility_reading_type": "G1.6",
  "utility_reading_blown": "396.6",
  "utility_reading_reading": "396.6",
  "utility_reading_blown_reading": "396.6",
  "utility_reading_blown_type": "G1.6",
  "utility_reading_blown_number": "17619818",
  "utility_reading_reading_blown": "396.6",
  "utility_reading_reading_blown_type": "G1.6",
  "utility_reading_reading_blown_number": "17619818"
}

‚úÖ Valid JSON!
  customer_name: Indraprastha Gas Limited
  utility_reading: 396.6
  utility_reading_number: 17619818
  utility_reading_type: G1.6
  utility_reading_blown: 396.6
  utility_reading_reading: 396.6
  utility_reading_blown_reading: 396.6
  utility_reading_blown_type: G1.6
  utility_reading_blown_number: 176198