Skip to content

luliuzee/metermind

Repository files navigation

MeterMind

ERNIE AI Developer Challenge 2025 Submission

Fine-tuning ERNIE-4.5-VL for industrial gauge reading with 100x inference speedup.

Fine-tuning Vision-Language Models for Precise Analog Gauge Reading

MeterMind is a complete pipeline for training AI models to accurately read analog gauges, meters, and dials. It includes synthetic data generation, model fine-tuning on Modal, and comprehensive evaluation tools.

Key Results

Fine-tuning ERNIE-4.5-VL-28B on 570 synthetic gauge images achieves remarkable improvements:

Metric Baseline 1-Epoch Fine-tuned Improvement
MAE 2.82 0.60 79% better
RMSE 5.18 0.93 82% better
Exact Match 16.7% 53.3% +36.6%
Within ±1 46.7% 86.7% +40.0%
Within ±2 56.7% 100.0% +43.3%
Within ±5 90.0% 100.0% +10.0%

Performance by Gauge Type (MAE)

Gauge Type Baseline Fine-tuned Improvement
Bimetal Thermometer 5.65 1.00 82% better
Standard Pressure 1.60 0.50 69% better
Glycerin-filled 1.20 0.30 75% better

Features

  • Synthetic Data Generation: Procedural gauge images with realistic effects

    • 3D perspective transforms
    • Industrial backgrounds (metal, concrete, machinery)
    • Damage effects (scratches, dust, rust)
    • HDRI-style lighting variations
  • Cloud Training: Modal-based fine-tuning with LoRA

    • NVIDIA B200 GPU support
    • Automatic experiment tracking
    • Resume capability
  • Comprehensive Evaluation: MSE, MAE, RMSE, and accuracy metrics

Quick Start

1. Installation

cd MeterMind
pip install -r requirements.txt

2. Generate Synthetic Data

# Generate 600 images (200 per gauge type)
python src/generation/generate.py --output data/synthetic --per-type 200

# Prepare training splits
python src/training/prepare_data.py

3. Upload to Modal & Train

# Upload data to Modal volume
modal run src/training/modal_finetune.py --upload

# Run training (1 epoch)
modal run src/training/modal_finetune.py --max-steps 285 --name my_experiment

# Run full training (3 epochs)
modal run src/training/modal_finetune.py --max-steps 855 --name full_training

4. Evaluate

# List experiments
modal run src/training/modal_finetune.py --list-exp

# Evaluate specific experiment
modal run src/training/modal_finetune.py --eval-only my_experiment

Web App

A simple Gradio interface for testing the deployed model.

1. Deploy Your Own Modal Endpoint

# Create a Modal secret for your API key
modal secret create metermind-api-key API_KEY="your-secret-key"

# Deploy the endpoint
modal deploy src/deployment/deploy_simple.py

Note the endpoint URL from the output (e.g., https://your-name--metermind-simple-gaugereader-predict.modal.run)

2. Run the Gradio App

# Set environment variables
export METERMIND_API_URL="https://your-endpoint.modal.run"
export METERMIND_API_KEY="your-secret-key"

# Run the app
python app.py

Open http://localhost:7860, upload a gauge image, and get the reading.

Direct API Usage

import requests
import base64

with open("gauge.png", "rb") as f:
    image_b64 = base64.b64encode(f.read()).decode()

response = requests.post(
    "https://your-endpoint.modal.run",
    json={"image": image_b64, "api_key": "your-api-key"}
)
print(response.json())  # {"reading": 70.0, "raw": "70"}

Deployment with vLLM

For production use, we provide a vLLM-based deployment that achieves ~100x faster inference.

Performance Comparison

Deployment Warm Latency Cold Start
deploy_simple.py (Unsloth) ~110s ~2 min
deploy_vllm.py (vLLM) ~0.85s ~2.5 min

Key Findings

  • vLLM nightly required: ERNIE-4.5-VL support requires vLLM nightly build (--extra-index-url https://wheels.vllm.ai/nightly)
  • Model merging: LoRA adapter must be merged with base model and pushed to HuggingFace (vLLM doesn't support LoRA adapters directly for this model)
  • Processor configs: Must push preprocessor_config.json alongside merged weights for vLLM's image processor
  • Prompt format: Uses OpenAI chat format with base64 data URI for images

Setup

# 1. Create secrets
modal secret create huggingface-secret HF_TOKEN=hf_your_token
modal secret create metermind-api-key API_KEY=your-secret-key

# 2. Merge LoRA and push to HuggingFace (one-time)
modal run src/deployment/deploy_vllm.py::merge_model

# 3. Deploy vLLM server
modal deploy src/deployment/deploy_vllm.py

Usage

import requests
import base64

with open("gauge.png", "rb") as f:
    image_b64 = base64.b64encode(f.read()).decode()

response = requests.post(
    "https://your-vllm-endpoint.modal.run",
    json={"image": image_b64, "api_key": "your-api-key"}
)
print(response.json())  # {"reading": 70.0, "raw": "70"}

Model Weights

Project Structure

MeterMind/
├── README.md                       # This file
├── requirements.txt                # Python dependencies
├── .gitignore                      # Git ignore patterns
│
├── configs/
│   └── default.yaml                # Training hyperparameters
│
├── data/
│   └── synthetic/
│       ├── images/                 # Generated gauge images (512x512)
│       ├── annotations.json        # Raw annotations with metadata
│       ├── train.json              # Training set (570 samples)
│       └── val.json                # Validation set (30 samples)
│
├── src/
│   ├── generation/                 # Synthetic data generation
│   │   ├── gauge_renderer.py       # Base gauge rendering classes
│   │   ├── backgrounds.py          # Procedural backgrounds
│   │   ├── effects_3d.py           # Perspective & glass effects
│   │   ├── damage_effects.py       # Wear & damage simulation
│   │   ├── lighting_effects.py     # HDRI-style lighting
│   │   ├── augmentations.py        # Data augmentations
│   │   └── generate.py             # Main generation script
│   │
│   ├── training/                   # Model training
│   │   ├── prepare_data.py         # Convert to training format
│   │   └── modal_finetune.py       # Modal cloud training script
│   │
│   ├── evaluation/                 # Evaluation tools
│   │   └── baseline_eval.py        # Baseline API evaluation
│   │
│   └── deployment/                 # Modal deployments
│       ├── deploy_simple.py        # Unsloth-based (~110s/inference)
│       └── deploy_vllm.py          # vLLM-based (~0.85s/inference)
│
├── scripts/                        # Shell utility scripts
│   ├── generate_data.sh
│   ├── upload_to_modal.sh
│   └── train.sh
│
├── results/                        # Evaluation results
│   ├── baseline/
│   └── experiments/
│
└── docs/                           # Documentation
    ├── BASELINE.md                 # Baseline evaluation report
    ├── TRAINING.md                 # Training guide
    └── RESULTS.md                  # Experiment results

Dataset

Gauge Types

Type Scale Range Unit Count
Standard Pressure 0-100 PSI 200
Glycerin-filled 0-100 PSI 200
Bimetal Thermometer 0-250 °F 200

Augmentations Applied

  • 3D Effects: Perspective transforms (54% of images)
  • Backgrounds: Metal panels, concrete, machinery, brick
  • Lighting: Studio, harsh sun, low-light, industrial
  • Damage: Scratches, dust, rust (69% with some damage)

Model Configuration

Base Model

  • Model: ERNIE-4.5-VL-28B-A3B (via Unsloth)
  • Architecture: Vision-Language Transformer

LoRA Configuration

Parameter Value
Rank (r) 8
Alpha 16
Dropout 0
Target Modules q_proj, k_proj, v_proj, o_proj, gate_proj, up_proj, down_proj, fc1, fc2

Training Hyperparameters

Parameter Value
Learning Rate 2e-4
Batch Size 1
Gradient Accumulation 2
Effective Batch Size 2
Warmup Steps 10
Weight Decay 0.001
LR Scheduler Linear
Optimizer AdamW 8-bit
Max Seq Length 1024
Precision BFloat16

Experiments

Experiment Steps Epochs MAE Within ±2 Within ±5
Baseline (no fine-tuning) - - 2.82 56.7% 90.0%
test_50steps 50 0.18 1.50 73.3% 93.3%
synthetic_1epoch 285 1 0.60 100.0% 100.0%
synthetic_3epoch 855 3 TBD TBD TBD

Requirements

  • Python 3.11+
  • Modal account (for cloud training)
  • CUDA-compatible GPU (for local inference)

Key Dependencies

  • torch>=2.0.0
  • transformers>=4.56.0
  • unsloth
  • modal
  • pillow
  • opencv-python
  • tqdm

Citation

If you use MeterMind in your research, please cite:

@software{metermind2024,
  title={MeterMind: Fine-tuning Vision-Language Models for Analog Gauge Reading},
  author={Your Name},
  year={2024},
  url={https://github.com/yourusername/MeterMind}
}

License

MIT License - see LICENSE file for details.

Model Weights

Fine-tuned LoRA adapter available on HuggingFace:

luliuzee/metermind-ernie-gauge-lora

Acknowledgments

About

No description, website, or topics provided.

Resources

Security policy

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors