MeterMind

ERNIE AI Developer Challenge 2025 Submission

Fine-tuning ERNIE-4.5-VL for industrial gauge reading with 100x inference speedup.

Fine-tuning Vision-Language Models for Precise Analog Gauge Reading

MeterMind is a complete pipeline for training AI models to accurately read analog gauges, meters, and dials. It includes synthetic data generation, model fine-tuning on Modal, and comprehensive evaluation tools.

Key Results

Fine-tuning ERNIE-4.5-VL-28B on 570 synthetic gauge images achieves remarkable improvements:

Metric	Baseline	1-Epoch Fine-tuned	Improvement
MAE	2.82	0.60	79% better
RMSE	5.18	0.93	82% better
Exact Match	16.7%	53.3%	+36.6%
Within ±1	46.7%	86.7%	+40.0%
Within ±2	56.7%	100.0%	+43.3%
Within ±5	90.0%	100.0%	+10.0%

Performance by Gauge Type (MAE)

Gauge Type	Baseline	Fine-tuned	Improvement
Bimetal Thermometer	5.65	1.00	82% better
Standard Pressure	1.60	0.50	69% better
Glycerin-filled	1.20	0.30	75% better

Features

Synthetic Data Generation: Procedural gauge images with realistic effects
- 3D perspective transforms
- Industrial backgrounds (metal, concrete, machinery)
- Damage effects (scratches, dust, rust)
- HDRI-style lighting variations
Cloud Training: Modal-based fine-tuning with LoRA
- NVIDIA B200 GPU support
- Automatic experiment tracking
- Resume capability
Comprehensive Evaluation: MSE, MAE, RMSE, and accuracy metrics

Quick Start

1. Installation

cd MeterMind
pip install -r requirements.txt

2. Generate Synthetic Data

# Generate 600 images (200 per gauge type)
python src/generation/generate.py --output data/synthetic --per-type 200

# Prepare training splits
python src/training/prepare_data.py

3. Upload to Modal & Train

# Upload data to Modal volume
modal run src/training/modal_finetune.py --upload

# Run training (1 epoch)
modal run src/training/modal_finetune.py --max-steps 285 --name my_experiment

# Run full training (3 epochs)
modal run src/training/modal_finetune.py --max-steps 855 --name full_training

4. Evaluate

# List experiments
modal run src/training/modal_finetune.py --list-exp

# Evaluate specific experiment
modal run src/training/modal_finetune.py --eval-only my_experiment

Web App

A simple Gradio interface for testing the deployed model.

1. Deploy Your Own Modal Endpoint

# Create a Modal secret for your API key
modal secret create metermind-api-key API_KEY="your-secret-key"

# Deploy the endpoint
modal deploy src/deployment/deploy_simple.py

Note the endpoint URL from the output (e.g., https://your-name--metermind-simple-gaugereader-predict.modal.run)

2. Run the Gradio App

# Set environment variables
export METERMIND_API_URL="https://your-endpoint.modal.run"
export METERMIND_API_KEY="your-secret-key"

# Run the app
python app.py

Open http://localhost:7860, upload a gauge image, and get the reading.

Direct API Usage

import requests
import base64

with open("gauge.png", "rb") as f:
    image_b64 = base64.b64encode(f.read()).decode()

response = requests.post(
    "https://your-endpoint.modal.run",
    json={"image": image_b64, "api_key": "your-api-key"}
)
print(response.json())  # {"reading": 70.0, "raw": "70"}

Deployment with vLLM

For production use, we provide a vLLM-based deployment that achieves ~100x faster inference.

Performance Comparison

Deployment	Warm Latency	Cold Start
`deploy_simple.py` (Unsloth)	~110s	~2 min
`deploy_vllm.py` (vLLM)	~0.85s	~2.5 min

Key Findings

vLLM nightly required: ERNIE-4.5-VL support requires vLLM nightly build (--extra-index-url https://wheels.vllm.ai/nightly)
Model merging: LoRA adapter must be merged with base model and pushed to HuggingFace (vLLM doesn't support LoRA adapters directly for this model)
Processor configs: Must push preprocessor_config.json alongside merged weights for vLLM's image processor
Prompt format: Uses OpenAI chat format with base64 data URI for images

Setup

# 1. Create secrets
modal secret create huggingface-secret HF_TOKEN=hf_your_token
modal secret create metermind-api-key API_KEY=your-secret-key

# 2. Merge LoRA and push to HuggingFace (one-time)
modal run src/deployment/deploy_vllm.py::merge_model

# 3. Deploy vLLM server
modal deploy src/deployment/deploy_vllm.py

Usage

import requests
import base64

with open("gauge.png", "rb") as f:
    image_b64 = base64.b64encode(f.read()).decode()

response = requests.post(
    "https://your-vllm-endpoint.modal.run",
    json={"image": image_b64, "api_key": "your-api-key"}
)
print(response.json())  # {"reading": 70.0, "raw": "70"}

Model Weights

LoRA Adapter: luliuzee/metermind-ernie-gauge-lora
Merged Model (for vLLM): luliuzee/metermind-ernie-merged

Project Structure

MeterMind/
├── README.md                       # This file
├── requirements.txt                # Python dependencies
├── .gitignore                      # Git ignore patterns
│
├── configs/
│   └── default.yaml                # Training hyperparameters
│
├── data/
│   └── synthetic/
│       ├── images/                 # Generated gauge images (512x512)
│       ├── annotations.json        # Raw annotations with metadata
│       ├── train.json              # Training set (570 samples)
│       └── val.json                # Validation set (30 samples)
│
├── src/
│   ├── generation/                 # Synthetic data generation
│   │   ├── gauge_renderer.py       # Base gauge rendering classes
│   │   ├── backgrounds.py          # Procedural backgrounds
│   │   ├── effects_3d.py           # Perspective & glass effects
│   │   ├── damage_effects.py       # Wear & damage simulation
│   │   ├── lighting_effects.py     # HDRI-style lighting
│   │   ├── augmentations.py        # Data augmentations
│   │   └── generate.py             # Main generation script
│   │
│   ├── training/                   # Model training
│   │   ├── prepare_data.py         # Convert to training format
│   │   └── modal_finetune.py       # Modal cloud training script
│   │
│   ├── evaluation/                 # Evaluation tools
│   │   └── baseline_eval.py        # Baseline API evaluation
│   │
│   └── deployment/                 # Modal deployments
│       ├── deploy_simple.py        # Unsloth-based (~110s/inference)
│       └── deploy_vllm.py          # vLLM-based (~0.85s/inference)
│
├── scripts/                        # Shell utility scripts
│   ├── generate_data.sh
│   ├── upload_to_modal.sh
│   └── train.sh
│
├── results/                        # Evaluation results
│   ├── baseline/
│   └── experiments/
│
└── docs/                           # Documentation
    ├── BASELINE.md                 # Baseline evaluation report
    ├── TRAINING.md                 # Training guide
    └── RESULTS.md                  # Experiment results

Dataset

Gauge Types

Type	Scale Range	Unit	Count
Standard Pressure	0-100	PSI	200
Glycerin-filled	0-100	PSI	200
Bimetal Thermometer	0-250	°F	200

Augmentations Applied

3D Effects: Perspective transforms (54% of images)
Backgrounds: Metal panels, concrete, machinery, brick
Lighting: Studio, harsh sun, low-light, industrial
Damage: Scratches, dust, rust (69% with some damage)

Model Configuration

Base Model

Model: ERNIE-4.5-VL-28B-A3B (via Unsloth)
Architecture: Vision-Language Transformer

LoRA Configuration

Parameter	Value
Rank (r)	8
Alpha	16
Dropout	0
Target Modules	q_proj, k_proj, v_proj, o_proj, gate_proj, up_proj, down_proj, fc1, fc2

Training Hyperparameters

Parameter	Value
Learning Rate	2e-4
Batch Size	1
Gradient Accumulation	2
Effective Batch Size	2
Warmup Steps	10
Weight Decay	0.001
LR Scheduler	Linear
Optimizer	AdamW 8-bit
Max Seq Length	1024
Precision	BFloat16

Experiments

Experiment	Steps	Epochs	MAE	Within ±2	Within ±5
Baseline (no fine-tuning)	-	-	2.82	56.7%	90.0%
test_50steps	50	0.18	1.50	73.3%	93.3%
synthetic_1epoch	285	1	0.60	100.0%	100.0%
synthetic_3epoch	855	3	TBD	TBD	TBD

Requirements

Python 3.11+
Modal account (for cloud training)
CUDA-compatible GPU (for local inference)

Key Dependencies

torch>=2.0.0
transformers>=4.56.0
unsloth
modal
pillow
opencv-python
tqdm

Citation

If you use MeterMind in your research, please cite:

@software{metermind2024,
  title={MeterMind: Fine-tuning Vision-Language Models for Analog Gauge Reading},
  author={Your Name},
  year={2024},
  url={https://github.com/yourusername/MeterMind}
}

License

MIT License - see LICENSE file for details.

Model Weights

Fine-tuned LoRA adapter available on HuggingFace:

luliuzee/metermind-ernie-gauge-lora

Acknowledgments

ERNIE-4.5-VL by Baidu
Unsloth for efficient fine-tuning
Modal for cloud GPU infrastructure

Name		Name	Last commit message	Last commit date
Latest commit History 11 Commits
configs		configs
data/synthetic		data/synthetic
docs		docs
huggingface_space		huggingface_space
models		models
results/baseline		results/baseline
scripts		scripts
src		src
.env.sample		.env.sample
.gitignore		.gitignore
DEVPOST.md		DEVPOST.md
PROJECT_STORY.md		PROJECT_STORY.md
README.md		README.md
SECURITY.md		SECURITY.md
SUBMISSION.md		SUBMISSION.md
app.py		app.py
requirements.txt		requirements.txt
test_endpoint.py		test_endpoint.py
test_endpoint_url.py		test_endpoint_url.py
test_local_images.py		test_local_images.py

Folders and files

Latest commit

History

Repository files navigation

MeterMind

Key Results

Performance by Gauge Type (MAE)

Features

Quick Start

1. Installation

2. Generate Synthetic Data

3. Upload to Modal & Train

4. Evaluate

Web App

1. Deploy Your Own Modal Endpoint

2. Run the Gradio App

Direct API Usage

Deployment with vLLM

Performance Comparison

Key Findings

Setup

Usage

Model Weights

Project Structure

Dataset

Gauge Types

Augmentations Applied

Model Configuration

Base Model

LoRA Configuration

Training Hyperparameters

Experiments

Requirements

Key Dependencies

Citation

License

Model Weights

Acknowledgments

About

Resources

Security policy

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Packages