ERNIE AI Developer Challenge 2025 Submission
Fine-tuning ERNIE-4.5-VL for industrial gauge reading with 100x inference speedup.
Fine-tuning Vision-Language Models for Precise Analog Gauge Reading
MeterMind is a complete pipeline for training AI models to accurately read analog gauges, meters, and dials. It includes synthetic data generation, model fine-tuning on Modal, and comprehensive evaluation tools.
Fine-tuning ERNIE-4.5-VL-28B on 570 synthetic gauge images achieves remarkable improvements:
| Metric | Baseline | 1-Epoch Fine-tuned | Improvement |
|---|---|---|---|
| MAE | 2.82 | 0.60 | 79% better |
| RMSE | 5.18 | 0.93 | 82% better |
| Exact Match | 16.7% | 53.3% | +36.6% |
| Within ±1 | 46.7% | 86.7% | +40.0% |
| Within ±2 | 56.7% | 100.0% | +43.3% |
| Within ±5 | 90.0% | 100.0% | +10.0% |
| Gauge Type | Baseline | Fine-tuned | Improvement |
|---|---|---|---|
| Bimetal Thermometer | 5.65 | 1.00 | 82% better |
| Standard Pressure | 1.60 | 0.50 | 69% better |
| Glycerin-filled | 1.20 | 0.30 | 75% better |
-
Synthetic Data Generation: Procedural gauge images with realistic effects
- 3D perspective transforms
- Industrial backgrounds (metal, concrete, machinery)
- Damage effects (scratches, dust, rust)
- HDRI-style lighting variations
-
Cloud Training: Modal-based fine-tuning with LoRA
- NVIDIA B200 GPU support
- Automatic experiment tracking
- Resume capability
-
Comprehensive Evaluation: MSE, MAE, RMSE, and accuracy metrics
cd MeterMind
pip install -r requirements.txt# Generate 600 images (200 per gauge type)
python src/generation/generate.py --output data/synthetic --per-type 200
# Prepare training splits
python src/training/prepare_data.py# Upload data to Modal volume
modal run src/training/modal_finetune.py --upload
# Run training (1 epoch)
modal run src/training/modal_finetune.py --max-steps 285 --name my_experiment
# Run full training (3 epochs)
modal run src/training/modal_finetune.py --max-steps 855 --name full_training# List experiments
modal run src/training/modal_finetune.py --list-exp
# Evaluate specific experiment
modal run src/training/modal_finetune.py --eval-only my_experimentA simple Gradio interface for testing the deployed model.
# Create a Modal secret for your API key
modal secret create metermind-api-key API_KEY="your-secret-key"
# Deploy the endpoint
modal deploy src/deployment/deploy_simple.pyNote the endpoint URL from the output (e.g., https://your-name--metermind-simple-gaugereader-predict.modal.run)
# Set environment variables
export METERMIND_API_URL="https://your-endpoint.modal.run"
export METERMIND_API_KEY="your-secret-key"
# Run the app
python app.pyOpen http://localhost:7860, upload a gauge image, and get the reading.
import requests
import base64
with open("gauge.png", "rb") as f:
image_b64 = base64.b64encode(f.read()).decode()
response = requests.post(
"https://your-endpoint.modal.run",
json={"image": image_b64, "api_key": "your-api-key"}
)
print(response.json()) # {"reading": 70.0, "raw": "70"}For production use, we provide a vLLM-based deployment that achieves ~100x faster inference.
| Deployment | Warm Latency | Cold Start |
|---|---|---|
deploy_simple.py (Unsloth) |
~110s | ~2 min |
deploy_vllm.py (vLLM) |
~0.85s | ~2.5 min |
- vLLM nightly required: ERNIE-4.5-VL support requires vLLM nightly build (
--extra-index-url https://wheels.vllm.ai/nightly) - Model merging: LoRA adapter must be merged with base model and pushed to HuggingFace (vLLM doesn't support LoRA adapters directly for this model)
- Processor configs: Must push
preprocessor_config.jsonalongside merged weights for vLLM's image processor - Prompt format: Uses OpenAI chat format with base64 data URI for images
# 1. Create secrets
modal secret create huggingface-secret HF_TOKEN=hf_your_token
modal secret create metermind-api-key API_KEY=your-secret-key
# 2. Merge LoRA and push to HuggingFace (one-time)
modal run src/deployment/deploy_vllm.py::merge_model
# 3. Deploy vLLM server
modal deploy src/deployment/deploy_vllm.pyimport requests
import base64
with open("gauge.png", "rb") as f:
image_b64 = base64.b64encode(f.read()).decode()
response = requests.post(
"https://your-vllm-endpoint.modal.run",
json={"image": image_b64, "api_key": "your-api-key"}
)
print(response.json()) # {"reading": 70.0, "raw": "70"}- LoRA Adapter: luliuzee/metermind-ernie-gauge-lora
- Merged Model (for vLLM): luliuzee/metermind-ernie-merged
MeterMind/
├── README.md # This file
├── requirements.txt # Python dependencies
├── .gitignore # Git ignore patterns
│
├── configs/
│ └── default.yaml # Training hyperparameters
│
├── data/
│ └── synthetic/
│ ├── images/ # Generated gauge images (512x512)
│ ├── annotations.json # Raw annotations with metadata
│ ├── train.json # Training set (570 samples)
│ └── val.json # Validation set (30 samples)
│
├── src/
│ ├── generation/ # Synthetic data generation
│ │ ├── gauge_renderer.py # Base gauge rendering classes
│ │ ├── backgrounds.py # Procedural backgrounds
│ │ ├── effects_3d.py # Perspective & glass effects
│ │ ├── damage_effects.py # Wear & damage simulation
│ │ ├── lighting_effects.py # HDRI-style lighting
│ │ ├── augmentations.py # Data augmentations
│ │ └── generate.py # Main generation script
│ │
│ ├── training/ # Model training
│ │ ├── prepare_data.py # Convert to training format
│ │ └── modal_finetune.py # Modal cloud training script
│ │
│ ├── evaluation/ # Evaluation tools
│ │ └── baseline_eval.py # Baseline API evaluation
│ │
│ └── deployment/ # Modal deployments
│ ├── deploy_simple.py # Unsloth-based (~110s/inference)
│ └── deploy_vllm.py # vLLM-based (~0.85s/inference)
│
├── scripts/ # Shell utility scripts
│ ├── generate_data.sh
│ ├── upload_to_modal.sh
│ └── train.sh
│
├── results/ # Evaluation results
│ ├── baseline/
│ └── experiments/
│
└── docs/ # Documentation
├── BASELINE.md # Baseline evaluation report
├── TRAINING.md # Training guide
└── RESULTS.md # Experiment results
| Type | Scale Range | Unit | Count |
|---|---|---|---|
| Standard Pressure | 0-100 | PSI | 200 |
| Glycerin-filled | 0-100 | PSI | 200 |
| Bimetal Thermometer | 0-250 | °F | 200 |
- 3D Effects: Perspective transforms (54% of images)
- Backgrounds: Metal panels, concrete, machinery, brick
- Lighting: Studio, harsh sun, low-light, industrial
- Damage: Scratches, dust, rust (69% with some damage)
- Model: ERNIE-4.5-VL-28B-A3B (via Unsloth)
- Architecture: Vision-Language Transformer
| Parameter | Value |
|---|---|
| Rank (r) | 8 |
| Alpha | 16 |
| Dropout | 0 |
| Target Modules | q_proj, k_proj, v_proj, o_proj, gate_proj, up_proj, down_proj, fc1, fc2 |
| Parameter | Value |
|---|---|
| Learning Rate | 2e-4 |
| Batch Size | 1 |
| Gradient Accumulation | 2 |
| Effective Batch Size | 2 |
| Warmup Steps | 10 |
| Weight Decay | 0.001 |
| LR Scheduler | Linear |
| Optimizer | AdamW 8-bit |
| Max Seq Length | 1024 |
| Precision | BFloat16 |
| Experiment | Steps | Epochs | MAE | Within ±2 | Within ±5 |
|---|---|---|---|---|---|
| Baseline (no fine-tuning) | - | - | 2.82 | 56.7% | 90.0% |
| test_50steps | 50 | 0.18 | 1.50 | 73.3% | 93.3% |
| synthetic_1epoch | 285 | 1 | 0.60 | 100.0% | 100.0% |
| synthetic_3epoch | 855 | 3 | TBD | TBD | TBD |
- Python 3.11+
- Modal account (for cloud training)
- CUDA-compatible GPU (for local inference)
torch>=2.0.0transformers>=4.56.0unslothmodalpillowopencv-pythontqdm
If you use MeterMind in your research, please cite:
@software{metermind2024,
title={MeterMind: Fine-tuning Vision-Language Models for Analog Gauge Reading},
author={Your Name},
year={2024},
url={https://github.com/yourusername/MeterMind}
}MIT License - see LICENSE file for details.
Fine-tuned LoRA adapter available on HuggingFace:
luliuzee/metermind-ernie-gauge-lora
- ERNIE-4.5-VL by Baidu
- Unsloth for efficient fine-tuning
- Modal for cloud GPU infrastructure