Production-Grade Real-Time Multi-Model Image Recognition
VisionCore is a modular, real-time image recognition system that orchestrates five AI models simultaneously through a webcam feed. It provides object detection, image classification, semantic segmentation, face detection with head-pose estimation, and monocular depth estimation — all running in parallel with multi-object tracking.
- 5 parallel AI models — YOLOv8 detector, EfficientNet classifier, DeepLabV3 segmenter, MediaPipe face mesh, MiDaS depth estimator
- SORT multi-object tracker — built from scratch using Kalman filter + Hungarian algorithm
- Threaded pipeline —
ThreadPoolExecutorruns each model independently per frame - FastAPI server — REST endpoints, MJPEG streaming, WebSocket real-time detections
- Live dashboard — dark-themed web UI with FPS gauge, model toggles, detection cards
- Video recorder — non-blocking background writer with metadata JSON sidecar
- Click CLI —
run,snapshot,benchmark,download,servecommands - Configurable — single YAML config with deep-merge override support
┌───────────┐ ┌──────────────────┐ ┌───────────────┐
│ Webcam │────►│ InferencePipeline│────►│ Annotator │
│ Capture │ │ (ThreadPool) │ │ (OpenCV draw)│
└───────────┘ ├──────────────────┤ └───────┬───────┘
│ ┌──────────────┐ │ │
│ │ EfficientNet │ │ ┌─────▼─────┐
│ │ Classifier │ │ │ FastAPI │
│ ├──────────────┤ │ │ Server │
│ │ YOLOv8 │ │ ├───────────┤
│ │ Detector │ │ │ /stream │──► MJPEG
│ ├──────────────┤ │ │ /ws │──► WebSocket
│ │ DeepLabV3 │ │ │ /api/v1/* │──► REST
│ │ Segmenter │ │ └───────────┘
│ ├──────────────┤ │ │
│ │ MediaPipe │ │ ┌─────▼─────┐
│ │ Face Mesh │ │ │ Dashboard │
│ ├──────────────┤ │ │ (HTML/JS) │
│ │ MiDaS │ │ └───────────┘
│ │ Depth │ │
│ └──────────────┘ │
│ ┌──────────────┐ │
│ │ SORT Tracker │ │
│ └──────────────┘ │
└──────────────────┘
- Python 3.10+
- Webcam (or use
MockCapturefor testing) - ~2 GB disk (for model weights, auto-downloaded on first run)
git clone https://github.com/gokulprakash30/visioncore.git
cd visioncore
python -m venv .venv && source .venv/bin/activate
pip install -r requirements.txt# Live inference + API server + dashboard
python cli/main.py run
# Standalone mode (OpenCV window only, no server)
python cli/main.py run --no-server
# Headless (API server only, no display)
python cli/main.py run --no-display
# Single snapshot
python cli/main.py snapshot --output snapshot.jpg
# Benchmark all models
python cli/main.py benchmark --frames 100 --output results.json
# Pre-download all model weights
python cli/main.py downloadOpen http://localhost:8000 after starting the server.
visioncore/
├── cli/main.py # Click CLI (run, snapshot, benchmark, download, serve)
├── config/default.yaml # Master YAML configuration
├── dashboard/ # Web frontend (HTML + CSS + JS)
│ ├── index.html
│ ├── style.css
│ └── app.js
├── scripts/
│ ├── benchmark.py # Performance benchmarking
│ └── download_models.py # Pre-download all weights
├── tests/ # Pytest suite (21 tests)
│ ├── conftest.py
│ ├── test_annotator.py
│ ├── test_api.py
│ ├── test_capture.py
│ └── test_pipeline.py
├── visioncore/ # Core Python package
│ ├── __init__.py
│ ├── annotator.py # Frame annotation (boxes, masks, HUD)
│ ├── capture.py # WebcamCapture + MockCapture
│ ├── pipeline.py # Multi-model InferencePipeline
│ ├── recorder.py # Background video recording
│ ├── tracker.py # SORT multi-object tracker
│ ├── api/
│ │ ├── schemas.py # Pydantic schemas
│ │ └── server.py # FastAPI server
│ ├── models/
│ │ ├── base.py # Abstract BaseModel + Detection dataclass
│ │ ├── classifier.py # EfficientNet-B0
│ │ ├── detector.py # YOLOv8
│ │ ├── depth.py # MiDaS depth
│ │ ├── face.py # MediaPipe face mesh + head pose
│ │ └── segmenter.py # DeepLabV3 MobileNet
│ └── utils/
│ ├── config.py # YAML config with deep-merge + AttrDict
│ ├── fps.py # Thread-safe EMA FPS counter
│ └── logger.py # Structured JSON logger + Rich console
├── assets/demo_frame.jpg
├── pyproject.toml
├── requirements.txt
├── requirements-dev.txt
├── Makefile
├── Dockerfile
├── LICENSE # Apache 2.0
└── .github/workflows/ci.yml
All settings live in config/default.yaml. Override with a custom YAML:
python cli/main.py run --config my_config.yamlKey sections:
| Section | What it controls |
|---|---|
capture |
Camera device, resolution, FPS |
pipeline |
Enabled models, device (cpu/cuda/mps), confidence threshold |
models.* |
Per-model weights, hyperparameters |
tracker |
SORT max_age, min_hits, IoU threshold |
annotator |
Bbox thickness, font scale, mask alpha |
api |
Host, port, CORS origins, JPEG quality |
recorder |
Output dir, codec, FPS |
| Method | Endpoint | Description |
|---|---|---|
| GET | /api/v1/status |
System status (FPS, models, device) |
| GET | /api/v1/models |
List all available models |
| POST | /api/v1/models/toggle |
Enable/disable a model at runtime |
| GET | /api/v1/frame/latest |
Latest inference metadata |
| GET | /api/v1/snapshot |
Save annotated frame to disk |
| GET | /api/v1/metrics |
Per-model inference latencies |
| POST | /api/v1/record/start |
Start MP4 recording |
| POST | /api/v1/record/stop |
Stop recording + save metadata |
| POST | /api/v1/config |
Update config value at runtime |
| GET | /stream |
MJPEG video stream |
| WS | /ws/detections |
Real-time detection WebSocket |
# Run all tests
pytest tests/ -v
# With coverage
pytest tests/ --cov=visioncore --cov-report=term-missingdocker build -t visioncore .
docker run --device=/dev/video0 -p 8000:8000 visioncoreApache License 2.0 — see LICENSE.
Built by gokulprakash30.
