VisionCore

Production-Grade Real-Time Multi-Model Image Recognition

Overview

VisionCore is a modular, real-time image recognition system that orchestrates five AI models simultaneously through a webcam feed. It provides object detection, image classification, semantic segmentation, face detection with head-pose estimation, and monocular depth estimation — all running in parallel with multi-object tracking.

Key Features

5 parallel AI models — YOLOv8 detector, EfficientNet classifier, DeepLabV3 segmenter, MediaPipe face mesh, MiDaS depth estimator
SORT multi-object tracker — built from scratch using Kalman filter + Hungarian algorithm
Threaded pipeline — ThreadPoolExecutor runs each model independently per frame
FastAPI server — REST endpoints, MJPEG streaming, WebSocket real-time detections
Live dashboard — dark-themed web UI with FPS gauge, model toggles, detection cards
Video recorder — non-blocking background writer with metadata JSON sidecar
Click CLI — run, snapshot, benchmark, download, serve commands
Configurable — single YAML config with deep-merge override support

Architecture

┌───────────┐     ┌──────────────────┐     ┌───────────────┐
│  Webcam   │────►│ InferencePipeline│────►│   Annotator   │
│  Capture  │     │  (ThreadPool)    │     │  (OpenCV draw)│
└───────────┘     ├──────────────────┤     └───────┬───────┘
                  │ ┌──────────────┐ │             │
                  │ │ EfficientNet │ │       ┌─────▼─────┐
                  │ │  Classifier  │ │       │  FastAPI   │
                  │ ├──────────────┤ │       │  Server    │
                  │ │   YOLOv8     │ │       ├───────────┤
                  │ │  Detector    │ │       │ /stream   │──► MJPEG
                  │ ├──────────────┤ │       │ /ws       │──► WebSocket
                  │ │  DeepLabV3   │ │       │ /api/v1/* │──► REST
                  │ │  Segmenter   │ │       └───────────┘
                  │ ├──────────────┤ │             │
                  │ │  MediaPipe   │ │       ┌─────▼─────┐
                  │ │  Face Mesh   │ │       │ Dashboard │
                  │ ├──────────────┤ │       │ (HTML/JS) │
                  │ │   MiDaS      │ │       └───────────┘
                  │ │   Depth      │ │
                  │ └──────────────┘ │
                  │ ┌──────────────┐ │
                  │ │ SORT Tracker │ │
                  │ └──────────────┘ │
                  └──────────────────┘

Quick Start

Prerequisites

Python 3.10+
Webcam (or use MockCapture for testing)
~2 GB disk (for model weights, auto-downloaded on first run)

Installation

git clone https://github.com/gokulprakash30/visioncore.git
cd visioncore
python -m venv .venv && source .venv/bin/activate
pip install -r requirements.txt

Run

# Live inference + API server + dashboard
python cli/main.py run

# Standalone mode (OpenCV window only, no server)
python cli/main.py run --no-server

# Headless (API server only, no display)
python cli/main.py run --no-display

# Single snapshot
python cli/main.py snapshot --output snapshot.jpg

# Benchmark all models
python cli/main.py benchmark --frames 100 --output results.json

# Pre-download all model weights
python cli/main.py download

Dashboard

Open http://localhost:8000 after starting the server.

Project Structure

visioncore/
├── cli/main.py              # Click CLI (run, snapshot, benchmark, download, serve)
├── config/default.yaml      # Master YAML configuration
├── dashboard/               # Web frontend (HTML + CSS + JS)
│   ├── index.html
│   ├── style.css
│   └── app.js
├── scripts/
│   ├── benchmark.py         # Performance benchmarking
│   └── download_models.py   # Pre-download all weights
├── tests/                   # Pytest suite (21 tests)
│   ├── conftest.py
│   ├── test_annotator.py
│   ├── test_api.py
│   ├── test_capture.py
│   └── test_pipeline.py
├── visioncore/              # Core Python package
│   ├── __init__.py
│   ├── annotator.py         # Frame annotation (boxes, masks, HUD)
│   ├── capture.py           # WebcamCapture + MockCapture
│   ├── pipeline.py          # Multi-model InferencePipeline
│   ├── recorder.py          # Background video recording
│   ├── tracker.py           # SORT multi-object tracker
│   ├── api/
│   │   ├── schemas.py       # Pydantic schemas
│   │   └── server.py        # FastAPI server
│   ├── models/
│   │   ├── base.py          # Abstract BaseModel + Detection dataclass
│   │   ├── classifier.py    # EfficientNet-B0
│   │   ├── detector.py      # YOLOv8
│   │   ├── depth.py         # MiDaS depth
│   │   ├── face.py          # MediaPipe face mesh + head pose
│   │   └── segmenter.py     # DeepLabV3 MobileNet
│   └── utils/
│       ├── config.py         # YAML config with deep-merge + AttrDict
│       ├── fps.py            # Thread-safe EMA FPS counter
│       └── logger.py         # Structured JSON logger + Rich console
├── assets/demo_frame.jpg
├── pyproject.toml
├── requirements.txt
├── requirements-dev.txt
├── Makefile
├── Dockerfile
├── LICENSE                  # Apache 2.0
└── .github/workflows/ci.yml

Configuration

All settings live in config/default.yaml. Override with a custom YAML:

python cli/main.py run --config my_config.yaml

Key sections:

Section	What it controls
`capture`	Camera device, resolution, FPS
`pipeline`	Enabled models, device (cpu/cuda/mps), confidence threshold
`models.*`	Per-model weights, hyperparameters
`tracker`	SORT max_age, min_hits, IoU threshold
`annotator`	Bbox thickness, font scale, mask alpha
`api`	Host, port, CORS origins, JPEG quality
`recorder`	Output dir, codec, FPS

API Reference

Method	Endpoint	Description
GET	`/api/v1/status`	System status (FPS, models, device)
GET	`/api/v1/models`	List all available models
POST	`/api/v1/models/toggle`	Enable/disable a model at runtime
GET	`/api/v1/frame/latest`	Latest inference metadata
GET	`/api/v1/snapshot`	Save annotated frame to disk
GET	`/api/v1/metrics`	Per-model inference latencies
POST	`/api/v1/record/start`	Start MP4 recording
POST	`/api/v1/record/stop`	Stop recording + save metadata
POST	`/api/v1/config`	Update config value at runtime
GET	`/stream`	MJPEG video stream
WS	`/ws/detections`	Real-time detection WebSocket

Testing

# Run all tests
pytest tests/ -v

# With coverage
pytest tests/ --cov=visioncore --cov-report=term-missing

Docker

docker build -t visioncore .
docker run --device=/dev/video0 -p 8000:8000 visioncore

License

Apache License 2.0 — see LICENSE.

Author

Built by gokulprakash30.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

VisionCore

Overview

Key Features

Architecture

Quick Start

Prerequisites

Installation

Run

Dashboard

Project Structure

Configuration

API Reference

Testing

Docker

License

Author

About

Uh oh!

Releases 1

Packages

Uh oh!

Contributors

Uh oh!

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 1 Commit
.github/workflows		.github/workflows
assets		assets
cli		cli
config		config
dashboard		dashboard
scripts		scripts
tests		tests
visioncore		visioncore
.gitignore		.gitignore
Dockerfile		Dockerfile
LICENSE		LICENSE
Makefile		Makefile
README.md		README.md
pyproject.toml		pyproject.toml
requirements-dev.txt		requirements-dev.txt
requirements.txt		requirements.txt

Folders and files

Latest commit

History

Repository files navigation

VisionCore

Overview

Key Features

Architecture

Quick Start

Prerequisites

Installation

Run

Dashboard

Project Structure

Configuration

API Reference

Testing

Docker

License

Author

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases 1

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages