Skip to content

gokulprakash30/visioncore

Repository files navigation

VisionCore

Production-Grade Real-Time Multi-Model Image Recognition

Demo

CI Python 3.10+ License: Apache 2.0


Overview

VisionCore is a modular, real-time image recognition system that orchestrates five AI models simultaneously through a webcam feed. It provides object detection, image classification, semantic segmentation, face detection with head-pose estimation, and monocular depth estimation — all running in parallel with multi-object tracking.

Key Features

  • 5 parallel AI models — YOLOv8 detector, EfficientNet classifier, DeepLabV3 segmenter, MediaPipe face mesh, MiDaS depth estimator
  • SORT multi-object tracker — built from scratch using Kalman filter + Hungarian algorithm
  • Threaded pipelineThreadPoolExecutor runs each model independently per frame
  • FastAPI server — REST endpoints, MJPEG streaming, WebSocket real-time detections
  • Live dashboard — dark-themed web UI with FPS gauge, model toggles, detection cards
  • Video recorder — non-blocking background writer with metadata JSON sidecar
  • Click CLIrun, snapshot, benchmark, download, serve commands
  • Configurable — single YAML config with deep-merge override support

Architecture

┌───────────┐     ┌──────────────────┐     ┌───────────────┐
│  Webcam   │────►│ InferencePipeline│────►│   Annotator   │
│  Capture  │     │  (ThreadPool)    │     │  (OpenCV draw)│
└───────────┘     ├──────────────────┤     └───────┬───────┘
                  │ ┌──────────────┐ │             │
                  │ │ EfficientNet │ │       ┌─────▼─────┐
                  │ │  Classifier  │ │       │  FastAPI   │
                  │ ├──────────────┤ │       │  Server    │
                  │ │   YOLOv8     │ │       ├───────────┤
                  │ │  Detector    │ │       │ /stream   │──► MJPEG
                  │ ├──────────────┤ │       │ /ws       │──► WebSocket
                  │ │  DeepLabV3   │ │       │ /api/v1/* │──► REST
                  │ │  Segmenter   │ │       └───────────┘
                  │ ├──────────────┤ │             │
                  │ │  MediaPipe   │ │       ┌─────▼─────┐
                  │ │  Face Mesh   │ │       │ Dashboard │
                  │ ├──────────────┤ │       │ (HTML/JS) │
                  │ │   MiDaS      │ │       └───────────┘
                  │ │   Depth      │ │
                  │ └──────────────┘ │
                  │ ┌──────────────┐ │
                  │ │ SORT Tracker │ │
                  │ └──────────────┘ │
                  └──────────────────┘

Quick Start

Prerequisites

  • Python 3.10+
  • Webcam (or use MockCapture for testing)
  • ~2 GB disk (for model weights, auto-downloaded on first run)

Installation

git clone https://github.com/gokulprakash30/visioncore.git
cd visioncore
python -m venv .venv && source .venv/bin/activate
pip install -r requirements.txt

Run

# Live inference + API server + dashboard
python cli/main.py run

# Standalone mode (OpenCV window only, no server)
python cli/main.py run --no-server

# Headless (API server only, no display)
python cli/main.py run --no-display

# Single snapshot
python cli/main.py snapshot --output snapshot.jpg

# Benchmark all models
python cli/main.py benchmark --frames 100 --output results.json

# Pre-download all model weights
python cli/main.py download

Dashboard

Open http://localhost:8000 after starting the server.


Project Structure

visioncore/
├── cli/main.py              # Click CLI (run, snapshot, benchmark, download, serve)
├── config/default.yaml      # Master YAML configuration
├── dashboard/               # Web frontend (HTML + CSS + JS)
│   ├── index.html
│   ├── style.css
│   └── app.js
├── scripts/
│   ├── benchmark.py         # Performance benchmarking
│   └── download_models.py   # Pre-download all weights
├── tests/                   # Pytest suite (21 tests)
│   ├── conftest.py
│   ├── test_annotator.py
│   ├── test_api.py
│   ├── test_capture.py
│   └── test_pipeline.py
├── visioncore/              # Core Python package
│   ├── __init__.py
│   ├── annotator.py         # Frame annotation (boxes, masks, HUD)
│   ├── capture.py           # WebcamCapture + MockCapture
│   ├── pipeline.py          # Multi-model InferencePipeline
│   ├── recorder.py          # Background video recording
│   ├── tracker.py           # SORT multi-object tracker
│   ├── api/
│   │   ├── schemas.py       # Pydantic schemas
│   │   └── server.py        # FastAPI server
│   ├── models/
│   │   ├── base.py          # Abstract BaseModel + Detection dataclass
│   │   ├── classifier.py    # EfficientNet-B0
│   │   ├── detector.py      # YOLOv8
│   │   ├── depth.py         # MiDaS depth
│   │   ├── face.py          # MediaPipe face mesh + head pose
│   │   └── segmenter.py     # DeepLabV3 MobileNet
│   └── utils/
│       ├── config.py         # YAML config with deep-merge + AttrDict
│       ├── fps.py            # Thread-safe EMA FPS counter
│       └── logger.py         # Structured JSON logger + Rich console
├── assets/demo_frame.jpg
├── pyproject.toml
├── requirements.txt
├── requirements-dev.txt
├── Makefile
├── Dockerfile
├── LICENSE                  # Apache 2.0
└── .github/workflows/ci.yml

Configuration

All settings live in config/default.yaml. Override with a custom YAML:

python cli/main.py run --config my_config.yaml

Key sections:

Section What it controls
capture Camera device, resolution, FPS
pipeline Enabled models, device (cpu/cuda/mps), confidence threshold
models.* Per-model weights, hyperparameters
tracker SORT max_age, min_hits, IoU threshold
annotator Bbox thickness, font scale, mask alpha
api Host, port, CORS origins, JPEG quality
recorder Output dir, codec, FPS

API Reference

Method Endpoint Description
GET /api/v1/status System status (FPS, models, device)
GET /api/v1/models List all available models
POST /api/v1/models/toggle Enable/disable a model at runtime
GET /api/v1/frame/latest Latest inference metadata
GET /api/v1/snapshot Save annotated frame to disk
GET /api/v1/metrics Per-model inference latencies
POST /api/v1/record/start Start MP4 recording
POST /api/v1/record/stop Stop recording + save metadata
POST /api/v1/config Update config value at runtime
GET /stream MJPEG video stream
WS /ws/detections Real-time detection WebSocket

Testing

# Run all tests
pytest tests/ -v

# With coverage
pytest tests/ --cov=visioncore --cov-report=term-missing

Docker

docker build -t visioncore .
docker run --device=/dev/video0 -p 8000:8000 visioncore

License

Apache License 2.0 — see LICENSE.


Author

Built by gokulprakash30.

About

Production-grade real-time multi-model image recognition system

Resources

License

Stars

Watchers

Forks

Packages

 
 
 

Contributors