Skip to content

loopinf/snapback

Repository files navigation

Snapback — On-device Vision Prototype (OpenCV + MediaPipe + VLM)

A small prototype that runs a real-time desk camera loop and overlays user state.

  • Capture/overlay: OpenCV
  • On-device inference: MediaPipe FaceLandmarker (Tasks API)
  • Coaching layer (optional): Ollama + qwen2.5vl:3b (VLM)

This repo is used as a concrete “on-device vision pipeline” artifact (capture → inference → overlay → latency measurement).


What it does

  • Grabs webcam frames
  • Runs face landmark detection (and derives simple features like EAR)
  • Tracks a coarse state (focused/distracted/etc.)
  • (Optional) calls a VLM periodically and shows a short coaching message

Setup

This project uses uv (recommended).

# inside the repo
uv sync

If you don’t use uv, you can still install from pyproject.toml using your preferred tool.

Note: On first run, detector.py downloads face_landmarker.task from the official MediaPipe model bucket.


Run

uv run python main.py

Press q to quit.


Benchmark (latency)

Quick end-to-end-ish benchmark for the FaceLandmarker inference step:

uv run python bench_mediapipe.py --frames 200

It prints average / p50 / p95 milliseconds per frame.


Files

  • main.py — OpenCV loop + overlay + glue
  • detector.py — MediaPipe FaceLandmarker (Tasks API) + EAR/head signals
  • bench_mediapipe.py — simple latency benchmark runner
  • vlm_coach.py, vlm.py — VLM layer (Ollama + qwen2.5vl)

Notes

  • This is a prototype. It is OK to describe it as a prototype / exploration on a resume.
  • If you need a sharable artifact: add
    • README run steps (already)
    • one screenshot under assets/
    • benchmark output (this repo includes a runner)

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages