Score Vision — Real-Time Football Game State Recognition

Score Vision is a computer vision system for real-time football video analysis. It reduces the cost and time of manual game-state annotation (player/ball/referee tracking, pitch keypoints) by an order of magnitude, targeting sports analytics, broadcast, and data providers. The pipeline is designed to process multiple HD video streams in parallel via distributed inference nodes, with lightweight validation to keep quality high while minimizing compute overhead.

Sample output

The pipeline takes a video stream and returns per-frame detections and tracking (players, ball, referee, goalkeeper, pitch keypoints). Example output on a football clip:

Full video: 64aacaf72b1948b88a61982970d81c-10M.mp4

Overview

The framework combines detection and tracking (YOLO, HRNet, OSNet, ByteTrack) with lightweight validation (CLIP and keypoint checks) so that complex video analysis can run at scale without re-running full inference for verification. The result is fast, cost-effective game-state recognition suitable for live or batch processing.

Why football?

Football is a demanding domain: high-stakes, real-time decisions, crowded scenes, and varying camera angles. It is an ideal testbed for robust detection and tracking. The same pipeline design extends to other sports and general video analytics.

Market context

Manual video annotation in sports can cost $10–55 per minute, with complex scenarios requiring up to four hours of human labelling per minute of footage. Score Vision aims to cut these costs by 10× to 100× while improving speed and accuracy, serving clubs, broadcasters, betting operators, and analytics providers.

Technologies

Category	Technologies
Detection & tracking	YOLO (Ultralytics), HRNet, OSNet — custom-trained on football datasets for player, goalkeeper, referee, and ball detection; ByteTrack for multi-object tracking
Pitch / keypoints	YOLO-based pitch detection; keypoint estimation for field geometry
Backend & API	Python, FastAPI, Uvicorn; async frame streaming; request verification and concurrency control
Video & inference	OpenCV, supervision; async video download and frame-by-frame streaming; GPU (CUDA) / Apple Silicon (MPS) support
Validation	CLIP and VLM-based checks for bounding-box semantics; keypoint stability and reprojection scoring
Infrastructure	Hugging Face (model hosting); SQLite for challenge/response storage; PM2/Docker for deployment

Model choices

YOLO: Single-stage detection suited to real-time video; strong accuracy/speed trade-off for players and ball; tunable confidence and IoU for production.
HRNet: High-resolution feature maps across scales → better localization and keypoint/pose-style outputs; helps in crowded scenes and with small objects (e.g. ball, distant players).
OSNet: Lightweight person re-identification; improves identity consistency across frames when combined with ByteTrack, reducing ID switches in multi-player tracking.
Custom training: Models are trained on football-specific data (players, referees, goalkeepers, ball, pitch) to maximize accuracy and robustness to camera angles, lighting, and occlusions typical in match footage.

Inference pipeline

Ingest: Receive video URL → download (with retries/timeouts) → validate file (OpenCV).
Frame stream: Stream frames asynchronously (OpenCV VideoCapture) with device-specific timeouts to avoid stalls on CPU/GPU.
Per-frame inference:
- Pitch: YOLO pitch model → extract pitch keypoints (field lines, corners) for downstream geometry.
- Players / ball / referee / goalkeeper: YOLO (and, where used, HRNet/OSNet) detection → confidence threshold and NMS → ByteTrack to assign stable IDs across frames.
- Output: Per-frame list of objects (bbox, class_id, tracker_id) and keypoints; coordinates rounded for compact JSON.
Response: Aggregate frames + processing time; optional signing/verification for integration with distributed or API-driven systems.

Lightweight validation

Validation ensures output quality without re-running full inference:

Frame filtering & keypoint validation
Frames are filtered using pitch detection. A global scoring system evaluates keypoint accuracy (stability, plausibility, reprojection error).
Semantic bounding-box assessment
Selected frames are checked with CLIP-based object verification (players, ball, referees, goalkeepers). The result is a confidence-weighted quality score.

Scores are combined and normalized (0–1) for quality control and optional use in ranking or routing.

Main technical challenge: duplicate detections

Problem: During bounding-box inference, the same player or object was often detected multiple times in overlapping boxes, leading to duplicate detections and noisy tracking.

Approach:

Confidence (score) threshold: Only high-confidence detections are kept, reducing duplicates while retaining true positives.
Typical range: Confidence (ROI score) is set between 0.7 and 0.9 in production; the exact value is chosen per model and dataset from validation metrics (precision/recall, tracking quality).
Trade-off: Lower thresholds increase recall but add duplicate detections; higher thresholds reduce duplicates but can drop valid detections under occlusion or motion blur. Threshold choice depends on model training output (e.g. confidence distribution and calibration).

Performance metrics

Game State Higher Order Tracking Accuracy (GS-HOTA) is used for quality assessment:

GS-HOTA = √(Detection × Association)
Detection: Object detection accuracy.
Association: Tracking consistency across frames.

Evaluation covers detection and tracking accuracy, consistency over time, and response latency. The system is designed to support quality-weighted scoring and volume-based contribution metrics where the pipeline is run in a distributed or multi-instance setup.

Architecture (high level)

Inference nodes: Receive video streams (or URLs), run the detection-and-tracking pipeline per frame, and return structured JSON (objects, keypoints, IDs).
Validation layer: Optional lightweight checks (CLIP, keypoint scoring) on a subset of frames to verify quality without full re-inference.
API & storage: FastAPI for ingestion and responses; SQLite or similar for challenge/response and state; optional Hugging Face for model hosting.

No blockchain or token terminology is required to run or integrate the pipeline; it can be deployed as a standard microservice or multi-instance API.

Quick start

Ensure system requirements (Python 3.10+, GPU recommended for real-time inference; see min_compute.yaml for reference specs).
Clone the repository and install dependencies (see setup guides in the repo).
Configure environment (API keys, model paths, device).
Run the inference service and send video URLs or streams to the API.

Detailed setup for inference nodes (running the detection pipeline) and validation / quality-check services is in the documentation:

(These guides may use internal role names; functionally they describe “run the detector” and “run the validator” services.)

Roadmap

Current: Game State Recognition pipeline, VLM-based validation, benchmarking.
Planned: Human-in-the-loop validation, additional footage types (e.g. grassroots), dashboard and leaderboard.
Future: Action spotting, match event captioning, advanced player tracking, integration APIs, adaptation to other sports, developer tools and SDKs.

Research

The paper "Score Vision: Enabling Complex Computer Vision Through Lightweight Validation - A Game State Recognition Framework for Live Football" describes the lightweight validation approach and its role in reducing computational overhead while maintaining accuracy at scale.

Contributing

Contributions are welcome. See Contributing Guidelines for code style, pull request process, and testing.

License

This project is licensed under the MIT License. See LICENSE for details.

Name		Name	Last commit message	Last commit date
Latest commit History 15 Commits
miner		miner
validator		validator
.dockerignore		.dockerignore
.gitignore		.gitignore
64aacaf72b1948b88a61982970d81c-10M.mp4		64aacaf72b1948b88a61982970d81c-10M.mp4
CONTRIBUTING.md		CONTRIBUTING.md
Dockerfile		Dockerfile
LICENSE		LICENSE
README.md		README.md
RESUME_PROJECT.md		RESUME_PROJECT.md
bootstrap.sh		bootstrap.sh
docker-compose.yml		docker-compose.yml
env.example		env.example
min_compute.yaml		min_compute.yaml
pyproject.toml		pyproject.toml
requirements.txt		requirements.txt
sample-output.gif		sample-output.gif
setup.py		setup.py
validator_auto_update.sh		validator_auto_update.sh

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Score Vision — Real-Time Football Game State Recognition

Sample output

Overview

Why football?

Market context

Technologies

Model choices

Inference pipeline

Lightweight validation

Main technical challenge: duplicate detections

Performance metrics

Architecture (high level)

Quick start

Roadmap

Research

Contributing

License

About

Uh oh!

Releases

Packages

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Score Vision — Real-Time Football Game State Recognition

Sample output

Overview

Why football?

Market context

Technologies

Model choices

Inference pipeline

Lightweight validation

Main technical challenge: duplicate detections

Performance metrics

Architecture (high level)

Quick start

Roadmap

Research

Contributing

License

About

Topics

Resources

License

Contributing

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Packages