MatchMind - A football analytics Platform

Match Mind is a Production-grade football data analysis pipeline. From raw StatsBomb event data to actionable tactical insights, interactive dashboards, automated PDF reports, Monte Carlo match simulation, and coaching-ready presentations.

Key Features

High-performance data pipeline: COPY-protocol bulk loading + async concurrent fetching (3-4x faster)
Core analyses: Opponent profiling, player performance, xG/xA, passing networks, pressing heatmaps
Custom xG model: Trainable logistic regression + gradient boosting upgrade with hyperparameter tuning
Player similarity engine: Embedding-based player comparison for recruitment shortlisting
Possession chain analysis: Sequence modelling of build-up patterns, transitions, and dangerous possessions
Set-piece analysis: Corner/free-kick clustering, delivery zone classification, efficiency metrics
Match simulation: Monte Carlo outcome prediction with scoreline probabilities and in-match updates
Player development tracking: Longitudinal trajectory analysis, breakout identification, age curves
Spatial dominance: Voronoi tessellation, passing lanes, defensive coverage gaps
Video timestamp alignment: Event-to-broadcast sync, FFmpeg clip generation, SRT subtitles
Tracking data integration: Pitch control, physical metrics, and event synchronisation
FastAPI REST layer: Full API for external integrations (Tableau, mobile, Slack bots)
Interactive dashboard: Plotly Dash with opponent scouting and player performance views
Automated PDF reports: Match reports, opponent scouts, and player profiles via Jinja2 + WeasyPrint
Parquet cache layer: Instant notebook/dashboard loads bypassing database for read-heavy workflows
Materialised views: Pre-computed SQL aggregates refreshed after each ingestion
Table partitioning: Event table partitioned by season_id for scalable query performance
Performance-optimised: Indexed queries (70x speedup), COPY protocol (5-10x), async I/O (3-4x)
Production engineering: Type hints, unit tests, CI/CD, Docker, modular package structure

Quick Start

Prerequisites

Python 3.10+
uv package manager
PostgreSQL 14+ (or use Docker Compose)

Setup

# Clone the repo
git clone https://github.com/ronniepiku/MatchMind.git
cd MatchMind

# Install dependencies with uv
uv sync

# Start PostgreSQL (via Docker)
docker compose up -d db

# Configure environment
cp .env.example .env  # Edit credentials if needed

# Ingest sample data (3 matches for quick demo)
uv run fb-ingest --max-matches 3

# Or use async ingestion (3-4x faster for full datasets)
uv run fb-ingest-async --max-matches 3

# Run tests
uv run pytest

# Launch dashboard
uv run fb-dashboard
# → Open http://localhost:8050

# Generate a PDF report
uv run fb-report --type opponent --team-id 771 --season-id 106

Without Docker (Local PostgreSQL)

# Create database
createdb football_analytics
psql football_analytics < src/football_analytics/db/schema.sql

# Set connection in .env
echo "POSTGRES_HOST=localhost" >> .env

# Ingest and run
uv sync
uv run fb-ingest --competition-id 43 --season-id 106
uv run fb-dashboard

Full Docker Environment

docker compose up --build
# Dashboard: http://localhost:8050
# Postgres: localhost:5433

Project Structure

MatchMind/
├── src/football_analytics/
│   ├── __init__.py              # Package root (v0.3.0)
│   ├── config.py                # Environment & settings
│   ├── ingest.py                # Sync ingestion (COPY protocol)
│   ├── async_ingest.py          # Async concurrent ingestion (httpx)
│   ├── cache.py                 # Parquet cache layer
│   ├── api.py                   # FastAPI REST endpoints
│   ├── db/
│   │   ├── __init__.py          # Engine & session management
│   │   ├── schema.sql           # DDL + indexes + materialised views
│   │   ├── partitioning.sql     # Table partitioning by season_id
│   │   └── queries.sql          # Optimised analytical queries
│   ├── analysis/
│   │   ├── opponent_profile.py  # Opponent scouting reports
│   │   ├── player_performance.py # Player metrics & radar charts
│   │   ├── visualisations.py    # Static + interactive plots
│   │   ├── xg_model.py          # Custom xG model (logistic regression)
│   │   ├── xg_model_advanced.py # Gradient boosting xG (HistGradient)
│   │   ├── similarity.py        # Player similarity engine
│   │   ├── tracking.py          # Tracking data integration
│   │   ├── possession_chains.py # Possession sequence analysis
│   │   ├── set_pieces.py        # Set-piece analysis & clustering
│   │   ├── simulation.py        # Monte Carlo match simulation
│   │   ├── development.py       # Player development tracking
│   │   ├── spatial.py           # Voronoi tessellation & space control
│   │   └── video_alignment.py   # Video timestamp sync & clip generation
│   ├── dashboard/
│   │   └── app.py               # Plotly Dash application
│   └── reports/
│       ├── pdf_report.py        # Automated PDF generation
│       └── templates/           # Jinja2 HTML report templates
├── tests/                       # Unit tests (pytest)
│   ├── test_ingest.py           # Ingestion tests
│   ├── test_analysis.py         # Visualisation tests
│   ├── test_enhancements.py     # xG model, similarity, cache, tracking tests
│   └── test_v030_enhancements.py # v0.3 module tests (chains, simulation, API)
├── notebooks/                   # Reproducible Jupyter analyses
├── docs/                        # Technical docs
│   ├── PERFORMANCE.md           # Profiling results & optimisation notes
│   ├── VIDEO_INTEGRATION.md     # Video sync & clip extraction guide
│   └── TECHNICAL_APPENDIX.md    # Methods, metrics, assumptions
├── data/                        # Raw + processed + cache (gitignored)
├── .github/workflows/ci.yml     # GitHub Actions CI
├── Dockerfile                   # Container build
├── docker-compose.yml           # Full stack orchestration
├── pyproject.toml               # Dependencies (uv/hatch)
└── README.md                    # This file

Analyses & Modules

Module	Capability	Use Case
`opponent_profile.py`	Attack patterns, defensive shape, key threats	Pre-match preparation
`player_performance.py`	Season stats, rolling form, radar percentiles	Player reviews, recruitment
`xg_model.py`	Custom trainable xG model with evaluation	Model understanding, custom features
`xg_model_advanced.py`	Gradient boosting xG with hyperparameter tuning	Higher accuracy xG predictions
`similarity.py`	Cosine similarity on normalised player vectors	Recruitment shortlisting, replacement finding
`possession_chains.py`	Build-up sequence analysis, transition metrics	Tactical pattern identification
`set_pieces.py`	Corner/FK clustering, delivery zones, efficiency	Set-piece coaching & defence
`simulation.py`	Monte Carlo match/season outcome simulation	Pre-match strategy, projections
`development.py`	Multi-season trajectory analysis, breakout detection	Academy scouting, squad planning
`spatial.py`	Voronoi tessellation, passing lanes, coverage gaps	Space control, defensive analysis
`video_alignment.py`	Event-to-video sync, clip generation, SRT export	Coach video review workflow
`tracking.py`	Pitch control, physical metrics, space analysis	Advanced tactical analysis (with tracking data)
`visualisations.py`	Shot maps, passing networks, heatmaps, xG timeline	Reports, presentations, dashboards
`cache.py`	Parquet-based query result caching	Fast notebook/dashboard iteration
`pdf_report.py`	Automated PDF/HTML reports	Coach presentations, weekly reports
`api.py`	FastAPI REST endpoints with Pydantic validation	External integrations, mobile apps

Dashboard

Three interactive views accessible at http://localhost:8050:

Opponent Profile — Select a team and season to generate a scouting report with attack pattern breakdown, defensive zone analysis, and key player identification.
Player Performance — Individual player season summary, rolling form chart, and squad-level comparison table.
Team Scorecard — Holistic team performance dashboard with KPI cards (dangerous possession %, box entry rate, xG/chain), possession style breakdown pie chart, defensive actions by zone, set-piece efficiency, and transition metrics.

Report Generation

Generate coach-ready PDF reports from the command line:

# Post-match report
uv run fb-report --type match --match-id 3869685

# Opponent scouting document
uv run fb-report --type opponent --team-id 771 --season-id 106

# Player profile
uv run fb-report --type player --player-id 5503 --season-id 106

Reports are saved to data/reports/ as PDF (with WeasyPrint) or HTML fallback.

Performance

See docs/PERFORMANCE.md for full profiling results.

Key wins:

70x query speedup via composite indexes matching analytical query patterns
5-10x faster ingestion via PostgreSQL COPY protocol (staging table + upsert)
3-4x faster downloads via async concurrent fetching (httpx, concurrency=8)
Instant dashboard loads via materialised views (pre-computed aggregates)
50ms data access via Parquet cache layer (vs 800ms from PostgreSQL)
Vectorised Python — pandas operations replace row-level loops throughout
Connection pooling — Reuse connections across dashboard callbacks

Video Integration

See docs/VIDEO_INTEGRATION.md for the full guide on:

Synchronising StatsBomb event timestamps with broadcast video
Extracting tactical clips with FFmpeg
Tagging clips with analytical context
Integrating clips into the dashboard

Custom xG Model

The project includes multiple xG models:

from football_analytics.analysis.xg_model import train_xg_model
from football_analytics.analysis.xg_model_advanced import train_advanced_xg_model, compare_models

# Baseline model
baseline, metrics, cv_probs = train_xg_model(shots_df)
print(metrics.summary())
# Brier Score: 0.0712 | ROC-AUC: 0.782

# Advanced model (gradient boosting with hyperparameter tuning)
result = train_advanced_xg_model(shots_df, backend="hist", tune_hyperparams=True)
print(result.metrics.summary())
# Brier Score: 0.0648 | ROC-AUC: 0.821

# Compare models
comparison = compare_models(shots_df, baseline, result.model)

Match Simulation

Monte Carlo simulation for match outcome prediction:

from football_analytics.analysis.simulation import simulate_match, format_simulation_report

result = simulate_match(home_xg=1.8, away_xg=1.2, home_team="Arsenal", away_team="Chelsea")
print(format_simulation_report(result))
# Arsenal win: 48.2% | Draw: 24.1% | Chelsea win: 27.7%
# Most likely score: 2-1 | Over 2.5: 58.3% | BTTS: 62.1%

Possession Chain Analysis

Analyse build-up patterns and dangerous possessions:

from football_analytics.analysis.possession_chains import (
    extract_possession_chains, chains_to_dataframe, compute_team_possession_profile
)

chains = extract_possession_chains(events_df)
chains_df = chains_to_dataframe(chains)
profile = compute_team_possession_profile(chains_df, team_id=771)
# → style_distribution, box_entry_rate, xg_per_chain, transition metrics

REST API

Launch the FastAPI server for external integrations:

uv run fb-api
# → API docs at http://localhost:8000/docs

# Example: predict xG
curl -X POST http://localhost:8000/api/v1/xg/predict \
  -H "Content-Type: application/json" \
  -d '{"location_x": 105, "location_y": 40, "shot_body_part": "Foot"}'

# Example: simulate match
curl -X POST http://localhost:8000/api/v1/simulation/match \
  -H "Content-Type: application/json" \
  -d '{"home_xg": 1.8, "away_xg": 1.2, "home_team": "Liverpool", "away_team": "Everton"}'

Player Similarity

Find similar players for recruitment or tactical replacement:

from football_analytics.analysis.similarity import compute_player_vectors, find_similar_players

vectors = compute_player_vectors(season_id=106, engine=engine)
similar = find_similar_players(target_player_id=5503, player_vectors=vectors, position_group="FW")

Development

# Install with dev dependencies
uv sync --all-extras

# Run linter
uv run ruff check src/ tests/

# Run type checker
uv run mypy src/football_analytics/

# Run tests with coverage
uv run pytest --cov-report=html

# Format code
uv run ruff format src/ tests/

Data Source

This project uses StatsBomb Open Data — freely available event-level football data under a non-commercial license. No proprietary data is included.

Default dataset: FIFA World Cup 2022 (competition_id=43, season_id=106).

Documentation

For Users

API Guide — Complete REST API reference with endpoint examples, error handling, integration patterns
Video Integration — How to sync events with broadcast video and extract clips
Technical Appendix — Detailed methodology for all metrics, models, and analyses

For Developers

Testing Guide — Test organization (73 tests), running tests, writing new tests, coverage targets
PERFORMANCE.md — Profiling results, optimisation strategies, benchmarks

License

MIT — see LICENSE.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

MatchMind - A football analytics Platform

Key Features

Quick Start

Prerequisites

Setup

Without Docker (Local PostgreSQL)

Full Docker Environment

Project Structure

Analyses & Modules

Dashboard

Report Generation

Performance

Video Integration

Custom xG Model

Match Simulation

Possession Chain Analysis

REST API

Player Similarity

Development

Data Source

Documentation

For Users

For Developers

License

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 5 Commits
.github/workflows		.github/workflows
data		data
docs		docs
notebooks		notebooks
src/football_analytics		src/football_analytics
tests		tests
.env.example		.env.example
.gitignore		.gitignore
CHANGELOG.md		CHANGELOG.md
Dockerfile		Dockerfile
LICENSE		LICENSE
README.md		README.md
docker-compose.yml		docker-compose.yml
pyproject.toml		pyproject.toml
pytest.ini		pytest.ini

Folders and files

Latest commit

History

Repository files navigation

MatchMind - A football analytics Platform

Key Features

Quick Start

Prerequisites

Setup

Without Docker (Local PostgreSQL)

Full Docker Environment

Project Structure

Analyses & Modules

Dashboard

Report Generation

Performance

Video Integration

Custom xG Model

Match Simulation

Possession Chain Analysis

REST API

Player Similarity

Development

Data Source

Documentation

For Users

For Developers

License

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages