Statistical predictions and betting analysis for La Liga (Spain's Primera División).
Part of the Betting Oracle suite — built with the same architecture as the MLS Predictor and Premier League Predictor apps.
# 1. Clone and set up environment
git clone https://github.com/gmalbert/la-liga.git
cd la-liga
python -m venv venv
venv\Scripts\activate # Windows
# source venv/bin/activate # macOS / Linux
pip install -r requirements.txt
# 2. Configure API keys
copy .env.example .env # Windows
# cp .env.example .env # macOS / Linux
# Edit .env and add your keys
# 3. Fetch data
python fetch_historical_csvs.py # 10 seasons of La Liga results
python fetch_upcoming_fixtures.py # upcoming PD fixtures
# 4. Run the app (model trains automatically on first load)
streamlit run predictions.pyCreate a .env file in the project root (never commit this):
FOOTBALL_DATA_KEY=your_football_data_org_key
ODDS_API_KEY=your_the_odds_api_key
Get your keys:
- football-data.org — free tier covers La Liga (
PD) - The Odds API — free tier, sport key:
soccer_spain_la_liga
la-liga/
├── predictions.py # App entry point — run this with streamlit
├── utils.py # Shared helpers: data, features, model, display
├── footer.py # Betting Oracle footer
├── themes.py # CSS theme (La Liga red)
├── team_name_mapping.py # Normalize team names across data sources
│
├── pages/
│ ├── predictions_tab.py # 🎯 Default page — upcoming match predictions
│ ├── fixtures.py # 🗓️ Fixtures, standings, live scores
│ ├── statistics.py # 📊 xG rankings, form, H2H, Copa congestion
│ ├── team_deep_dive.py # 🔬 Per-team deep dive
│ ├── raw_data.py # 📁 Historical data browser
│ ├── markets.py # 📈 Bookmaker odds and implied probs
│ └── best_bets.py # 💰 Model-vs-market value plays
│
├── data_files/
│ ├── logo.png
│ ├── combined_historical_data.csv # generated by fetch_historical_csvs.py
│ ├── upcoming_fixtures.csv # generated by fetch_upcoming_fixtures.py
│ ├── predictions_log.csv # generated by app
│ └── raw/
│ ├── fbref_team_xg.csv # generated by fetch_fbref_xg.py
│ ├── copa_fixtures.csv # generated by fetch_copa_fixtures.py
│ └── odds.csv # generated by fetch_odds.py
│
├── models/
│ ├── ensemble_model.pkl # auto-generated on first app run
│ └── metrics.json # model evaluation metrics
│
├── .streamlit/config.toml # La Liga red theme
├── docs/ # Roadmap documents
├── requirements.txt
├── .env # NOT committed — create from .env.example
└── .gitignore
- Data — 10 seasons of La Liga results from football-data.co.uk (SP1.csv files)
- Features — Rolling averages computed per team (shift(1) prevents leakage):
- Goals scored/conceded (last 5), win rate (last 10), momentum points (last 3), rest days
- Vig-removed bookmaker implied probabilities (when odds are available)
- Model — Soft-voting ensemble: XGBoost (weight 2) + Random Forest (1.5) + Gradient Boosting (1) + Logistic Regression (0.5)
- Predictions — Feature vector built from each team's most recent stats; model returns P(HomeWin / Draw / AwayWin)
- Risk — Shannon entropy of the probability distribution; high entropy = high risk
The model trains automatically on first load. Delete models/ensemble_model.pkl to retrain.
| Page | Description |
|---|---|
| Predictions | Upcoming fixture predictions with risk scoring and betting tips |
| Fixtures & Standings | Live La Liga table, upcoming fixtures, live scores |
| Statistics | xG rankings, last-5 form, H2H analyzer, Copa del Rey flag |
| Team Deep Dive | Per-team KPIs, home/away split, last 10 results |
| Raw Data | Filterable historical data browser |
| Markets | Bookmaker odds and implied probabilities |
| Best Bets | Value plays where model edge ≥ 4% vs market |
- Competition code:
PD· FBref comp ID:12· Season: August–May - All natural grass — no surface flag needed
- Copa del Rey congestion flag is La Liga-specific
- Avg goals: ~1.45 home / ~1.12 away
See docs/README.md for the full implementation roadmap.