Skip to content

gmalbert/bundesliga

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

7 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

La Liga Linea 🇪🇸

Statistical predictions and betting analysis for La Liga (Spain's Primera División).

Part of the Betting Oracle suite — built with the same architecture as the MLS Predictor and Premier League Predictor apps.


Quick Start

# 1. Clone and set up environment
git clone https://github.com/gmalbert/la-liga.git
cd la-liga
python -m venv venv
venv\Scripts\activate          # Windows
# source venv/bin/activate     # macOS / Linux

pip install -r requirements.txt

# 2. Configure API keys
copy .env.example .env         # Windows
# cp .env.example .env         # macOS / Linux
# Edit .env and add your keys

# 3. Fetch data
python fetch_historical_csvs.py    # 10 seasons of La Liga results
python fetch_upcoming_fixtures.py  # upcoming PD fixtures

# 4. Run the app (model trains automatically on first load)
streamlit run predictions.py

Environment Variables

Create a .env file in the project root (never commit this):

FOOTBALL_DATA_KEY=your_football_data_org_key
ODDS_API_KEY=your_the_odds_api_key

Get your keys:


Project Structure

la-liga/
├── predictions.py              # App entry point — run this with streamlit
├── utils.py                    # Shared helpers: data, features, model, display
├── footer.py                   # Betting Oracle footer
├── themes.py                   # CSS theme (La Liga red)
├── team_name_mapping.py        # Normalize team names across data sources
│
├── pages/
│   ├── predictions_tab.py      # 🎯 Default page — upcoming match predictions
│   ├── fixtures.py             # 🗓️ Fixtures, standings, live scores
│   ├── statistics.py           # 📊 xG rankings, form, H2H, Copa congestion
│   ├── team_deep_dive.py       # 🔬 Per-team deep dive
│   ├── raw_data.py             # 📁 Historical data browser
│   ├── markets.py              # 📈 Bookmaker odds and implied probs
│   └── best_bets.py            # 💰 Model-vs-market value plays
│
├── data_files/
│   ├── logo.png
│   ├── combined_historical_data.csv   # generated by fetch_historical_csvs.py
│   ├── upcoming_fixtures.csv          # generated by fetch_upcoming_fixtures.py
│   ├── predictions_log.csv            # generated by app
│   └── raw/
│       ├── fbref_team_xg.csv          # generated by fetch_fbref_xg.py
│       ├── copa_fixtures.csv          # generated by fetch_copa_fixtures.py
│       └── odds.csv                   # generated by fetch_odds.py
│
├── models/
│   ├── ensemble_model.pkl             # auto-generated on first app run
│   └── metrics.json                   # model evaluation metrics
│
├── .streamlit/config.toml             # La Liga red theme
├── docs/                              # Roadmap documents
├── requirements.txt
├── .env                               # NOT committed — create from .env.example
└── .gitignore

How the Model Works

  1. Data — 10 seasons of La Liga results from football-data.co.uk (SP1.csv files)
  2. Features — Rolling averages computed per team (shift(1) prevents leakage):
    • Goals scored/conceded (last 5), win rate (last 10), momentum points (last 3), rest days
    • Vig-removed bookmaker implied probabilities (when odds are available)
  3. Model — Soft-voting ensemble: XGBoost (weight 2) + Random Forest (1.5) + Gradient Boosting (1) + Logistic Regression (0.5)
  4. Predictions — Feature vector built from each team's most recent stats; model returns P(HomeWin / Draw / AwayWin)
  5. Risk — Shannon entropy of the probability distribution; high entropy = high risk

The model trains automatically on first load. Delete models/ensemble_model.pkl to retrain.


Pages

Page Description
Predictions Upcoming fixture predictions with risk scoring and betting tips
Fixtures & Standings Live La Liga table, upcoming fixtures, live scores
Statistics xG rankings, last-5 form, H2H analyzer, Copa del Rey flag
Team Deep Dive Per-team KPIs, home/away split, last 10 results
Raw Data Filterable historical data browser
Markets Bookmaker odds and implied probabilities
Best Bets Value plays where model edge ≥ 4% vs market

La Liga Notes

  • Competition code: PD · FBref comp ID: 12 · Season: August–May
  • All natural grass — no surface flag needed
  • Copa del Rey congestion flag is La Liga-specific
  • Avg goals: ~1.45 home / ~1.12 away

See docs/README.md for the full implementation roadmap.

About

Data analysis and sports betting for Bundesliga

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages